* Re: [CFT][PATCH] ucounts: Fix signal ucount refcounting
@ 2021-11-27 1:35 kernel test robot
0 siblings, 0 replies; 13+ messages in thread
From: kernel test robot @ 2021-11-27 1:35 UTC (permalink / raw)
To: kbuild
[-- Attachment #1: Type: text/plain, Size: 15037 bytes --]
CC: llvm(a)lists.linux.dev
CC: kbuild-all(a)lists.01.org
In-Reply-To: <87mtnavszx.fsf_-_@disp2133>
References: <87mtnavszx.fsf_-_@disp2133>
TO: "Eric W. Biederman" <ebiederm@xmission.com>
Hi "Eric,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on linus/master]
[cannot apply to v5.16-rc2 next-20211126]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Eric-W-Biederman/ucounts-Fix-signal-ucount-refcounting/20211016-061359
base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 8fe31e0995f048d16b378b90926793a0aa4af1e5
:::::: branch date: 6 weeks ago
:::::: commit date: 6 weeks ago
config: arm-randconfig-c002-20211017 (https://download.01.org/0day-ci/archive/20211127/202111270907.h62AppsO-lkp(a)intel.com/config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 8ca4b3ef19fe82d7ad6a6e1515317dcc01b41515)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install arm cross compiling tool for clang build
# apt-get install binutils-arm-linux-gnueabi
# https://github.com/0day-ci/linux/commit/e042a898defa264b6a95a439b8570486b47bcd49
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Eric-W-Biederman/ucounts-Fix-signal-ucount-refcounting/20211016-061359
git checkout e042a898defa264b6a95a439b8570486b47bcd49
# save the config file to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm clang-analyzer
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
clang-analyzer warnings: (new ones prefixed by >>)
fs/xfs/libxfs/xfs_rmap.c:207:6: note: Left side of '||' is false
fs/xfs/libxfs/xfs_rmap.c:207:15: note: Assuming the condition is false
if (error || !*stat)
^~~~~~
fs/xfs/libxfs/xfs_rmap.c:207:2: note: Taking false branch
if (error || !*stat)
^
fs/xfs/libxfs/xfs_rmap.c:210:6: note: Calling 'xfs_rmap_btrec_to_irec'
if (xfs_rmap_btrec_to_irec(rec, irec))
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
fs/xfs/libxfs/xfs_rmap.c:188:9: note: Calling 'xfs_rmap_irec_offset_unpack'
return xfs_rmap_irec_offset_unpack(be64_to_cpu(rec->rmap.rm_offset),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
fs/xfs/libxfs/xfs_rmap.h:70:6: note: Assuming the condition is true
if (offset & ~(XFS_RMAP_OFF_MASK | XFS_RMAP_OFF_FLAGS))
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
fs/xfs/libxfs/xfs_rmap.h:70:2: note: Taking true branch
if (offset & ~(XFS_RMAP_OFF_MASK | XFS_RMAP_OFF_FLAGS))
^
fs/xfs/libxfs/xfs_rmap.h:71:3: note: Returning without writing to 'irec->rm_flags'
return -EFSCORRUPTED;
^
fs/xfs/libxfs/xfs_rmap.h:71:3: note: Returning the value -117, which participates in a condition later
return -EFSCORRUPTED;
^~~~~~~~~~~~~~~~~~~~
fs/xfs/libxfs/xfs_rmap.c:188:9: note: Returning from 'xfs_rmap_irec_offset_unpack'
return xfs_rmap_irec_offset_unpack(be64_to_cpu(rec->rmap.rm_offset),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
fs/xfs/libxfs/xfs_rmap.c:188:2: note: Returning without writing to 'irec->rm_flags'
return xfs_rmap_irec_offset_unpack(be64_to_cpu(rec->rmap.rm_offset),
^
fs/xfs/libxfs/xfs_rmap.c:188:2: note: Returning the value -117, which participates in a condition later
return xfs_rmap_irec_offset_unpack(be64_to_cpu(rec->rmap.rm_offset),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
fs/xfs/libxfs/xfs_rmap.c:210:6: note: Returning from 'xfs_rmap_btrec_to_irec'
if (xfs_rmap_btrec_to_irec(rec, irec))
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
fs/xfs/libxfs/xfs_rmap.c:210:2: note: Taking true branch
if (xfs_rmap_btrec_to_irec(rec, irec))
^
fs/xfs/libxfs/xfs_rmap.c:211:3: note: Control jumps to line 239
goto out_bad_rec;
^
fs/xfs/libxfs/xfs_rmap.c:242:2: note: 4th function call argument is an uninitialized value
xfs_warn(mp,
^
Suppressed 7 warnings (7 in non-user code).
Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
6 warnings generated.
Suppressed 6 warnings (6 in non-user code).
Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
2 warnings generated.
Suppressed 2 warnings (2 in non-user code).
Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
2 warnings generated.
Suppressed 2 warnings (2 in non-user code).
Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
5 warnings generated.
include/linux/list.h:808:10: warning: Access to field 'pprev' results in a dereference of a null pointer (loaded from variable 'h') [clang-analyzer-core.NullDereference]
return !h->pprev;
^
kernel/ucount.c:251:23: note: Assuming pointer value is null
for (iter = ucounts; iter; iter = iter->ns->ucounts) {
^~~~
kernel/ucount.c:251:2: note: Loop condition is false. Execution continues on line 255
for (iter = ucounts; iter; iter = iter->ns->ucounts) {
^
kernel/ucount.c:255:14: note: Passing null pointer value via 1st parameter 'ucounts'
put_ucounts(ucounts);
^~~~~~~
kernel/ucount.c:255:2: note: Calling 'put_ucounts'
put_ucounts(ucounts);
^~~~~~~~~~~~~~~~~~~~
kernel/ucount.c:204:6: note: Assuming the condition is true
if (atomic_dec_and_lock_irqsave(&ucounts->count, &ucounts_lock, flags)) {
^
include/linux/spinlock.h:490:21: note: expanded from macro 'atomic_dec_and_lock_irqsave'
__cond_lock(lock, _atomic_dec_and_lock_irqsave(atomic, lock, &(flags)))
~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/compiler_types.h:48:28: note: expanded from macro '__cond_lock'
# define __cond_lock(x,c) (c)
^
kernel/ucount.c:204:2: note: Taking true branch
if (atomic_dec_and_lock_irqsave(&ucounts->count, &ucounts_lock, flags)) {
^
kernel/ucount.c:205:18: note: Passing null pointer value via 1st parameter 'n'
hlist_del_init(&ucounts->node);
^~~~~~~~~~~~~~
kernel/ucount.c:205:3: note: Calling 'hlist_del_init'
hlist_del_init(&ucounts->node);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/list.h:865:22: note: Passing null pointer value via 1st parameter 'h'
if (!hlist_unhashed(n)) {
^
include/linux/list.h:865:7: note: Calling 'hlist_unhashed'
if (!hlist_unhashed(n)) {
^~~~~~~~~~~~~~~~~
include/linux/list.h:808:10: note: Access to field 'pprev' results in a dereference of a null pointer (loaded from variable 'h')
return !h->pprev;
^
>> kernel/ucount.c:291:44: warning: Use of memory after it is freed [clang-analyzer-unix.Malloc]
for (iter = ucounts; iter != last; iter = iter->ns->ucounts) {
^
kernel/ucount.c:309:2: note: Loop condition is true. Entering loop body
for (iter = ucounts; iter; iter = iter->ns->ucounts) {
^
kernel/ucount.c:310:14: note: Left side of '||' is false
long max = READ_ONCE(iter->ns->ucount_max[type]);
^
include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
compiletime_assert_rwonce_type(x); \
^
include/asm-generic/rwonce.h:36:21: note: expanded from macro 'compiletime_assert_rwonce_type'
compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
^
include/linux/compiler_types.h:290:3: note: expanded from macro '__native_word'
(sizeof(t) == sizeof(char) || sizeof(t) == sizeof(short) || \
^
kernel/ucount.c:310:14: note: Left side of '||' is false
long max = READ_ONCE(iter->ns->ucount_max[type]);
^
include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
compiletime_assert_rwonce_type(x); \
^
include/asm-generic/rwonce.h:36:21: note: expanded from macro 'compiletime_assert_rwonce_type'
compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
^
include/linux/compiler_types.h:290:3: note: expanded from macro '__native_word'
(sizeof(t) == sizeof(char) || sizeof(t) == sizeof(short) || \
^
kernel/ucount.c:310:14: note: Left side of '||' is true
long max = READ_ONCE(iter->ns->ucount_max[type]);
^
include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
compiletime_assert_rwonce_type(x); \
^
include/asm-generic/rwonce.h:36:21: note: expanded from macro 'compiletime_assert_rwonce_type'
compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
^
include/linux/compiler_types.h:291:28: note: expanded from macro '__native_word'
sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long))
^
kernel/ucount.c:310:14: note: Taking false branch
long max = READ_ONCE(iter->ns->ucount_max[type]);
^
include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
compiletime_assert_rwonce_type(x); \
^
include/asm-generic/rwonce.h:36:2: note: expanded from macro 'compiletime_assert_rwonce_type'
compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
^
include/linux/compiler_types.h:322:2: note: expanded from macro 'compiletime_assert'
_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
^
include/linux/compiler_types.h:310:2: note: expanded from macro '_compiletime_assert'
__compiletime_assert(condition, msg, prefix, suffix)
^
include/linux/compiler_types.h:302:3: note: expanded from macro '__compiletime_assert'
if (!(condition)) \
^
kernel/ucount.c:310:14: note: Loop condition is false. Exiting loop
long max = READ_ONCE(iter->ns->ucount_max[type]);
^
include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
compiletime_assert_rwonce_type(x); \
^
include/asm-generic/rwonce.h:36:2: note: expanded from macro 'compiletime_assert_rwonce_type'
compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
^
include/linux/compiler_types.h:322:2: note: expanded from macro 'compiletime_assert'
_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
^
include/linux/compiler_types.h:310:2: note: expanded from macro '_compiletime_assert'
__compiletime_assert(condition, msg, prefix, suffix)
^
include/linux/compiler_types.h:300:2: note: expanded from macro '__compiletime_assert'
do { \
^
kernel/ucount.c:312:7: note: Assuming 'new' is >= 0
if (new < 0 || new > max)
^~~~~~~
kernel/ucount.c:312:7: note: Left side of '||' is false
kernel/ucount.c:312:18: note: Assuming 'new' is <= 'max'
if (new < 0 || new > max)
^~~~~~~~~
kernel/ucount.c:312:3: note: Taking false branch
if (new < 0 || new > max)
^
kernel/ucount.c:314:12: note: 'iter' is equal to 'ucounts'
else if (iter == ucounts)
^~~~
kernel/ucount.c:314:8: note: Taking true branch
else if (iter == ucounts)
^
kernel/ucount.c:316:8: note: Assuming 'new' is not equal to 1
if ((new == 1) && (get_ucounts(iter) != iter))
^~~~~~~~
kernel/ucount.c:316:18: note: Left side of '&&' is false
if ((new == 1) && (get_ucounts(iter) != iter))
^
kernel/ucount.c:309:2: note: Loop condition is true. Entering loop body
vim +291 kernel/ucount.c
21d1c5e386bc751 Alexey Gladkov 2021-04-22 286
e042a898defa264 Eric W. Biederman 2021-10-15 287 static void do_dec_rlimit_put_ucounts(struct ucounts *ucounts,
e042a898defa264 Eric W. Biederman 2021-10-15 288 struct ucounts *last, enum ucount_type type)
e042a898defa264 Eric W. Biederman 2021-10-15 289 {
e042a898defa264 Eric W. Biederman 2021-10-15 290 struct ucounts *iter;
e042a898defa264 Eric W. Biederman 2021-10-15 @291 for (iter = ucounts; iter != last; iter = iter->ns->ucounts) {
e042a898defa264 Eric W. Biederman 2021-10-15 292 long dec = atomic_long_add_return(-1, &iter->ucount[type]);
e042a898defa264 Eric W. Biederman 2021-10-15 293 WARN_ON_ONCE(dec < 0);
e042a898defa264 Eric W. Biederman 2021-10-15 294 if (dec == 0)
e042a898defa264 Eric W. Biederman 2021-10-15 295 put_ucounts(iter);
e042a898defa264 Eric W. Biederman 2021-10-15 296 }
e042a898defa264 Eric W. Biederman 2021-10-15 297 }
e042a898defa264 Eric W. Biederman 2021-10-15 298
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [CFT][PATCH] ucounts: Fix signal ucount refcounting
@ 2021-11-26 15:09 kernel test robot
0 siblings, 0 replies; 13+ messages in thread
From: kernel test robot @ 2021-11-26 15:09 UTC (permalink / raw)
To: kbuild
[-- Attachment #1: Type: text/plain, Size: 15108 bytes --]
CC: llvm(a)lists.linux.dev
CC: kbuild-all(a)lists.01.org
In-Reply-To: <87mtnavszx.fsf_-_@disp2133>
References: <87mtnavszx.fsf_-_@disp2133>
TO: "Eric W. Biederman" <ebiederm@xmission.com>
Hi "Eric,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on linus/master]
[cannot apply to v5.16-rc2 next-20211126]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Eric-W-Biederman/ucounts-Fix-signal-ucount-refcounting/20211016-061359
base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 8fe31e0995f048d16b378b90926793a0aa4af1e5
:::::: branch date: 6 weeks ago
:::::: commit date: 6 weeks ago
config: arm-randconfig-c002-20211017 (https://download.01.org/0day-ci/archive/20211126/202111262308.9Mq1UEM2-lkp(a)intel.com/config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 8ca4b3ef19fe82d7ad6a6e1515317dcc01b41515)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install arm cross compiling tool for clang build
# apt-get install binutils-arm-linux-gnueabi
# https://github.com/0day-ci/linux/commit/e042a898defa264b6a95a439b8570486b47bcd49
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Eric-W-Biederman/ucounts-Fix-signal-ucount-refcounting/20211016-061359
git checkout e042a898defa264b6a95a439b8570486b47bcd49
# save the config file to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm clang-analyzer
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
clang-analyzer warnings: (new ones prefixed by >>)
fs/notify/fsnotify.c:212:15: note: Left side of '&&' is true
if (unlikely(parent_watched && !p_mask))
^
fs/notify/fsnotify.c:212:33: note: Assuming 'p_mask' is not equal to 0
if (unlikely(parent_watched && !p_mask))
^
include/linux/compiler.h:78:42: note: expanded from macro 'unlikely'
# define unlikely(x) __builtin_expect(!!(x), 0)
^
fs/notify/fsnotify.c:212:2: note: Taking false branch
if (unlikely(parent_watched && !p_mask))
^
fs/notify/fsnotify.c:220:6: note: 'parent_needed' is false
if (parent_needed || parent_interested) {
^~~~~~~~~~~~~
fs/notify/fsnotify.c:220:6: note: Left side of '||' is false
fs/notify/fsnotify.c:220:23: note: Assuming 'parent_interested' is true
if (parent_needed || parent_interested) {
^~~~~~~~~~~~~~~~~
fs/notify/fsnotify.c:220:2: note: Taking true branch
if (parent_needed || parent_interested) {
^
fs/notify/fsnotify.c:222:45: note: Passing null pointer value via 1st parameter 'data'
WARN_ON_ONCE(inode != fsnotify_data_inode(data, data_type));
^
include/asm-generic/bug.h:146:18: note: expanded from macro 'WARN_ON_ONCE'
DO_ONCE_LITE_IF(condition, WARN_ON, 1)
^~~~~~~~~
include/linux/once_lite.h:15:27: note: expanded from macro 'DO_ONCE_LITE_IF'
bool __ret_do_once = !!(condition); \
^~~~~~~~~
fs/notify/fsnotify.c:222:25: note: Calling 'fsnotify_data_inode'
WARN_ON_ONCE(inode != fsnotify_data_inode(data, data_type));
^
include/asm-generic/bug.h:146:18: note: expanded from macro 'WARN_ON_ONCE'
DO_ONCE_LITE_IF(condition, WARN_ON, 1)
^~~~~~~~~
include/linux/once_lite.h:15:27: note: expanded from macro 'DO_ONCE_LITE_IF'
bool __ret_do_once = !!(condition); \
^~~~~~~~~
include/linux/fsnotify_backend.h:255:2: note: Control jumps to 'case FSNOTIFY_EVENT_PATH:' at line 258
switch (data_type) {
^
include/linux/fsnotify_backend.h:259:18: note: Access to field 'dentry' results in a dereference of a null pointer (loaded from variable 'data')
return d_inode(((const struct path *)data)->dentry);
^ ~~~~
Suppressed 6 warnings (6 in non-user code).
Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
6 warnings generated.
Suppressed 6 warnings (6 in non-user code).
Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
2 warnings generated.
Suppressed 2 warnings (2 in non-user code).
Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
2 warnings generated.
Suppressed 2 warnings (2 in non-user code).
Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
5 warnings generated.
include/linux/list.h:808:10: warning: Access to field 'pprev' results in a dereference of a null pointer (loaded from variable 'h') [clang-analyzer-core.NullDereference]
return !h->pprev;
^
kernel/ucount.c:251:23: note: Assuming pointer value is null
for (iter = ucounts; iter; iter = iter->ns->ucounts) {
^~~~
kernel/ucount.c:251:2: note: Loop condition is false. Execution continues on line 255
for (iter = ucounts; iter; iter = iter->ns->ucounts) {
^
kernel/ucount.c:255:14: note: Passing null pointer value via 1st parameter 'ucounts'
put_ucounts(ucounts);
^~~~~~~
kernel/ucount.c:255:2: note: Calling 'put_ucounts'
put_ucounts(ucounts);
^~~~~~~~~~~~~~~~~~~~
kernel/ucount.c:204:6: note: Assuming the condition is true
if (atomic_dec_and_lock_irqsave(&ucounts->count, &ucounts_lock, flags)) {
^
include/linux/spinlock.h:490:21: note: expanded from macro 'atomic_dec_and_lock_irqsave'
__cond_lock(lock, _atomic_dec_and_lock_irqsave(atomic, lock, &(flags)))
~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/compiler_types.h:48:28: note: expanded from macro '__cond_lock'
# define __cond_lock(x,c) (c)
^
kernel/ucount.c:204:2: note: Taking true branch
if (atomic_dec_and_lock_irqsave(&ucounts->count, &ucounts_lock, flags)) {
^
kernel/ucount.c:205:18: note: Passing null pointer value via 1st parameter 'n'
hlist_del_init(&ucounts->node);
^~~~~~~~~~~~~~
kernel/ucount.c:205:3: note: Calling 'hlist_del_init'
hlist_del_init(&ucounts->node);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/list.h:865:22: note: Passing null pointer value via 1st parameter 'h'
if (!hlist_unhashed(n)) {
^
include/linux/list.h:865:7: note: Calling 'hlist_unhashed'
if (!hlist_unhashed(n)) {
^~~~~~~~~~~~~~~~~
include/linux/list.h:808:10: note: Access to field 'pprev' results in a dereference of a null pointer (loaded from variable 'h')
return !h->pprev;
^
>> kernel/ucount.c:291:44: warning: Use of memory after it is freed [clang-analyzer-unix.Malloc]
for (iter = ucounts; iter != last; iter = iter->ns->ucounts) {
^
kernel/ucount.c:309:2: note: Loop condition is true. Entering loop body
for (iter = ucounts; iter; iter = iter->ns->ucounts) {
^
kernel/ucount.c:310:14: note: Left side of '||' is false
long max = READ_ONCE(iter->ns->ucount_max[type]);
^
include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
compiletime_assert_rwonce_type(x); \
^
include/asm-generic/rwonce.h:36:21: note: expanded from macro 'compiletime_assert_rwonce_type'
compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
^
include/linux/compiler_types.h:290:3: note: expanded from macro '__native_word'
(sizeof(t) == sizeof(char) || sizeof(t) == sizeof(short) || \
^
kernel/ucount.c:310:14: note: Left side of '||' is false
long max = READ_ONCE(iter->ns->ucount_max[type]);
^
include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
compiletime_assert_rwonce_type(x); \
^
include/asm-generic/rwonce.h:36:21: note: expanded from macro 'compiletime_assert_rwonce_type'
compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
^
include/linux/compiler_types.h:290:3: note: expanded from macro '__native_word'
(sizeof(t) == sizeof(char) || sizeof(t) == sizeof(short) || \
^
kernel/ucount.c:310:14: note: Left side of '||' is true
long max = READ_ONCE(iter->ns->ucount_max[type]);
^
include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
compiletime_assert_rwonce_type(x); \
^
include/asm-generic/rwonce.h:36:21: note: expanded from macro 'compiletime_assert_rwonce_type'
compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
^
include/linux/compiler_types.h:291:28: note: expanded from macro '__native_word'
sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long))
^
kernel/ucount.c:310:14: note: Taking false branch
long max = READ_ONCE(iter->ns->ucount_max[type]);
^
include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
compiletime_assert_rwonce_type(x); \
^
include/asm-generic/rwonce.h:36:2: note: expanded from macro 'compiletime_assert_rwonce_type'
compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
^
include/linux/compiler_types.h:322:2: note: expanded from macro 'compiletime_assert'
_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
^
include/linux/compiler_types.h:310:2: note: expanded from macro '_compiletime_assert'
__compiletime_assert(condition, msg, prefix, suffix)
^
include/linux/compiler_types.h:302:3: note: expanded from macro '__compiletime_assert'
if (!(condition)) \
^
kernel/ucount.c:310:14: note: Loop condition is false. Exiting loop
long max = READ_ONCE(iter->ns->ucount_max[type]);
^
include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
compiletime_assert_rwonce_type(x); \
^
include/asm-generic/rwonce.h:36:2: note: expanded from macro 'compiletime_assert_rwonce_type'
compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
^
include/linux/compiler_types.h:322:2: note: expanded from macro 'compiletime_assert'
_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
^
include/linux/compiler_types.h:310:2: note: expanded from macro '_compiletime_assert'
__compiletime_assert(condition, msg, prefix, suffix)
^
include/linux/compiler_types.h:300:2: note: expanded from macro '__compiletime_assert'
do { \
^
kernel/ucount.c:312:7: note: Assuming 'new' is >= 0
if (new < 0 || new > max)
^~~~~~~
kernel/ucount.c:312:7: note: Left side of '||' is false
kernel/ucount.c:312:18: note: Assuming 'new' is <= 'max'
if (new < 0 || new > max)
^~~~~~~~~
kernel/ucount.c:312:3: note: Taking false branch
if (new < 0 || new > max)
^
kernel/ucount.c:314:12: note: 'iter' is equal to 'ucounts'
else if (iter == ucounts)
^~~~
kernel/ucount.c:314:8: note: Taking true branch
else if (iter == ucounts)
^
kernel/ucount.c:316:8: note: Assuming 'new' is not equal to 1
if ((new == 1) && (get_ucounts(iter) != iter))
^~~~~~~~
kernel/ucount.c:316:18: note: Left side of '&&' is false
if ((new == 1) && (get_ucounts(iter) != iter))
^
kernel/ucount.c:309:2: note: Loop condition is true. Entering loop body
vim +291 kernel/ucount.c
21d1c5e386bc751 Alexey Gladkov 2021-04-22 286
e042a898defa264 Eric W. Biederman 2021-10-15 287 static void do_dec_rlimit_put_ucounts(struct ucounts *ucounts,
e042a898defa264 Eric W. Biederman 2021-10-15 288 struct ucounts *last, enum ucount_type type)
e042a898defa264 Eric W. Biederman 2021-10-15 289 {
e042a898defa264 Eric W. Biederman 2021-10-15 290 struct ucounts *iter;
e042a898defa264 Eric W. Biederman 2021-10-15 @291 for (iter = ucounts; iter != last; iter = iter->ns->ucounts) {
e042a898defa264 Eric W. Biederman 2021-10-15 292 long dec = atomic_long_add_return(-1, &iter->ucount[type]);
e042a898defa264 Eric W. Biederman 2021-10-15 293 WARN_ON_ONCE(dec < 0);
e042a898defa264 Eric W. Biederman 2021-10-15 294 if (dec == 0)
e042a898defa264 Eric W. Biederman 2021-10-15 295 put_ucounts(iter);
e042a898defa264 Eric W. Biederman 2021-10-15 296 }
e042a898defa264 Eric W. Biederman 2021-10-15 297 }
e042a898defa264 Eric W. Biederman 2021-10-15 298
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [CFT][PATCH] ucounts: Fix signal ucount refcounting
@ 2021-10-17 13:36 kernel test robot
0 siblings, 0 replies; 13+ messages in thread
From: kernel test robot @ 2021-10-17 13:36 UTC (permalink / raw)
To: kbuild
[-- Attachment #1: Type: text/plain, Size: 15194 bytes --]
CC: llvm(a)lists.linux.dev
CC: kbuild-all(a)lists.01.org
In-Reply-To: <87mtnavszx.fsf_-_@disp2133>
References: <87mtnavszx.fsf_-_@disp2133>
TO: "Eric W. Biederman" <ebiederm@xmission.com>
TO: Rune Kleveland <rune.kleveland@infomedia.dk>
CC: Yu Zhao <yuzhao@google.com>
CC: Alexey Gladkov <legion@kernel.org>
CC: Jordan Glover <Golden_Miller83@protonmail.ch>
CC: LKML <linux-kernel@vger.kernel.org>
CC: linux-mm(a)kvack.org
CC: containers(a)lists.linux-foundation.org
Hi "Eric,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on linus/master]
[also build test WARNING on v5.15-rc5 next-20211015]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Eric-W-Biederman/ucounts-Fix-signal-ucount-refcounting/20211016-061359
base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 8fe31e0995f048d16b378b90926793a0aa4af1e5
:::::: branch date: 2 days ago
:::::: commit date: 2 days ago
config: arm-randconfig-c002-20211017 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 8ca4b3ef19fe82d7ad6a6e1515317dcc01b41515)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install arm cross compiling tool for clang build
# apt-get install binutils-arm-linux-gnueabi
# https://github.com/0day-ci/linux/commit/e042a898defa264b6a95a439b8570486b47bcd49
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Eric-W-Biederman/ucounts-Fix-signal-ucount-refcounting/20211016-061359
git checkout e042a898defa264b6a95a439b8570486b47bcd49
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=arm clang-analyzer
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
clang-analyzer warnings: (new ones prefixed by >>)
include/linux/compiler_types.h:302:3: note: expanded from macro '__compiletime_assert'
if (!(condition)) \
^
fs/gfs2/recovery.c:114:8: note: Loop condition is false. Exiting loop
rr = list_first_entry(head, struct gfs2_revoke_replay, rr_list);
^
include/linux/list.h:522:2: note: expanded from macro 'list_first_entry'
list_entry((ptr)->next, type, member)
^
include/linux/list.h:511:2: note: expanded from macro 'list_entry'
container_of(ptr, type, member)
^
include/linux/kernel.h:495:2: note: expanded from macro 'container_of'
BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \
^
note: (skipping 1 expansions in backtrace; use -fmacro-backtrace-limit=0 to see all)
include/linux/compiler_types.h:322:2: note: expanded from macro 'compiletime_assert'
_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
^
include/linux/compiler_types.h:310:2: note: expanded from macro '_compiletime_assert'
__compiletime_assert(condition, msg, prefix, suffix)
^
include/linux/compiler_types.h:300:2: note: expanded from macro '__compiletime_assert'
do { \
^
fs/gfs2/recovery.c:115:3: note: Calling 'list_del'
list_del(&rr->rr_list);
^~~~~~~~~~~~~~~~~~~~~~
include/linux/list.h:146:2: note: Calling '__list_del_entry'
__list_del_entry(entry);
^~~~~~~~~~~~~~~~~~~~~~~
include/linux/list.h:132:2: note: Taking false branch
if (!__list_del_entry_valid(entry))
^
include/linux/list.h:135:13: note: Use of memory after it is freed
__list_del(entry->prev, entry->next);
^~~~~~~~~~~
Suppressed 6 warnings (6 in non-user code).
Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
6 warnings generated.
Suppressed 6 warnings (6 in non-user code).
Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
6 warnings generated.
Suppressed 6 warnings (6 in non-user code).
Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
6 warnings generated.
Suppressed 6 warnings (6 in non-user code).
Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
6 warnings generated.
Suppressed 6 warnings (6 in non-user code).
Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
2 warnings generated.
Suppressed 2 warnings (2 in non-user code).
Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
2 warnings generated.
Suppressed 2 warnings (2 in non-user code).
Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
5 warnings generated.
include/linux/list.h:808:10: warning: Access to field 'pprev' results in a dereference of a null pointer (loaded from variable 'h') [clang-analyzer-core.NullDereference]
return !h->pprev;
^
kernel/ucount.c:251:23: note: Assuming pointer value is null
for (iter = ucounts; iter; iter = iter->ns->ucounts) {
^~~~
kernel/ucount.c:251:2: note: Loop condition is false. Execution continues on line 255
for (iter = ucounts; iter; iter = iter->ns->ucounts) {
^
kernel/ucount.c:255:14: note: Passing null pointer value via 1st parameter 'ucounts'
put_ucounts(ucounts);
^~~~~~~
kernel/ucount.c:255:2: note: Calling 'put_ucounts'
put_ucounts(ucounts);
^~~~~~~~~~~~~~~~~~~~
kernel/ucount.c:204:6: note: Assuming the condition is true
if (atomic_dec_and_lock_irqsave(&ucounts->count, &ucounts_lock, flags)) {
^
include/linux/spinlock.h:490:21: note: expanded from macro 'atomic_dec_and_lock_irqsave'
__cond_lock(lock, _atomic_dec_and_lock_irqsave(atomic, lock, &(flags)))
~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/compiler_types.h:48:28: note: expanded from macro '__cond_lock'
# define __cond_lock(x,c) (c)
^
kernel/ucount.c:204:2: note: Taking true branch
if (atomic_dec_and_lock_irqsave(&ucounts->count, &ucounts_lock, flags)) {
^
kernel/ucount.c:205:18: note: Passing null pointer value via 1st parameter 'n'
hlist_del_init(&ucounts->node);
^~~~~~~~~~~~~~
kernel/ucount.c:205:3: note: Calling 'hlist_del_init'
hlist_del_init(&ucounts->node);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/linux/list.h:865:22: note: Passing null pointer value via 1st parameter 'h'
if (!hlist_unhashed(n)) {
^
include/linux/list.h:865:7: note: Calling 'hlist_unhashed'
if (!hlist_unhashed(n)) {
^~~~~~~~~~~~~~~~~
include/linux/list.h:808:10: note: Access to field 'pprev' results in a dereference of a null pointer (loaded from variable 'h')
return !h->pprev;
^
>> kernel/ucount.c:291:44: warning: Use of memory after it is freed [clang-analyzer-unix.Malloc]
for (iter = ucounts; iter != last; iter = iter->ns->ucounts) {
^
kernel/ucount.c:309:2: note: Loop condition is true. Entering loop body
for (iter = ucounts; iter; iter = iter->ns->ucounts) {
^
kernel/ucount.c:310:14: note: Left side of '||' is false
long max = READ_ONCE(iter->ns->ucount_max[type]);
^
include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
compiletime_assert_rwonce_type(x); \
^
include/asm-generic/rwonce.h:36:21: note: expanded from macro 'compiletime_assert_rwonce_type'
compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
^
include/linux/compiler_types.h:290:3: note: expanded from macro '__native_word'
(sizeof(t) == sizeof(char) || sizeof(t) == sizeof(short) || \
^
kernel/ucount.c:310:14: note: Left side of '||' is false
long max = READ_ONCE(iter->ns->ucount_max[type]);
^
include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
compiletime_assert_rwonce_type(x); \
^
include/asm-generic/rwonce.h:36:21: note: expanded from macro 'compiletime_assert_rwonce_type'
compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
^
include/linux/compiler_types.h:290:3: note: expanded from macro '__native_word'
(sizeof(t) == sizeof(char) || sizeof(t) == sizeof(short) || \
^
kernel/ucount.c:310:14: note: Left side of '||' is true
long max = READ_ONCE(iter->ns->ucount_max[type]);
^
include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
compiletime_assert_rwonce_type(x); \
^
include/asm-generic/rwonce.h:36:21: note: expanded from macro 'compiletime_assert_rwonce_type'
compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
^
include/linux/compiler_types.h:291:28: note: expanded from macro '__native_word'
sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long))
^
kernel/ucount.c:310:14: note: Taking false branch
long max = READ_ONCE(iter->ns->ucount_max[type]);
^
include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
compiletime_assert_rwonce_type(x); \
^
include/asm-generic/rwonce.h:36:2: note: expanded from macro 'compiletime_assert_rwonce_type'
compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
^
include/linux/compiler_types.h:322:2: note: expanded from macro 'compiletime_assert'
_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
^
include/linux/compiler_types.h:310:2: note: expanded from macro '_compiletime_assert'
__compiletime_assert(condition, msg, prefix, suffix)
^
include/linux/compiler_types.h:302:3: note: expanded from macro '__compiletime_assert'
if (!(condition)) \
^
kernel/ucount.c:310:14: note: Loop condition is false. Exiting loop
long max = READ_ONCE(iter->ns->ucount_max[type]);
^
include/asm-generic/rwonce.h:49:2: note: expanded from macro 'READ_ONCE'
compiletime_assert_rwonce_type(x); \
^
include/asm-generic/rwonce.h:36:2: note: expanded from macro 'compiletime_assert_rwonce_type'
compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
^
include/linux/compiler_types.h:322:2: note: expanded from macro 'compiletime_assert'
_compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
^
include/linux/compiler_types.h:310:2: note: expanded from macro '_compiletime_assert'
__compiletime_assert(condition, msg, prefix, suffix)
^
include/linux/compiler_types.h:300:2: note: expanded from macro '__compiletime_assert'
do { \
^
kernel/ucount.c:312:7: note: Assuming 'new' is >= 0
if (new < 0 || new > max)
^~~~~~~
kernel/ucount.c:312:7: note: Left side of '||' is false
kernel/ucount.c:312:18: note: Assuming 'new' is <= 'max'
if (new < 0 || new > max)
^~~~~~~~~
kernel/ucount.c:312:3: note: Taking false branch
if (new < 0 || new > max)
^
kernel/ucount.c:314:12: note: 'iter' is equal to 'ucounts'
else if (iter == ucounts)
^~~~
kernel/ucount.c:314:8: note: Taking true branch
else if (iter == ucounts)
^
kernel/ucount.c:316:8: note: Assuming 'new' is not equal to 1
if ((new == 1) && (get_ucounts(iter) != iter))
^~~~~~~~
kernel/ucount.c:316:18: note: Left side of '&&' is false
if ((new == 1) && (get_ucounts(iter) != iter))
^
kernel/ucount.c:309:2: note: Loop condition is true. Entering loop body
vim +291 kernel/ucount.c
21d1c5e386bc75 Alexey Gladkov 2021-04-22 286
e042a898defa26 Eric W. Biederman 2021-10-15 287 static void do_dec_rlimit_put_ucounts(struct ucounts *ucounts,
e042a898defa26 Eric W. Biederman 2021-10-15 288 struct ucounts *last, enum ucount_type type)
e042a898defa26 Eric W. Biederman 2021-10-15 289 {
e042a898defa26 Eric W. Biederman 2021-10-15 290 struct ucounts *iter;
e042a898defa26 Eric W. Biederman 2021-10-15 @291 for (iter = ucounts; iter != last; iter = iter->ns->ucounts) {
e042a898defa26 Eric W. Biederman 2021-10-15 292 long dec = atomic_long_add_return(-1, &iter->ucount[type]);
e042a898defa26 Eric W. Biederman 2021-10-15 293 WARN_ON_ONCE(dec < 0);
e042a898defa26 Eric W. Biederman 2021-10-15 294 if (dec == 0)
e042a898defa26 Eric W. Biederman 2021-10-15 295 put_ucounts(iter);
e042a898defa26 Eric W. Biederman 2021-10-15 296 }
e042a898defa26 Eric W. Biederman 2021-10-15 297 }
e042a898defa26 Eric W. Biederman 2021-10-15 298
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 33037 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* linux 5.14.3: free_user_ns causes NULL pointer dereference
@ 2021-09-15 19:49 Jordan Glover
2021-09-15 21:02 ` Eric W. Biederman
0 siblings, 1 reply; 13+ messages in thread
From: Jordan Glover @ 2021-09-15 19:49 UTC (permalink / raw)
To: LKML; +Cc: linux-mm, legion, containers, Eric W . Biederman
Hi, recently I hit system freeze after I was closing few containerized apps on my system. As for now it occurred only once on linux 5.14.3. I think it maybe be related to "Count rlimits in each user namespace" patchset merged during 5.14 window
https://lore.kernel.org/all/257aa5fb1a7d81cf0f4c34f39ada2320c4284771.1619094428.git.legion@kernel.org/T/#u
Logs below:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 26546 at kernel/ucount.c:253 dec_ucount+0x43/0x50
Modules linked in: nft_ct nft_fib_ipv4 nft_fib wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 udp_tunnel libblake2s blake2s_x86_64 libblake2s_generic libchacha ccm algif_aead des_generic libdes ecb algif_skcipher cmac md4 algif_hash af_alg hid_sensor_custom_intel_hinge hid_sensor_als hid_sensor_magn_3d hid_sensor_rotation hid_sensor_accel_3d hid_sensor_gyro_3d hid_sensor_trigger industrialio_triggered_buffer hid_sensor_iio_common kfifo_buf industrialio hid_sensor_custom hid_sensor_hub cros_ec_ishtp cros_ec intel_ishtp_loader nft_counter intel_ishtp_hid snd_hda_codec_hdmi intel_rapl_msr xt_mark ipt_REJECT nf_reject_ipv4 snd_ctl_led xt_LOG snd_hda_codec_conexant nf_log_syslog snd_hda_codec_generic xt_addrtype xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv4 mei_hdcp snd_hda_intel nft_compat wmi_bmof nf_tables intel_rapl_common libcrc32c think_lmi intel_tcc_cooling snd_intel_dspcfg firmware_attributes_class nfnetlink iwlmvm
intel_wmi_thunderbolt mac80211 x86_pkg_temp_thermal snd_hda_codec intel_powerclamp coretemp libarc4 vfat fat kvm_intel rapl intel_cstate snd_hwdep intel_uncore iwlwifi snd_hda_core mousedev joydev snd_pcm psmouse cfg80211 snd_timer mei_me ucsi_acpi wacom intel_ish_ipc intel_xhci_usb_role_switch mei intel_pch_thermal typec_ucsi roles typec intel_ishtp wmi thinkpad_acpi ledtrig_audio platform_profile snd soundcore rfkill tpm_crb i2c_hid_acpi i2c_hid acpi_pad tpm_tis mac_hid tpm_tis_core pkcs8_key_parser fuse zram ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 usbhid dm_crypt cbc encrypted_keys trusted asn1_encoder tee tpm rng_core dm_mod rtsx_pci_sdmmc mmc_core serio_raw atkbd libps2 crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd xhci_pci rtsx_pci xhci_pci_renesas i8042 serio kvmgt mdev vfio_iommu_type1 vfio i915 i2c_algo_bit intel_gtt ttm agpgart video drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec
drm kvm irqbypass
CPU: 1 PID: 26546 Comm: kworker/1:1 Not tainted 5.14.3 #1 c719caf0c6c208968387ed83e3061ac05d0faf2f
Workqueue: events free_user_ns
RIP: 0010:dec_ucount+0x43/0x50
Code: 14 01 48 8b 02 48 89 c6 48 83 ee 01 78 1c f0 48 0f b1 32 75 f0 48 8b 41 10 48 8b 88 e8 01 00 00 48 85 c9 75 d9 e9 0d fd ff ff <0f> 0b eb e7 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 f8 48
RSP: 0018:ffffa82cc2bd7e60 EFLAGS: 00010297
RAX: 0000000000000000 RBX: ffffa2f53298ee50 RCX: ffffa2f3c0061000
RDX: ffffa2f3c0061020 RSI: ffffffffffffffff RDI: ffffa2f3c0061000
RBP: ffffa2f53298ebe0 R08: 0000000000000020 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffffa2f3c0061000
R13: 00000000ffffffff R14: 0000000000000000 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffffa2f599680000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000628f892be9f8 CR3: 000000002880e004 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
free_user_ns+0x73/0x110
process_one_work+0x1e1/0x380
worker_thread+0x50/0x3a0
? rescuer_thread+0x360/0x360
kthread+0x127/0x150
? set_kthread_struct+0x40/0x40
ret_from_fork+0x22/0x30
---[ end trace eb7a8d38b64b2d3a ]---
BUG: kernel NULL pointer dereference, address: 00000000000001e8
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
Oops: 0000 [#1] SMP PTI
CPU: 1 PID: 26546 Comm: kworker/1:1 Tainted: G W 5.14.3 #1 c719caf0c6c208968387ed83e3061ac05d0faf2f
Workqueue: events free_user_ns
RIP: 0010:dec_ucount+0x32/0x50
Code: 74 34 89 f6 48 89 f9 4c 8d 04 f5 20 00 00 00 4a 8d 14 01 48 8b 02 48 89 c6 48 83 ee 01 78 1c f0 48 0f b1 32 75 f0 48 8b 41 10 <48> 8b 88 e8 01 00 00 48 85 c9 75 d9 e9 0d fd ff ff 0f 0b eb e7 66
RSP: 0018:ffffa82cc2bd7e60 EFLAGS: 00010297
RAX: 0000000000000000 RBX: ffffa2f53298ee50 RCX: ffffa2f3c0061000
RDX: ffffa2f3c0061020 RSI: ffffffffffffffff RDI: ffffa2f3c0061000
RBP: ffffa2f53298ebe0 R08: 0000000000000020 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffffa2f3c0061000
R13: 00000000ffffffff R14: 0000000000000000 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffffa2f599680000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000001e8 CR3: 000000002880e004 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
free_user_ns+0x73/0x110
process_one_work+0x1e1/0x380
worker_thread+0x50/0x3a0
? rescuer_thread+0x360/0x360
kthread+0x127/0x150
? set_kthread_struct+0x40/0x40
ret_from_fork+0x22/0x30
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: linux 5.14.3: free_user_ns causes NULL pointer dereference
2021-09-15 19:49 linux 5.14.3: free_user_ns causes NULL pointer dereference Jordan Glover
@ 2021-09-15 21:02 ` Eric W. Biederman
2021-09-15 22:42 ` Jordan Glover
0 siblings, 1 reply; 13+ messages in thread
From: Eric W. Biederman @ 2021-09-15 21:02 UTC (permalink / raw)
To: Jordan Glover; +Cc: LKML, linux-mm, legion, containers
Jordan Glover <Golden_Miller83@protonmail.ch> writes:
> Hi, recently I hit system freeze after I was closing few containerized apps on my system. As for now it occurred only once on linux 5.14.3. I think it maybe be related to "Count rlimits in each user namespace" patchset merged during 5.14 window
>
> https://lore.kernel.org/all/257aa5fb1a7d81cf0f4c34f39ada2320c4284771.1619094428.git.legion@kernel.org/T/#u
So that warning comes from:
void dec_ucount(struct ucounts *ucounts, enum ucount_type type)
{
struct ucounts *iter;
for (iter = ucounts; iter; iter = iter->ns->ucounts) {
long dec = atomic_long_dec_if_positive(&iter->ucount[type]);
WARN_ON_ONCE(dec < 0);
}
put_ucounts(ucounts);
}
Which certainly looks like a reference count bug. It could also be a
memory stomp somewhere close.
Do you have any idea what else was going on? This location is the
symptom but not the actual cause.
Eric
>
> Logs below:
>
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 26546 at kernel/ucount.c:253 dec_ucount+0x43/0x50
> Modules linked in: nft_ct nft_fib_ipv4 nft_fib wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 udp_tunnel libblake2s blake2s_x86_64 libblake2s_generic libchacha ccm algif_aead des_generic libdes ecb algif_skcipher cmac md4 algif_hash af_alg hid_sensor_custom_intel_hinge hid_sensor_als hid_sensor_magn_3d hid_sensor_rotation hid_sensor_accel_3d hid_sensor_gyro_3d hid_sensor_trigger industrialio_triggered_buffer hid_sensor_iio_common kfifo_buf industrialio hid_sensor_custom hid_sensor_hub cros_ec_ishtp cros_ec intel_ishtp_loader nft_counter intel_ishtp_hid snd_hda_codec_hdmi intel_rapl_msr xt_mark ipt_REJECT nf_reject_ipv4 snd_ctl_led xt_LOG snd_hda_codec_conexant nf_log_syslog snd_hda_codec_generic xt_addrtype xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv4 mei_hdcp snd_hda_intel nft_compat wmi_bmof nf_tables intel_rapl_common libcrc32c think_lmi intel_tcc_cooling snd_intel_dspcfg firmware_attributes_class nfnetlink iwlmvm
> intel_wmi_thunderbolt mac80211 x86_pkg_temp_thermal snd_hda_codec intel_powerclamp coretemp libarc4 vfat fat kvm_intel rapl intel_cstate snd_hwdep intel_uncore iwlwifi snd_hda_core mousedev joydev snd_pcm psmouse cfg80211 snd_timer mei_me ucsi_acpi wacom intel_ish_ipc intel_xhci_usb_role_switch mei intel_pch_thermal typec_ucsi roles typec intel_ishtp wmi thinkpad_acpi ledtrig_audio platform_profile snd soundcore rfkill tpm_crb i2c_hid_acpi i2c_hid acpi_pad tpm_tis mac_hid tpm_tis_core pkcs8_key_parser fuse zram ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 usbhid dm_crypt cbc encrypted_keys trusted asn1_encoder tee tpm rng_core dm_mod rtsx_pci_sdmmc mmc_core serio_raw atkbd libps2 crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd xhci_pci rtsx_pci xhci_pci_renesas i8042 serio kvmgt mdev vfio_iommu_type1 vfio i915 i2c_algo_bit intel_gtt ttm agpgart video drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec
> drm kvm irqbypass
> CPU: 1 PID: 26546 Comm: kworker/1:1 Not tainted 5.14.3 #1 c719caf0c6c208968387ed83e3061ac05d0faf2f
> Workqueue: events free_user_ns
> RIP: 0010:dec_ucount+0x43/0x50
> Code: 14 01 48 8b 02 48 89 c6 48 83 ee 01 78 1c f0 48 0f b1 32 75 f0 48 8b 41 10 48 8b 88 e8 01 00 00 48 85 c9 75 d9 e9 0d fd ff ff <0f> 0b eb e7 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 f8 48
> RSP: 0018:ffffa82cc2bd7e60 EFLAGS: 00010297
> RAX: 0000000000000000 RBX: ffffa2f53298ee50 RCX: ffffa2f3c0061000
> RDX: ffffa2f3c0061020 RSI: ffffffffffffffff RDI: ffffa2f3c0061000
> RBP: ffffa2f53298ebe0 R08: 0000000000000020 R09: 0000000000000000
> R10: 0000000000000001 R11: 0000000000000000 R12: ffffa2f3c0061000
> R13: 00000000ffffffff R14: 0000000000000000 R15: 0000000000000001
> FS: 0000000000000000(0000) GS:ffffa2f599680000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000628f892be9f8 CR3: 000000002880e004 CR4: 00000000003706e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> free_user_ns+0x73/0x110
> process_one_work+0x1e1/0x380
> worker_thread+0x50/0x3a0
> ? rescuer_thread+0x360/0x360
> kthread+0x127/0x150
> ? set_kthread_struct+0x40/0x40
> ret_from_fork+0x22/0x30
> ---[ end trace eb7a8d38b64b2d3a ]---
> BUG: kernel NULL pointer dereference, address: 00000000000001e8
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> Oops: 0000 [#1] SMP PTI
> CPU: 1 PID: 26546 Comm: kworker/1:1 Tainted: G W 5.14.3 #1 c719caf0c6c208968387ed83e3061ac05d0faf2f
> Workqueue: events free_user_ns
> RIP: 0010:dec_ucount+0x32/0x50
> Code: 74 34 89 f6 48 89 f9 4c 8d 04 f5 20 00 00 00 4a 8d 14 01 48 8b 02 48 89 c6 48 83 ee 01 78 1c f0 48 0f b1 32 75 f0 48 8b 41 10 <48> 8b 88 e8 01 00 00 48 85 c9 75 d9 e9 0d fd ff ff 0f 0b eb e7 66
> RSP: 0018:ffffa82cc2bd7e60 EFLAGS: 00010297
> RAX: 0000000000000000 RBX: ffffa2f53298ee50 RCX: ffffa2f3c0061000
> RDX: ffffa2f3c0061020 RSI: ffffffffffffffff RDI: ffffa2f3c0061000
> RBP: ffffa2f53298ebe0 R08: 0000000000000020 R09: 0000000000000000
> R10: 0000000000000001 R11: 0000000000000000 R12: ffffa2f3c0061000
> R13: 00000000ffffffff R14: 0000000000000000 R15: 0000000000000001
> FS: 0000000000000000(0000) GS:ffffa2f599680000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000000001e8 CR3: 000000002880e004 CR4: 00000000003706e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> free_user_ns+0x73/0x110
> process_one_work+0x1e1/0x380
> worker_thread+0x50/0x3a0
> ? rescuer_thread+0x360/0x360
> kthread+0x127/0x150
> ? set_kthread_struct+0x40/0x40
> ret_from_fork+0x22/0x30
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: linux 5.14.3: free_user_ns causes NULL pointer dereference
2021-09-15 21:02 ` Eric W. Biederman
@ 2021-09-15 22:42 ` Jordan Glover
2021-09-15 23:47 ` Jordan Glover
0 siblings, 1 reply; 13+ messages in thread
From: Jordan Glover @ 2021-09-15 22:42 UTC (permalink / raw)
To: ebiederm
Cc: LKML, linux-mm\@kvack.org, legion\@kernel.org,
containers\@lists.linux-foundation.org
On Wednesday, September 15th, 2021 at 9:02 PM, <ebiederm@xmission.com> wrote:
> Jordan Glover Golden_Miller83@protonmail.ch writes:
>
> > Hi, recently I hit system freeze after I was closing few containerized apps on my system. As for now it occurred only once on linux 5.14.3. I think it maybe be related to "Count rlimits in each user namespace" patchset merged during 5.14 window
> >
> > https://lore.kernel.org/all/257aa5fb1a7d81cf0f4c34f39ada2320c4284771.1619094428.git.legion@kernel.org/T/#u
>
> So that warning comes from:
>
> void dec_ucount(struct ucounts *ucounts, enum ucount_type type)
>
> {
>
> struct ucounts *iter;
>
> for (iter = ucounts; iter; iter = iter->ns->ucounts) {
>
> long dec = atomic_long_dec_if_positive(&iter->ucount[type]);
>
> WARN_ON_ONCE(dec < 0);
> }
> put_ucounts(ucounts);
>
>
> }
>
> Which certainly looks like a reference count bug. It could also be a
>
> memory stomp somewhere close.
>
> Do you have any idea what else was going on? This location is the
>
> symptom but not the actual cause.
>
> Eric
I had about 2 containerized (flatpak/bubblewrap) apps (browser + music player) running . I quickly closed them with intent to shutdown the system but instead get the freeze and had to use magic sysrq to reboot. System logs end with what I posted and before there is nothing suspicious.
Maybe it's some random fluke. I'll reply if I hit it again.
Jordan
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: linux 5.14.3: free_user_ns causes NULL pointer dereference
2021-09-15 22:42 ` Jordan Glover
@ 2021-09-15 23:47 ` Jordan Glover
2021-09-16 17:30 ` Eric W. Biederman
0 siblings, 1 reply; 13+ messages in thread
From: Jordan Glover @ 2021-09-15 23:47 UTC (permalink / raw)
To: ebiederm
Cc: LKML, linux-mm\@kvack.org, legion\@kernel.org,
containers\@lists.linux-foundation.org
On Wednesday, September 15th, 2021 at 10:42 PM, Jordan Glover <Golden_Miller83@protonmail.ch> wrote:
>
> I had about 2 containerized (flatpak/bubblewrap) apps (browser + music player) running . I quickly closed them with intent to shutdown the system but instead get the freeze and had to use magic sysrq to reboot. System logs end with what I posted and before there is nothing suspicious.
>
> Maybe it's some random fluke. I'll reply if I hit it again.
Heh, it jut happened again. This time closing firefox alone had such effect:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 351 at kernel/ucount.c:253 dec_ucount+0x43/0x50
Modules linked in: nft_ct nft_fib_ipv4 nft_fib wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 udp_tunnel libblake2s blake2s_x86_64 libblake2s_generic libchacha ccm algif_aead des_generic libdes ecb algif_skcipher cmac md4 algif_hash af_alg hid_sensor_custom_intel_hinge hid_sensor_als hid_sensor_gyro_3d hid_sensor_accel_3d hid_sensor_rotation hid_sensor_magn_3d hid_sensor_trigger industrialio_triggered_buffer hid_sensor_iio_common kfifo_buf industrialio hid_sensor_custom hid_sensor_hub cros_ec_ishtp cros_ec intel_ishtp_loader intel_ishtp_hid intel_rapl_msr nft_counter xt_mark ipt_REJECT nf_reject_ipv4 mei_hdcp intel_rapl_common xt_LOG nf_log_syslog intel_tcc_cooling x86_pkg_temp_thermal think_lmi wmi_bmof xt_addrtype firmware_attributes_class xt_tcpudp intel_powerclamp xt_conntrack nf_conntrack nf_defrag_ipv4 snd_hda_codec_hdmi nft_compat intel_wmi_thunderbolt nf_tables libcrc32c coretemp iwlmvm snd_ctl_led nfnetlink
snd_hda_codec_conexant mac80211 snd_hda_codec_generic libarc4 vfat snd_hda_intel kvm_intel fat iwlwifi snd_intel_dspcfg rapl intel_cstate joydev snd_hda_codec mousedev intel_uncore snd_hwdep snd_hda_core psmouse snd_pcm snd_timer cfg80211 mei_me wacom ucsi_acpi typec_ucsi mei intel_pch_thermal intel_xhci_usb_role_switch intel_ish_ipc roles intel_ishtp typec wmi thinkpad_acpi ledtrig_audio platform_profile snd soundcore tpm_crb rfkill i2c_hid_acpi tpm_tis tpm_tis_core i2c_hid mac_hid acpi_pad pkcs8_key_parser fuse zram ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 usbhid dm_crypt cbc encrypted_keys trusted asn1_encoder tee tpm rng_core dm_mod rtsx_pci_sdmmc mmc_core serio_raw atkbd libps2 crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd rtsx_pci xhci_pci xhci_pci_renesas i8042 serio kvmgt mdev vfio_iommu_type1 vfio i915 i2c_algo_bit intel_gtt ttm agpgart video drm_kms_helper syscopyarea sysfillrect sysimgblt
fb_sys_fops cec drm kvm irqbypass
CPU: 1 PID: 351 Comm: kworker/1:3 Not tainted 5.14.3 #1 c719caf0c6c208968387ed83e3061ac05d0faf2f
Workqueue: events free_user_ns
RIP: 0010:dec_ucount+0x43/0x50
Code: 14 01 48 8b 02 48 89 c6 48 83 ee 01 78 1c f0 48 0f b1 32 75 f0 48 8b 41 10 48 8b 88 e8 01 00 00 48 85 c9 75 d9 e9 0d fd ff ff <0f> 0b eb e7 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 f8 48
RSP: 0018:ffffaa06c08bbe60 EFLAGS: 00010297
RAX: 0000000000000000 RBX: ffff894ecb0c35a0 RCX: ffff894e25cdfcc0
RDX: ffff894e25cdfce0 RSI: ffffffffffffffff RDI: ffff894e25cdfcc0
RBP: ffff894ee393d380 R08: 0000000000000020 R09: ffff894ee393d5f0
R10: ffff894f617fd000 R11: 0000000000031678 R12: ffff894e25cdfcc0
R13: 00000000ffffffff R14: 0000000000000000 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffff894f59680000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000056ffceff6b10 CR3: 0000000147a0e005 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
free_user_ns+0x73/0x110
process_one_work+0x1e1/0x380
worker_thread+0x50/0x3a0
? rescuer_thread+0x360/0x360
kthread+0x127/0x150
? set_kthread_struct+0x40/0x40
ret_from_fork+0x22/0x30
---[ end trace ff45ac39689f43c1 ]---
BUG: kernel NULL pointer dereference, address: 00000000000001e8
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
Oops: 0000 [#1] SMP PTI
CPU: 1 PID: 351 Comm: kworker/1:3 Tainted: G W 5.14.3 #1 c719caf0c6c208968387ed83e3061ac05d0faf2f
Workqueue: events free_user_ns
RIP: 0010:dec_ucount+0x32/0x50
Code: 74 34 89 f6 48 89 f9 4c 8d 04 f5 20 00 00 00 4a 8d 14 01 48 8b 02 48 89 c6 48 83 ee 01 78 1c f0 48 0f b1 32 75 f0 48 8b 41 10 <48> 8b 88 e8 01 00 00 48 85 c9 75 d9 e9 0d fd ff ff 0f 0b eb e7 66
RSP: 0018:ffffaa06c08bbe60 EFLAGS: 00010297
RAX: 0000000000000000 RBX: ffff894ecb0c35a0 RCX: ffff894e25cdfcc0
RDX: ffff894e25cdfce0 RSI: ffffffffffffffff RDI: ffff894e25cdfcc0
RBP: ffff894ee393d380 R08: 0000000000000020 R09: ffff894ee393d5f0
R10: ffff894f617fd000 R11: 0000000000031678 R12: ffff894e25cdfcc0
R13: 00000000ffffffff R14: 0000000000000000 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffff894f59680000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000001e8 CR3: 0000000147a0e005 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
free_user_ns+0x73/0x110
process_one_work+0x1e1/0x380
worker_thread+0x50/0x3a0
? rescuer_thread+0x360/0x360
kthread+0x127/0x150
? set_kthread_struct+0x40/0x40
ret_from_fork+0x22/0x30
Modules linked in: nft_ct nft_fib_ipv4 nft_fib wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 udp_tunnel libblake2s blake2s_x86_64 libblake2s_generic libchacha ccm algif_aead des_generic libdes ecb algif_skcipher cmac md4 algif_hash af_alg hid_sensor_custom_intel_hinge hid_sensor_als hid_sensor_gyro_3d hid_sensor_accel_3d hid_sensor_rotation hid_sensor_magn_3d hid_sensor_trigger industrialio_triggered_buffer hid_sensor_iio_common kfifo_buf industrialio hid_sensor_custom hid_sensor_hub cros_ec_ishtp cros_ec intel_ishtp_loader intel_ishtp_hid intel_rapl_msr nft_counter xt_mark ipt_REJECT nf_reject_ipv4 mei_hdcp intel_rapl_common xt_LOG nf_log_syslog intel_tcc_cooling x86_pkg_temp_thermal think_lmi wmi_bmof xt_addrtype firmware_attributes_class xt_tcpudp intel_powerclamp xt_conntrack nf_conntrack nf_defrag_ipv4 snd_hda_codec_hdmi nft_compat intel_wmi_thunderbolt nf_tables libcrc32c coretemp iwlmvm snd_ctl_led nfnetlink
snd_hda_codec_conexant mac80211 snd_hda_codec_generic libarc4 vfat snd_hda_intel kvm_intel fat iwlwifi snd_intel_dspcfg rapl intel_cstate joydev snd_hda_codec mousedev intel_uncore snd_hwdep snd_hda_core psmouse snd_pcm snd_timer cfg80211 mei_me wacom ucsi_acpi typec_ucsi mei intel_pch_thermal intel_xhci_usb_role_switch intel_ish_ipc roles intel_ishtp typec wmi thinkpad_acpi ledtrig_audio platform_profile snd soundcore tpm_crb rfkill i2c_hid_acpi tpm_tis tpm_tis_core i2c_hid mac_hid acpi_pad pkcs8_key_parser fuse zram ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 usbhid dm_crypt cbc encrypted_keys trusted asn1_encoder tee tpm rng_core dm_mod rtsx_pci_sdmmc mmc_core serio_raw atkbd libps2 crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd rtsx_pci xhci_pci xhci_pci_renesas i8042 serio kvmgt mdev vfio_iommu_type1 vfio i915 i2c_algo_bit intel_gtt ttm agpgart video drm_kms_helper syscopyarea sysfillrect sysimgblt
fb_sys_fops cec drm kvm irqbypass
CR2: 00000000000001e8
---[ end trace ff45ac39689f43c2 ]---
RIP: 0010:dec_ucount+0x32/0x50
Code: 74 34 89 f6 48 89 f9 4c 8d 04 f5 20 00 00 00 4a 8d 14 01 48 8b 02 48 89 c6 48 83 ee 01 78 1c f0 48 0f b1 32 75 f0 48 8b 41 10 <48> 8b 88 e8 01 00 00 48 85 c9 75 d9 e9 0d fd ff ff 0f 0b eb e7 66
RSP: 0018:ffffaa06c08bbe60 EFLAGS: 00010297
RAX: 0000000000000000 RBX: ffff894ecb0c35a0 RCX: ffff894e25cdfcc0
RDX: ffff894e25cdfce0 RSI: ffffffffffffffff RDI: ffff894e25cdfcc0
RBP: ffff894ee393d380 R08: 0000000000000020 R09: ffff894ee393d5f0
R10: ffff894f617fd000 R11: 0000000000031678 R12: ffff894e25cdfcc0
R13: 00000000ffffffff R14: 0000000000000000 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffff894f59680000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000001e8 CR3: 0000000147a0e005 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: linux 5.14.3: free_user_ns causes NULL pointer dereference
2021-09-15 23:47 ` Jordan Glover
@ 2021-09-16 17:30 ` Eric W. Biederman
2021-09-28 13:40 ` Jordan Glover
0 siblings, 1 reply; 13+ messages in thread
From: Eric W. Biederman @ 2021-09-16 17:30 UTC (permalink / raw)
To: Jordan Glover; +Cc: LKML, linux-mm, legion, containers, Yu Zhao
Jordan Glover <Golden_Miller83@protonmail.ch> writes:
> On Wednesday, September 15th, 2021 at 10:42 PM, Jordan Glover <Golden_Miller83@protonmail.ch> wrote:
>>
>> I had about 2 containerized (flatpak/bubblewrap) apps (browser + music player) running . I quickly closed them with intent to shutdown the system but instead get the freeze and had to use magic sysrq to reboot. System logs end with what I posted and before there is nothing suspicious.
>>
>> Maybe it's some random fluke. I'll reply if I hit it again.
>
> Heh, it jut happened again. This time closing firefox alone had such
> effect:
Ok. It looks like he have a couple of folks seeing issues here.
I thought we had all of the issues sorted out for the release of v5.14,
but it looks like there is still some little bug left.
If Alex doesn't beat me to it I will see if I can come up with a
debugging patch to make it easy to help track down where the reference
count is going wrong. It will be a little bit as my brain is mush at
the moment.
Eric
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 351 at kernel/ucount.c:253 dec_ucount+0x43/0x50
> Modules linked in: nft_ct nft_fib_ipv4 nft_fib wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 udp_tunnel libblake2s blake2s_x86_64 libblake2s_generic libchacha ccm algif_aead des_generic libdes ecb algif_skcipher cmac md4 algif_hash af_alg hid_sensor_custom_intel_hinge hid_sensor_als hid_sensor_gyro_3d hid_sensor_accel_3d hid_sensor_rotation hid_sensor_magn_3d hid_sensor_trigger industrialio_triggered_buffer hid_sensor_iio_common kfifo_buf industrialio hid_sensor_custom hid_sensor_hub cros_ec_ishtp cros_ec intel_ishtp_loader intel_ishtp_hid intel_rapl_msr nft_counter xt_mark ipt_REJECT nf_reject_ipv4 mei_hdcp intel_rapl_common xt_LOG nf_log_syslog intel_tcc_cooling x86_pkg_temp_thermal think_lmi wmi_bmof xt_addrtype firmware_attributes_class xt_tcpudp intel_powerclamp xt_conntrack nf_conntrack nf_defrag_ipv4 snd_hda_codec_hdmi nft_compat intel_wmi_thunderbolt nf_tables libcrc32c coretemp iwlmvm snd_ctl_led nfnetlink
> snd_hda_codec_conexant mac80211 snd_hda_codec_generic libarc4 vfat snd_hda_intel kvm_intel fat iwlwifi snd_intel_dspcfg rapl intel_cstate joydev snd_hda_codec mousedev intel_uncore snd_hwdep snd_hda_core psmouse snd_pcm snd_timer cfg80211 mei_me wacom ucsi_acpi typec_ucsi mei intel_pch_thermal intel_xhci_usb_role_switch intel_ish_ipc roles intel_ishtp typec wmi thinkpad_acpi ledtrig_audio platform_profile snd soundcore tpm_crb rfkill i2c_hid_acpi tpm_tis tpm_tis_core i2c_hid mac_hid acpi_pad pkcs8_key_parser fuse zram ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 usbhid dm_crypt cbc encrypted_keys trusted asn1_encoder tee tpm rng_core dm_mod rtsx_pci_sdmmc mmc_core serio_raw atkbd libps2 crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd rtsx_pci xhci_pci xhci_pci_renesas i8042 serio kvmgt mdev vfio_iommu_type1 vfio i915 i2c_algo_bit intel_gtt ttm agpgart video drm_kms_helper syscopyarea sysfillrect sysimgblt
> fb_sys_fops cec drm kvm irqbypass
> CPU: 1 PID: 351 Comm: kworker/1:3 Not tainted 5.14.3 #1 c719caf0c6c208968387ed83e3061ac05d0faf2f
> Workqueue: events free_user_ns
> RIP: 0010:dec_ucount+0x43/0x50
> Code: 14 01 48 8b 02 48 89 c6 48 83 ee 01 78 1c f0 48 0f b1 32 75 f0 48 8b 41 10 48 8b 88 e8 01 00 00 48 85 c9 75 d9 e9 0d fd ff ff <0f> 0b eb e7 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 f8 48
> RSP: 0018:ffffaa06c08bbe60 EFLAGS: 00010297
> RAX: 0000000000000000 RBX: ffff894ecb0c35a0 RCX: ffff894e25cdfcc0
> RDX: ffff894e25cdfce0 RSI: ffffffffffffffff RDI: ffff894e25cdfcc0
> RBP: ffff894ee393d380 R08: 0000000000000020 R09: ffff894ee393d5f0
> R10: ffff894f617fd000 R11: 0000000000031678 R12: ffff894e25cdfcc0
> R13: 00000000ffffffff R14: 0000000000000000 R15: 0000000000000001
> FS: 0000000000000000(0000) GS:ffff894f59680000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 000056ffceff6b10 CR3: 0000000147a0e005 CR4: 00000000003706e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> free_user_ns+0x73/0x110
> process_one_work+0x1e1/0x380
> worker_thread+0x50/0x3a0
> ? rescuer_thread+0x360/0x360
> kthread+0x127/0x150
> ? set_kthread_struct+0x40/0x40
> ret_from_fork+0x22/0x30
> ---[ end trace ff45ac39689f43c1 ]---
> BUG: kernel NULL pointer dereference, address: 00000000000001e8
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> Oops: 0000 [#1] SMP PTI
> CPU: 1 PID: 351 Comm: kworker/1:3 Tainted: G W 5.14.3 #1 c719caf0c6c208968387ed83e3061ac05d0faf2f
> Workqueue: events free_user_ns
> RIP: 0010:dec_ucount+0x32/0x50
> Code: 74 34 89 f6 48 89 f9 4c 8d 04 f5 20 00 00 00 4a 8d 14 01 48 8b 02 48 89 c6 48 83 ee 01 78 1c f0 48 0f b1 32 75 f0 48 8b 41 10 <48> 8b 88 e8 01 00 00 48 85 c9 75 d9 e9 0d fd ff ff 0f 0b eb e7 66
> RSP: 0018:ffffaa06c08bbe60 EFLAGS: 00010297
> RAX: 0000000000000000 RBX: ffff894ecb0c35a0 RCX: ffff894e25cdfcc0
> RDX: ffff894e25cdfce0 RSI: ffffffffffffffff RDI: ffff894e25cdfcc0
> RBP: ffff894ee393d380 R08: 0000000000000020 R09: ffff894ee393d5f0
> R10: ffff894f617fd000 R11: 0000000000031678 R12: ffff894e25cdfcc0
> R13: 00000000ffffffff R14: 0000000000000000 R15: 0000000000000001
> FS: 0000000000000000(0000) GS:ffff894f59680000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000000001e8 CR3: 0000000147a0e005 CR4: 00000000003706e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> free_user_ns+0x73/0x110
> process_one_work+0x1e1/0x380
> worker_thread+0x50/0x3a0
> ? rescuer_thread+0x360/0x360
> kthread+0x127/0x150
> ? set_kthread_struct+0x40/0x40
> ret_from_fork+0x22/0x30
> Modules linked in: nft_ct nft_fib_ipv4 nft_fib wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 udp_tunnel libblake2s blake2s_x86_64 libblake2s_generic libchacha ccm algif_aead des_generic libdes ecb algif_skcipher cmac md4 algif_hash af_alg hid_sensor_custom_intel_hinge hid_sensor_als hid_sensor_gyro_3d hid_sensor_accel_3d hid_sensor_rotation hid_sensor_magn_3d hid_sensor_trigger industrialio_triggered_buffer hid_sensor_iio_common kfifo_buf industrialio hid_sensor_custom hid_sensor_hub cros_ec_ishtp cros_ec intel_ishtp_loader intel_ishtp_hid intel_rapl_msr nft_counter xt_mark ipt_REJECT nf_reject_ipv4 mei_hdcp intel_rapl_common xt_LOG nf_log_syslog intel_tcc_cooling x86_pkg_temp_thermal think_lmi wmi_bmof xt_addrtype firmware_attributes_class xt_tcpudp intel_powerclamp xt_conntrack nf_conntrack nf_defrag_ipv4 snd_hda_codec_hdmi nft_compat intel_wmi_thunderbolt nf_tables libcrc32c coretemp iwlmvm snd_ctl_led nfnetlink
> snd_hda_codec_conexant mac80211 snd_hda_codec_generic libarc4 vfat snd_hda_intel kvm_intel fat iwlwifi snd_intel_dspcfg rapl intel_cstate joydev snd_hda_codec mousedev intel_uncore snd_hwdep snd_hda_core psmouse snd_pcm snd_timer cfg80211 mei_me wacom ucsi_acpi typec_ucsi mei intel_pch_thermal intel_xhci_usb_role_switch intel_ish_ipc roles intel_ishtp typec wmi thinkpad_acpi ledtrig_audio platform_profile snd soundcore tpm_crb rfkill i2c_hid_acpi tpm_tis tpm_tis_core i2c_hid mac_hid acpi_pad pkcs8_key_parser fuse zram ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 usbhid dm_crypt cbc encrypted_keys trusted asn1_encoder tee tpm rng_core dm_mod rtsx_pci_sdmmc mmc_core serio_raw atkbd libps2 crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel crypto_simd cryptd rtsx_pci xhci_pci xhci_pci_renesas i8042 serio kvmgt mdev vfio_iommu_type1 vfio i915 i2c_algo_bit intel_gtt ttm agpgart video drm_kms_helper syscopyarea sysfillrect sysimgblt
> fb_sys_fops cec drm kvm irqbypass
> CR2: 00000000000001e8
> ---[ end trace ff45ac39689f43c2 ]---
> RIP: 0010:dec_ucount+0x32/0x50
> Code: 74 34 89 f6 48 89 f9 4c 8d 04 f5 20 00 00 00 4a 8d 14 01 48 8b 02 48 89 c6 48 83 ee 01 78 1c f0 48 0f b1 32 75 f0 48 8b 41 10 <48> 8b 88 e8 01 00 00 48 85 c9 75 d9 e9 0d fd ff ff 0f 0b eb e7 66
> RSP: 0018:ffffaa06c08bbe60 EFLAGS: 00010297
> RAX: 0000000000000000 RBX: ffff894ecb0c35a0 RCX: ffff894e25cdfcc0
> RDX: ffff894e25cdfce0 RSI: ffffffffffffffff RDI: ffff894e25cdfcc0
> RBP: ffff894ee393d380 R08: 0000000000000020 R09: ffff894ee393d5f0
> R10: ffff894f617fd000 R11: 0000000000031678 R12: ffff894e25cdfcc0
> R13: 00000000ffffffff R14: 0000000000000000 R15: 0000000000000001
> FS: 0000000000000000(0000) GS:ffff894f59680000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000000001e8 CR3: 0000000147a0e005 CR4: 00000000003706e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: linux 5.14.3: free_user_ns causes NULL pointer dereference
2021-09-16 17:30 ` Eric W. Biederman
@ 2021-09-28 13:40 ` Jordan Glover
2021-09-29 17:36 ` Alexey Gladkov
0 siblings, 1 reply; 13+ messages in thread
From: Jordan Glover @ 2021-09-28 13:40 UTC (permalink / raw)
To: ebiederm
Cc: LKML, linux-mm\@kvack.org, legion\@kernel.org,
containers\@lists.linux-foundation.org, Yu Zhao
On Thursday, September 16th, 2021 at 5:30 PM, <ebiederm@xmission.com> wrote:
> Jordan Glover Golden_Miller83@protonmail.ch writes:
>
> > On Wednesday, September 15th, 2021 at 10:42 PM, Jordan Glover Golden_Miller83@protonmail.ch wrote:
> >
> > > I had about 2 containerized (flatpak/bubblewrap) apps (browser + music player) running . I quickly closed them with intent to shutdown the system but instead get the freeze and had to use magic sysrq to reboot. System logs end with what I posted and before there is nothing suspicious.
> > >
> > > Maybe it's some random fluke. I'll reply if I hit it again.
> >
> > Heh, it jut happened again. This time closing firefox alone had such
> > effect:
>
> Ok. It looks like he have a couple of folks seeing issues here.
> I thought we had all of the issues sorted out for the release of v5.14,
> but it looks like there is still some little bug left.
>
> If Alex doesn't beat me to it I will see if I can come up with a
> debugging patch to make it easy to help track down where the reference
> count is going wrong. It will be a little bit as my brain is mush at
> the moment.
>
> Eric
As the issue persist in 5.14.7 I would be very interested in such patch.
For now the thing is mostly reproducible when I close several tabs in ff then
close the browser in short period of time. When I close tabs then wait out
a bit then close the browser it doesn't happen so I guess some interrupted
cleanup triggers it.
Jordan
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: linux 5.14.3: free_user_ns causes NULL pointer dereference
2021-09-28 13:40 ` Jordan Glover
@ 2021-09-29 17:36 ` Alexey Gladkov
2021-09-29 21:39 ` Jordan Glover
0 siblings, 1 reply; 13+ messages in thread
From: Alexey Gladkov @ 2021-09-29 17:36 UTC (permalink / raw)
To: Jordan Glover
Cc: ebiederm, LKML, linux-mm\@kvack.org,
containers\@lists.linux-foundation.org, Yu Zhao
On Tue, Sep 28, 2021 at 01:40:48PM +0000, Jordan Glover wrote:
> On Thursday, September 16th, 2021 at 5:30 PM, <ebiederm@xmission.com> wrote:
>
> > Jordan Glover Golden_Miller83@protonmail.ch writes:
> >
> > > On Wednesday, September 15th, 2021 at 10:42 PM, Jordan Glover Golden_Miller83@protonmail.ch wrote:
> > >
> > > > I had about 2 containerized (flatpak/bubblewrap) apps (browser + music player) running . I quickly closed them with intent to shutdown the system but instead get the freeze and had to use magic sysrq to reboot. System logs end with what I posted and before there is nothing suspicious.
> > > >
> > > > Maybe it's some random fluke. I'll reply if I hit it again.
> > >
> > > Heh, it jut happened again. This time closing firefox alone had such
> > > effect:
> >
> > Ok. It looks like he have a couple of folks seeing issues here.
> > I thought we had all of the issues sorted out for the release of v5.14,
> > but it looks like there is still some little bug left.
> >
> > If Alex doesn't beat me to it I will see if I can come up with a
> > debugging patch to make it easy to help track down where the reference
> > count is going wrong. It will be a little bit as my brain is mush at
> > the moment.
> >
> > Eric
>
> As the issue persist in 5.14.7 I would be very interested in such patch.
>
> For now the thing is mostly reproducible when I close several tabs in ff then
> close the browser in short period of time. When I close tabs then wait out
> a bit then close the browser it doesn't happen so I guess some interrupted
> cleanup triggers it.
I'm still investigating, but I would like to rule out one option.
Could you check out the patch?
diff --git a/kernel/ucount.c b/kernel/ucount.c
index bb51849e6375..f23f906f4f62 100644
--- a/kernel/ucount.c
+++ b/kernel/ucount.c
@@ -201,11 +201,14 @@ void put_ucounts(struct ucounts *ucounts)
{
unsigned long flags;
- if (atomic_dec_and_lock_irqsave(&ucounts->count, &ucounts_lock, flags)) {
+ spin_lock_irqsave(&ucounts_lock, flags);
+ if (atomic_dec_and_test(&ucounts->count)) {
hlist_del_init(&ucounts->node);
spin_unlock_irqrestore(&ucounts_lock, flags);
kfree(ucounts);
+ return;
}
+ spin_unlock_irqrestore(&ucounts_lock, flags);
}
static inline bool atomic_long_inc_below(atomic_long_t *v, int u)
--
Rgrds, legion
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: linux 5.14.3: free_user_ns causes NULL pointer dereference
2021-09-29 17:36 ` Alexey Gladkov
@ 2021-09-29 21:39 ` Jordan Glover
2021-09-30 13:06 ` Alexey Gladkov
0 siblings, 1 reply; 13+ messages in thread
From: Jordan Glover @ 2021-09-29 21:39 UTC (permalink / raw)
To: Alexey Gladkov
Cc: ebiederm, LKML, linux-mm\\@kvack.org,
containers\\@lists.linux-foundation.org, Yu Zhao
On Wednesday, September 29th, 2021 at 5:36 PM, Alexey Gladkov <legion@kernel.org> wrote:
> On Tue, Sep 28, 2021 at 01:40:48PM +0000, Jordan Glover wrote:
>
> > On Thursday, September 16th, 2021 at 5:30 PM, ebiederm@xmission.com wrote:
> >
> > > Jordan Glover Golden_Miller83@protonmail.ch writes:
> > >
> > > > On Wednesday, September 15th, 2021 at 10:42 PM, Jordan Glover Golden_Miller83@protonmail.ch wrote:
> > > >
> > > > > I had about 2 containerized (flatpak/bubblewrap) apps (browser + music player) running . I quickly closed them with intent to shutdown the system but instead get the freeze and had to use magic sysrq to reboot. System logs end with what I posted and before there is nothing suspicious.
> > > > >
> > > > > Maybe it's some random fluke. I'll reply if I hit it again.
> > > >
> > > > Heh, it jut happened again. This time closing firefox alone had such
> > > >
> > > > effect:
> > >
> > > Ok. It looks like he have a couple of folks seeing issues here.
> > >
> > > I thought we had all of the issues sorted out for the release of v5.14,
> > >
> > > but it looks like there is still some little bug left.
> > >
> > > If Alex doesn't beat me to it I will see if I can come up with a
> > >
> > > debugging patch to make it easy to help track down where the reference
> > >
> > > count is going wrong. It will be a little bit as my brain is mush at
> > >
> > > the moment.
> > >
> > > Eric
> >
> > As the issue persist in 5.14.7 I would be very interested in such patch.
> >
> > For now the thing is mostly reproducible when I close several tabs in ff then
> >
> > close the browser in short period of time. When I close tabs then wait out
> >
> > a bit then close the browser it doesn't happen so I guess some interrupted
> >
> > cleanup triggers it.
>
> I'm still investigating, but I would like to rule out one option.
>
> Could you check out the patch?
Thx, I added it to my kernel and will report in few days.
Does this patch try to fix the issue or make it easier to track?
Jordan
> diff --git a/kernel/ucount.c b/kernel/ucount.c
>
> index bb51849e6375..f23f906f4f62 100644
>
> --- a/kernel/ucount.c
>
> +++ b/kernel/ucount.c
>
> @@ -201,11 +201,14 @@ void put_ucounts(struct ucounts *ucounts)
>
> {
>
> unsigned long flags;
>
> - if (atomic_dec_and_lock_irqsave(&ucounts->count, &ucounts_lock, flags)) {
>
>
>
> - spin_lock_irqsave(&ucounts_lock, flags);
>
>
> - if (atomic_dec_and_test(&ucounts->count)) {
>
> hlist_del_init(&ucounts->node);
>
> spin_unlock_irqrestore(&ucounts_lock, flags);
> kfree(ucounts);
>
>
> - return;
> }
>
>
> - spin_unlock_irqrestore(&ucounts_lock, flags);
>
>
>
> }
>
> static inline bool atomic_long_inc_below(atomic_long_t *v, int u)
>
> ---------------------------------------------------------------------
>
> Rgrds, legion
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: linux 5.14.3: free_user_ns causes NULL pointer dereference
2021-09-29 21:39 ` Jordan Glover
@ 2021-09-30 13:06 ` Alexey Gladkov
2021-09-30 22:27 ` Yu Zhao
0 siblings, 1 reply; 13+ messages in thread
From: Alexey Gladkov @ 2021-09-30 13:06 UTC (permalink / raw)
To: Jordan Glover; +Cc: ebiederm, LKML, linux-mm, containers, Yu Zhao
On Wed, Sep 29, 2021 at 09:39:06PM +0000, Jordan Glover wrote:
> > I'm still investigating, but I would like to rule out one option.
> >
> > Could you check out the patch?
>
>
> Thx, I added it to my kernel and will report in few days.
> Does this patch try to fix the issue or make it easier to track?
I suspect the error is caused by a race between allow_ucounts() and
put_ucounts(). I think this patch could solve the problem.
--
Rgrds, legion
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: linux 5.14.3: free_user_ns causes NULL pointer dereference
2021-09-30 13:06 ` Alexey Gladkov
@ 2021-09-30 22:27 ` Yu Zhao
2021-10-04 17:10 ` Eric W. Biederman
0 siblings, 1 reply; 13+ messages in thread
From: Yu Zhao @ 2021-09-30 22:27 UTC (permalink / raw)
To: Alexey Gladkov; +Cc: Jordan Glover, ebiederm, LKML, linux-mm, containers
On Thu, Sep 30, 2021 at 7:06 AM Alexey Gladkov <legion@kernel.org> wrote:
>
> On Wed, Sep 29, 2021 at 09:39:06PM +0000, Jordan Glover wrote:
> > > I'm still investigating, but I would like to rule out one option.
> > >
> > > Could you check out the patch?
> >
> >
> > Thx, I added it to my kernel and will report in few days.
> > Does this patch try to fix the issue or make it easier to track?
>
> I suspect the error is caused by a race between allow_ucounts() and
> put_ucounts(). I think this patch could solve the problem.
Thanks for your help. Still can reproduce the problem with the change suggested.
[ 7761.885966] ==================================================================
[ 7761.893462] BUG: KASAN: use-after-free in dec_ucount+0x50/0xd8
[ 7761.899491] Write of size 8 at addr ffffff80c537b140 by task
kworker/u16:3/10303
[ 7761.907110]
[ 7761.908668] CPU: 0 PID: 10303 Comm: kworker/u16:3 Not tainted
5.14.0-lockdep+ #1
[ 7761.916289] Hardware name: Google Lazor (rev3+) with KB Backlight (DT)
[ 7761.923021] Workqueue: netns cleanup_net
[ 7761.927106] Call trace:
[ 7761.929648] dump_backtrace+0x0/0x42c
[ 7761.933442] show_stack+0x24/0x30
[ 7761.936878] dump_stack_lvl+0xd0/0x100
[ 7761.940763] print_address_description+0x30/0x304
[ 7761.945628] kasan_report+0x190/0x1d8
[ 7761.949418] kasan_check_range+0x1ac/0x1bc
[ 7761.953655] __kasan_check_write+0x44/0x54
[ 7761.957891] dec_ucount+0x50/0xd8
[ 7761.961334] cleanup_net+0x630/0x718
[ 7761.965036] process_one_work+0x7b4/0x10ec
[ 7761.969274] worker_thread+0x800/0xcf4
[ 7761.973152] kthread+0x2a8/0x358
[ 7761.976496] ret_from_fork+0x10/0x18
[ 7761.980201]
[ 7761.981761] Allocated by task 4840:
[ 7761.985366] kasan_save_stack+0x38/0x68
[ 7761.989342] __kasan_kmalloc+0x9c/0xb8
[ 7761.993222] kmem_cache_alloc_trace+0x2a4/0x370
[ 7761.997905] alloc_ucounts+0x150/0x374
[ 7762.001787] set_cred_ucounts+0x198/0x248
[ 7762.005935] __sys_setresuid+0x31c/0x4f8
[ 7762.009993] __arm64_sys_setresuid+0x84/0x98
[ 7762.014410] invoke_syscall+0xd4/0x2c8
[ 7762.018292] el0_svc_common+0x124/0x200
[ 7762.022265] do_el0_svc_compat+0x54/0x64
[ 7762.026325] el0_svc_compat+0x24/0x34
[ 7762.030124] el0t_32_sync_handler+0xc0/0xf0
[ 7762.034451] el0t_32_sync+0x19c/0x1a0
[ 7762.038241]
[ 7762.039799] Freed by task 0:
[ 7762.042778] kasan_save_stack+0x38/0x68
[ 7762.046747] kasan_set_track+0x28/0x3c
[ 7762.050625] kasan_set_free_info+0x28/0x4c
[ 7762.054857] ____kasan_slab_free+0x118/0x164
[ 7762.059277] __kasan_slab_free+0x18/0x28
[ 7762.063339] kfree+0x2f8/0x500
[ 7762.066505] put_ucounts+0x11c/0x134
[ 7762.070209] put_cred_rcu+0x1bc/0x35c
[ 7762.074006] rcu_core+0xa68/0x1b20
[ 7762.077538] rcu_core_si+0x1c/0x28
[ 7762.081061] __do_softirq+0x4bc/0xedc
[ 7762.084851]
[ 7762.086401] The buggy address belongs to the object at ffffff80c537b100
[ 7762.086401] which belongs to the cache kmalloc-256 of size 256
[ 7762.099267] The buggy address is located 64 bytes inside of
[ 7762.099267] 256-byte region [ffffff80c537b100, ffffff80c537b200)
[ 7762.111248] The buggy address belongs to the page:
[ 7762.116185] page:fffffffe0314de00 refcount:1 mapcount:0
mapping:0000000000000000 index:0xffffff80c537ad00 pfn:0x145378
[ 7762.127180] head:fffffffe0314de00 order:3 compound_mapcount:0
compound_pincount:0
[ 7762.134881] flags: 0x8000000000010200(slab|head|zone=2)
[ 7762.140286] raw: 8000000000010200 fffffffe02799408 fffffffe02020808
ffffff808000c980
[ 7762.148263] raw: ffffff80c537ad00 0000000000200004 00000001ffffffff
0000000000000000
[ 7762.156228] page dumped because: kasan: bad access detected
[ 7762.161974]
[ 7762.163532] Memory state around the buggy address:
[ 7762.168475] ffffff80c537b000: fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc
[ 7762.175915] ffffff80c537b080: fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc
[ 7762.183346] >ffffff80c537b100: fa fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[ 7762.190774] ^
[ 7762.196258] ffffff80c537b180: fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb fb
[ 7762.203689] ffffff80c537b200: fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc
[ 7762.211125] ==================================================================
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: linux 5.14.3: free_user_ns causes NULL pointer dereference
2021-09-30 22:27 ` Yu Zhao
@ 2021-10-04 17:10 ` Eric W. Biederman
2021-10-04 17:19 ` Eric W. Biederman
0 siblings, 1 reply; 13+ messages in thread
From: Eric W. Biederman @ 2021-10-04 17:10 UTC (permalink / raw)
To: Yu Zhao
Cc: Alexey Gladkov, Jordan Glover, LKML, linux-mm, containers,
Rune Kleveland
Adding Rune Kleveland to the discussion as he also seems to have
reproduced the issue.
Alex and I have been starring at the code and the reports and this
bug is hiding well. Here is what we have figured out so far.
Both the warning from free_user_ns calling dec_ucount that Jordan Glover
reported and the KASAN error that Yu Zhao has reported appear to have
the same cause. Using a ucounts structure after it has been freed and
reallocated as something else.
I have just skimmed through the recent report from Rune Kleveland
and it appears also to be a use after free. Especially since the
second failure in the log is slub complaining about trying to free
the ucounts data structure.
We looked through the users of put_ucounts and we don't see any obvious
buggy users that would be freeing the data structure early.
Alex has tried to reproduce this so far is not having any luck.
Folks can you tell what compiler versions you are using and share your
kernel config with us? That might help.
The little debug diff below is my guess of what is happening. If the
folks who can reproduce this issue can try the patch below and let me
know if the warnings fire that would be appreciated. It is still not
enough to track down the bug but at least it will confirm my current
hypothesis about how things look before there is a use of memory after
it is freed.
Thank you,
Eric
diff --git a/kernel/cred.c b/kernel/cred.c
index f784e08c2fbd..e7ffaa3cf5a6 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -120,6 +120,12 @@ static void put_cred_rcu(struct rcu_head *rcu)
if (cred->group_info)
put_group_info(cred->group_info);
free_uid(cred->user);
+#if 1
+ if ((cred->ucounts == cred->user_ns->ucounts) &&
+ (atomic_read(&cred->ucounts->count) == 1)) {
+ WARN_ONCE(1, "put_cred_rcu: ucount count 1\n");
+ }
+#endif
if (cred->ucounts)
put_ucounts(cred->ucounts);
put_user_ns(cred->user_ns);
diff --git a/kernel/exit.c b/kernel/exit.c
index 91a43e57a32e..60fd88b34c1a 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -743,6 +743,13 @@ void __noreturn do_exit(long code)
if (unlikely(!tsk->pid))
panic("Attempted to kill the idle task!");
+#if 1
+ if ((tsk->cred->ucounts == tsk->cred->user_ns->ucounts) &&
+ (atomic_read(tsk->cred->ucounts->count) == 1)) {
+ WARN_ONCE(1, "do_exit: ucount count 1\n");
+ }
+#endif
+
/*
* If do_exit is called because this processes oopsed, it's possible
* that get_fs() was left as KERNEL_DS, so reset it to USER_DS before
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: linux 5.14.3: free_user_ns causes NULL pointer dereference
2021-10-04 17:10 ` Eric W. Biederman
@ 2021-10-04 17:19 ` Eric W. Biederman
2021-10-10 8:59 ` Rune Kleveland
0 siblings, 1 reply; 13+ messages in thread
From: Eric W. Biederman @ 2021-10-04 17:19 UTC (permalink / raw)
To: Yu Zhao
Cc: Alexey Gladkov, Jordan Glover, LKML, linux-mm, containers,
Rune Kleveland
ebiederm@xmission.com (Eric W. Biederman) writes:
> Adding Rune Kleveland to the discussion as he also seems to have
> reproduced the issue.
>
> Alex and I have been starring at the code and the reports and this
> bug is hiding well. Here is what we have figured out so far.
>
> Both the warning from free_user_ns calling dec_ucount that Jordan Glover
> reported and the KASAN error that Yu Zhao has reported appear to have
> the same cause. Using a ucounts structure after it has been freed and
> reallocated as something else.
>
> I have just skimmed through the recent report from Rune Kleveland
> and it appears also to be a use after free. Especially since the
> second failure in the log is slub complaining about trying to free
> the ucounts data structure.
>
> We looked through the users of put_ucounts and we don't see any obvious
> buggy users that would be freeing the data structure early.
>
> Alex has tried to reproduce this so far is not having any luck.
> Folks can you tell what compiler versions you are using and share your
> kernel config with us? That might help.
>
> The little debug diff below is my guess of what is happening. If the
> folks who can reproduce this issue can try the patch below and let me
> know if the warnings fire that would be appreciated. It is still not
> enough to track down the bug but at least it will confirm my current
> hypothesis about how things look before there is a use of memory after
> it is freed.
Bah. Scratch that test patch. I just double checked myself and
cred->ucounts and cred->user_ns->ucounts should never be equal,
as the user namespace is counted in it's parent user namespace.
That observation now tells me I have a parent user namespace that went
corrupt.
Back to the drawing board.
> Thank you,
> Eric
>
> diff --git a/kernel/cred.c b/kernel/cred.c
> index f784e08c2fbd..e7ffaa3cf5a6 100644
> --- a/kernel/cred.c
> +++ b/kernel/cred.c
> @@ -120,6 +120,12 @@ static void put_cred_rcu(struct rcu_head *rcu)
> if (cred->group_info)
> put_group_info(cred->group_info);
> free_uid(cred->user);
> +#if 1
> + if ((cred->ucounts == cred->user_ns->ucounts) &&
> + (atomic_read(&cred->ucounts->count) == 1)) {
> + WARN_ONCE(1, "put_cred_rcu: ucount count 1\n");
> + }
> +#endif
> if (cred->ucounts)
> put_ucounts(cred->ucounts);
> put_user_ns(cred->user_ns);
> diff --git a/kernel/exit.c b/kernel/exit.c
> index 91a43e57a32e..60fd88b34c1a 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -743,6 +743,13 @@ void __noreturn do_exit(long code)
> if (unlikely(!tsk->pid))
> panic("Attempted to kill the idle task!");
>
> +#if 1
> + if ((tsk->cred->ucounts == tsk->cred->user_ns->ucounts) &&
> + (atomic_read(tsk->cred->ucounts->count) == 1)) {
> + WARN_ONCE(1, "do_exit: ucount count 1\n");
> + }
> +#endif
> +
> /*
> * If do_exit is called because this processes oopsed, it's possible
> * that get_fs() was left as KERNEL_DS, so reset it to USER_DS before
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: linux 5.14.3: free_user_ns causes NULL pointer dereference
2021-10-04 17:19 ` Eric W. Biederman
@ 2021-10-10 8:59 ` Rune Kleveland
2021-10-15 22:10 ` [CFT][PATCH] ucounts: Fix signal ucount refcounting Eric W. Biederman
0 siblings, 1 reply; 13+ messages in thread
From: Rune Kleveland @ 2021-10-10 8:59 UTC (permalink / raw)
To: Eric W. Biederman, Yu Zhao
Cc: Alexey Gladkov, Jordan Glover, LKML, linux-mm, containers
Hi!
Just wanted to let you know that I still get these on stock Fedora
kernel 5.14.10 on the IBM blades. But it took 10 hours before the first
server crashed. The other 4 still runs fine since 15 hours ago. So for
me it seems more stable now, but that could just be a coincidence.
Best regards,
Rune
------------[ cut here ]------------
kernel BUG at mm/slub.c:321!
invalid opcode: 0000 [#1] SMP PTI
CPU: 22 PID: 1838853 Comm: python3 Not tainted 5.14.10-200.fc34.x86_64 #1
Hardware name: IBM BladeCenter HS22 -[7870TKN]-/68Y8161, BIOS
-[P9E164CUS-1.28]- 04/17/2018
RIP: 0010:__slab_free+0x245/0x4a0
Code: 0f b6 5c 24 1b 44 8b 44 24 1c 48 89 44 24 08 48 8b 54 24 20 4c 8b
4c 24 28 e9 8a fe ff ff 41 f7 45 08 00 0d 21 00 75 98 eb 8d <0f> 0b 49
3b 54 24 28 0f 85 53 ff ff ff 49 8b 44 24 08 4>
RSP: 0018:ffffb71dcfd6fda0 EFLAGS: 00010246
RAX: ffff9c5480d35860 RBX: ffff9c5480d35800 RCX: ffff9c5480d35800
RDX: 00000000802a0029 RSI: ffffeb41da034d00 RDI: ffff9c4f00042800
RBP: ffffb71dcfd6fe50 R08: 0000000000000001 R09: ffffffff9210b6a5
R10: ffff9c548102e000 R11: 0000000062667658 R12: ffffeb41da034d00
R13: ffff9c4f00042800 R14: ffff9c5480d35800 R15: ffff9c5480d35800
FS: 00007f7a89765740(0000) GS:ffff9c6c1fc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7a7ad7f4b0 CR3: 0000000564af4002 CR4: 00000000000206e0
Call Trace:
? filename_lookup+0x135/0x1b0
? put_ucounts+0x65/0x70
kfree+0x369/0x3c0
put_ucounts+0x65/0x70
put_cred_rcu+0x70/0xd0
do_faccessat+0x113/0x240
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f7a899cc44b
Code: 77 05 c3 0f 1f 40 00 48 8b 15 29 1a 0d 00 f7 d8 64 89 02 48 c7 c0
ff ff ff ff c3 0f 1f 40 00 f3 0f 1e fa b8 15 00 00 00 0f 05 <48> 3d 00
f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 f9 19 0>
RSP: 002b:00007ffd01fa9ce8 EFLAGS: 00000202 ORIG_RAX: 0000000000000015
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7a899cc44b
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 00007f79a96d6e10
RBP: 0000000000000001 R08: 0000000000000000 R09: 00007f7a7c1fb930
R10: 00007f79a96d6000 R11: 0000000000000202 R12: 00007ffd01fa9d00
R13: 0000000000000001 R14: 0000556841045c90 R15: 00000000ffffff9c
Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs
lockd grace sunrpc fscache netfs nft_fib_inet nft_fib_ipv4 nft_fib_ipv6
nft_fib nft_reject_inet nf_reject_ipv4 nf_rejec>
---[ end trace 0a81b150eacde1d5 ]---
RIP: 0010:__slab_free+0x245/0x4a0
Code: 0f b6 5c 24 1b 44 8b 44 24 1c 48 89 44 24 08 48 8b 54 24 20 4c 8b
4c 24 28 e9 8a fe ff ff 41 f7 45 08 00 0d 21 00 75 98 eb 8d <0f> 0b 49
3b 54 24 28 0f 85 53 ff ff ff 49 8b 44 24 08 4>
RSP: 0018:ffffb71dcfd6fda0 EFLAGS: 00010246
RAX: ffff9c5480d35860 RBX: ffff9c5480d35800 RCX: ffff9c5480d35800
RDX: 00000000802a0029 RSI: ffffeb41da034d00 RDI: ffff9c4f00042800
RBP: ffffb71dcfd6fe50 R08: 0000000000000001 R09: ffffffff9210b6a5
R10: ffff9c548102e000 R11: 0000000062667658 R12: ffffeb41da034d00
R13: ffff9c4f00042800 R14: ffff9c5480d35800 R15: ffff9c5480d35800
FS: 00007f7a89765740(0000) GS:ffff9c6c1fc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f7a7ad7f4b0 CR3: 0000000564af4002 CR4: 00000000000206e0
------------[ cut here ]------------
On 04/10/2021 19:19, Eric W. Biederman wrote:
> ebiederm@xmission.com (Eric W. Biederman) writes:
>
>> Adding Rune Kleveland to the discussion as he also seems to have
>> reproduced the issue.
>>
>> Alex and I have been starring at the code and the reports and this
>> bug is hiding well. Here is what we have figured out so far.
>>
>> Both the warning from free_user_ns calling dec_ucount that Jordan Glover
>> reported and the KASAN error that Yu Zhao has reported appear to have
>> the same cause. Using a ucounts structure after it has been freed and
>> reallocated as something else.
>>
>> I have just skimmed through the recent report from Rune Kleveland
>> and it appears also to be a use after free. Especially since the
>> second failure in the log is slub complaining about trying to free
>> the ucounts data structure.
>>
>> We looked through the users of put_ucounts and we don't see any obvious
>> buggy users that would be freeing the data structure early.
>>
>> Alex has tried to reproduce this so far is not having any luck.
>> Folks can you tell what compiler versions you are using and share your
>> kernel config with us? That might help.
>>
>> The little debug diff below is my guess of what is happening. If the
>> folks who can reproduce this issue can try the patch below and let me
>> know if the warnings fire that would be appreciated. It is still not
>> enough to track down the bug but at least it will confirm my current
>> hypothesis about how things look before there is a use of memory after
>> it is freed.
> Bah. Scratch that test patch. I just double checked myself and
> cred->ucounts and cred->user_ns->ucounts should never be equal,
> as the user namespace is counted in it's parent user namespace.
>
> That observation now tells me I have a parent user namespace that went
> corrupt.
>
> Back to the drawing board.
>
>
>> Thank you,
>> Eric
>>
>> diff --git a/kernel/cred.c b/kernel/cred.c
>> index f784e08c2fbd..e7ffaa3cf5a6 100644
>> --- a/kernel/cred.c
>> +++ b/kernel/cred.c
>> @@ -120,6 +120,12 @@ static void put_cred_rcu(struct rcu_head *rcu)
>> if (cred->group_info)
>> put_group_info(cred->group_info);
>> free_uid(cred->user);
>> +#if 1
>> + if ((cred->ucounts == cred->user_ns->ucounts) &&
>> + (atomic_read(&cred->ucounts->count) == 1)) {
>> + WARN_ONCE(1, "put_cred_rcu: ucount count 1\n");
>> + }
>> +#endif
>> if (cred->ucounts)
>> put_ucounts(cred->ucounts);
>> put_user_ns(cred->user_ns);
>> diff --git a/kernel/exit.c b/kernel/exit.c
>> index 91a43e57a32e..60fd88b34c1a 100644
>> --- a/kernel/exit.c
>> +++ b/kernel/exit.c
>> @@ -743,6 +743,13 @@ void __noreturn do_exit(long code)
>> if (unlikely(!tsk->pid))
>> panic("Attempted to kill the idle task!");
>>
>> +#if 1
>> + if ((tsk->cred->ucounts == tsk->cred->user_ns->ucounts) &&
>> + (atomic_read(tsk->cred->ucounts->count) == 1)) {
>> + WARN_ONCE(1, "do_exit: ucount count 1\n");
>> + }
>> +#endif
>> +
>> /*
>> * If do_exit is called because this processes oopsed, it's possible
>> * that get_fs() was left as KERNEL_DS, so reset it to USER_DS before
^ permalink raw reply [flat|nested] 13+ messages in thread
* [CFT][PATCH] ucounts: Fix signal ucount refcounting
2021-10-10 8:59 ` Rune Kleveland
@ 2021-10-15 22:10 ` Eric W. Biederman
2021-10-15 23:09 ` Alexey Gladkov
` (2 more replies)
0 siblings, 3 replies; 13+ messages in thread
From: Eric W. Biederman @ 2021-10-15 22:10 UTC (permalink / raw)
To: Rune Kleveland
Cc: Yu Zhao, Alexey Gladkov, Jordan Glover, LKML, linux-mm, containers
In commit fda31c50292a ("signal: avoid double atomic counter
increments for user accounting") Linus made a clever optimization to
how rlimits and the struct user_struct. Unfortunately that
optimization does not work in the obvious way when moved to nested
rlimits. The problem is that the last decrement of the per user
namespace per user sigpending counter might also be the last decrement
of the sigpending counter in the parent user namespace as well. Which
means that simply freeing the leaf ucount in __free_sigqueue is not
enough.
Maintain the optimization and handle the tricky cases by introducing
inc_rlimit_get_ucounts and dec_rlimit_put_ucounts.
By moving the entire optimization into functions that perform all of
the work it becomes possible to ensure that every level is handled
properly.
I wish we had a single user across all of the threads whose rlimit
could be charged so we did not need this complexity.
Cc: stable@vger.kernel.org
Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
With a lot of help from Alex who found a way I could reproduce this
I believe I have found the issue.
Could people who are seeing this issue test and verify this solves the
problem for them?
include/linux/user_namespace.h | 2 ++
kernel/signal.c | 25 +++++----------------
kernel/ucount.c | 41 ++++++++++++++++++++++++++++++++++
3 files changed, 49 insertions(+), 19 deletions(-)
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index eb70cabe6e7f..33a4240e6a6f 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -127,6 +127,8 @@ static inline long get_ucounts_value(struct ucounts *ucounts, enum ucount_type t
long inc_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v);
bool dec_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v);
+long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum ucount_type type);
+void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum ucount_type type);
bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsigned long max);
static inline void set_rlimit_ucount_max(struct user_namespace *ns,
diff --git a/kernel/signal.c b/kernel/signal.c
index a3229add4455..762de58c6e76 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -425,22 +425,10 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags,
*/
rcu_read_lock();
ucounts = task_ucounts(t);
- sigpending = inc_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1);
- switch (sigpending) {
- case 1:
- if (likely(get_ucounts(ucounts)))
- break;
- fallthrough;
- case LONG_MAX:
- /*
- * we need to decrease the ucount in the userns tree on any
- * failure to avoid counts leaking.
- */
- dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1);
- rcu_read_unlock();
- return NULL;
- }
+ sigpending = inc_rlimit_get_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING);
rcu_read_unlock();
+ if (sigpending == LONG_MAX)
+ return NULL;
if (override_rlimit || likely(sigpending <= task_rlimit(t, RLIMIT_SIGPENDING))) {
q = kmem_cache_alloc(sigqueue_cachep, gfp_flags);
@@ -449,8 +437,7 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags,
}
if (unlikely(q == NULL)) {
- if (dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1))
- put_ucounts(ucounts);
+ dec_rlimit_put_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING);
} else {
INIT_LIST_HEAD(&q->list);
q->flags = sigqueue_flags;
@@ -463,8 +450,8 @@ static void __sigqueue_free(struct sigqueue *q)
{
if (q->flags & SIGQUEUE_PREALLOC)
return;
- if (q->ucounts && dec_rlimit_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING, 1)) {
- put_ucounts(q->ucounts);
+ if (q->ucounts) {
+ dec_rlimit_put_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING);
q->ucounts = NULL;
}
kmem_cache_free(sigqueue_cachep, q);
diff --git a/kernel/ucount.c b/kernel/ucount.c
index 3b7e176cf7a2..687d77aa66bb 100644
--- a/kernel/ucount.c
+++ b/kernel/ucount.c
@@ -285,6 +285,47 @@ bool dec_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v)
return (new == 0);
}
+static void do_dec_rlimit_put_ucounts(struct ucounts *ucounts,
+ struct ucounts *last, enum ucount_type type)
+{
+ struct ucounts *iter;
+ for (iter = ucounts; iter != last; iter = iter->ns->ucounts) {
+ long dec = atomic_long_add_return(-1, &iter->ucount[type]);
+ WARN_ON_ONCE(dec < 0);
+ if (dec == 0)
+ put_ucounts(iter);
+ }
+}
+
+void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum ucount_type type)
+{
+ do_dec_rlimit_put_ucounts(ucounts, NULL, type);
+}
+
+long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum ucount_type type)
+{
+ struct ucounts *iter;
+ long dec, ret = 0;
+
+ for (iter = ucounts; iter; iter = iter->ns->ucounts) {
+ long max = READ_ONCE(iter->ns->ucount_max[type]);
+ long new = atomic_long_add_return(1, &iter->ucount[type]);
+ if (new < 0 || new > max)
+ goto unwind;
+ else if (iter == ucounts)
+ ret = new;
+ if ((new == 1) && (get_ucounts(iter) != iter))
+ goto dec_unwind;
+ }
+ return ret;
+dec_unwind:
+ dec = atomic_long_add_return(1, &iter->ucount[type]);
+ WARN_ON_ONCE(dec < 0);
+unwind:
+ do_dec_rlimit_put_ucounts(ucounts, iter, type);
+ return LONG_MAX;
+}
+
bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsigned long max)
{
struct ucounts *iter;
--
2.20.1
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [CFT][PATCH] ucounts: Fix signal ucount refcounting
2021-10-15 22:10 ` [CFT][PATCH] ucounts: Fix signal ucount refcounting Eric W. Biederman
@ 2021-10-15 23:09 ` Alexey Gladkov
2021-10-16 17:34 ` Eric W. Biederman
2021-10-16 2:08 ` Hillf Danton
2021-10-17 16:47 ` Rune Kleveland
2 siblings, 1 reply; 13+ messages in thread
From: Alexey Gladkov @ 2021-10-15 23:09 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Rune Kleveland, Yu Zhao, Jordan Glover, LKML, linux-mm, containers
On Fri, Oct 15, 2021 at 05:10:58PM -0500, Eric W. Biederman wrote:
>
> In commit fda31c50292a ("signal: avoid double atomic counter
> increments for user accounting") Linus made a clever optimization to
> how rlimits and the struct user_struct. Unfortunately that
> optimization does not work in the obvious way when moved to nested
> rlimits. The problem is that the last decrement of the per user
> namespace per user sigpending counter might also be the last decrement
> of the sigpending counter in the parent user namespace as well. Which
> means that simply freeing the leaf ucount in __free_sigqueue is not
> enough.
>
> Maintain the optimization and handle the tricky cases by introducing
> inc_rlimit_get_ucounts and dec_rlimit_put_ucounts.
>
> By moving the entire optimization into functions that perform all of
> the work it becomes possible to ensure that every level is handled
> properly.
>
> I wish we had a single user across all of the threads whose rlimit
> could be charged so we did not need this complexity.
>
> Cc: stable@vger.kernel.org
> Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts")
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>
> With a lot of help from Alex who found a way I could reproduce this
> I believe I have found the issue.
>
> Could people who are seeing this issue test and verify this solves the
> problem for them?
>
> include/linux/user_namespace.h | 2 ++
> kernel/signal.c | 25 +++++----------------
> kernel/ucount.c | 41 ++++++++++++++++++++++++++++++++++
> 3 files changed, 49 insertions(+), 19 deletions(-)
>
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index eb70cabe6e7f..33a4240e6a6f 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -127,6 +127,8 @@ static inline long get_ucounts_value(struct ucounts *ucounts, enum ucount_type t
>
> long inc_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v);
> bool dec_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v);
> +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum ucount_type type);
> +void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum ucount_type type);
> bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsigned long max);
>
> static inline void set_rlimit_ucount_max(struct user_namespace *ns,
> diff --git a/kernel/signal.c b/kernel/signal.c
> index a3229add4455..762de58c6e76 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -425,22 +425,10 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags,
> */
> rcu_read_lock();
> ucounts = task_ucounts(t);
> - sigpending = inc_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1);
> - switch (sigpending) {
> - case 1:
> - if (likely(get_ucounts(ucounts)))
> - break;
> - fallthrough;
> - case LONG_MAX:
> - /*
> - * we need to decrease the ucount in the userns tree on any
> - * failure to avoid counts leaking.
> - */
> - dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1);
> - rcu_read_unlock();
> - return NULL;
> - }
> + sigpending = inc_rlimit_get_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING);
> rcu_read_unlock();
> + if (sigpending == LONG_MAX)
> + return NULL;
>
> if (override_rlimit || likely(sigpending <= task_rlimit(t, RLIMIT_SIGPENDING))) {
> q = kmem_cache_alloc(sigqueue_cachep, gfp_flags);
> @@ -449,8 +437,7 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags,
> }
>
> if (unlikely(q == NULL)) {
> - if (dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1))
> - put_ucounts(ucounts);
> + dec_rlimit_put_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING);
> } else {
> INIT_LIST_HEAD(&q->list);
> q->flags = sigqueue_flags;
> @@ -463,8 +450,8 @@ static void __sigqueue_free(struct sigqueue *q)
> {
> if (q->flags & SIGQUEUE_PREALLOC)
> return;
> - if (q->ucounts && dec_rlimit_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING, 1)) {
> - put_ucounts(q->ucounts);
> + if (q->ucounts) {
> + dec_rlimit_put_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING);
> q->ucounts = NULL;
> }
> kmem_cache_free(sigqueue_cachep, q);
> diff --git a/kernel/ucount.c b/kernel/ucount.c
> index 3b7e176cf7a2..687d77aa66bb 100644
> --- a/kernel/ucount.c
> +++ b/kernel/ucount.c
> @@ -285,6 +285,47 @@ bool dec_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v)
> return (new == 0);
> }
>
> +static void do_dec_rlimit_put_ucounts(struct ucounts *ucounts,
> + struct ucounts *last, enum ucount_type type)
> +{
> + struct ucounts *iter;
> + for (iter = ucounts; iter != last; iter = iter->ns->ucounts) {
> + long dec = atomic_long_add_return(-1, &iter->ucount[type]);
> + WARN_ON_ONCE(dec < 0);
> + if (dec == 0)
> + put_ucounts(iter);
> + }
> +}
> +
> +void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum ucount_type type)
> +{
> + do_dec_rlimit_put_ucounts(ucounts, NULL, type);
> +}
> +
> +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum ucount_type type)
> +{
> + struct ucounts *iter;
> + long dec, ret = 0;
> +
> + for (iter = ucounts; iter; iter = iter->ns->ucounts) {
> + long max = READ_ONCE(iter->ns->ucount_max[type]);
> + long new = atomic_long_add_return(1, &iter->ucount[type]);
> + if (new < 0 || new > max)
> + goto unwind;
> + else if (iter == ucounts)
> + ret = new;
> + if ((new == 1) && (get_ucounts(iter) != iter))
get_ucounts can do put_ucounts. Are you sure it's correct to use
get_ucounts here?
> + goto dec_unwind;
> + }
> + return ret;
> +dec_unwind:
> + dec = atomic_long_add_return(1, &iter->ucount[type]);
Should be -1 ?
> + WARN_ON_ONCE(dec < 0);
> +unwind:
> + do_dec_rlimit_put_ucounts(ucounts, iter, type);
> + return LONG_MAX;
> +}
> +
> bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsigned long max)
> {
> struct ucounts *iter;
> --
> 2.20.1
>
--
Rgrds, legion
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [CFT][PATCH] ucounts: Fix signal ucount refcounting
2021-10-15 23:09 ` Alexey Gladkov
@ 2021-10-16 17:34 ` Eric W. Biederman
2021-10-17 19:35 ` Yu Zhao
0 siblings, 1 reply; 13+ messages in thread
From: Eric W. Biederman @ 2021-10-16 17:34 UTC (permalink / raw)
To: Alexey Gladkov
Cc: Rune Kleveland, Yu Zhao, Jordan Glover, LKML, linux-mm, containers
Alexey Gladkov <legion@kernel.org> writes:
> On Fri, Oct 15, 2021 at 05:10:58PM -0500, Eric W. Biederman wrote:
>>
>> In commit fda31c50292a ("signal: avoid double atomic counter
>> increments for user accounting") Linus made a clever optimization to
>> how rlimits and the struct user_struct. Unfortunately that
>> optimization does not work in the obvious way when moved to nested
>> rlimits. The problem is that the last decrement of the per user
>> namespace per user sigpending counter might also be the last decrement
>> of the sigpending counter in the parent user namespace as well. Which
>> means that simply freeing the leaf ucount in __free_sigqueue is not
>> enough.
>>
>> Maintain the optimization and handle the tricky cases by introducing
>> inc_rlimit_get_ucounts and dec_rlimit_put_ucounts.
>>
>> By moving the entire optimization into functions that perform all of
>> the work it becomes possible to ensure that every level is handled
>> properly.
>>
>> I wish we had a single user across all of the threads whose rlimit
>> could be charged so we did not need this complexity.
>>
>> Cc: stable@vger.kernel.org
>> Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts")
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>>
>> With a lot of help from Alex who found a way I could reproduce this
>> I believe I have found the issue.
>>
>> Could people who are seeing this issue test and verify this solves the
>> problem for them?
>>
>> include/linux/user_namespace.h | 2 ++
>> kernel/signal.c | 25 +++++----------------
>> kernel/ucount.c | 41 ++++++++++++++++++++++++++++++++++
>> 3 files changed, 49 insertions(+), 19 deletions(-)
>>
>> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
>> index eb70cabe6e7f..33a4240e6a6f 100644
>> --- a/include/linux/user_namespace.h
>> +++ b/include/linux/user_namespace.h
>> @@ -127,6 +127,8 @@ static inline long get_ucounts_value(struct ucounts *ucounts, enum ucount_type t
>>
>> long inc_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v);
>> bool dec_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v);
>> +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum ucount_type type);
>> +void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum ucount_type type);
>> bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsigned long max);
>>
>> static inline void set_rlimit_ucount_max(struct user_namespace *ns,
>> diff --git a/kernel/signal.c b/kernel/signal.c
>> index a3229add4455..762de58c6e76 100644
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -425,22 +425,10 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags,
>> */
>> rcu_read_lock();
>> ucounts = task_ucounts(t);
>> - sigpending = inc_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1);
>> - switch (sigpending) {
>> - case 1:
>> - if (likely(get_ucounts(ucounts)))
>> - break;
>> - fallthrough;
>> - case LONG_MAX:
>> - /*
>> - * we need to decrease the ucount in the userns tree on any
>> - * failure to avoid counts leaking.
>> - */
>> - dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1);
>> - rcu_read_unlock();
>> - return NULL;
>> - }
>> + sigpending = inc_rlimit_get_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING);
>> rcu_read_unlock();
>> + if (sigpending == LONG_MAX)
>> + return NULL;
>>
>> if (override_rlimit || likely(sigpending <= task_rlimit(t, RLIMIT_SIGPENDING))) {
>> q = kmem_cache_alloc(sigqueue_cachep, gfp_flags);
>> @@ -449,8 +437,7 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags,
>> }
>>
>> if (unlikely(q == NULL)) {
>> - if (dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1))
>> - put_ucounts(ucounts);
>> + dec_rlimit_put_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING);
>> } else {
>> INIT_LIST_HEAD(&q->list);
>> q->flags = sigqueue_flags;
>> @@ -463,8 +450,8 @@ static void __sigqueue_free(struct sigqueue *q)
>> {
>> if (q->flags & SIGQUEUE_PREALLOC)
>> return;
>> - if (q->ucounts && dec_rlimit_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING, 1)) {
>> - put_ucounts(q->ucounts);
>> + if (q->ucounts) {
>> + dec_rlimit_put_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING);
>> q->ucounts = NULL;
>> }
>> kmem_cache_free(sigqueue_cachep, q);
>> diff --git a/kernel/ucount.c b/kernel/ucount.c
>> index 3b7e176cf7a2..687d77aa66bb 100644
>> --- a/kernel/ucount.c
>> +++ b/kernel/ucount.c
>> @@ -285,6 +285,47 @@ bool dec_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v)
>> return (new == 0);
>> }
>>
>> +static void do_dec_rlimit_put_ucounts(struct ucounts *ucounts,
>> + struct ucounts *last, enum ucount_type type)
>> +{
>> + struct ucounts *iter;
>> + for (iter = ucounts; iter != last; iter = iter->ns->ucounts) {
>> + long dec = atomic_long_add_return(-1, &iter->ucount[type]);
>> + WARN_ON_ONCE(dec < 0);
>> + if (dec == 0)
>> + put_ucounts(iter);
>> + }
>> +}
>> +
>> +void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum ucount_type type)
>> +{
>> + do_dec_rlimit_put_ucounts(ucounts, NULL, type);
>> +}
>> +
>> +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum ucount_type type)
>> +{
>> + struct ucounts *iter;
>> + long dec, ret = 0;
>> +
>> + for (iter = ucounts; iter; iter = iter->ns->ucounts) {
>> + long max = READ_ONCE(iter->ns->ucount_max[type]);
>> + long new = atomic_long_add_return(1, &iter->ucount[type]);
>> + if (new < 0 || new > max)
>> + goto unwind;
>> + else if (iter == ucounts)
>> + ret = new;
>> + if ((new == 1) && (get_ucounts(iter) != iter))
>
> get_ucounts can do put_ucounts. Are you sure it's correct to use
> get_ucounts here?
My only concern would be if we could not run inc_rlimit_get_ucounts
would not be safe to call under rcu_read_lock(). I don't see anything
in get_ucounts or put_ucounts that would not be safe under
rcu_read_lock().
For get_ucounts we do need to test to see if it fails. Either by
testing for NULL or testing to see if it does not return the expected
ucount.
Does that make sense or do you have another concern?
>> + goto dec_unwind;
>> + }
>> + return ret;
>> +dec_unwind:
>> + dec = atomic_long_add_return(1, &iter->ucount[type]);
>
> Should be -1 ?
Yes it should. I will fix and resend.
>> + WARN_ON_ONCE(dec < 0);
>> +unwind:
>> + do_dec_rlimit_put_ucounts(ucounts, iter, type);
>> + return LONG_MAX;
>> +}
>> +
>> bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsigned long max)
>> {
>> struct ucounts *iter;
>> --
>> 2.20.1
>>
Eric
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [CFT][PATCH] ucounts: Fix signal ucount refcounting
2021-10-16 17:34 ` Eric W. Biederman
@ 2021-10-17 19:35 ` Yu Zhao
2021-10-18 15:35 ` Eric W. Biederman
0 siblings, 1 reply; 13+ messages in thread
From: Yu Zhao @ 2021-10-17 19:35 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Alexey Gladkov, Rune Kleveland, Jordan Glover, LKML, Linux-MM,
containers\@lists.linux-foundation.org
On Sat, Oct 16, 2021 at 11:35 AM Eric W. Biederman
<ebiederm@xmission.com> wrote:
>
> Alexey Gladkov <legion@kernel.org> writes:
>
> > On Fri, Oct 15, 2021 at 05:10:58PM -0500, Eric W. Biederman wrote:
> >>
> >> In commit fda31c50292a ("signal: avoid double atomic counter
> >> increments for user accounting") Linus made a clever optimization to
> >> how rlimits and the struct user_struct. Unfortunately that
> >> optimization does not work in the obvious way when moved to nested
> >> rlimits. The problem is that the last decrement of the per user
> >> namespace per user sigpending counter might also be the last decrement
> >> of the sigpending counter in the parent user namespace as well. Which
> >> means that simply freeing the leaf ucount in __free_sigqueue is not
> >> enough.
> >>
> >> Maintain the optimization and handle the tricky cases by introducing
> >> inc_rlimit_get_ucounts and dec_rlimit_put_ucounts.
> >>
> >> By moving the entire optimization into functions that perform all of
> >> the work it becomes possible to ensure that every level is handled
> >> properly.
> >>
> >> I wish we had a single user across all of the threads whose rlimit
> >> could be charged so we did not need this complexity.
> >>
> >> Cc: stable@vger.kernel.org
> >> Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts")
> >> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> >> ---
> >>
> >> With a lot of help from Alex who found a way I could reproduce this
> >> I believe I have found the issue.
> >>
> >> Could people who are seeing this issue test and verify this solves the
> >> problem for them?
> >>
> >> include/linux/user_namespace.h | 2 ++
> >> kernel/signal.c | 25 +++++----------------
> >> kernel/ucount.c | 41 ++++++++++++++++++++++++++++++++++
> >> 3 files changed, 49 insertions(+), 19 deletions(-)
> >>
> >> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> >> index eb70cabe6e7f..33a4240e6a6f 100644
> >> --- a/include/linux/user_namespace.h
> >> +++ b/include/linux/user_namespace.h
> >> @@ -127,6 +127,8 @@ static inline long get_ucounts_value(struct ucounts *ucounts, enum ucount_type t
> >>
> >> long inc_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v);
> >> bool dec_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v);
> >> +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum ucount_type type);
> >> +void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum ucount_type type);
> >> bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsigned long max);
> >>
> >> static inline void set_rlimit_ucount_max(struct user_namespace *ns,
> >> diff --git a/kernel/signal.c b/kernel/signal.c
> >> index a3229add4455..762de58c6e76 100644
> >> --- a/kernel/signal.c
> >> +++ b/kernel/signal.c
> >> @@ -425,22 +425,10 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags,
> >> */
> >> rcu_read_lock();
> >> ucounts = task_ucounts(t);
> >> - sigpending = inc_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1);
> >> - switch (sigpending) {
> >> - case 1:
> >> - if (likely(get_ucounts(ucounts)))
> >> - break;
> >> - fallthrough;
> >> - case LONG_MAX:
> >> - /*
> >> - * we need to decrease the ucount in the userns tree on any
> >> - * failure to avoid counts leaking.
> >> - */
> >> - dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1);
> >> - rcu_read_unlock();
> >> - return NULL;
> >> - }
> >> + sigpending = inc_rlimit_get_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING);
> >> rcu_read_unlock();
> >> + if (sigpending == LONG_MAX)
> >> + return NULL;
> >>
> >> if (override_rlimit || likely(sigpending <= task_rlimit(t, RLIMIT_SIGPENDING))) {
> >> q = kmem_cache_alloc(sigqueue_cachep, gfp_flags);
> >> @@ -449,8 +437,7 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags,
> >> }
> >>
> >> if (unlikely(q == NULL)) {
> >> - if (dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1))
> >> - put_ucounts(ucounts);
> >> + dec_rlimit_put_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING);
> >> } else {
> >> INIT_LIST_HEAD(&q->list);
> >> q->flags = sigqueue_flags;
> >> @@ -463,8 +450,8 @@ static void __sigqueue_free(struct sigqueue *q)
> >> {
> >> if (q->flags & SIGQUEUE_PREALLOC)
> >> return;
> >> - if (q->ucounts && dec_rlimit_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING, 1)) {
> >> - put_ucounts(q->ucounts);
> >> + if (q->ucounts) {
> >> + dec_rlimit_put_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING);
> >> q->ucounts = NULL;
> >> }
> >> kmem_cache_free(sigqueue_cachep, q);
> >> diff --git a/kernel/ucount.c b/kernel/ucount.c
> >> index 3b7e176cf7a2..687d77aa66bb 100644
> >> --- a/kernel/ucount.c
> >> +++ b/kernel/ucount.c
> >> @@ -285,6 +285,47 @@ bool dec_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v)
> >> return (new == 0);
> >> }
> >>
> >> +static void do_dec_rlimit_put_ucounts(struct ucounts *ucounts,
> >> + struct ucounts *last, enum ucount_type type)
> >> +{
> >> + struct ucounts *iter;
> >> + for (iter = ucounts; iter != last; iter = iter->ns->ucounts) {
> >> + long dec = atomic_long_add_return(-1, &iter->ucount[type]);
> >> + WARN_ON_ONCE(dec < 0);
> >> + if (dec == 0)
> >> + put_ucounts(iter);
> >> + }
> >> +}
> >> +
> >> +void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum ucount_type type)
> >> +{
> >> + do_dec_rlimit_put_ucounts(ucounts, NULL, type);
> >> +}
> >> +
> >> +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum ucount_type type)
> >> +{
> >> + struct ucounts *iter;
> >> + long dec, ret = 0;
> >> +
> >> + for (iter = ucounts; iter; iter = iter->ns->ucounts) {
> >> + long max = READ_ONCE(iter->ns->ucount_max[type]);
> >> + long new = atomic_long_add_return(1, &iter->ucount[type]);
> >> + if (new < 0 || new > max)
> >> + goto unwind;
> >> + else if (iter == ucounts)
> >> + ret = new;
> >> + if ((new == 1) && (get_ucounts(iter) != iter))
> >
> > get_ucounts can do put_ucounts. Are you sure it's correct to use
> > get_ucounts here?
>
> My only concern would be if we could not run inc_rlimit_get_ucounts
> would not be safe to call under rcu_read_lock(). I don't see anything
> in get_ucounts or put_ucounts that would not be safe under
> rcu_read_lock().
>
> For get_ucounts we do need to test to see if it fails. Either by
> testing for NULL or testing to see if it does not return the expected
> ucount.
>
> Does that make sense or do you have another concern?
>
>
> >> + goto dec_unwind;
> >> + }
> >> + return ret;
> >> +dec_unwind:
> >> + dec = atomic_long_add_return(1, &iter->ucount[type]);
> >
> > Should be -1 ?
>
> Yes it should. I will fix and resend.
Or just atomic_long_dec_return().
> >> + WARN_ON_ONCE(dec < 0);
> >> +unwind:
> >> + do_dec_rlimit_put_ucounts(ucounts, iter, type);
> >> + return LONG_MAX;
> >> +}
> >> +
> >> bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsigned long max)
> >> {
> >> struct ucounts *iter;
> >> --
> >> 2.20.1
> >>
>
> Eric
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [CFT][PATCH] ucounts: Fix signal ucount refcounting
2021-10-17 19:35 ` Yu Zhao
@ 2021-10-18 15:35 ` Eric W. Biederman
0 siblings, 0 replies; 13+ messages in thread
From: Eric W. Biederman @ 2021-10-18 15:35 UTC (permalink / raw)
To: Yu Zhao
Cc: Alexey Gladkov, Rune Kleveland, Jordan Glover, LKML, Linux-MM,
containers
Yu Zhao <yuzhao@google.com> writes:
> On Sat, Oct 16, 2021 at 11:35 AM Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>>
>> Alexey Gladkov <legion@kernel.org> writes:
>>
>> > On Fri, Oct 15, 2021 at 05:10:58PM -0500, Eric W. Biederman wrote:
>> >> + goto dec_unwind;
>> >> + }
>> >> + return ret;
>> >> +dec_unwind:
>> >> + dec = atomic_long_add_return(1, &iter->ucount[type]);
>> >
>> > Should be -1 ?
>>
>> Yes it should. I will fix and resend.
>
> Or just atomic_long_dec_return().
It would have to be atomic_long_sub_return().
Even then I would want to change all of kernel/ucount.c to use
the same helper function so discrepancies can easily be spotted.
It is a good idea, just not I think for this particular patch.
Eric
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [CFT][PATCH] ucounts: Fix signal ucount refcounting
2021-10-15 22:10 ` [CFT][PATCH] ucounts: Fix signal ucount refcounting Eric W. Biederman
2021-10-15 23:09 ` Alexey Gladkov
@ 2021-10-16 2:08 ` Hillf Danton
2021-10-16 18:00 ` Eric W. Biederman
2021-10-17 16:47 ` Rune Kleveland
2 siblings, 1 reply; 13+ messages in thread
From: Hillf Danton @ 2021-10-16 2:08 UTC (permalink / raw)
To: Eric W . Biederman
Cc: Rune Kleveland, Yu Zhao, Alexey Gladkov, Jordan Glover, LKML,
linux-mm, containers
On Fri, 15 Oct 2021 17:10:58 -0500 Eric W. Biederman wrote:
>
> In commit fda31c50292a ("signal: avoid double atomic counter
> increments for user accounting") Linus made a clever optimization to
> how rlimits and the struct user_struct. Unfortunately that
> optimization does not work in the obvious way when moved to nested
> rlimits. The problem is that the last decrement of the per user
> namespace per user sigpending counter might also be the last decrement
> of the sigpending counter in the parent user namespace as well. Which
> means that simply freeing the leaf ucount in __free_sigqueue is not
> enough.
>
> Maintain the optimization and handle the tricky cases by introducing
> inc_rlimit_get_ucounts and dec_rlimit_put_ucounts.
>
> By moving the entire optimization into functions that perform all of
> the work it becomes possible to ensure that every level is handled
> properly.
>
> I wish we had a single user across all of the threads whose rlimit
> could be charged so we did not need this complexity.
>
> Cc: stable@vger.kernel.org
> Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts")
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
>
> With a lot of help from Alex who found a way I could reproduce this
> I believe I have found the issue.
>
> Could people who are seeing this issue test and verify this solves the
> problem for them?
>
> include/linux/user_namespace.h | 2 ++
> kernel/signal.c | 25 +++++----------------
> kernel/ucount.c | 41 ++++++++++++++++++++++++++++++++++
> 3 files changed, 49 insertions(+), 19 deletions(-)
>
> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
> index eb70cabe6e7f..33a4240e6a6f 100644
> --- a/include/linux/user_namespace.h
> +++ b/include/linux/user_namespace.h
> @@ -127,6 +127,8 @@ static inline long get_ucounts_value(struct ucounts *ucounts, enum ucount_type t
>
> long inc_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v);
> bool dec_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v);
> +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum ucount_type type);
> +void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum ucount_type type);
> bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsigned long max);
>
> static inline void set_rlimit_ucount_max(struct user_namespace *ns,
> diff --git a/kernel/signal.c b/kernel/signal.c
> index a3229add4455..762de58c6e76 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -425,22 +425,10 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags,
> */
> rcu_read_lock();
> ucounts = task_ucounts(t);
> - sigpending = inc_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1);
> - switch (sigpending) {
> - case 1:
> - if (likely(get_ucounts(ucounts)))
> - break;
> - fallthrough;
> - case LONG_MAX:
> - /*
> - * we need to decrease the ucount in the userns tree on any
> - * failure to avoid counts leaking.
> - */
> - dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1);
> - rcu_read_unlock();
> - return NULL;
> - }
> + sigpending = inc_rlimit_get_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING);
> rcu_read_unlock();
> + if (sigpending == LONG_MAX)
> + return NULL;
>
> if (override_rlimit || likely(sigpending <= task_rlimit(t, RLIMIT_SIGPENDING))) {
> q = kmem_cache_alloc(sigqueue_cachep, gfp_flags);
> @@ -449,8 +437,7 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags,
> }
>
> if (unlikely(q == NULL)) {
> - if (dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1))
> - put_ucounts(ucounts);
> + dec_rlimit_put_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING);
> } else {
> INIT_LIST_HEAD(&q->list);
> q->flags = sigqueue_flags;
> @@ -463,8 +450,8 @@ static void __sigqueue_free(struct sigqueue *q)
> {
> if (q->flags & SIGQUEUE_PREALLOC)
> return;
> - if (q->ucounts && dec_rlimit_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING, 1)) {
> - put_ucounts(q->ucounts);
> + if (q->ucounts) {
> + dec_rlimit_put_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING);
> q->ucounts = NULL;
> }
> kmem_cache_free(sigqueue_cachep, q);
> diff --git a/kernel/ucount.c b/kernel/ucount.c
> index 3b7e176cf7a2..687d77aa66bb 100644
> --- a/kernel/ucount.c
> +++ b/kernel/ucount.c
> @@ -285,6 +285,47 @@ bool dec_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v)
> return (new == 0);
> }
>
> +static void do_dec_rlimit_put_ucounts(struct ucounts *ucounts,
> + struct ucounts *last, enum ucount_type type)
> +{
> + struct ucounts *iter;
> + for (iter = ucounts; iter != last; iter = iter->ns->ucounts) {
> + long dec = atomic_long_add_return(-1, &iter->ucount[type]);
> + WARN_ON_ONCE(dec < 0);
> + if (dec == 0)
> + put_ucounts(iter);
> + }
Given kfree in put_ucounts(), this has difficulty surviving tests like
kasan if the put pairs with the get in the below inc_rlimit_get_ucounts().
> +}
> +
> +void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum ucount_type type)
> +{
> + do_dec_rlimit_put_ucounts(ucounts, NULL, type);
> +}
> +
> +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum ucount_type type)
> +{
> + struct ucounts *iter;
> + long dec, ret = 0;
> +
> + for (iter = ucounts; iter; iter = iter->ns->ucounts) {
> + long max = READ_ONCE(iter->ns->ucount_max[type]);
> + long new = atomic_long_add_return(1, &iter->ucount[type]);
> + if (new < 0 || new > max)
> + goto unwind;
> + else if (iter == ucounts)
> + ret = new;
> + if ((new == 1) && (get_ucounts(iter) != iter))
> + goto dec_unwind;
Add a line of comment for get to ease readers.
Hillf
> + }
> + return ret;
> +dec_unwind:
> + dec = atomic_long_add_return(1, &iter->ucount[type]);
> + WARN_ON_ONCE(dec < 0);
> +unwind:
> + do_dec_rlimit_put_ucounts(ucounts, iter, type);
> + return LONG_MAX;
> +}
> +
> bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsigned long max)
> {
> struct ucounts *iter;
> --
> 2.20.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [CFT][PATCH] ucounts: Fix signal ucount refcounting
2021-10-16 2:08 ` Hillf Danton
@ 2021-10-16 18:00 ` Eric W. Biederman
0 siblings, 0 replies; 13+ messages in thread
From: Eric W. Biederman @ 2021-10-16 18:00 UTC (permalink / raw)
To: Hillf Danton
Cc: Rune Kleveland, Yu Zhao, Alexey Gladkov, Jordan Glover, LKML,
linux-mm, containers
Hillf Danton <hdanton@sina.com> writes:
> On Fri, 15 Oct 2021 17:10:58 -0500 Eric W. Biederman wrote:
>>
>> In commit fda31c50292a ("signal: avoid double atomic counter
>> increments for user accounting") Linus made a clever optimization to
>> how rlimits and the struct user_struct. Unfortunately that
>> optimization does not work in the obvious way when moved to nested
>> rlimits. The problem is that the last decrement of the per user
>> namespace per user sigpending counter might also be the last decrement
>> of the sigpending counter in the parent user namespace as well. Which
>> means that simply freeing the leaf ucount in __free_sigqueue is not
>> enough.
>>
>> Maintain the optimization and handle the tricky cases by introducing
>> inc_rlimit_get_ucounts and dec_rlimit_put_ucounts.
>>
>> By moving the entire optimization into functions that perform all of
>> the work it becomes possible to ensure that every level is handled
>> properly.
>>
>> I wish we had a single user across all of the threads whose rlimit
>> could be charged so we did not need this complexity.
>>
>> Cc: stable@vger.kernel.org
>> Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts")
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
>> ---
>>
>> With a lot of help from Alex who found a way I could reproduce this
>> I believe I have found the issue.
>>
>> Could people who are seeing this issue test and verify this solves the
>> problem for them?
>>
>> include/linux/user_namespace.h | 2 ++
>> kernel/signal.c | 25 +++++----------------
>> kernel/ucount.c | 41 ++++++++++++++++++++++++++++++++++
>> 3 files changed, 49 insertions(+), 19 deletions(-)
>>
>> diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
>> index eb70cabe6e7f..33a4240e6a6f 100644
>> --- a/include/linux/user_namespace.h
>> +++ b/include/linux/user_namespace.h
>> @@ -127,6 +127,8 @@ static inline long get_ucounts_value(struct ucounts *ucounts, enum ucount_type t
>>
>> long inc_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v);
>> bool dec_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v);
>> +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum ucount_type type);
>> +void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum ucount_type type);
>> bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsigned long max);
>>
>> static inline void set_rlimit_ucount_max(struct user_namespace *ns,
>> diff --git a/kernel/signal.c b/kernel/signal.c
>> index a3229add4455..762de58c6e76 100644
>> --- a/kernel/signal.c
>> +++ b/kernel/signal.c
>> @@ -425,22 +425,10 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags,
>> */
>> rcu_read_lock();
>> ucounts = task_ucounts(t);
>> - sigpending = inc_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1);
>> - switch (sigpending) {
>> - case 1:
>> - if (likely(get_ucounts(ucounts)))
>> - break;
>> - fallthrough;
>> - case LONG_MAX:
>> - /*
>> - * we need to decrease the ucount in the userns tree on any
>> - * failure to avoid counts leaking.
>> - */
>> - dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1);
>> - rcu_read_unlock();
>> - return NULL;
>> - }
>> + sigpending = inc_rlimit_get_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING);
>> rcu_read_unlock();
>> + if (sigpending == LONG_MAX)
>> + return NULL;
>>
>> if (override_rlimit || likely(sigpending <= task_rlimit(t, RLIMIT_SIGPENDING))) {
>> q = kmem_cache_alloc(sigqueue_cachep, gfp_flags);
>> @@ -449,8 +437,7 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags,
>> }
>>
>> if (unlikely(q == NULL)) {
>> - if (dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, 1))
>> - put_ucounts(ucounts);
>> + dec_rlimit_put_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING);
>> } else {
>> INIT_LIST_HEAD(&q->list);
>> q->flags = sigqueue_flags;
>> @@ -463,8 +450,8 @@ static void __sigqueue_free(struct sigqueue *q)
>> {
>> if (q->flags & SIGQUEUE_PREALLOC)
>> return;
>> - if (q->ucounts && dec_rlimit_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING, 1)) {
>> - put_ucounts(q->ucounts);
>> + if (q->ucounts) {
>> + dec_rlimit_put_ucounts(q->ucounts, UCOUNT_RLIMIT_SIGPENDING);
>> q->ucounts = NULL;
>> }
>> kmem_cache_free(sigqueue_cachep, q);
>> diff --git a/kernel/ucount.c b/kernel/ucount.c
>> index 3b7e176cf7a2..687d77aa66bb 100644
>> --- a/kernel/ucount.c
>> +++ b/kernel/ucount.c
>> @@ -285,6 +285,47 @@ bool dec_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v)
>> return (new == 0);
>> }
>>
>> +static void do_dec_rlimit_put_ucounts(struct ucounts *ucounts,
>> + struct ucounts *last, enum ucount_type type)
>> +{
>> + struct ucounts *iter;
>> + for (iter = ucounts; iter != last; iter = iter->ns->ucounts) {
>> + long dec = atomic_long_add_return(-1, &iter->ucount[type]);
>> + WARN_ON_ONCE(dec < 0);
>> + if (dec == 0)
>> + put_ucounts(iter);
>> + }
>
> Given kfree in put_ucounts(), this has difficulty surviving tests like
> kasan if the put pairs with the get in the below
> inc_rlimit_get_ucounts().
I don't know if this is what you are thinking about but there is indeed
a bug in that loop caused by kfree.
The problem is that iter->ns->ucounts is read after put_ucounts. It
just needs to be read before hand.
>> +}
>> +
>> +void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum ucount_type type)
>> +{
>> + do_dec_rlimit_put_ucounts(ucounts, NULL, type);
>> +}
>> +
>> +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum ucount_type type)
>> +{
>> + struct ucounts *iter;
>> + long dec, ret = 0;
>> +
>> + for (iter = ucounts; iter; iter = iter->ns->ucounts) {
>> + long max = READ_ONCE(iter->ns->ucount_max[type]);
>> + long new = atomic_long_add_return(1, &iter->ucount[type]);
>> + if (new < 0 || new > max)
>> + goto unwind;
>> + else if (iter == ucounts)
>> + ret = new;
>> + if ((new == 1) && (get_ucounts(iter) != iter))
>> + goto dec_unwind;
>
> Add a line of comment for get to ease readers.
/* you are not expected to understand this */
I think that is the classic comment from unix source. Seriously I can't
think of any comment that will make the situation more comprehensible.
> Hillf
>
>> + }
>> + return ret;
>> +dec_unwind:
>> + dec = atomic_long_add_return(1, &iter->ucount[type]);
>> + WARN_ON_ONCE(dec < 0);
>> +unwind:
>> + do_dec_rlimit_put_ucounts(ucounts, iter, type);
>> + return LONG_MAX;
>> +}
>> +
>> bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsigned long max)
>> {
>> struct ucounts *iter;
>> --
>> 2.20.1
Eric
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [CFT][PATCH] ucounts: Fix signal ucount refcounting
2021-10-15 22:10 ` [CFT][PATCH] ucounts: Fix signal ucount refcounting Eric W. Biederman
2021-10-15 23:09 ` Alexey Gladkov
2021-10-16 2:08 ` Hillf Danton
@ 2021-10-17 16:47 ` Rune Kleveland
2021-10-18 6:25 ` Yu Zhao
2 siblings, 1 reply; 13+ messages in thread
From: Rune Kleveland @ 2021-10-17 16:47 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Yu Zhao, Alexey Gladkov, Jordan Glover, LKML, linux-mm, containers
Hi!
After applying the below patch, the 5 most problematic servers have run
without any issues for 23 hours. That never happened before the patch on
5.14, so the patch seems to have fixed the issue for me.
On Monday there will be more load on the servers, which caused them to
crash faster without the patch. I will let you know if it happens again.
Best regards,
Rune
On 16/10/2021 00:10, Eric W. Biederman wrote:
>
> In commit fda31c50292a ("signal: avoid double atomic counter
> increments for user accounting") Linus made a clever optimization to
> how rlimits and the struct user_struct. Unfortunately that
> optimization does not work in the obvious way when moved to nested
> rlimits. The problem is that the last decrement of the per user
> namespace per user sigpending counter might also be the last decrement
> of the sigpending counter in the parent user namespace as well. Which
> means that simply freeing the leaf ucount in __free_sigqueue is not
> enough.
>
> Maintain the optimization and handle the tricky cases by introducing
> inc_rlimit_get_ucounts and dec_rlimit_put_ucounts.
>
> By moving the entire optimization into functions that perform all of
> the work it becomes possible to ensure that every level is handled
> properly.
>
> I wish we had a single user across all of the threads whose rlimit
> could be charged so we did not need this complexity.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [CFT][PATCH] ucounts: Fix signal ucount refcounting
2021-10-17 16:47 ` Rune Kleveland
@ 2021-10-18 6:25 ` Yu Zhao
2021-10-18 10:31 ` Jordan Glover
0 siblings, 1 reply; 13+ messages in thread
From: Yu Zhao @ 2021-10-18 6:25 UTC (permalink / raw)
To: Rune Kleveland, Eric W. Biederman
Cc: Alexey Gladkov, Jordan Glover, LKML, Linux-MM,
containers\@lists.linux-foundation.org
On Sun, Oct 17, 2021 at 10:47 AM Rune Kleveland
<rune.kleveland@infomedia.dk> wrote:
>
> Hi!
>
> After applying the below patch, the 5 most problematic servers have run
> without any issues for 23 hours. That never happened before the patch on
> 5.14, so the patch seems to have fixed the issue for me.
Confirm. I couldn't reproduce the problem on 5.14 either.
> On Monday there will be more load on the servers, which caused them to
> crash faster without the patch. I will let you know if it happens again.
>
> Best regards,
> Rune
>
> On 16/10/2021 00:10, Eric W. Biederman wrote:
> >
> > In commit fda31c50292a ("signal: avoid double atomic counter
> > increments for user accounting") Linus made a clever optimization to
> > how rlimits and the struct user_struct. Unfortunately that
> > optimization does not work in the obvious way when moved to nested
> > rlimits. The problem is that the last decrement of the per user
> > namespace per user sigpending counter might also be the last decrement
> > of the sigpending counter in the parent user namespace as well. Which
> > means that simply freeing the leaf ucount in __free_sigqueue is not
> > enough.
> >
> > Maintain the optimization and handle the tricky cases by introducing
> > inc_rlimit_get_ucounts and dec_rlimit_put_ucounts.
> >
> > By moving the entire optimization into functions that perform all of
> > the work it becomes possible to ensure that every level is handled
> > properly.
> >
> > I wish we had a single user across all of the threads whose rlimit
> > could be charged so we did not need this complexity.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [CFT][PATCH] ucounts: Fix signal ucount refcounting
2021-10-18 6:25 ` Yu Zhao
@ 2021-10-18 10:31 ` Jordan Glover
0 siblings, 0 replies; 13+ messages in thread
From: Jordan Glover @ 2021-10-18 10:31 UTC (permalink / raw)
To: Yu Zhao
Cc: Rune Kleveland, Eric W. Biederman, Alexey Gladkov, LKML,
Linux-MM, containers\\@lists.linux-foundation.org
On Monday, October 18th, 2021 at 6:25 AM, Yu Zhao <yuzhao@google.com> wrote:
> On Sun, Oct 17, 2021 at 10:47 AM Rune Kleveland
>
> rune.kleveland@infomedia.dk wrote:
>
> > Hi!
> >
> > After applying the below patch, the 5 most problematic servers have run
> >
> > without any issues for 23 hours. That never happened before the patch on
> >
> > 5.14, so the patch seems to have fixed the issue for me.
>
> Confirm. I couldn't reproduce the problem on 5.14 either.
>
I'm also unable to reproduce the crash as for now. Thx for the patch.
Jordan
> > On Monday there will be more load on the servers, which caused them to
> >
> > crash faster without the patch. I will let you know if it happens again.
> >
> > Best regards,
> >
> > Rune
> >
> > On 16/10/2021 00:10, Eric W. Biederman wrote:
> >
> > > In commit fda31c50292a ("signal: avoid double atomic counter
> > >
> > > increments for user accounting") Linus made a clever optimization to
> > >
> > > how rlimits and the struct user_struct. Unfortunately that
> > >
> > > optimization does not work in the obvious way when moved to nested
> > >
> > > rlimits. The problem is that the last decrement of the per user
> > >
> > > namespace per user sigpending counter might also be the last decrement
> > >
> > > of the sigpending counter in the parent user namespace as well. Which
> > >
> > > means that simply freeing the leaf ucount in __free_sigqueue is not
> > >
> > > enough.
> > >
> > > Maintain the optimization and handle the tricky cases by introducing
> > >
> > > inc_rlimit_get_ucounts and dec_rlimit_put_ucounts.
> > >
> > > By moving the entire optimization into functions that perform all of
> > >
> > > the work it becomes possible to ensure that every level is handled
> > >
> > > properly.
> > >
> > > I wish we had a single user across all of the threads whose rlimit
> > >
> > > could be charged so we did not need this complexity.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2021-11-27 1:35 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-27 1:35 [CFT][PATCH] ucounts: Fix signal ucount refcounting kernel test robot
-- strict thread matches above, loose matches on Subject: below --
2021-11-26 15:09 kernel test robot
2021-10-17 13:36 kernel test robot
2021-09-15 19:49 linux 5.14.3: free_user_ns causes NULL pointer dereference Jordan Glover
2021-09-15 21:02 ` Eric W. Biederman
2021-09-15 22:42 ` Jordan Glover
2021-09-15 23:47 ` Jordan Glover
2021-09-16 17:30 ` Eric W. Biederman
2021-09-28 13:40 ` Jordan Glover
2021-09-29 17:36 ` Alexey Gladkov
2021-09-29 21:39 ` Jordan Glover
2021-09-30 13:06 ` Alexey Gladkov
2021-09-30 22:27 ` Yu Zhao
2021-10-04 17:10 ` Eric W. Biederman
2021-10-04 17:19 ` Eric W. Biederman
2021-10-10 8:59 ` Rune Kleveland
2021-10-15 22:10 ` [CFT][PATCH] ucounts: Fix signal ucount refcounting Eric W. Biederman
2021-10-15 23:09 ` Alexey Gladkov
2021-10-16 17:34 ` Eric W. Biederman
2021-10-17 19:35 ` Yu Zhao
2021-10-18 15:35 ` Eric W. Biederman
2021-10-16 2:08 ` Hillf Danton
2021-10-16 18:00 ` Eric W. Biederman
2021-10-17 16:47 ` Rune Kleveland
2021-10-18 6:25 ` Yu Zhao
2021-10-18 10:31 ` Jordan Glover
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.