linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT PULL] ucount fix for v5.14-rc
@ 2021-08-05 17:15 Eric W. Biederman
  2021-08-05 19:26 ` pr-tracker-bot
       [not found] ` <20210806021052.3013-1-hdanton@sina.com>
  0 siblings, 2 replies; 9+ messages in thread
From: Eric W. Biederman @ 2021-08-05 17:15 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Alexey Gladkov


Please pull the for-v5.14 branch from the git tree:

  git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-v5.14

  HEAD: 345daff2e994ee844d6a609c37f085695fbb4c4d ucounts: Fix race condition between alloc_ucounts and put_ucounts

This is just one patch on top of v5.14-rc3 that fixes a subtle locking
versus reference counting bug in the ucount changes, found by syzbot
after you merged this code.

The ucount sysctls have a issue on big endian architectures that I hope
to get in before v5.14 is released but a patch is not ready yet.

Eric


From 345daff2e994ee844d6a609c37f085695fbb4c4d Mon Sep 17 00:00:00 2001
From: Alexey Gladkov <legion@kernel.org>
Date: Tue, 27 Jul 2021 17:24:18 +0200
Subject: [PATCH] ucounts: Fix race condition between alloc_ucounts and put_ucounts

The race happens because put_ucounts() doesn't use spinlock and
get_ucounts is not under spinlock:

CPU0                    CPU1
----                    ----
alloc_ucounts()         put_ucounts()

spin_lock_irq(&ucounts_lock);
ucounts = find_ucounts(ns, uid, hashent);

                        atomic_dec_and_test(&ucounts->count))

spin_unlock_irq(&ucounts_lock);

                        spin_lock_irqsave(&ucounts_lock, flags);
                        hlist_del_init(&ucounts->node);
                        spin_unlock_irqrestore(&ucounts_lock, flags);
                        kfree(ucounts);

ucounts = get_ucounts(ucounts);

==================================================================
BUG: KASAN: use-after-free in instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
BUG: KASAN: use-after-free in atomic_add_negative include/asm-generic/atomic-instrumented.h:556 [inline]
BUG: KASAN: use-after-free in get_ucounts kernel/ucount.c:152 [inline]
BUG: KASAN: use-after-free in get_ucounts kernel/ucount.c:150 [inline]
BUG: KASAN: use-after-free in alloc_ucounts+0x19b/0x5b0 kernel/ucount.c:188
Write of size 4 at addr ffff88802821e41c by task syz-executor.4/16785

CPU: 1 PID: 16785 Comm: syz-executor.4 Not tainted 5.14.0-rc1-next-20210712-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:105
 print_address_description.constprop.0.cold+0x6c/0x309 mm/kasan/report.c:233
 __kasan_report mm/kasan/report.c:419 [inline]
 kasan_report.cold+0x83/0xdf mm/kasan/report.c:436
 check_region_inline mm/kasan/generic.c:183 [inline]
 kasan_check_range+0x13d/0x180 mm/kasan/generic.c:189
 instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
 atomic_add_negative include/asm-generic/atomic-instrumented.h:556 [inline]
 get_ucounts kernel/ucount.c:152 [inline]
 get_ucounts kernel/ucount.c:150 [inline]
 alloc_ucounts+0x19b/0x5b0 kernel/ucount.c:188
 set_cred_ucounts+0x171/0x3a0 kernel/cred.c:684
 __sys_setuid+0x285/0x400 kernel/sys.c:623
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x4665d9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fde54097188 EFLAGS: 00000246 ORIG_RAX: 0000000000000069
RAX: ffffffffffffffda RBX: 000000000056bf80 RCX: 00000000004665d9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000000ff
RBP: 00000000004bfcb9 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000056bf80
R13: 00007ffc8655740f R14: 00007fde54097300 R15: 0000000000022000

Allocated by task 16784:
 kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
 kasan_set_track mm/kasan/common.c:46 [inline]
 set_alloc_info mm/kasan/common.c:434 [inline]
 ____kasan_kmalloc mm/kasan/common.c:513 [inline]
 ____kasan_kmalloc mm/kasan/common.c:472 [inline]
 __kasan_kmalloc+0x9b/0xd0 mm/kasan/common.c:522
 kmalloc include/linux/slab.h:591 [inline]
 kzalloc include/linux/slab.h:721 [inline]
 alloc_ucounts+0x23d/0x5b0 kernel/ucount.c:169
 set_cred_ucounts+0x171/0x3a0 kernel/cred.c:684
 __sys_setuid+0x285/0x400 kernel/sys.c:623
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Freed by task 16785:
 kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
 kasan_set_track+0x1c/0x30 mm/kasan/common.c:46
 kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:360
 ____kasan_slab_free mm/kasan/common.c:366 [inline]
 ____kasan_slab_free mm/kasan/common.c:328 [inline]
 __kasan_slab_free+0xfb/0x130 mm/kasan/common.c:374
 kasan_slab_free include/linux/kasan.h:229 [inline]
 slab_free_hook mm/slub.c:1650 [inline]
 slab_free_freelist_hook+0xdf/0x240 mm/slub.c:1675
 slab_free mm/slub.c:3235 [inline]
 kfree+0xeb/0x650 mm/slub.c:4295
 put_ucounts kernel/ucount.c:200 [inline]
 put_ucounts+0x117/0x150 kernel/ucount.c:192
 put_cred_rcu+0x27a/0x520 kernel/cred.c:124
 rcu_do_batch kernel/rcu/tree.c:2550 [inline]
 rcu_core+0x7ab/0x1380 kernel/rcu/tree.c:2785
 __do_softirq+0x29b/0x9c2 kernel/softirq.c:558

Last potentially related work creation:
 kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
 kasan_record_aux_stack+0xe5/0x110 mm/kasan/generic.c:348
 insert_work+0x48/0x370 kernel/workqueue.c:1332
 __queue_work+0x5c1/0xed0 kernel/workqueue.c:1498
 queue_work_on+0xee/0x110 kernel/workqueue.c:1525
 queue_work include/linux/workqueue.h:507 [inline]
 call_usermodehelper_exec+0x1f0/0x4c0 kernel/umh.c:435
 kobject_uevent_env+0xf8f/0x1650 lib/kobject_uevent.c:618
 netdev_queue_add_kobject net/core/net-sysfs.c:1621 [inline]
 netdev_queue_update_kobjects+0x374/0x450 net/core/net-sysfs.c:1655
 register_queue_kobjects net/core/net-sysfs.c:1716 [inline]
 netdev_register_kobject+0x35a/0x430 net/core/net-sysfs.c:1959
 register_netdevice+0xd33/0x1500 net/core/dev.c:10331
 nsim_init_netdevsim drivers/net/netdevsim/netdev.c:317 [inline]
 nsim_create+0x381/0x4d0 drivers/net/netdevsim/netdev.c:364
 __nsim_dev_port_add+0x32e/0x830 drivers/net/netdevsim/dev.c:1295
 nsim_dev_port_add_all+0x53/0x150 drivers/net/netdevsim/dev.c:1355
 nsim_dev_probe+0xcb5/0x1190 drivers/net/netdevsim/dev.c:1496
 call_driver_probe drivers/base/dd.c:517 [inline]
 really_probe+0x23c/0xcd0 drivers/base/dd.c:595
 __driver_probe_device+0x338/0x4d0 drivers/base/dd.c:747
 driver_probe_device+0x4c/0x1a0 drivers/base/dd.c:777
 __device_attach_driver+0x20b/0x2f0 drivers/base/dd.c:894
 bus_for_each_drv+0x15f/0x1e0 drivers/base/bus.c:427
 __device_attach+0x228/0x4a0 drivers/base/dd.c:965
 bus_probe_device+0x1e4/0x290 drivers/base/bus.c:487
 device_add+0xc2f/0x2180 drivers/base/core.c:3356
 nsim_bus_dev_new drivers/net/netdevsim/bus.c:431 [inline]
 new_device_store+0x436/0x710 drivers/net/netdevsim/bus.c:298
 bus_attr_store+0x72/0xa0 drivers/base/bus.c:122
 sysfs_kf_write+0x110/0x160 fs/sysfs/file.c:139
 kernfs_fop_write_iter+0x342/0x500 fs/kernfs/file.c:296
 call_write_iter include/linux/fs.h:2152 [inline]
 new_sync_write+0x426/0x650 fs/read_write.c:518
 vfs_write+0x75a/0xa40 fs/read_write.c:605
 ksys_write+0x12d/0x250 fs/read_write.c:658
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

Second to last potentially related work creation:
 kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
 kasan_record_aux_stack+0xe5/0x110 mm/kasan/generic.c:348
 insert_work+0x48/0x370 kernel/workqueue.c:1332
 __queue_work+0x5c1/0xed0 kernel/workqueue.c:1498
 queue_work_on+0xee/0x110 kernel/workqueue.c:1525
 queue_work include/linux/workqueue.h:507 [inline]
 call_usermodehelper_exec+0x1f0/0x4c0 kernel/umh.c:435
 kobject_uevent_env+0xf8f/0x1650 lib/kobject_uevent.c:618
 kobject_synth_uevent+0x701/0x850 lib/kobject_uevent.c:208
 uevent_store+0x20/0x50 drivers/base/core.c:2371
 dev_attr_store+0x50/0x80 drivers/base/core.c:2072
 sysfs_kf_write+0x110/0x160 fs/sysfs/file.c:139
 kernfs_fop_write_iter+0x342/0x500 fs/kernfs/file.c:296
 call_write_iter include/linux/fs.h:2152 [inline]
 new_sync_write+0x426/0x650 fs/read_write.c:518
 vfs_write+0x75a/0xa40 fs/read_write.c:605
 ksys_write+0x12d/0x250 fs/read_write.c:658
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae

The buggy address belongs to the object at ffff88802821e400
 which belongs to the cache kmalloc-192 of size 192
The buggy address is located 28 bytes inside of
 192-byte region [ffff88802821e400, ffff88802821e4c0)
The buggy address belongs to the page:
page:ffffea0000a08780 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x2821e
flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000000200 dead000000000100 dead000000000122 ffff888010841a00
raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x12cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY), pid 1, ts 12874702440, free_ts 12637793385
 prep_new_page mm/page_alloc.c:2433 [inline]
 get_page_from_freelist+0xa72/0x2f80 mm/page_alloc.c:4166
 __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5374
 alloc_page_interleave+0x1e/0x200 mm/mempolicy.c:2119
 alloc_pages+0x238/0x2a0 mm/mempolicy.c:2242
 alloc_slab_page mm/slub.c:1713 [inline]
 allocate_slab+0x32b/0x4c0 mm/slub.c:1853
 new_slab mm/slub.c:1916 [inline]
 new_slab_objects mm/slub.c:2662 [inline]
 ___slab_alloc+0x4ba/0x820 mm/slub.c:2825
 __slab_alloc.constprop.0+0xa7/0xf0 mm/slub.c:2865
 slab_alloc_node mm/slub.c:2947 [inline]
 slab_alloc mm/slub.c:2989 [inline]
 __kmalloc+0x312/0x330 mm/slub.c:4133
 kmalloc include/linux/slab.h:596 [inline]
 kzalloc include/linux/slab.h:721 [inline]
 __register_sysctl_table+0x112/0x1090 fs/proc/proc_sysctl.c:1318
 rds_tcp_init_net+0x1db/0x4f0 net/rds/tcp.c:551
 ops_init+0xaf/0x470 net/core/net_namespace.c:140
 __register_pernet_operations net/core/net_namespace.c:1137 [inline]
 register_pernet_operations+0x35a/0x850 net/core/net_namespace.c:1214
 register_pernet_device+0x26/0x70 net/core/net_namespace.c:1301
 rds_tcp_init+0x77/0xe0 net/rds/tcp.c:717
 do_one_initcall+0x103/0x650 init/main.c:1285
 do_initcall_level init/main.c:1360 [inline]
 do_initcalls init/main.c:1376 [inline]
 do_basic_setup init/main.c:1396 [inline]
 kernel_init_freeable+0x6b8/0x741 init/main.c:1598
page last free stack trace:
 reset_page_owner include/linux/page_owner.h:24 [inline]
 free_pages_prepare mm/page_alloc.c:1343 [inline]
 free_pcp_prepare+0x312/0x7d0 mm/page_alloc.c:1394
 free_unref_page_prepare mm/page_alloc.c:3329 [inline]
 free_unref_page+0x19/0x690 mm/page_alloc.c:3408
 __vunmap+0x783/0xb70 mm/vmalloc.c:2587
 free_work+0x58/0x70 mm/vmalloc.c:82
 process_one_work+0x98d/0x1630 kernel/workqueue.c:2276
 worker_thread+0x658/0x11f0 kernel/workqueue.c:2422
 kthread+0x3e5/0x4d0 kernel/kthread.c:319
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

Memory state around the buggy address:
 ffff88802821e300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff88802821e380: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
>ffff88802821e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                            ^
 ffff88802821e480: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
 ffff88802821e500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================

- The race fix has two parts.
  * Changing the code to guarantee that ucounts->count is only decremented
    when ucounts_lock is held.  This guarantees that find_ucounts
    will never find a structure with a zero reference count.
  * Changing alloc_ucounts to increment ucounts->count while
    ucounts_lock is held.  This guarantees the reference count on the
    found data structure will not be decremented to zero (and the data
    structure freed) before the reference count is incremented.
  -- Eric Biederman

Reported-by: syzbot+01985d7909f9468f013c@syzkaller.appspotmail.com
Reported-by: syzbot+59dd63761094a80ad06d@syzkaller.appspotmail.com
Reported-by: syzbot+6cd79f45bb8fa1c9eeae@syzkaller.appspotmail.com
Reported-by: syzbot+b6e65bd125a05f803d6b@syzkaller.appspotmail.com
Fixes: b6c336528926 ("Use atomic_t for ucounts reference counting")
Cc: Hillf Danton <hdanton@sina.com>
Signed-off-by: Alexey Gladkov <legion@kernel.org>
Link: https://lkml.kernel.org/r/7b2ace1759b281cdd2d66101d6b305deef722efb.1627397820.git.legion@kernel.org
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
---
 kernel/ucount.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/kernel/ucount.c b/kernel/ucount.c
index 87799e2379bd..77be3bbe3cc4 100644
--- a/kernel/ucount.c
+++ b/kernel/ucount.c
@@ -160,6 +160,7 @@ struct ucounts *alloc_ucounts(struct user_namespace *ns, kuid_t uid)
 {
 	struct hlist_head *hashent = ucounts_hashentry(ns, uid);
 	struct ucounts *ucounts, *new;
+	long overflow;
 
 	spin_lock_irq(&ucounts_lock);
 	ucounts = find_ucounts(ns, uid, hashent);
@@ -184,8 +185,12 @@ struct ucounts *alloc_ucounts(struct user_namespace *ns, kuid_t uid)
 			return new;
 		}
 	}
+	overflow = atomic_add_negative(1, &ucounts->count);
 	spin_unlock_irq(&ucounts_lock);
-	ucounts = get_ucounts(ucounts);
+	if (overflow) {
+		put_ucounts(ucounts);
+		return NULL;
+	}
 	return ucounts;
 }
 
@@ -193,8 +198,7 @@ void put_ucounts(struct ucounts *ucounts)
 {
 	unsigned long flags;
 
-	if (atomic_dec_and_test(&ucounts->count)) {
-		spin_lock_irqsave(&ucounts_lock, flags);
+	if (atomic_dec_and_lock_irqsave(&ucounts->count, &ucounts_lock, flags)) {
 		hlist_del_init(&ucounts->node);
 		spin_unlock_irqrestore(&ucounts_lock, flags);
 		kfree(ucounts);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [GIT PULL] ucount fix for v5.14-rc
  2021-08-05 17:15 [GIT PULL] ucount fix for v5.14-rc Eric W. Biederman
@ 2021-08-05 19:26 ` pr-tracker-bot
       [not found] ` <20210806021052.3013-1-hdanton@sina.com>
  1 sibling, 0 replies; 9+ messages in thread
From: pr-tracker-bot @ 2021-08-05 19:26 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Linus Torvalds, linux-kernel, Alexey Gladkov

The pull request you sent on Thu, 05 Aug 2021 12:15:24 -0500:

> git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-v5.14

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/6209049ecfc1894453d1fc850e60c58d4eccaf2a

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [GIT PULL] ucount fix for v5.14-rc
       [not found] ` <20210806021052.3013-1-hdanton@sina.com>
@ 2021-08-06  3:38   ` Eric W. Biederman
       [not found]     ` <20210806061458.3075-1-hdanton@sina.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Eric W. Biederman @ 2021-08-06  3:38 UTC (permalink / raw)
  To: Hillf Danton; +Cc: Linus Torvalds, linux-kernel, Alexey Gladkov

Hillf Danton <hdanton@sina.com> writes:

> On Thu, 05 Aug 2021 12:15:24 -0500 Eric W. Biederman wrote:
>>
>>Please pull the for-v5.14 branch from the git tree:
>>
>>  git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-v5.14
>>
>>  HEAD: 345daff2e994ee844d6a609c37f085695fbb4c4d ucounts: Fix race condition between alloc_ucounts and put_ucounts
>>
>>This is just one patch on top of v5.14-rc3 that fixes a subtle locking
>>versus reference counting bug in the ucount changes, found by syzbot
>>after you merged this code.
>>
>>The ucount sysctls have a issue on big endian architectures that I hope
>>to get in before v5.14 is released but a patch is not ready yet.
>>
>>Eric
>>
>>
>>>From 345daff2e994ee844d6a609c37f085695fbb4c4d Mon Sep 17 00:00:00 2001
>>From: Alexey Gladkov <legion@kernel.org>
>>Date: Tue, 27 Jul 2021 17:24:18 +0200
>>Subject: [PATCH] ucounts: Fix race condition between alloc_ucounts and put_ucounts
>>
>>The race happens because put_ucounts() doesn't use spinlock and
>>get_ucounts is not under spinlock:
>>
>>CPU0                    CPU1
>>----                    ----
>>alloc_ucounts()         put_ucounts()
>>
>>spin_lock_irq(&ucounts_lock);
>>ucounts = find_ucounts(ns, uid, hashent);
>>
>>                        atomic_dec_and_test(&ucounts->count))
>>
>>spin_unlock_irq(&ucounts_lock);
>>
>>                        spin_lock_irqsave(&ucounts_lock, flags);
>>                        hlist_del_init(&ucounts->node);
>>                        spin_unlock_irqrestore(&ucounts_lock, flags);
>>                        kfree(ucounts);
>>
>>ucounts = get_ucounts(ucounts);
>>
>>==================================================================
>>BUG: KASAN: use-after-free in instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
>>BUG: KASAN: use-after-free in atomic_add_negative include/asm-generic/atomic-instrumented.h:556 [inline]
>>BUG: KASAN: use-after-free in get_ucounts kernel/ucount.c:152 [inline]
>>BUG: KASAN: use-after-free in get_ucounts kernel/ucount.c:150 [inline]
>>BUG: KASAN: use-after-free in alloc_ucounts+0x19b/0x5b0 kernel/ucount.c:188
>>Write of size 4 at addr ffff88802821e41c by task syz-executor.4/16785
>>
>>CPU: 1 PID: 16785 Comm: syz-executor.4 Not tainted 5.14.0-rc1-next-20210712-syzkaller #0
>>Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
>>Call Trace:
>> __dump_stack lib/dump_stack.c:88 [inline]
>> dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:105
>> print_address_description.constprop.0.cold+0x6c/0x309 mm/kasan/report.c:233
>> __kasan_report mm/kasan/report.c:419 [inline]
>> kasan_report.cold+0x83/0xdf mm/kasan/report.c:436
>> check_region_inline mm/kasan/generic.c:183 [inline]
>> kasan_check_range+0x13d/0x180 mm/kasan/generic.c:189
>> instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
>> atomic_add_negative include/asm-generic/atomic-instrumented.h:556 [inline]
>> get_ucounts kernel/ucount.c:152 [inline]
>> get_ucounts kernel/ucount.c:150 [inline]
>> alloc_ucounts+0x19b/0x5b0 kernel/ucount.c:188
>> set_cred_ucounts+0x171/0x3a0 kernel/cred.c:684
>> __sys_setuid+0x285/0x400 kernel/sys.c:623
>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>> do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
>> entry_SYSCALL_64_after_hwframe+0x44/0xae
>>RIP: 0033:0x4665d9
>>Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
>>RSP: 002b:00007fde54097188 EFLAGS: 00000246 ORIG_RAX: 0000000000000069
>>RAX: ffffffffffffffda RBX: 000000000056bf80 RCX: 00000000004665d9
>>RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000000ff
>>RBP: 00000000004bfcb9 R08: 0000000000000000 R09: 0000000000000000
>>R10: 0000000000000000 R11: 0000000000000246 R12: 000000000056bf80
>>R13: 00007ffc8655740f R14: 00007fde54097300 R15: 0000000000022000
>>
>>Allocated by task 16784:
>> kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
>> kasan_set_track mm/kasan/common.c:46 [inline]
>> set_alloc_info mm/kasan/common.c:434 [inline]
>> ____kasan_kmalloc mm/kasan/common.c:513 [inline]
>> ____kasan_kmalloc mm/kasan/common.c:472 [inline]
>> __kasan_kmalloc+0x9b/0xd0 mm/kasan/common.c:522
>> kmalloc include/linux/slab.h:591 [inline]
>> kzalloc include/linux/slab.h:721 [inline]
>> alloc_ucounts+0x23d/0x5b0 kernel/ucount.c:169
>> set_cred_ucounts+0x171/0x3a0 kernel/cred.c:684
>> __sys_setuid+0x285/0x400 kernel/sys.c:623
>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>> do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
>> entry_SYSCALL_64_after_hwframe+0x44/0xae
>>
>>Freed by task 16785:
>> kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
>> kasan_set_track+0x1c/0x30 mm/kasan/common.c:46
>> kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:360
>> ____kasan_slab_free mm/kasan/common.c:366 [inline]
>> ____kasan_slab_free mm/kasan/common.c:328 [inline]
>> __kasan_slab_free+0xfb/0x130 mm/kasan/common.c:374
>> kasan_slab_free include/linux/kasan.h:229 [inline]
>> slab_free_hook mm/slub.c:1650 [inline]
>> slab_free_freelist_hook+0xdf/0x240 mm/slub.c:1675
>> slab_free mm/slub.c:3235 [inline]
>> kfree+0xeb/0x650 mm/slub.c:4295
>> put_ucounts kernel/ucount.c:200 [inline]
>> put_ucounts+0x117/0x150 kernel/ucount.c:192
>> put_cred_rcu+0x27a/0x520 kernel/cred.c:124
>> rcu_do_batch kernel/rcu/tree.c:2550 [inline]
>> rcu_core+0x7ab/0x1380 kernel/rcu/tree.c:2785
>> __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
>>
>>Last potentially related work creation:
>> kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
>> kasan_record_aux_stack+0xe5/0x110 mm/kasan/generic.c:348
>> insert_work+0x48/0x370 kernel/workqueue.c:1332
>> __queue_work+0x5c1/0xed0 kernel/workqueue.c:1498
>> queue_work_on+0xee/0x110 kernel/workqueue.c:1525
>> queue_work include/linux/workqueue.h:507 [inline]
>> call_usermodehelper_exec+0x1f0/0x4c0 kernel/umh.c:435
>> kobject_uevent_env+0xf8f/0x1650 lib/kobject_uevent.c:618
>> netdev_queue_add_kobject net/core/net-sysfs.c:1621 [inline]
>> netdev_queue_update_kobjects+0x374/0x450 net/core/net-sysfs.c:1655
>> register_queue_kobjects net/core/net-sysfs.c:1716 [inline]
>> netdev_register_kobject+0x35a/0x430 net/core/net-sysfs.c:1959
>> register_netdevice+0xd33/0x1500 net/core/dev.c:10331
>> nsim_init_netdevsim drivers/net/netdevsim/netdev.c:317 [inline]
>> nsim_create+0x381/0x4d0 drivers/net/netdevsim/netdev.c:364
>> __nsim_dev_port_add+0x32e/0x830 drivers/net/netdevsim/dev.c:1295
>> nsim_dev_port_add_all+0x53/0x150 drivers/net/netdevsim/dev.c:1355
>> nsim_dev_probe+0xcb5/0x1190 drivers/net/netdevsim/dev.c:1496
>> call_driver_probe drivers/base/dd.c:517 [inline]
>> really_probe+0x23c/0xcd0 drivers/base/dd.c:595
>> __driver_probe_device+0x338/0x4d0 drivers/base/dd.c:747
>> driver_probe_device+0x4c/0x1a0 drivers/base/dd.c:777
>> __device_attach_driver+0x20b/0x2f0 drivers/base/dd.c:894
>> bus_for_each_drv+0x15f/0x1e0 drivers/base/bus.c:427
>> __device_attach+0x228/0x4a0 drivers/base/dd.c:965
>> bus_probe_device+0x1e4/0x290 drivers/base/bus.c:487
>> device_add+0xc2f/0x2180 drivers/base/core.c:3356
>> nsim_bus_dev_new drivers/net/netdevsim/bus.c:431 [inline]
>> new_device_store+0x436/0x710 drivers/net/netdevsim/bus.c:298
>> bus_attr_store+0x72/0xa0 drivers/base/bus.c:122
>> sysfs_kf_write+0x110/0x160 fs/sysfs/file.c:139
>> kernfs_fop_write_iter+0x342/0x500 fs/kernfs/file.c:296
>> call_write_iter include/linux/fs.h:2152 [inline]
>> new_sync_write+0x426/0x650 fs/read_write.c:518
>> vfs_write+0x75a/0xa40 fs/read_write.c:605
>> ksys_write+0x12d/0x250 fs/read_write.c:658
>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>> do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
>> entry_SYSCALL_64_after_hwframe+0x44/0xae
>>
>>Second to last potentially related work creation:
>> kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
>> kasan_record_aux_stack+0xe5/0x110 mm/kasan/generic.c:348
>> insert_work+0x48/0x370 kernel/workqueue.c:1332
>> __queue_work+0x5c1/0xed0 kernel/workqueue.c:1498
>> queue_work_on+0xee/0x110 kernel/workqueue.c:1525
>> queue_work include/linux/workqueue.h:507 [inline]
>> call_usermodehelper_exec+0x1f0/0x4c0 kernel/umh.c:435
>> kobject_uevent_env+0xf8f/0x1650 lib/kobject_uevent.c:618
>> kobject_synth_uevent+0x701/0x850 lib/kobject_uevent.c:208
>> uevent_store+0x20/0x50 drivers/base/core.c:2371
>> dev_attr_store+0x50/0x80 drivers/base/core.c:2072
>> sysfs_kf_write+0x110/0x160 fs/sysfs/file.c:139
>> kernfs_fop_write_iter+0x342/0x500 fs/kernfs/file.c:296
>> call_write_iter include/linux/fs.h:2152 [inline]
>> new_sync_write+0x426/0x650 fs/read_write.c:518
>> vfs_write+0x75a/0xa40 fs/read_write.c:605
>> ksys_write+0x12d/0x250 fs/read_write.c:658
>> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>> do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
>> entry_SYSCALL_64_after_hwframe+0x44/0xae
>>
>>The buggy address belongs to the object at ffff88802821e400
>> which belongs to the cache kmalloc-192 of size 192
>>The buggy address is located 28 bytes inside of
>> 192-byte region [ffff88802821e400, ffff88802821e4c0)
>>The buggy address belongs to the page:
>>page:ffffea0000a08780 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x2821e
>>flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff)
>>raw: 00fff00000000200 dead000000000100 dead000000000122 ffff888010841a00
>>raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
>>page dumped because: kasan: bad access detected
>>page_owner tracks the page as allocated
>>page last allocated via order 0, migratetype Unmovable, gfp_mask 0x12cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY), pid 1, ts 12874702440, free_ts 12637793385
>> prep_new_page mm/page_alloc.c:2433 [inline]
>> get_page_from_freelist+0xa72/0x2f80 mm/page_alloc.c:4166
>> __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5374
>> alloc_page_interleave+0x1e/0x200 mm/mempolicy.c:2119
>> alloc_pages+0x238/0x2a0 mm/mempolicy.c:2242
>> alloc_slab_page mm/slub.c:1713 [inline]
>> allocate_slab+0x32b/0x4c0 mm/slub.c:1853
>> new_slab mm/slub.c:1916 [inline]
>> new_slab_objects mm/slub.c:2662 [inline]
>> ___slab_alloc+0x4ba/0x820 mm/slub.c:2825
>> __slab_alloc.constprop.0+0xa7/0xf0 mm/slub.c:2865
>> slab_alloc_node mm/slub.c:2947 [inline]
>> slab_alloc mm/slub.c:2989 [inline]
>> __kmalloc+0x312/0x330 mm/slub.c:4133
>> kmalloc include/linux/slab.h:596 [inline]
>> kzalloc include/linux/slab.h:721 [inline]
>> __register_sysctl_table+0x112/0x1090 fs/proc/proc_sysctl.c:1318
>> rds_tcp_init_net+0x1db/0x4f0 net/rds/tcp.c:551
>> ops_init+0xaf/0x470 net/core/net_namespace.c:140
>> __register_pernet_operations net/core/net_namespace.c:1137 [inline]
>> register_pernet_operations+0x35a/0x850 net/core/net_namespace.c:1214
>> register_pernet_device+0x26/0x70 net/core/net_namespace.c:1301
>> rds_tcp_init+0x77/0xe0 net/rds/tcp.c:717
>> do_one_initcall+0x103/0x650 init/main.c:1285
>> do_initcall_level init/main.c:1360 [inline]
>> do_initcalls init/main.c:1376 [inline]
>> do_basic_setup init/main.c:1396 [inline]
>> kernel_init_freeable+0x6b8/0x741 init/main.c:1598
>>page last free stack trace:
>> reset_page_owner include/linux/page_owner.h:24 [inline]
>> free_pages_prepare mm/page_alloc.c:1343 [inline]
>> free_pcp_prepare+0x312/0x7d0 mm/page_alloc.c:1394
>> free_unref_page_prepare mm/page_alloc.c:3329 [inline]
>> free_unref_page+0x19/0x690 mm/page_alloc.c:3408
>> __vunmap+0x783/0xb70 mm/vmalloc.c:2587
>> free_work+0x58/0x70 mm/vmalloc.c:82
>> process_one_work+0x98d/0x1630 kernel/workqueue.c:2276
>> worker_thread+0x658/0x11f0 kernel/workqueue.c:2422
>> kthread+0x3e5/0x4d0 kernel/kthread.c:319
>> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
>>
>>Memory state around the buggy address:
>> ffff88802821e300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> ffff88802821e380: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
>>>ffff88802821e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>>                            ^
>> ffff88802821e480: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
>> ffff88802821e500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>==================================================================
>>
>>- The race fix has two parts.
>>  * Changing the code to guarantee that ucounts->count is only decremented
>>    when ucounts_lock is held.  This guarantees that find_ucounts
>>    will never find a structure with a zero reference count.
>>  * Changing alloc_ucounts to increment ucounts->count while
>>    ucounts_lock is held.  This guarantees the reference count on the
>>    found data structure will not be decremented to zero (and the data
>>    structure freed) before the reference count is incremented.
>>  -- Eric Biederman
>>
>>Reported-by: syzbot+01985d7909f9468f013c@syzkaller.appspotmail.com
>>Reported-by: syzbot+59dd63761094a80ad06d@syzkaller.appspotmail.com
>>Reported-by: syzbot+6cd79f45bb8fa1c9eeae@syzkaller.appspotmail.com
>>Reported-by: syzbot+b6e65bd125a05f803d6b@syzkaller.appspotmail.com
>>Fixes: b6c336528926 ("Use atomic_t for ucounts reference counting")
>>Cc: Hillf Danton <hdanton@sina.com>
>>Signed-off-by: Alexey Gladkov <legion@kernel.org>
>>Link: https://lkml.kernel.org/r/7b2ace1759b281cdd2d66101d6b305deef722efb.1627397820.git.legion@kernel.org
>>Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
>>---
>> kernel/ucount.c | 10 +++++++---
>> 1 file changed, 7 insertions(+), 3 deletions(-)
>>
>>diff --git a/kernel/ucount.c b/kernel/ucount.c
>>index 87799e2379bd..77be3bbe3cc4 100644
>>--- a/kernel/ucount.c
>>+++ b/kernel/ucount.c
>>@@ -160,6 +160,7 @@ struct ucounts *alloc_ucounts(struct user_namespace *ns, kuid_t uid)
>> {
>> 	struct hlist_head *hashent = ucounts_hashentry(ns, uid);
>> 	struct ucounts *ucounts, *new;
>>+	long overflow;
>> 
>> 	spin_lock_irq(&ucounts_lock);
>> 	ucounts = find_ucounts(ns, uid, hashent);
>>@@ -184,8 +185,12 @@ struct ucounts *alloc_ucounts(struct user_namespace *ns, kuid_t uid)
>> 			return new;
>> 		}
>> 	}
>>+	overflow = atomic_add_negative(1, &ucounts->count);
>> 	spin_unlock_irq(&ucounts_lock);
>>-	ucounts = get_ucounts(ucounts);
>>+	if (overflow) {
>>+		put_ucounts(ucounts);
>>+		return NULL;
>>+	}
>> 	return ucounts;
>> }
>> 
>>@@ -193,8 +198,7 @@ void put_ucounts(struct ucounts *ucounts)
>> {
>> 	unsigned long flags;
>> 
>>-	if (atomic_dec_and_test(&ucounts->count)) {
>>-		spin_lock_irqsave(&ucounts_lock, flags);
>>+	if (atomic_dec_and_lock_irqsave(&ucounts->count, &ucounts_lock, flags)) {
>
> If get_ucounts() runs on another CPU after locking ucounts_lock on the
> current CPU,
>
> 	if (ucounts && atomic_add_negative(1, &ucounts->count))
> 		return ucounts;
>
> UAF will happen again.

I think you are saying if someone calls get_ucounts without knowing the
reference count is at least one a user after free will happen.  It is a
bug to call get_ucounts unless it is known the reference count is at
least one.

> Is it too late to update the fix, Eric?

It would require a separate fix.  Probably that fix needs to be finding
fixing the logic where something calls get_ucounts when the reference
count is zero.

Eric

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [GIT PULL] ucount fix for v5.14-rc
       [not found]     ` <20210806061458.3075-1-hdanton@sina.com>
@ 2021-08-06 17:37       ` Linus Torvalds
       [not found]         ` <20210807050314.1807-1-hdanton@sina.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2021-08-06 17:37 UTC (permalink / raw)
  To: Hillf Danton; +Cc: Eric W. Biederman, Linux Kernel Mailing List, Alexey Gladkov

On Thu, Aug 5, 2021 at 11:15 PM Hillf Danton <hdanton@sina.com> wrote:
>
> On Thu, 05 Aug 2021 22:38:05 -0500 Eric W. Biederman wrote:
> >
> >I think you are saying if someone calls get_ucounts without knowing the
> >reference count is at least one a user after free will happen.  It is a
> >bug to call get_ucounts unless it is known the reference count is at
> >least one.
>
> Doubt it works because no atomicity is ensured by two atomic operations
> in tow.
>
>         if (atomic_read(&ucounts->count) >= 1)
>                 if (ucounts && atomic_add_negative(1, &ucounts->count))

Note that the first atomic_read() check is purely a debug check.

Eric's point is that you can't do "get_ucounts()" on an ucount that
you don't already have a reference to - that would be a bug even
*without* the "get_ucounts()", because you're clearly dealing with an
ucount that could be released at any time.

So you have only a couple of operations:

 (a) find_ucounts() looks up the ucount for a user that doesn't have a ref yet.

 (b) get_ucounts() increments a ref for somebody who already has a
ucount but is giving it away to somebody else too. We know the ref
can't go down to zero, because we are ourselves holding a ref to it.

 (c) put_ucounts() decrements a ref (and frees it when the refs go
down to zero).

Of these, (a) needs to be called under the lock, and needs to
increment the ref-count before the lock is released.

But (b) does *not* need a lock, exactly because the current getter
must already hold a ref, so it can't go away.

And (c) needs to hold a lock only for the "goes to zero" case, so you
can avoid the lock if the count started out higher than 1 (which is
why we have that atomic_dec_and_lock_irqsave() pattern)

The bug was in (a) and (c), but (b) is fine.

Side note: other data structures - not ucounts - can have slightly
more complex behavior, and in particular you can do the find/put
operations without locking if you

 - use RCU to free it

 - do the "find" in a RCU read-locked section

 - after finding it, you use "get_ref_not_zero()" to do an atomic "did
somebody free the last ref, if not increment it"

and the above pattern allows you to do all of a-c without any actual
locking, just local atomics.

That's what all the really critical kernel data structures tend to do.

(And the *really* critical data structures with complex topology - ie
dentries - have a local lock too, and use the lockref data structure,
but that's basically just dentries and the gfs2 gl/qd_lock - and I
have a sneaking suspicion that the gfs2 use is more of a "because I
can" rather than a "because I need to")

                 Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [GIT PULL] ucount fix for v5.14-rc
       [not found]         ` <20210807050314.1807-1-hdanton@sina.com>
@ 2021-08-07  8:23           ` Linus Torvalds
       [not found]             ` <20210807091128.1862-1-hdanton@sina.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2021-08-07  8:23 UTC (permalink / raw)
  To: Hillf Danton; +Cc: Eric W. Biederman, Linux Kernel Mailing List, Alexey Gladkov

On Fri, Aug 6, 2021 at 10:03 PM Hillf Danton <hdanton@sina.com> wrote:
>
> Then the current atomic_add_negative() in consideration over the "risk"
> of count overflow in real workloads can be replaced with the not_zero
> version.

What? No.

The atomic_add_negative() has absolutely nothing to do with not_zero.

The "negative" comes not at all from the count ever being zero, and as
I explained, that isn't even an issue here.

The "negative" is from a large _positive_ count growing so much that
the sign bit gets set. It's basically a "31-bit overflow" thing.

So:

 - not_zero makes no sense for get_ucounts(), because it can't be
zero, because we hold a reference to it

 - atomic_add_negative() is about not letting the counts become too
large, and when they do, we undo the reference (ie the pattern is
"increment ref - but if it then overflows into bit #31, decrement it
again"

and the two have *NOTHING* to do with each other. So your statement
about replacing one with the other makes no sense.

I was trying to explain that in _other_ situations, the
"atomic_inc_not_zero()" kind of pattern is used as a way to allow the
find-vs-last-drop race to be done without locking, but that's not what
the ucounts code does.

ucounts uses the ucounts_lock, and that one is entirely immaterial for
the atomic_add_negative() case, because the "negative" test is
literally about the value being as far away from zero as is _possible_
(and at that point, the lock is most definitely not needed - it's
needed only for the cases where the refcount goes to zero, and to make
sure that a "find" cannot race with that).

                Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [GIT PULL] ucount fix for v5.14-rc
       [not found]             ` <20210807091128.1862-1-hdanton@sina.com>
@ 2021-08-07 15:10               ` Linus Torvalds
       [not found]                 ` <20210808004243.2007-1-hdanton@sina.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2021-08-07 15:10 UTC (permalink / raw)
  To: Hillf Danton; +Cc: Eric W. Biederman, Linux Kernel Mailing List, Alexey Gladkov

On Sat, Aug 7, 2021 at 2:11 AM Hillf Danton <hdanton@sina.com> wrote:
>
>
>         CPU0                    CPU1    CPU2
>         ----                    ----    ----
>         given count == 2
>                                         put uc
>                                 put uc
>         get uc
>         UAF

No.

The thread on CPU0 must have had a reference to the ucount.

So if CPU1 and CPU2 are doing a put at the same time (and they held a
ref to it too), then the starting count must have been at least 3.

In other words, the above would be a bug - and not because of the UAF,
but because somebody had screwed up their reference counts.

You might as well had said "given count = 2, do 99x put_ucounts() ->
UAF". True, but immaterial from the standpoint of "put_ucounts()"
correctness.

Get it? You can't do a "get_ucounts()" on something that you don't
already have a reference to (and similarly, you obviously cannot do a
"put_ucounts()" on something you don't hold a reference to).

You *can* do a "find_ucount()" to find a ucount that you don't yet
have a reference to, but that's why you have to hold the lock (and why
anybody who does that has to increment the reference to it after
finding it before they drop the lock).

           Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [GIT PULL] ucount fix for v5.14-rc
       [not found]                 ` <20210808004243.2007-1-hdanton@sina.com>
@ 2021-08-08  1:00                   ` Linus Torvalds
       [not found]                     ` <20210808012056.2067-1-hdanton@sina.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2021-08-08  1:00 UTC (permalink / raw)
  To: Hillf Danton; +Cc: Eric W. Biederman, Linux Kernel Mailing List, Alexey Gladkov

On Sat, Aug 7, 2021 at 5:42 PM Hillf Danton <hdanton@sina.com> wrote:
>
> Given the syzbot report, I doubt 3 is correct.

I doubt your whole scenario.

> If 3 is actually correct, however, the fix in this pull request is
> incorrect.

Why do you not accept the fact that the old code was buggy, and the
bug was that the alloc->find didn't increment the count from 0
correctly under the lock?

The fact is, the commit in question is ObviouslyCorrect(tm), and I
don't understand any of your arguments against it.

The old code would look up a uncounts entry, but then drop the lock,
before incrementing it.

That explains *everything*. It means that you have this basic race:

Thread (a) on CPU1: starting out _without_ a reference to the
uncounts, look up entry under the lock, but don't increment the count,
release lock.

Thread (b) on CPU2: have a reference, do a put_ucounts(). Count goes
to zero, take the lock, unhash it, free the entry

Thread (a) continues, increments the count on a UAF entry, triggers KASAN.

Look, the fix in question _fixes_ exactly the above. The KASAN traces
clearly show that alloc_ucounts() was involved. Now it does the right
thing, and it does the count increment under the lock, and the
put_ucounts() thing atomic_dec_and_lock_irqsave().

And this isn't even an interesting case. This was not a subtle bug.
The ucounts code had an _obvious_ and unquestionable bug, and handled
this wrong. The ucounts refcount code wasn't even doing anything
unusual, it was just doing it BADLY and WRONG.

This situation is _literally_ why atomic_dec_and_lock exists in the
first place. The fact that the ucount code had missed this all was
just a sad and pitiful bug, and it was just embarrassing that we
hadn't noticed the obvious problem with commit b6c336528926 ("Use
atomic_t for ucounts reference counting") earlier.

What it is you claim happens that _isn't_ just due to this stupid and
trivial bug? Because the scenario you outlined did not make sense, and
I've pointed out _why_ it did not.

          Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [GIT PULL] ucount fix for v5.14-rc
       [not found]                     ` <20210808012056.2067-1-hdanton@sina.com>
@ 2021-08-08  1:45                       ` Linus Torvalds
  2021-08-08  2:05                         ` Linus Torvalds
  0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2021-08-08  1:45 UTC (permalink / raw)
  To: Hillf Danton; +Cc: Eric W. Biederman, Linux Kernel Mailing List, Alexey Gladkov

On Sat, Aug 7, 2021 at 6:21 PM Hillf Danton <hdanton@sina.com> wrote:
>
> The smart syzbot reported it, Sir.

Where?

Do you have something _after_ the fix that got merged and that this
whole thread is all about. Commit 345daff2e994 ("ucounts: Fix race
condition between alloc_ucounts and put_ucounts")

Because that fix matches the syzbot reports I saw. It explains them
100%, and the fix is clearly the right thing.

> If it is so ObviouslyCorrect(tm) then where is the count screwup coming from?

The syzbot reports I see are the ones from _before_ the fix.

And the bug was the one I have now exhaustively explained to you
several times. That got fixed.

So where's the report after the fix?

Point to it. Christ, Hilf, I've been asking you to explain yourself
too many times already.

I'm going to stop wasting my time with your emails unless you can
point to an actual report. No more  made-up examples that have nothing
to back them up and where you don't react to my answers to them except
with more verbiage without any technical details.

              Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [GIT PULL] ucount fix for v5.14-rc
  2021-08-08  1:45                       ` Linus Torvalds
@ 2021-08-08  2:05                         ` Linus Torvalds
  0 siblings, 0 replies; 9+ messages in thread
From: Linus Torvalds @ 2021-08-08  2:05 UTC (permalink / raw)
  To: Hillf Danton; +Cc: Eric W. Biederman, Linux Kernel Mailing List, Alexey Gladkov

On Sat, Aug 7, 2021 at 6:45 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Because that fix matches the syzbot reports I saw. It explains them
> 100%, and the fix is clearly the right thing.

Just to clarify: the exact symptoms of the race before that fix can
vary. The simplest form was the one I described:

> Thread (a) on CPU1: starting out _without_ a reference to the
> uncounts, look up entry under the lock, but don't increment the count,
> release lock.
>
> Thread (b) on CPU2: have a reference, do a put_ucounts(). Count goes
> to zero, take the lock, unhash it, free the entry
>
> Thread (a) continues, increments the count on a UAF entry, triggers KASAN.

because that one happens immediately. But a more likely one is
actually that the "Thread (a) continues" thing happens _before_ the
entry is free'd (but after thread (b) has decremented the count to
zero), and then what happens is that thread (a) doesn't actually
trigger UAF and a KASAN report at that point yet.

Instead, thread (a) will install the pointer to the - free'd - ucounts
somewhere, and the UAF will trigger only rather much later.

Which makes the KASAN reports much harder to read, of course, because
by the time you actually access the invalid uncounts pointer, the
original *cause* of the problem is gone.

So I described the "easy" case to see, not the only one that that the
bug in b6c336528926 ("Use atomic_t for ucounts reference counting")
would trigger.

But commit 345daff2e994 ("ucounts: Fix race condition between
alloc_ucounts and put_ucounts") really is the obvious fix for an
obvious bug.

Could there be _another_ bug? Sure. Fixing one bug - no matter how
obvious the fix - doesn't guarantee there isn't something else lurking
too. But if so, I haven't seen the syzbot reports for it.

            Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-08-08  2:05 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-05 17:15 [GIT PULL] ucount fix for v5.14-rc Eric W. Biederman
2021-08-05 19:26 ` pr-tracker-bot
     [not found] ` <20210806021052.3013-1-hdanton@sina.com>
2021-08-06  3:38   ` Eric W. Biederman
     [not found]     ` <20210806061458.3075-1-hdanton@sina.com>
2021-08-06 17:37       ` Linus Torvalds
     [not found]         ` <20210807050314.1807-1-hdanton@sina.com>
2021-08-07  8:23           ` Linus Torvalds
     [not found]             ` <20210807091128.1862-1-hdanton@sina.com>
2021-08-07 15:10               ` Linus Torvalds
     [not found]                 ` <20210808004243.2007-1-hdanton@sina.com>
2021-08-08  1:00                   ` Linus Torvalds
     [not found]                     ` <20210808012056.2067-1-hdanton@sina.com>
2021-08-08  1:45                       ` Linus Torvalds
2021-08-08  2:05                         ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).