[heads-up] buggered refcounting logics in cgroup1_mount()

* [heads-up] buggered refcounting logics in cgroup1_mount()
@ 2019-01-11  7:23 Al Viro
  2019-01-11 20:54 ` Tejun Heo
  0 siblings, 1 reply; 6+ messages in thread
From: Al Viro @ 2019-01-11  7:23 UTC (permalink / raw)
  To: Zefan Li; +Cc: Tejun Heo, Linus Torvalds, linux-fsdevel

	There'd been a long string of kludges in that area, and unfortunately
even the latest variant of solution is broken.  You rely upon the trick with
percpu_ref_reinit() being done only after cgroup_do_mount().  That prevents
parallel mount picking cgroup_root of superblock in the middle of setting
up, then losing CPU, original mount succeeding and umount marking our
cgroup_root killed; then the second mount regains CPU and proceeds to set
a new superblock.  Umount of _that_ would try to kill cgroup_root one more
time, with obvious nastiness.

	Delaying the moment when cgroup_root can be grabbed solves that,
but it does nothing for another similar case:
	* cgroup1 is mounted and populated
	* umount doesn't kill cgroup_root since it's non-empty
	* mount attempt picks cgroup_root, tries to do kernfs_pin_sb()
(which finds no superblocks) and successfully bumps the refcount.  Then
it drops cgroup_mutex and proceeds to lose CPU (preemption, blocking
allocation in kernfs, whatever).
	* another mount attempt picks the same cgroup_root, again
finds no superblock, bumps the refcount and proceeds to mount the
sucker.  Then, before the first one regains CPU, it manages to
empty the tree and umount it, marking cgroup_root killed.
	* now the first attempt regains CPU and we are in the same
situation as with the original bug.

	Another problem with that is that (without any races) failing
allocation in d_make_root() within kernfs_fill_super() will end up
in cgroup_kill_sb(), with cgroup_root _still_ marked killed (your
kludge creates it that way and we hadn't reached the reinit yet).

	AFAICS, a cleaner solution would be this:
* to hell with kernfs_pin_sb(); just try to grab a reference to
cgroup_root on reuse.
* have cgroup_kill_sb() treat "it's already marked killed" as
"just drop the reference, then".
* after cgroup_do_mount() check if cgroup_root got marked killed and
do deactivate_locked_super() in such case (with the same
restart_syscall() failure exit).

	Objections?  I would love to kill kernfs_pin_sb() as
a followup (it's a fundamentally broken API), but that's not
a stable fodder; some fix of refcounting bugs, OTOH, should be.

^ permalink raw reply	[flat|nested] 6+ messages in thread