linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [heads-up] buggered refcounting logics in cgroup1_mount()
@ 2019-01-11  7:23 Al Viro
  2019-01-11 20:54 ` Tejun Heo
  0 siblings, 1 reply; 6+ messages in thread
From: Al Viro @ 2019-01-11  7:23 UTC (permalink / raw)
  To: Zefan Li; +Cc: Tejun Heo, Linus Torvalds, linux-fsdevel

	There'd been a long string of kludges in that area, and unfortunately
even the latest variant of solution is broken.  You rely upon the trick with
percpu_ref_reinit() being done only after cgroup_do_mount().  That prevents
parallel mount picking cgroup_root of superblock in the middle of setting
up, then losing CPU, original mount succeeding and umount marking our
cgroup_root killed; then the second mount regains CPU and proceeds to set
a new superblock.  Umount of _that_ would try to kill cgroup_root one more
time, with obvious nastiness.

	Delaying the moment when cgroup_root can be grabbed solves that,
but it does nothing for another similar case:
	* cgroup1 is mounted and populated
	* umount doesn't kill cgroup_root since it's non-empty
	* mount attempt picks cgroup_root, tries to do kernfs_pin_sb()
(which finds no superblocks) and successfully bumps the refcount.  Then
it drops cgroup_mutex and proceeds to lose CPU (preemption, blocking
allocation in kernfs, whatever).
	* another mount attempt picks the same cgroup_root, again
finds no superblock, bumps the refcount and proceeds to mount the
sucker.  Then, before the first one regains CPU, it manages to
empty the tree and umount it, marking cgroup_root killed.
	* now the first attempt regains CPU and we are in the same
situation as with the original bug.

	Another problem with that is that (without any races) failing
allocation in d_make_root() within kernfs_fill_super() will end up
in cgroup_kill_sb(), with cgroup_root _still_ marked killed (your
kludge creates it that way and we hadn't reached the reinit yet).

	AFAICS, a cleaner solution would be this:
* to hell with kernfs_pin_sb(); just try to grab a reference to
cgroup_root on reuse.
* have cgroup_kill_sb() treat "it's already marked killed" as
"just drop the reference, then".
* after cgroup_do_mount() check if cgroup_root got marked killed and
do deactivate_locked_super() in such case (with the same
restart_syscall() failure exit).

	Objections?  I would love to kill kernfs_pin_sb() as
a followup (it's a fundamentally broken API), but that's not
a stable fodder; some fix of refcounting bugs, OTOH, should be.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [heads-up] buggered refcounting logics in cgroup1_mount()
  2019-01-11  7:23 [heads-up] buggered refcounting logics in cgroup1_mount() Al Viro
@ 2019-01-11 20:54 ` Tejun Heo
  2019-01-12  5:38   ` Al Viro
  2019-01-14  1:31   ` Zefan Li
  0 siblings, 2 replies; 6+ messages in thread
From: Tejun Heo @ 2019-01-11 20:54 UTC (permalink / raw)
  To: Al Viro; +Cc: Zefan Li, Linus Torvalds, linux-fsdevel

Hello, Al.

On Fri, Jan 11, 2019 at 07:23:09AM +0000, Al Viro wrote:
> 	AFAICS, a cleaner solution would be this:
> * to hell with kernfs_pin_sb(); just try to grab a reference to
> cgroup_root on reuse.
> * have cgroup_kill_sb() treat "it's already marked killed" as
> "just drop the reference, then".
> * after cgroup_do_mount() check if cgroup_root got marked killed and
> do deactivate_locked_super() in such case (with the same
> restart_syscall() failure exit).
> 
> 	Objections?  I would love to kill kernfs_pin_sb() as
> a followup (it's a fundamentally broken API), but that's not
> a stable fodder; some fix of refcounting bugs, OTOH, should be.

cgroup1 hierarchies have really weird set of requirements and the
implementation has always been somewhat broken.  I thought I fixed it
but obviously not.  I have no objection whatsoever and would much
appreciate the work.

Thanks a lot and happy new year!

-- 
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [heads-up] buggered refcounting logics in cgroup1_mount()
  2019-01-11 20:54 ` Tejun Heo
@ 2019-01-12  5:38   ` Al Viro
  2019-01-12 21:52     ` Al Viro
  2019-01-14  1:31   ` Zefan Li
  1 sibling, 1 reply; 6+ messages in thread
From: Al Viro @ 2019-01-12  5:38 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Zefan Li, Linus Torvalds, linux-fsdevel, Andrey Vagin

On Fri, Jan 11, 2019 at 12:54:51PM -0800, Tejun Heo wrote:
> Hello, Al.
> 
> On Fri, Jan 11, 2019 at 07:23:09AM +0000, Al Viro wrote:
> > 	AFAICS, a cleaner solution would be this:
> > * to hell with kernfs_pin_sb(); just try to grab a reference to
> > cgroup_root on reuse.
> > * have cgroup_kill_sb() treat "it's already marked killed" as
> > "just drop the reference, then".
> > * after cgroup_do_mount() check if cgroup_root got marked killed and
> > do deactivate_locked_super() in such case (with the same
> > restart_syscall() failure exit).
> > 
> > 	Objections?  I would love to kill kernfs_pin_sb() as
> > a followup (it's a fundamentally broken API), but that's not
> > a stable fodder; some fix of refcounting bugs, OTOH, should be.
> 
> cgroup1 hierarchies have really weird set of requirements and the
> implementation has always been somewhat broken.  I thought I fixed it
> but obviously not.  I have no objection whatsoever and would much
> appreciate the work.

See vfs.git #fixes (the last two commits in there).  It seems to
work here, but I don't have the CRIU regression tests, etc., so
I would really appreciate if that thing got a real beating.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [heads-up] buggered refcounting logics in cgroup1_mount()
  2019-01-12  5:38   ` Al Viro
@ 2019-01-12 21:52     ` Al Viro
  2019-01-15 15:26       ` Tejun Heo
  0 siblings, 1 reply; 6+ messages in thread
From: Al Viro @ 2019-01-12 21:52 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Zefan Li, Linus Torvalds, linux-fsdevel, Andrey Vagin

On Sat, Jan 12, 2019 at 05:38:25AM +0000, Al Viro wrote:
> On Fri, Jan 11, 2019 at 12:54:51PM -0800, Tejun Heo wrote:
> > Hello, Al.
> > 
> > On Fri, Jan 11, 2019 at 07:23:09AM +0000, Al Viro wrote:
> > > 	AFAICS, a cleaner solution would be this:
> > > * to hell with kernfs_pin_sb(); just try to grab a reference to
> > > cgroup_root on reuse.
> > > * have cgroup_kill_sb() treat "it's already marked killed" as
> > > "just drop the reference, then".
> > > * after cgroup_do_mount() check if cgroup_root got marked killed and
> > > do deactivate_locked_super() in such case (with the same
> > > restart_syscall() failure exit).
> > > 
> > > 	Objections?  I would love to kill kernfs_pin_sb() as
> > > a followup (it's a fundamentally broken API), but that's not
> > > a stable fodder; some fix of refcounting bugs, OTOH, should be.
> > 
> > cgroup1 hierarchies have really weird set of requirements and the
> > implementation has always been somewhat broken.  I thought I fixed it
> > but obviously not.  I have no objection whatsoever and would much
> > appreciate the work.
> 
> See vfs.git #fixes (the last two commits in there).  It seems to
> work here, but I don't have the CRIU regression tests, etc., so
> I would really appreciate if that thing got a real beating.

BTW, while kernfs_pin_sb() and struct kernfs_root ->supers definitely
ought to die, there's something else that might be killable with that
approach.  percpu_ref_reinit() has no users left after that, and,
as David Howells has observed, PERCPU_REF_INIT_DEAD is also unused.

Do we want to keep these two?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [heads-up] buggered refcounting logics in cgroup1_mount()
  2019-01-11 20:54 ` Tejun Heo
  2019-01-12  5:38   ` Al Viro
@ 2019-01-14  1:31   ` Zefan Li
  1 sibling, 0 replies; 6+ messages in thread
From: Zefan Li @ 2019-01-14  1:31 UTC (permalink / raw)
  To: Tejun Heo, Al Viro; +Cc: Linus Torvalds, linux-fsdevel

On 2019/1/12 4:54, Tejun Heo wrote:
> Hello, Al.
> 
> On Fri, Jan 11, 2019 at 07:23:09AM +0000, Al Viro wrote:
>> 	AFAICS, a cleaner solution would be this:
>> * to hell with kernfs_pin_sb(); just try to grab a reference to
>> cgroup_root on reuse.
>> * have cgroup_kill_sb() treat "it's already marked killed" as
>> "just drop the reference, then".
>> * after cgroup_do_mount() check if cgroup_root got marked killed and
>> do deactivate_locked_super() in such case (with the same
>> restart_syscall() failure exit).
>>
>> 	Objections?  I would love to kill kernfs_pin_sb() as
>> a followup (it's a fundamentally broken API), but that's not
>> a stable fodder; some fix of refcounting bugs, OTOH, should be.
> 
> cgroup1 hierarchies have really weird set of requirements and the
> implementation has always been somewhat broken.  I thought I fixed it
> but obviously not.  I have no objection whatsoever and would much
> appreciate the work.
> 
> Thanks a lot and happy new year!
> 

+1.

We've fixed bugs there quite a few times. Would be great to see it's fixed
once and for all. Thnaks for the work!


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [heads-up] buggered refcounting logics in cgroup1_mount()
  2019-01-12 21:52     ` Al Viro
@ 2019-01-15 15:26       ` Tejun Heo
  0 siblings, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2019-01-15 15:26 UTC (permalink / raw)
  To: Al Viro; +Cc: Zefan Li, Linus Torvalds, linux-fsdevel, Andrey Vagin

Hello, Al.

On Sat, Jan 12, 2019 at 09:52:15PM +0000, Al Viro wrote:
> > See vfs.git #fixes (the last two commits in there).  It seems to
> > work here, but I don't have the CRIU regression tests, etc., so
> > I would really appreciate if that thing got a real beating.
> 
> BTW, while kernfs_pin_sb() and struct kernfs_root ->supers definitely
> ought to die, there's something else that might be killable with that
> approach.  percpu_ref_reinit() has no users left after that, and,
> as David Howells has observed, PERCPU_REF_INIT_DEAD is also unused.
> 
> Do we want to keep these two?

Sorry about the delay.

* The patches look good to me.  That said, of the two patches, I think
  it'd be probably better to send only the first one to -stable.  As
  there haven't been any bug reports against the mount behavior for
  some years now, I don't think there's a need to backport the bigger
  change.  Any change carries some risk after all.

* Please feel free to remove percpu_ref_reinit() and unused parts.
  It's easy enough to add it back later if necessary.

* How do you wanna route the patches?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-01-15 15:26 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-11  7:23 [heads-up] buggered refcounting logics in cgroup1_mount() Al Viro
2019-01-11 20:54 ` Tejun Heo
2019-01-12  5:38   ` Al Viro
2019-01-12 21:52     ` Al Viro
2019-01-15 15:26       ` Tejun Heo
2019-01-14  1:31   ` Zefan Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).