Re: livepatch: allow removal of a disabled patch

From: Miroslav Benes <mbenes@suse.cz>
To: Jessica Yu <jeyu@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>,
	Jiri Kosina <jikos@kernel.org>,
	pmladek@suse.com, jslaby@suse.cz, live-patching@vger.kernel.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	huawei.libin@huawei.com, minfei.huang@yahoo.com
Subject: Re: livepatch: allow removal of a disabled patch
Date: Fri, 6 May 2016 09:51:43 +0200 (CEST)	[thread overview]
Message-ID: <alpine.LNX.2.00.1605060942470.11165@pobox.suse.cz> (raw)
In-Reply-To: <20160506004222.GA31986@packer-debian-8-amd64.digitalocean.com>

On Thu, 5 May 2016, Jessica Yu wrote:

> +++ Josh Poimboeuf [05/05/16 10:04 -0500]:
> > On Thu, May 05, 2016 at 04:25:48PM +0200, Miroslav Benes wrote:
> > > On Thu, 5 May 2016, Josh Poimboeuf wrote:
> > > 
> > > > On Thu, May 05, 2016 at 10:28:12AM +0200, Miroslav Benes wrote:
> > > > > I think it boils down to the following problem.
> > > > >
> > > > > 1. CONFIG_DEBUG_KOBJECT_RELEASE=y
> > > > >
> > > > > 2. we have dynamic kobjects, so there is a pointer in klp_patch to
> > > struct
> > > > > kobject
> > > > >
> > > > > 3. it is allocated during klp_init_patch() and all is fine
> > > > >
> > > > > 4. now we want to remove the patch module. It is disabled and
> > > module_put()
> > > > > is called. User calls rmmod on the module.
> > > > >
> > > > > 5. klp_unregister_patch() is called in __exit method.
> > > > >
> > > > > 6. klp_free_patch() is called.
> > > > >
> > > > > 7. kobject_put(patch->kobj) is called.
> > > > >
> > > > > ...now it gets interesting...
> > > > >
> > > > > 8. among others kobject_cleanup() is scheduled as a delayed work (this
> > > is
> > > > > important).
> > > > >
> > > > > 9. there is no completion, so kobject_put returns and the module goes
> > > > > away.
> > > > >
> > > > > 10. someone calls patch enabled_store attribute (for example). They
> > > can
> > > > > because kobject_cleanup() has not been called yet. It is delayed
> > > > > scheduled.
> > > > >
> > > > > ...crash...
> > > >
> > > > But what exactly causes the crash?  In enabled_store() we can see that
> > > > the patch isn't in the list, so we can return -EINVAL.
> > > 
> > > Ok, bad example. Take enabled_show() instead. It could be fixed in the
> > > same way, but I am not sure it is the right thing to do. It does not scale
> > > because the problem is elsewhere.
> > > 
> > > Anyway, it is (even if theoretically) there in my opinion and we
> > > have two options.
> > > 
> > > 1. We could forget about CONFIG_DEBUG_KOBJECT_RELEASE and all is ok
> > > without completion and regardless of dynamic/static kobject allocation.
> > > 
> > > 2. We introduce completion and we are ok even with
> > > CONFIG_DEBUG_KOBJECT_RELEASE=y and again regardless of dynamic/static
> > > kobject allocation.
> > 
> > I would disagree with the statement that the dynamic kobject doesn't
> > scale.  We would just need a helper function to get from a kobject to
> > its klp_patch.
> > 
> > In fact, to me it seems like the right way to do it.  It doesn't make
> > sense for the code which creates the kobject to be different from the
> > code which initializes it.  It's slightly out of context, but
> > kobject.txt does say:
> > 
> >  "Code which creates a kobject must, of course, initialize that object."
> > 
> > I view the completion as a hack to compensate for the fact that we're
> > abusing the kobject interface.  And so it makes sense to me that
> > CONFIG_DEBUG_KOBJECT_RELEASE would cause problems, because we're using
> > kobjects in the wrong way.
> > 
> > So in my view, the two options are:
> > 
> > 1. Convert the kobject to dynamic as I described.
> > 
> > 2. Change the klp_register() interface so that klp_patch gets allocated
> >   in livepatch code.
> > 
> > I'd be curious to hear what others think.
> 
> So, I think both of these solutions would enable us to get rid of
> the completion. Let me try to summarize..

Whoa, thanks for a good summary. That's exactly what was needed :)

> For solution #1, if we dynamically allocate the kobject, i.e. we have a
> pointer now instead of having it embedded in the klp_patch struct, we no
> longer need to worry if the corresponding klp_patch gets deallocated under
> our nose. Since the kobject_cleanup function is delayed w/
> CONFIG_DEBUG_KOBJECT_RELEASE, it is possible to have sysfs entries that
> refer to a klp_patch that no longer exists. Thus if any of the sysfs
> functions get called, we would have to take care to ensure that the
> klp_patch struct corresponding to the kobject in question actually still
> exists. In this case, all sysfs functions would require an extra check to
> make sure the matching klp_patch is still on the patches list and return an
> error if it isn't found. The "pro" is that this change would be simple, the
> "con" is that now kobjects are decoupled and managed completely separately
> from the object (klp_patch) with which they are associated, which doesn't
> feel 100% right.

Yes, I've thought a lot about it during the night (I've even dreams about 
kobjects now) and this doesn't seem as a good solution. We would need to 
make sure that we have appropriate checks everytime we change something in 
the code. Moreover I don't like walking through the list of patches each 
time. It is not critical path, but it is not nice anyway. This is what I 
meant with scaling problem previously.

I agree that while compaction makes all the problems go away, it is more 
like 'after a fashion' solution. Which leads me to... 

> For solution #2, we could have livepatch manage the (de)allocation of
> klp_patch objects internally. Maybe in this scenario the caller would need
> to request a klp_patch object be allocated and the caller would fill out the
> returned klp_patch struct as appropriate. In this case, we would be able
> leave the kobject embedded in the klp_patch struct (and dynamic kobjects
> wouldn't be needed), as livepatch would now have control of both structures.

I think this is the way to go.

> Then during the patch module exit path, when kobject_put is called, the
> klp_patch struct would be freed in its kobject's release function. We
> wouldn't have to hold up rmmod, and delayed execution of kobject_cleanup
> wouldn't break anything, because a klp_patch would then have the same
> "lifespan"
> as its corresponding kobject, and therefore it would be safe to invoke
> enabled_store & co. up until kobject_cleanup is finally executed. We'd be
> able to use container_of in this case as well. In addition, we wouldn't
> have to force all sysfs functions to support an awkward edge-case (i.e.
> checking if the corresponding klp_patch still exists). I think this
> solution also matches better with the typical use-case of the kobject
> release method, as described in kobject.txt (replacing 'my_object' with
> klp_patch):
> ---
> (...)
> This notification is done through a kobject's release() method.
> Usually such a method has a form like:
> 
>   void my_object_release(struct kobject *kobj)
>    {
>        struct my_object *mine = container_of(kobj, struct my_object, kobj);
>        /* Perform any additional cleanup on this object, then... */
>        kfree(mine);
>    }
> ---
> Apologies for the giant wall of text. In any case I feel like solution #2
> is actually closer in line with how kobjects are normally used, embedded in
> the structures they refer to, which get deallocated once their refcount
> hits 0. What do people think?

I agree. Josh and jikos proposed this as well, so I think we have an 
agreement. I'm gonna prepare a patch independent of the consistency model 
as this is a separate issue.

Thanks a lot,
Miroslav