Re: [PATCH v2 2/3] livepatch: add atomic replace

From: Petr Mladek <pmladek@suse.com>
To: Jason Baron <jbaron@akamai.com>
Cc: linux-kernel@vger.kernel.org, live-patching@vger.kernel.org,
	jpoimboe@redhat.com, jeyu@kernel.org, jikos@kernel.org,
	mbenes@suse.cz, gregkh@linuxfoundation.org
Subject: Re: [PATCH v2 2/3] livepatch: add atomic replace
Date: Fri, 15 Sep 2017 17:46:58 +0200	[thread overview]
Message-ID: <20170915154658.GM2908@pathway.suse.cz> (raw)
In-Reply-To: <b3961ef5-9f42-d692-07ab-18a4639d3aa2@akamai.com>

On Thu 2017-09-14 08:31:24, Jason Baron wrote:
> 
> 
> On 09/14/2017 06:32 AM, Petr Mladek wrote:
> > On Tue 2017-09-12 23:47:32, Jason Baron wrote:
> >>
> >>
> >> On 09/12/2017 01:35 AM, Petr Mladek wrote:
> >>> On Mon 2017-09-11 23:46:28, Jason Baron wrote:
> >>>> On 09/11/2017 09:53 AM, Petr Mladek wrote:
> >>>>> On Wed 2017-08-30 17:38:44, Jason Baron wrote:
> >>>>>> diff --git a/kernel/livepatch/core.c b/kernel/livepatch/core.c
> >>>>>> index 6004be3..21cecc5 100644
> >>>>>> --- a/kernel/livepatch/core.c
> >>>>>> +++ b/kernel/livepatch/core.c
> >>>>>> @@ -600,13 +603,38 @@ static void klp_free_patch(struct klp_patch *patch)
> >>>>>>  		list_del(&patch->list);
> >>>>>>  }
> >>>>>>  
> >>>>>> -static int klp_init_func(struct klp_object *obj, struct klp_func *func)
> >>>>>> +void klp_patch_free_no_ops(struct klp_patch *patch)
> >>>>>> +{
> >>>>>> +	struct klp_object *obj, *tmp_obj;
> >>>>>> +	struct klp_func *func, *tmp_func;
> >>>>>> +
> >>>>>> +	klp_for_each_object(patch, obj) {
> >>>>>> +		list_for_each_entry_safe(func, tmp_func, &obj->func_list,
> >>>>>> +					 func_entry) {
> >>>>>> +			list_del_init(&func->func_entry);
> >>>>>> +			kobject_put(&func->kobj);
> >>>>>> +			kfree(func->old_name);
> >>>>>> +			kfree(func);
> >>>>>
> >>>>> kobject_put() is asynchronous. The structure should be freed using
> >>>>> the kobject release method.
> >>>>>
> >>>>> The question is how secure this should be. We might want to block
> >>>>> other operations with the patch until all the release methods
> >>>>> are called. But this might be tricky as there is not a single
> >>>>> root kobject that would get destroyed at the end.
> >>>>>
> >>>>> A crazy solution would be to define a global atomic counter.
> >>>>> It might get increased with each kobject_put() call and
> >>>>> descreased in each release method. Then we could wait
> >>>>> until it gets zero.
> >>>>>
> >>>>> It should be safe to wait with klp_mutex taken. Note that this
> >>>>> is not possible with patch->kobj() where the "the enable"
> >>>>> attribute takes the mutex as well, see
> >>>>> enabled_store() in kernel/livepatch/core.c.
> >>>>
> >>>> Thanks for pointing this out - it looks like the livepatch code uses
> >>>> wait_for_completion() with special kobj callbacks. Perhaps, there could
> >>>> be a nop callback, but I'd have to look at this more closely...
> >>>
> >>> The completion is usable for the root kobject but you do not free
> >>> it here. It might be pretty complicated if you need separate
> >>> completion for all the freed kobjects.
> >>>
> >>> A solution might be a global atomic counter and a waitqueue.
> >>> Feel free to ask for more details.
> >>>
> >>
> >> So the issue is that the user might access some of the klp_* data
> >> structures via /sysfs after we have already freed them?
> > 
> > yes
> > 
> >> if so, this seems to be a common kernel pattern:
> >>
> >> kobject_del(kobj);
> >> kobject_put(kobj);
> > 
> > IMHO, this is used for other reason.
> > 
> > kobject_del() removes the object from the hierachy. Therefore
> > it prevents new operations. But it does not wait for the exiting
> > operations to finish. Therefore there still might be users that
> > access the data even after this function finishes.
> 
> I looked into this further - and it does appear to wait until all
> operations finish. In kernfs_drain() the code does:
> 
> wait_event(root->deactivate_waitq, atomic_read(&kn->active) ==
> KN_DEACTIVATED_BIAS);
> 
> The call stack is:
> 
> kobject_del()
>  sysfs_remove_dir()
>   kernfs_remove()
>    __kernfs_remove()
>     kernfs_drain()
> 
> And __kernfs_remove() has already done a 'deactivate' prior:
> 
> /* prevent any new usage under @kn by deactivating all nodes */
> 
> pos = NULL;
> 
> while ((pos = kernfs_next_descendant_post(pos, kn)))
> 
>        if (kernfs_active(pos))
> 
>                atomic_add(KN_DEACTIVATED_BIAS, &pos->active);
> 
> 
> So I *think* doing the kobject_del() first is sufficient. It may be
> worth some better documented if that is the case...

Sigh, I am confused. Your arguments look convincing. But I still
feel a fear. As I said, we had had a long discussion about this long
time ago. Unfortunately, I do not remember all the details.

In each case, there is the following mentioned in
Documentation/kobject.txt:

=== cut ===
Once you registered your kobject via kobject_add(), you must never use
kfree() to free it directly. The only safe way is to use kobject_put(). It
is good practice to always use kobject_put() after kobject_init() to avoid
errors creeping in.

This notification is done through a kobject's release() method. Usually
such a method has a form like::

    void my_object_release(struct kobject *kobj)
    {
            struct my_object *mine = container_of(kobj, struct my_object, kobj);

            /* Perform any additional cleanup on this object, then... */
            kfree(mine);
    }

One important point cannot be overstated: every kobject must have a
release() method, and the kobject must persist (in a consistent state)
until that method is called. If these constraints are not met, the code is
flawed.  Note that the kernel will warn you if you forget to provide a
release() method.  Do not try to get rid of this warning by providing an
"empty" release function; you will be mocked mercilessly by the kobject
maintainer if you attempt this.

Note, the name of the kobject is available in the release function, but it
must NOT be changed within this callback.  Otherwise there will be a memory
leak in the kobject core, which makes people unhappy.
=== cut ===

The fact is that if you enable CONFIG_DEBUG_KOBJECT_RELEASE, it will
make a deferred access to the struct kobject by intention.
See schedule_delayed_work() in kobject_release.

The struct kobject is part of both struct klp_func and
struct klp_object. The delayed call will cause an access
to an already freed memory if you free the structures
immediately after calling kobject_put().

I am not sure why this is. Either we missed something and
kernfs_drain() is not enough to avoid extra references
of the kobject. Or kobject authors are over paranoid and
push people to the only supported and "right design"
a hard way.

Anyway, we either need to "fix" kobject implementation
or we need to free the structures in the respective
release callbacks.

Best Regards,
Petr