From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756302AbcEEI2R (ORCPT ); Thu, 5 May 2016 04:28:17 -0400 Received: from mx2.suse.de ([195.135.220.15]:34821 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755909AbcEEI2N (ORCPT ); Thu, 5 May 2016 04:28:13 -0400 Date: Thu, 5 May 2016 10:28:12 +0200 (CEST) From: Miroslav Benes To: Josh Poimboeuf cc: Jiri Kosina , jeyu@redhat.com, pmladek@suse.com, jslaby@suse.cz, live-patching@vger.kernel.org, linux-kernel@vger.kernel.org, huawei.libin@huawei.com, minfei.huang@yahoo.com Subject: Re: [RFC PATCH] livepatch: allow removal of a disabled patch In-Reply-To: <20160504161423.pvupipravfxuyktz@treble> Message-ID: References: <1462190242-24731-1-git-send-email-mbenes@suse.cz> <20160503213709.g66xr624pwn7oguu@treble> <20160504023948.ttb3ko2wnicwruwy@treble> <20160504033619.5osteklgal3ixcbo@treble> <20160504131423.5yqvie2zy67jspak@treble> <20160504161423.pvupipravfxuyktz@treble> User-Agent: Alpine 2.00 (LNX 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 4 May 2016, Josh Poimboeuf wrote: > On Wed, May 04, 2016 at 04:35:28PM +0200, Miroslav Benes wrote: > > On Wed, 4 May 2016, Josh Poimboeuf wrote: > > > > > On Wed, May 04, 2016 at 01:58:47PM +0200, Miroslav Benes wrote: > > > > On Tue, 3 May 2016, Josh Poimboeuf wrote: > > > > > > > > > On Tue, May 03, 2016 at 09:39:48PM -0500, Josh Poimboeuf wrote: > > > > > > On Wed, May 04, 2016 at 12:31:12AM +0200, Jiri Kosina wrote: > > > > > > > On Tue, 3 May 2016, Josh Poimboeuf wrote: > > > > > > > > > > > > > > > > 1. Do we really need a completion? If I am not missing something > > > > > > > > > kobject_del() always waits for sysfs callers to leave thanks to kernfs > > > > > > > > > active protection. > > > > > > > > > > > > > > > > What do you mean by "kernfs active protection"? I see that > > > > > > > > kernfs_remove() gets the kernfs_mutex lock, but I can't find anywhere > > > > > > > > that a write to a sysfs file uses that lock. > > > > > > > > > > > > > > > > I'm probably missing something... > > > > > > > > > > > > > > I don't want to speak on Miroslav's behalf, but I'm pretty sure that what > > > > > > > he has on mind is per-kernfs_node active refcounting kernfs does (see > > > > > > > kernfs_node->active, and especially it's usage in __kernfs_remove()). > > > > > > > > > > > > > > More specifically, execution of store() and show() sysfs callbacks is > > > > > > > guaranteed (by kernfs) to happen with that particular attribute's active > > > > > > > reference held for reading (and that makes it impossible for that > > > > > > > attribute to vanish prematurely). > > > > > > > > > > > > Thanks, that makes sense. > > > > > > > > > > > > So what exactly is the problem the completion is trying to solve? Is it > > > > > > to ensure that the kobject has been cleaned up before it returns to the > > > > > > caller, in case the user wants to call klp_register() again after > > > > > > unregistering? > > > > > > > > > > > > If so, that's quite an unusual use case which I think we should just > > > > > > consider unsupported. In fact, if you try to do it, kobject_init() > > > > > > complains loudly because kobj->state_initialized is still 1 because > > > > > > kobjects aren't meant to be reused like that. > > > > > > > > > > ... and now I realize the point is actually to prevent the caller from > > > > > freeing klp_patch before kobject_cleanup() runs. > > > > > > > > Exactly. Sorry I was so brief. > > > > > > > > > So yeah, it looks like we need the completion in case > > > > > CONFIG_DEBUG_KOBJECT_RELEASE is enabled. > > > > > > > > > > Or alternatively we could convert patch->kobj to be dynamically > > > > > allocated instead of embedded in klp_patch. > > > > > > > > But that wouldn't help, would it? Dynamic kobjects registers generic > > > > release function dynamic_kobj_release() and that's it. We're in the same > > > > situation. I have got a feeling that dynamic kobjects are only for trivial > > > > cases. > > > > > > But the patch release doesn't need to do anything, right? > > > > That is correct. I wanted to point out that dynamic_kobj_release() did not > > really solve our "completion" issue. If there is a problem in our code and > > we need completion, dynamic kobjects would not help. If we don't need a > > completion at all we could move to dynamic kobjects. > > Why would we need a completion if we had a dynamic kobject? The > lifetime of the kobject no longer matters if it's separated from the > klp_patch struct. The user can go ahead and free the klp_patch before > the kobject gets freed, no problem. See below. > > There is still container_of() though. > > Yes, but we could still find the corresponding klp_patch by searching > through the klp_patches list. It's not ideal, but IMO it would be > better than needing the completion to account for our "special" > situation. That would be a solution. > > > > Moreover we use container_of() several times in the code and that does not > > > > work with dynamically allocated kobjects. > > > > > > > > Anyway I am really confused now. When I read changelog of c817a67ecba7 > > > > ("kobject: delayed kobject release: help find buggy drivers") all makes > > > > perfect sense. But isn't our situation somewhat special, because we have > > > > refcounts completely under control? So we know that once we call > > > > kobject_put() we can let a patch go... I must be missing something. > > > > > > > > It does not make sense to introduce completion just to satisfy a feature > > > > which was introduced to debug general cases. > > > > > > I think our situation is "special" because klp_patch and its embedded > > > kobject have separate lifetimes. We have a kobject, which we own, which > > > is embedded in a klp_patch struct, which we don't own. > > > > > > If I understand correctly, normally when you release a kobject, the > > > containing struct gets freed. But we can't do that here because the > > > caller allocated the klp_patch. > > > > Normally only struct kobject is freed, no? > > > > From Documentation/kobject.txt: > > > > "One important point cannot be overstated: every kobject must have a > > release() method, and the kobject must persist (in a consistent state) > > until that method is called. If these constraints are not met, the code is > > flawed." > > > > So we need to only make sure that klp_patch does not disappear before > > calling release() method. In our case that cannot happen even without > > completion, because we call kobject_put() in klp_unregister_patch() when > > the patch module is going and kobject_put() calls our release (potentially > > empty) method in a "sync" way. If I read that code correctly. > > > > This does not hold only if CONFIG_DEBUG_KOBJECT_RELEASE=y. > > Hm, maybe I'm missing your point, or maybe you're missing mine. Maybe > both :-) > > I understand why your patch has a completion. And I understand that > it's needed because of CONFIG_DEBUG_KOBJECT_RELEASE. But even then I > think it's only needed *if* the kobject is embedded in the klp_patch > struct. > > In my experience, usually the point of having a kobject release function > is so you can kfree the kobject's containing struct. That doesn't > conflict with your quote from kobject.txt. > > But in our "special" code, klp_patch (and thus the kobject itself) is > allocated by the caller. But its embedded kobject is initialized by > livepatch code. In my experience, it's highly unusual to have the > kobject allocated by one party and initialized by another. > > That's why I think it would be much more logical to turn patch->obj into > a pointer. Then we can allocate it *and* initialize it, so that we can > 100% control its lifetime and not have to worry about the low-level > details of how kobject_del() works, completions, > CONFIG_DEBUG_KOBJECT_RELEASE, etc. And we would be more in line with > the principles of kobject, as I understand them ;-) I think it boils down to the following problem. 1. CONFIG_DEBUG_KOBJECT_RELEASE=y 2. we have dynamic kobjects, so there is a pointer in klp_patch to struct kobject 3. it is allocated during klp_init_patch() and all is fine 4. now we want to remove the patch module. It is disabled and module_put() is called. User calls rmmod on the module. 5. klp_unregister_patch() is called in __exit method. 6. klp_free_patch() is called. 7. kobject_put(patch->kobj) is called. ...now it gets interesting... 8. among others kobject_cleanup() is scheduled as a delayed work (this is important). 9. there is no completion, so kobject_put returns and the module goes away. 10. someone calls patch enabled_store attribute (for example). They can because kobject_cleanup() has not been called yet. It is delayed scheduled. ...crash... Is this scenario possible? I think it is. And I think it is possible regardless dynamic/static allocation of the kobject in klp_patch (static allocation scenario was my concern at the beginning of this thread). I think it is not possible when CONFIG_DEBUG_KOBJECT_RELEASE=n. Again regardless of dynamic/static situation. So if it is possible we need the completion just because of CONFIG_DEBUG_KOBJECT_RELEASE. Which is unfortunate. Does all this make sense now? Miroslav