From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1762229AbdJQN0T (ORCPT ); Tue, 17 Oct 2017 09:26:19 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:45024 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1762186AbdJQN0P (ORCPT ); Tue, 17 Oct 2017 09:26:15 -0400 Date: Tue, 17 Oct 2017 15:26:22 +0200 From: Greg KH To: Peter Rajnoha Cc: linux-kernel@vger.kernel.org Subject: Re: [PATCH] kobject: record and send PREV_SEQNUM with uevents Message-ID: <20171017132622.GC25683@kroah.com> References: <20171016123544.10113-1-prajnoha@redhat.com> <20171016135522.GA32638@kroah.com> <20171016144903.GA8410@kroah.com> <6784e348-f800-ff72-763f-11c4eff98fff@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6784e348-f800-ff72-763f-11c4eff98fff@redhat.com> User-Agent: Mutt/1.9.1 (2017-09-22) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Oct 17, 2017 at 12:04:41PM +0200, Peter Rajnoha wrote: > On 10/16/2017 04:49 PM, Greg KH wrote: > > On Mon, Oct 16, 2017 at 04:00:26PM +0200, Peter Rajnoha wrote: > >> On 10/16/2017 03:55 PM, Greg KH wrote: > >>> On Mon, Oct 16, 2017 at 02:35:44PM +0200, Peter Rajnoha wrote: > >>>> Record last uevent seqnum that was used with kobject's uevent and send > >>>> it with next uevent as PREV_SEQNUM= variable. > >>>> > >>>> This enhances the way we are able to track and handle uevents in userspace > >>>> if we are trying to catch up with all the uevents that were generated > >>>> while we were not listening or the uevents which were not delivered. > >>>> This may happen if the userspace listener is not running yet when the > >>>> uevent is triggered or there's a period of time when it's not listening > >>>> (e.g. because the userspace listener is being restarted while a new > >>>> uevent fires). > >>>> > >>>> A userspace listener can compare the last uevent seqnum it knows about > >>>> with the last uevent seqnum the kernel reports through uevents delivered > >>>> to userspace, both genuine and synthetic ones (the ones generated by > >>>> writing uevent action name to /sys/.../uevent file). Then, if needed, > >>>> userspace may execute appropriate actions based on that. We don't need > >>>> to reevaluate and recheck all items, instead, we can concentrate only > >>>> on items for which we really and actually need to reflect changed state > >>>> in kernel while the userspace listener was not listening for uevents. > >>>> > >>>> Signed-off-by: Peter Rajnoha > >>>> --- > >>>> include/linux/kobject.h | 1 + > >>>> lib/kobject.c | 1 + > >>>> lib/kobject_uevent.c | 15 +++++++++++++-- > >>>> 3 files changed, 15 insertions(+), 2 deletions(-) > >>>> > >>>> diff --git a/include/linux/kobject.h b/include/linux/kobject.h > >>>> index e0a6205caa71..ee6db28b6186 100644 > >>>> --- a/include/linux/kobject.h > >>>> +++ b/include/linux/kobject.h > >>>> @@ -73,6 +73,7 @@ struct kobject { > >>>> #ifdef CONFIG_DEBUG_KOBJECT_RELEASE > >>>> struct delayed_work release; > >>>> #endif > >>>> + u64 last_uevent_seqnum; > >>>> unsigned int state_initialized:1; > >>>> unsigned int state_in_sysfs:1; > >>>> unsigned int state_add_uevent_sent:1; > >>>> diff --git a/lib/kobject.c b/lib/kobject.c > >>>> index 763d70a18941..2cc809f10ae7 100644 > >>>> --- a/lib/kobject.c > >>>> +++ b/lib/kobject.c > >>>> @@ -190,6 +190,7 @@ static void kobject_init_internal(struct kobject *kobj) > >>>> return; > >>>> kref_init(&kobj->kref); > >>>> INIT_LIST_HEAD(&kobj->entry); > >>>> + kobj->last_uevent_seqnum = 0; > >>>> kobj->state_in_sysfs = 0; > >>>> kobj->state_add_uevent_sent = 0; > >>>> kobj->state_remove_uevent_sent = 0; > >>>> diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c > >>>> index f237a09a5862..f1cbed06a395 100644 > >>>> --- a/lib/kobject_uevent.c > >>>> +++ b/lib/kobject_uevent.c > >>>> @@ -454,8 +454,19 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action, > >>>> } > >>>> > >>>> mutex_lock(&uevent_sock_mutex); > >>>> - /* we will send an event, so request a new sequence number */ > >>>> - retval = add_uevent_var(env, "SEQNUM=%llu", (unsigned long long)++uevent_seqnum); > >>>> + /* > >>>> + * We will send an event, so request a new sequence number. > >>>> + * Also, record the number for next uevent's PREV_SEQNUM. > >>>> + */ > >>>> + kobj->last_uevent_seqnum = ++uevent_seqnum; > >>>> + retval = add_uevent_var(env, "SEQNUM=%llu", > >>>> + (unsigned long long)uevent_seqnum); > >>>> + if (retval) { > >>>> + mutext_unlock(&uevent_sock_mutex); > >>>> + goto exit; > >>>> + } > >>>> + retval = add_uevent_var(env, "PREV_SEQNUM=%llu", > >>>> + (unsigned long long)kobj->last_uevent_seqnum); > >>> > >>> I'm confused, why are we storing "last_uevent_seqnum" here at all, if it > >>> is only just the current one-1? What else do you do with that value? > >> I'm sorry, I've sent incorrect patch by mistake. This is the correct one: > > > > Ok, this is a bit better (but it's not in a format I could apply it in, > > if I wanted to...) > > > >> From 91fa0d0f5c52b959a5ca4cc79d1a4f5970db75e2 Mon Sep 17 00:00:00 2001 > >> From: Peter Rajnoha > >> Date: Thu, 12 Oct 2017 11:33:48 +0200 > >> Subject: [PATCH] kobject: record and send PREV_SEQNUM with uevents > >> > >> Record last uevent seqnum that was used with kobject's uevent and send > >> it with next uevent as PREV_SEQNUM= variable. > >> > >> This enhances the way we are able to track and handle uevents in userspace > >> if we are trying to catch up with all the uevents that were generated > >> while we were not listening or the uevents which were not delivered. > >> This may happen if the userspace listener is not running yet when the > >> uevent is triggered or there's a period of time when it's not listening > >> (e.g. because the userspace listener is being restarted while a new > >> uevent fires). > >> > >> A userspace listener can compare the last uevent seqnum it knows about > >> with the last uevent seqnum the kernel reports through uevents delivered > >> to userspace, both genuine and synthetic ones (the ones generated by > >> writing uevent action name to /sys/.../uevent file). Then, if needed, > >> userspace may execute appropriate actions based on that. We don't need > >> to reevaluate and recheck all items, instead, we can concentrate only > >> on items for which we really and actually need to reflect changed state > >> in kernel while the userspace listener was not listening for uevents. > >> > >> Signed-off-by: Peter Rajnoha > > > > I don't really understand what userspace can do with this. Why does it > > matter what the sequence number was for a specific kobject? What can > > happen because of this? > > > > Do you have any example libudev code changes to show this in action so I > > can maybe understand the need? > > > > Not a libudev code change for this, at least I don't have this in plan > at this moment. > > It's more related to the way the udev rules are processed and, in > general, the way any other uevent listener can react to uevents. So the > change is in how "end consumer" of the uevent does its processing. > > At the moment, the sequence number is used globally, we don't have > per-kobject sequence numbers. So we can't use current SEQNUM that is > attached to each uevent to tell whether we have missed previous > uevent(s) for concrete device or not, we just know that some uevents for > some devices were not processed. Therefore, when we're triggering > uevents from userspace (udevadm trigger), we need to reevaluate all the > rules for all devices, no matter if it's really needed or not. > > For example, when we're switching from initramfs to root fs, we stop the > first udevd instance which is in initramfs, we keep the database (at the > moment, only for selected devices where we know that it's safe to do so > - same rules used in initramfs as in root fs). Then we switch over to > root fs and start the udevd again and we trigger uevents to fully > refresh the state, taking also the existing database into account. > > Now, it would help if we knew whether there were any uevents issued > after we stopped udevd in initramfs and before we started a new one from > root fs. If there are no uevents in between, we don't need to reevaluate > all the rules, we can just keep the existing database content. This can > become prominent if we have more devices in the system. It helps to > minimize the number of execution during boot, the overall impact of > triggering uevents and related processing, hence also making it more > straightforward to watch the state of the device and possibly debug > related problems. > > The same principle applies to any uevent listener in userspace, not just > udevd: > > - recording last seen SEQNUM > > - when new instance of the listener is running > > - trigger synthetic uevents to refresh state > > - comparing the PREV_SEQNUM from uevent generated with stored SEQNUM: Why can't you just compare SEQNUM with the "last seen" SEQNUM, they should just be 1 apart, right? Isn't that what is done today? How will this differ from looking at PREV_SEQNUM? > - If they differ, refresh the state fully, considering the state > we know of as obsolete as there were already other uevents before we > haven't processed. This may include deeper scans which may take longer, > executing more helper tools to get the state of the device. But that is what "last seen" also shows. > - If the are same, reuse the state we already know about is > up-to-date (there were no uevents we haven't seen), don't reevaluate > everything. Testing for +1 should be the same. > With the new synthetic uevent interface (the extention to > /sys/.../uevent) that accepts arguments (UUID and key=value pairs), we > can make it possible to specify the nature of the triggered uevent, we > can mark it through the key=value pairs which the uevent listener in > userspace can check for, so if we can also: but that doesn't matter for PREV_SEQNUM. > - either refresh only the state for devices where we missed uevents > (that could be used during bootup, only to refresh the state for devices > where we have no current state yet), comparing current PREV_SEQNUM and > stored last SEQNUM in userspace > > - or to force full refresh if that's needed for a reason. > > Also, some devices may have more complex activation scheme than usual > "device addition" to the system. There can be a sequence of uevents > which are expected for a successful change of device state, so we have a > state machine that tracks uevents (as a confirmation from kernel for > userspace that we can hook on in udev rules and uevent listeners). And > here, it's very helpful to know that we have missed uevent(s) before or > not, either they were not delivered or the uevent listener itself has > been restarted and during this period, it may have missed uevents. > > In summary, it's adding more robustness to the uevent interface where we > can directly and quickly detect we missed the uevent or not for a > specific device and hence we can optimize udev rules/uevent listeners > based on that so we don't dully refresh all the state and reexecute all > rules and code for any triggered uevents. It's really not needed all the > time. I'd like to see real, working code that uses this new interface, showing how it is not possible to do what is needed without the PREV_SEQNUM, before considering adding this. Otherwise it's just stuff we have to drag around for forever. thanks, greg k-h