All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] kobject: record and send PREV_SEQNUM with uevents
@ 2017-10-16 12:35 Peter Rajnoha
  2017-10-16 13:55 ` Greg KH
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Rajnoha @ 2017-10-16 12:35 UTC (permalink / raw)
  To: gregkh; +Cc: linux-kernel, Peter Rajnoha

Record last uevent seqnum that was used with kobject's uevent and send
it with next uevent as PREV_SEQNUM=<num> variable.

This enhances the way we are able to track and handle uevents in userspace
if we are trying to catch up with all the uevents that were generated
while we were not listening or the uevents which were not delivered.
This may happen if the userspace listener is not running yet when the
uevent is triggered or there's a period of time when it's not listening
(e.g. because the userspace listener is being restarted while a new
uevent fires).

A userspace listener can compare the last uevent seqnum it knows about
with the last uevent seqnum the kernel reports through uevents delivered
to userspace, both genuine and synthetic ones (the ones generated by
writing uevent action name to /sys/.../uevent file). Then, if needed,
userspace may execute appropriate actions based on that. We don't need
to reevaluate and recheck all items, instead, we can concentrate only
on items for which we really and actually need to reflect changed state
in kernel while the userspace listener was not listening for uevents.

Signed-off-by: Peter Rajnoha <prajnoha@redhat.com>
---
 include/linux/kobject.h |  1 +
 lib/kobject.c           |  1 +
 lib/kobject_uevent.c    | 15 +++++++++++++--
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/linux/kobject.h b/include/linux/kobject.h
index e0a6205caa71..ee6db28b6186 100644
--- a/include/linux/kobject.h
+++ b/include/linux/kobject.h
@@ -73,6 +73,7 @@ struct kobject {
 #ifdef CONFIG_DEBUG_KOBJECT_RELEASE
 	struct delayed_work	release;
 #endif
+	u64			last_uevent_seqnum;
 	unsigned int state_initialized:1;
 	unsigned int state_in_sysfs:1;
 	unsigned int state_add_uevent_sent:1;
diff --git a/lib/kobject.c b/lib/kobject.c
index 763d70a18941..2cc809f10ae7 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -190,6 +190,7 @@ static void kobject_init_internal(struct kobject *kobj)
 		return;
 	kref_init(&kobj->kref);
 	INIT_LIST_HEAD(&kobj->entry);
+	kobj->last_uevent_seqnum = 0;
 	kobj->state_in_sysfs = 0;
 	kobj->state_add_uevent_sent = 0;
 	kobj->state_remove_uevent_sent = 0;
diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index f237a09a5862..f1cbed06a395 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -454,8 +454,19 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
 	}
 
 	mutex_lock(&uevent_sock_mutex);
-	/* we will send an event, so request a new sequence number */
-	retval = add_uevent_var(env, "SEQNUM=%llu", (unsigned long long)++uevent_seqnum);
+	/*
+	 * We will send an event, so request a new sequence number.
+	 * Also, record the number for next uevent's PREV_SEQNUM.
+	 */
+	kobj->last_uevent_seqnum = ++uevent_seqnum;
+	retval = add_uevent_var(env, "SEQNUM=%llu",
+				(unsigned long long)uevent_seqnum);
+	if (retval) {
+		mutext_unlock(&uevent_sock_mutex);
+		goto exit;
+	}
+	retval = add_uevent_var(env, "PREV_SEQNUM=%llu",
+				(unsigned long long)kobj->last_uevent_seqnum);
 	if (retval) {
 		mutex_unlock(&uevent_sock_mutex);
 		goto exit;
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] kobject: record and send PREV_SEQNUM with uevents
  2017-10-16 12:35 [PATCH] kobject: record and send PREV_SEQNUM with uevents Peter Rajnoha
@ 2017-10-16 13:55 ` Greg KH
  2017-10-16 14:00   ` Peter Rajnoha
  0 siblings, 1 reply; 8+ messages in thread
From: Greg KH @ 2017-10-16 13:55 UTC (permalink / raw)
  To: Peter Rajnoha; +Cc: linux-kernel

On Mon, Oct 16, 2017 at 02:35:44PM +0200, Peter Rajnoha wrote:
> Record last uevent seqnum that was used with kobject's uevent and send
> it with next uevent as PREV_SEQNUM=<num> variable.
> 
> This enhances the way we are able to track and handle uevents in userspace
> if we are trying to catch up with all the uevents that were generated
> while we were not listening or the uevents which were not delivered.
> This may happen if the userspace listener is not running yet when the
> uevent is triggered or there's a period of time when it's not listening
> (e.g. because the userspace listener is being restarted while a new
> uevent fires).
> 
> A userspace listener can compare the last uevent seqnum it knows about
> with the last uevent seqnum the kernel reports through uevents delivered
> to userspace, both genuine and synthetic ones (the ones generated by
> writing uevent action name to /sys/.../uevent file). Then, if needed,
> userspace may execute appropriate actions based on that. We don't need
> to reevaluate and recheck all items, instead, we can concentrate only
> on items for which we really and actually need to reflect changed state
> in kernel while the userspace listener was not listening for uevents.
> 
> Signed-off-by: Peter Rajnoha <prajnoha@redhat.com>
> ---
>  include/linux/kobject.h |  1 +
>  lib/kobject.c           |  1 +
>  lib/kobject_uevent.c    | 15 +++++++++++++--
>  3 files changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/kobject.h b/include/linux/kobject.h
> index e0a6205caa71..ee6db28b6186 100644
> --- a/include/linux/kobject.h
> +++ b/include/linux/kobject.h
> @@ -73,6 +73,7 @@ struct kobject {
>  #ifdef CONFIG_DEBUG_KOBJECT_RELEASE
>  	struct delayed_work	release;
>  #endif
> +	u64			last_uevent_seqnum;
>  	unsigned int state_initialized:1;
>  	unsigned int state_in_sysfs:1;
>  	unsigned int state_add_uevent_sent:1;
> diff --git a/lib/kobject.c b/lib/kobject.c
> index 763d70a18941..2cc809f10ae7 100644
> --- a/lib/kobject.c
> +++ b/lib/kobject.c
> @@ -190,6 +190,7 @@ static void kobject_init_internal(struct kobject *kobj)
>  		return;
>  	kref_init(&kobj->kref);
>  	INIT_LIST_HEAD(&kobj->entry);
> +	kobj->last_uevent_seqnum = 0;
>  	kobj->state_in_sysfs = 0;
>  	kobj->state_add_uevent_sent = 0;
>  	kobj->state_remove_uevent_sent = 0;
> diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
> index f237a09a5862..f1cbed06a395 100644
> --- a/lib/kobject_uevent.c
> +++ b/lib/kobject_uevent.c
> @@ -454,8 +454,19 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
>  	}
>  
>  	mutex_lock(&uevent_sock_mutex);
> -	/* we will send an event, so request a new sequence number */
> -	retval = add_uevent_var(env, "SEQNUM=%llu", (unsigned long long)++uevent_seqnum);
> +	/*
> +	 * We will send an event, so request a new sequence number.
> +	 * Also, record the number for next uevent's PREV_SEQNUM.
> +	 */
> +	kobj->last_uevent_seqnum = ++uevent_seqnum;
> +	retval = add_uevent_var(env, "SEQNUM=%llu",
> +				(unsigned long long)uevent_seqnum);
> +	if (retval) {
> +		mutext_unlock(&uevent_sock_mutex);
> +		goto exit;
> +	}
> +	retval = add_uevent_var(env, "PREV_SEQNUM=%llu",
> +				(unsigned long long)kobj->last_uevent_seqnum);

I'm confused, why are we storing "last_uevent_seqnum" here at all, if it
is only just the current one-1?  What else do you do with that value?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] kobject: record and send PREV_SEQNUM with uevents
  2017-10-16 13:55 ` Greg KH
@ 2017-10-16 14:00   ` Peter Rajnoha
  2017-10-16 14:49     ` Greg KH
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Rajnoha @ 2017-10-16 14:00 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-kernel

On 10/16/2017 03:55 PM, Greg KH wrote:
> On Mon, Oct 16, 2017 at 02:35:44PM +0200, Peter Rajnoha wrote:
>> Record last uevent seqnum that was used with kobject's uevent and send
>> it with next uevent as PREV_SEQNUM=<num> variable.
>>
>> This enhances the way we are able to track and handle uevents in userspace
>> if we are trying to catch up with all the uevents that were generated
>> while we were not listening or the uevents which were not delivered.
>> This may happen if the userspace listener is not running yet when the
>> uevent is triggered or there's a period of time when it's not listening
>> (e.g. because the userspace listener is being restarted while a new
>> uevent fires).
>>
>> A userspace listener can compare the last uevent seqnum it knows about
>> with the last uevent seqnum the kernel reports through uevents delivered
>> to userspace, both genuine and synthetic ones (the ones generated by
>> writing uevent action name to /sys/.../uevent file). Then, if needed,
>> userspace may execute appropriate actions based on that. We don't need
>> to reevaluate and recheck all items, instead, we can concentrate only
>> on items for which we really and actually need to reflect changed state
>> in kernel while the userspace listener was not listening for uevents.
>>
>> Signed-off-by: Peter Rajnoha <prajnoha@redhat.com>
>> ---
>>  include/linux/kobject.h |  1 +
>>  lib/kobject.c           |  1 +
>>  lib/kobject_uevent.c    | 15 +++++++++++++--
>>  3 files changed, 15 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/kobject.h b/include/linux/kobject.h
>> index e0a6205caa71..ee6db28b6186 100644
>> --- a/include/linux/kobject.h
>> +++ b/include/linux/kobject.h
>> @@ -73,6 +73,7 @@ struct kobject {
>>  #ifdef CONFIG_DEBUG_KOBJECT_RELEASE
>>  	struct delayed_work	release;
>>  #endif
>> +	u64			last_uevent_seqnum;
>>  	unsigned int state_initialized:1;
>>  	unsigned int state_in_sysfs:1;
>>  	unsigned int state_add_uevent_sent:1;
>> diff --git a/lib/kobject.c b/lib/kobject.c
>> index 763d70a18941..2cc809f10ae7 100644
>> --- a/lib/kobject.c
>> +++ b/lib/kobject.c
>> @@ -190,6 +190,7 @@ static void kobject_init_internal(struct kobject *kobj)
>>  		return;
>>  	kref_init(&kobj->kref);
>>  	INIT_LIST_HEAD(&kobj->entry);
>> +	kobj->last_uevent_seqnum = 0;
>>  	kobj->state_in_sysfs = 0;
>>  	kobj->state_add_uevent_sent = 0;
>>  	kobj->state_remove_uevent_sent = 0;
>> diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
>> index f237a09a5862..f1cbed06a395 100644
>> --- a/lib/kobject_uevent.c
>> +++ b/lib/kobject_uevent.c
>> @@ -454,8 +454,19 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
>>  	}
>>  
>>  	mutex_lock(&uevent_sock_mutex);
>> -	/* we will send an event, so request a new sequence number */
>> -	retval = add_uevent_var(env, "SEQNUM=%llu", (unsigned long long)++uevent_seqnum);
>> +	/*
>> +	 * We will send an event, so request a new sequence number.
>> +	 * Also, record the number for next uevent's PREV_SEQNUM.
>> +	 */
>> +	kobj->last_uevent_seqnum = ++uevent_seqnum;
>> +	retval = add_uevent_var(env, "SEQNUM=%llu",
>> +				(unsigned long long)uevent_seqnum);
>> +	if (retval) {
>> +		mutext_unlock(&uevent_sock_mutex);
>> +		goto exit;
>> +	}
>> +	retval = add_uevent_var(env, "PREV_SEQNUM=%llu",
>> +				(unsigned long long)kobj->last_uevent_seqnum);
> 
> I'm confused, why are we storing "last_uevent_seqnum" here at all, if it
> is only just the current one-1?  What else do you do with that value?
I'm sorry, I've sent incorrect patch by mistake. This is the correct one:

>From 91fa0d0f5c52b959a5ca4cc79d1a4f5970db75e2 Mon Sep 17 00:00:00 2001
From: Peter Rajnoha <prajnoha@redhat.com>
Date: Thu, 12 Oct 2017 11:33:48 +0200
Subject: [PATCH] kobject: record and send PREV_SEQNUM with uevents

Record last uevent seqnum that was used with kobject's uevent and send
it with next uevent as PREV_SEQNUM=<num> variable.

This enhances the way we are able to track and handle uevents in userspace
if we are trying to catch up with all the uevents that were generated
while we were not listening or the uevents which were not delivered.
This may happen if the userspace listener is not running yet when the
uevent is triggered or there's a period of time when it's not listening
(e.g. because the userspace listener is being restarted while a new
uevent fires).

A userspace listener can compare the last uevent seqnum it knows about
with the last uevent seqnum the kernel reports through uevents delivered
to userspace, both genuine and synthetic ones (the ones generated by
writing uevent action name to /sys/.../uevent file). Then, if needed,
userspace may execute appropriate actions based on that. We don't need
to reevaluate and recheck all items, instead, we can concentrate only
on items for which we really and actually need to reflect changed state
in kernel while the userspace listener was not listening for uevents.

Signed-off-by: Peter Rajnoha <prajnoha@redhat.com>
---
 include/linux/kobject.h |  1 +
 lib/kobject.c           |  1 +
 lib/kobject_uevent.c    | 12 +++++++++++-
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/include/linux/kobject.h b/include/linux/kobject.h
index e0a6205caa71..ee6db28b6186 100644
--- a/include/linux/kobject.h
+++ b/include/linux/kobject.h
@@ -73,6 +73,7 @@ struct kobject {
 #ifdef CONFIG_DEBUG_KOBJECT_RELEASE
 	struct delayed_work	release;
 #endif
+	u64			last_uevent_seqnum;
 	unsigned int state_initialized:1;
 	unsigned int state_in_sysfs:1;
 	unsigned int state_add_uevent_sent:1;
diff --git a/lib/kobject.c b/lib/kobject.c
index 763d70a18941..2cc809f10ae7 100644
--- a/lib/kobject.c
+++ b/lib/kobject.c
@@ -190,6 +190,7 @@ static void kobject_init_internal(struct kobject *kobj)
 		return;
 	kref_init(&kobj->kref);
 	INIT_LIST_HEAD(&kobj->entry);
+	kobj->last_uevent_seqnum = 0;
 	kobj->state_in_sysfs = 0;
 	kobj->state_add_uevent_sent = 0;
 	kobj->state_remove_uevent_sent = 0;
diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
index f237a09a5862..b1d301bab3e6 100644
--- a/lib/kobject_uevent.c
+++ b/lib/kobject_uevent.c
@@ -454,12 +454,22 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
 	}
 
 	mutex_lock(&uevent_sock_mutex);
-	/* we will send an event, so request a new sequence number */
+	/*
+	 * We will send an event, so request a new sequence number.
+	 * Also, record the number for next uevent's PREV_SEQNUM.
+	 */
 	retval = add_uevent_var(env, "SEQNUM=%llu", (unsigned long long)++uevent_seqnum);
 	if (retval) {
 		mutex_unlock(&uevent_sock_mutex);
 		goto exit;
 	}
+	retval = add_uevent_var(env, "PREV_SEQNUM=%llu",
+				(unsigned long long)kobj->last_uevent_seqnum);
+	kobj->last_uevent_seqnum = uevent_seqnum;
+	if (retval) {
+		mutex_unlock(&uevent_sock_mutex);
+		goto exit;
+	}
 
 #if defined(CONFIG_NET)
 	/* send netlink message */
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] kobject: record and send PREV_SEQNUM with uevents
  2017-10-16 14:00   ` Peter Rajnoha
@ 2017-10-16 14:49     ` Greg KH
  2017-10-17 10:04       ` Peter Rajnoha
  0 siblings, 1 reply; 8+ messages in thread
From: Greg KH @ 2017-10-16 14:49 UTC (permalink / raw)
  To: Peter Rajnoha; +Cc: linux-kernel

On Mon, Oct 16, 2017 at 04:00:26PM +0200, Peter Rajnoha wrote:
> On 10/16/2017 03:55 PM, Greg KH wrote:
> > On Mon, Oct 16, 2017 at 02:35:44PM +0200, Peter Rajnoha wrote:
> >> Record last uevent seqnum that was used with kobject's uevent and send
> >> it with next uevent as PREV_SEQNUM=<num> variable.
> >>
> >> This enhances the way we are able to track and handle uevents in userspace
> >> if we are trying to catch up with all the uevents that were generated
> >> while we were not listening or the uevents which were not delivered.
> >> This may happen if the userspace listener is not running yet when the
> >> uevent is triggered or there's a period of time when it's not listening
> >> (e.g. because the userspace listener is being restarted while a new
> >> uevent fires).
> >>
> >> A userspace listener can compare the last uevent seqnum it knows about
> >> with the last uevent seqnum the kernel reports through uevents delivered
> >> to userspace, both genuine and synthetic ones (the ones generated by
> >> writing uevent action name to /sys/.../uevent file). Then, if needed,
> >> userspace may execute appropriate actions based on that. We don't need
> >> to reevaluate and recheck all items, instead, we can concentrate only
> >> on items for which we really and actually need to reflect changed state
> >> in kernel while the userspace listener was not listening for uevents.
> >>
> >> Signed-off-by: Peter Rajnoha <prajnoha@redhat.com>
> >> ---
> >>  include/linux/kobject.h |  1 +
> >>  lib/kobject.c           |  1 +
> >>  lib/kobject_uevent.c    | 15 +++++++++++++--
> >>  3 files changed, 15 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/include/linux/kobject.h b/include/linux/kobject.h
> >> index e0a6205caa71..ee6db28b6186 100644
> >> --- a/include/linux/kobject.h
> >> +++ b/include/linux/kobject.h
> >> @@ -73,6 +73,7 @@ struct kobject {
> >>  #ifdef CONFIG_DEBUG_KOBJECT_RELEASE
> >>  	struct delayed_work	release;
> >>  #endif
> >> +	u64			last_uevent_seqnum;
> >>  	unsigned int state_initialized:1;
> >>  	unsigned int state_in_sysfs:1;
> >>  	unsigned int state_add_uevent_sent:1;
> >> diff --git a/lib/kobject.c b/lib/kobject.c
> >> index 763d70a18941..2cc809f10ae7 100644
> >> --- a/lib/kobject.c
> >> +++ b/lib/kobject.c
> >> @@ -190,6 +190,7 @@ static void kobject_init_internal(struct kobject *kobj)
> >>  		return;
> >>  	kref_init(&kobj->kref);
> >>  	INIT_LIST_HEAD(&kobj->entry);
> >> +	kobj->last_uevent_seqnum = 0;
> >>  	kobj->state_in_sysfs = 0;
> >>  	kobj->state_add_uevent_sent = 0;
> >>  	kobj->state_remove_uevent_sent = 0;
> >> diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
> >> index f237a09a5862..f1cbed06a395 100644
> >> --- a/lib/kobject_uevent.c
> >> +++ b/lib/kobject_uevent.c
> >> @@ -454,8 +454,19 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
> >>  	}
> >>  
> >>  	mutex_lock(&uevent_sock_mutex);
> >> -	/* we will send an event, so request a new sequence number */
> >> -	retval = add_uevent_var(env, "SEQNUM=%llu", (unsigned long long)++uevent_seqnum);
> >> +	/*
> >> +	 * We will send an event, so request a new sequence number.
> >> +	 * Also, record the number for next uevent's PREV_SEQNUM.
> >> +	 */
> >> +	kobj->last_uevent_seqnum = ++uevent_seqnum;
> >> +	retval = add_uevent_var(env, "SEQNUM=%llu",
> >> +				(unsigned long long)uevent_seqnum);
> >> +	if (retval) {
> >> +		mutext_unlock(&uevent_sock_mutex);
> >> +		goto exit;
> >> +	}
> >> +	retval = add_uevent_var(env, "PREV_SEQNUM=%llu",
> >> +				(unsigned long long)kobj->last_uevent_seqnum);
> > 
> > I'm confused, why are we storing "last_uevent_seqnum" here at all, if it
> > is only just the current one-1?  What else do you do with that value?
> I'm sorry, I've sent incorrect patch by mistake. This is the correct one:

Ok, this is a bit better (but it's not in a format I could apply it in,
if I wanted to...)

> From 91fa0d0f5c52b959a5ca4cc79d1a4f5970db75e2 Mon Sep 17 00:00:00 2001
> From: Peter Rajnoha <prajnoha@redhat.com>
> Date: Thu, 12 Oct 2017 11:33:48 +0200
> Subject: [PATCH] kobject: record and send PREV_SEQNUM with uevents
> 
> Record last uevent seqnum that was used with kobject's uevent and send
> it with next uevent as PREV_SEQNUM=<num> variable.
> 
> This enhances the way we are able to track and handle uevents in userspace
> if we are trying to catch up with all the uevents that were generated
> while we were not listening or the uevents which were not delivered.
> This may happen if the userspace listener is not running yet when the
> uevent is triggered or there's a period of time when it's not listening
> (e.g. because the userspace listener is being restarted while a new
> uevent fires).
> 
> A userspace listener can compare the last uevent seqnum it knows about
> with the last uevent seqnum the kernel reports through uevents delivered
> to userspace, both genuine and synthetic ones (the ones generated by
> writing uevent action name to /sys/.../uevent file). Then, if needed,
> userspace may execute appropriate actions based on that. We don't need
> to reevaluate and recheck all items, instead, we can concentrate only
> on items for which we really and actually need to reflect changed state
> in kernel while the userspace listener was not listening for uevents.
> 
> Signed-off-by: Peter Rajnoha <prajnoha@redhat.com>

I don't really understand what userspace can do with this.  Why does it
matter what the sequence number was for a specific kobject?  What can
happen because of this?

Do you have any example libudev code changes to show this in action so I
can maybe understand the need?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] kobject: record and send PREV_SEQNUM with uevents
  2017-10-16 14:49     ` Greg KH
@ 2017-10-17 10:04       ` Peter Rajnoha
  2017-10-17 13:26         ` Greg KH
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Rajnoha @ 2017-10-17 10:04 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-kernel

On 10/16/2017 04:49 PM, Greg KH wrote:
> On Mon, Oct 16, 2017 at 04:00:26PM +0200, Peter Rajnoha wrote:
>> On 10/16/2017 03:55 PM, Greg KH wrote:
>>> On Mon, Oct 16, 2017 at 02:35:44PM +0200, Peter Rajnoha wrote:
>>>> Record last uevent seqnum that was used with kobject's uevent and send
>>>> it with next uevent as PREV_SEQNUM=<num> variable.
>>>>
>>>> This enhances the way we are able to track and handle uevents in userspace
>>>> if we are trying to catch up with all the uevents that were generated
>>>> while we were not listening or the uevents which were not delivered.
>>>> This may happen if the userspace listener is not running yet when the
>>>> uevent is triggered or there's a period of time when it's not listening
>>>> (e.g. because the userspace listener is being restarted while a new
>>>> uevent fires).
>>>>
>>>> A userspace listener can compare the last uevent seqnum it knows about
>>>> with the last uevent seqnum the kernel reports through uevents delivered
>>>> to userspace, both genuine and synthetic ones (the ones generated by
>>>> writing uevent action name to /sys/.../uevent file). Then, if needed,
>>>> userspace may execute appropriate actions based on that. We don't need
>>>> to reevaluate and recheck all items, instead, we can concentrate only
>>>> on items for which we really and actually need to reflect changed state
>>>> in kernel while the userspace listener was not listening for uevents.
>>>>
>>>> Signed-off-by: Peter Rajnoha <prajnoha@redhat.com>
>>>> ---
>>>>  include/linux/kobject.h |  1 +
>>>>  lib/kobject.c           |  1 +
>>>>  lib/kobject_uevent.c    | 15 +++++++++++++--
>>>>  3 files changed, 15 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/include/linux/kobject.h b/include/linux/kobject.h
>>>> index e0a6205caa71..ee6db28b6186 100644
>>>> --- a/include/linux/kobject.h
>>>> +++ b/include/linux/kobject.h
>>>> @@ -73,6 +73,7 @@ struct kobject {
>>>>  #ifdef CONFIG_DEBUG_KOBJECT_RELEASE
>>>>  	struct delayed_work	release;
>>>>  #endif
>>>> +	u64			last_uevent_seqnum;
>>>>  	unsigned int state_initialized:1;
>>>>  	unsigned int state_in_sysfs:1;
>>>>  	unsigned int state_add_uevent_sent:1;
>>>> diff --git a/lib/kobject.c b/lib/kobject.c
>>>> index 763d70a18941..2cc809f10ae7 100644
>>>> --- a/lib/kobject.c
>>>> +++ b/lib/kobject.c
>>>> @@ -190,6 +190,7 @@ static void kobject_init_internal(struct kobject *kobj)
>>>>  		return;
>>>>  	kref_init(&kobj->kref);
>>>>  	INIT_LIST_HEAD(&kobj->entry);
>>>> +	kobj->last_uevent_seqnum = 0;
>>>>  	kobj->state_in_sysfs = 0;
>>>>  	kobj->state_add_uevent_sent = 0;
>>>>  	kobj->state_remove_uevent_sent = 0;
>>>> diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
>>>> index f237a09a5862..f1cbed06a395 100644
>>>> --- a/lib/kobject_uevent.c
>>>> +++ b/lib/kobject_uevent.c
>>>> @@ -454,8 +454,19 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
>>>>  	}
>>>>  
>>>>  	mutex_lock(&uevent_sock_mutex);
>>>> -	/* we will send an event, so request a new sequence number */
>>>> -	retval = add_uevent_var(env, "SEQNUM=%llu", (unsigned long long)++uevent_seqnum);
>>>> +	/*
>>>> +	 * We will send an event, so request a new sequence number.
>>>> +	 * Also, record the number for next uevent's PREV_SEQNUM.
>>>> +	 */
>>>> +	kobj->last_uevent_seqnum = ++uevent_seqnum;
>>>> +	retval = add_uevent_var(env, "SEQNUM=%llu",
>>>> +				(unsigned long long)uevent_seqnum);
>>>> +	if (retval) {
>>>> +		mutext_unlock(&uevent_sock_mutex);
>>>> +		goto exit;
>>>> +	}
>>>> +	retval = add_uevent_var(env, "PREV_SEQNUM=%llu",
>>>> +				(unsigned long long)kobj->last_uevent_seqnum);
>>>
>>> I'm confused, why are we storing "last_uevent_seqnum" here at all, if it
>>> is only just the current one-1?  What else do you do with that value?
>> I'm sorry, I've sent incorrect patch by mistake. This is the correct one:
> 
> Ok, this is a bit better (but it's not in a format I could apply it in,
> if I wanted to...)
> 
>> From 91fa0d0f5c52b959a5ca4cc79d1a4f5970db75e2 Mon Sep 17 00:00:00 2001
>> From: Peter Rajnoha <prajnoha@redhat.com>
>> Date: Thu, 12 Oct 2017 11:33:48 +0200
>> Subject: [PATCH] kobject: record and send PREV_SEQNUM with uevents
>>
>> Record last uevent seqnum that was used with kobject's uevent and send
>> it with next uevent as PREV_SEQNUM=<num> variable.
>>
>> This enhances the way we are able to track and handle uevents in userspace
>> if we are trying to catch up with all the uevents that were generated
>> while we were not listening or the uevents which were not delivered.
>> This may happen if the userspace listener is not running yet when the
>> uevent is triggered or there's a period of time when it's not listening
>> (e.g. because the userspace listener is being restarted while a new
>> uevent fires).
>>
>> A userspace listener can compare the last uevent seqnum it knows about
>> with the last uevent seqnum the kernel reports through uevents delivered
>> to userspace, both genuine and synthetic ones (the ones generated by
>> writing uevent action name to /sys/.../uevent file). Then, if needed,
>> userspace may execute appropriate actions based on that. We don't need
>> to reevaluate and recheck all items, instead, we can concentrate only
>> on items for which we really and actually need to reflect changed state
>> in kernel while the userspace listener was not listening for uevents.
>>
>> Signed-off-by: Peter Rajnoha <prajnoha@redhat.com>
> 
> I don't really understand what userspace can do with this.  Why does it
> matter what the sequence number was for a specific kobject?  What can
> happen because of this?
> 
> Do you have any example libudev code changes to show this in action so I
> can maybe understand the need?
> 

Not a libudev code change for this, at least I don't have this in plan
at this moment.

It's more related to the way the udev rules are processed and, in
general, the way any other uevent listener can react to uevents. So the
change is in how "end consumer" of the uevent does its processing.

At the moment, the sequence number is used globally, we don't have
per-kobject sequence numbers. So we can't use current SEQNUM that is
attached to each uevent to tell whether we have missed previous
uevent(s) for concrete device or not, we just know that some uevents for
some devices were not processed. Therefore, when we're triggering
uevents from userspace (udevadm trigger), we need to reevaluate all the
rules for all devices, no matter if it's really needed or not.

For example, when we're switching from initramfs to root fs, we stop the
first udevd instance which is in initramfs, we keep the database (at the
moment, only for selected devices where we know that it's safe to do so
- same rules used in initramfs as in root fs). Then we switch over to
root fs and start the udevd again and we trigger uevents to fully
refresh the state, taking also the existing database into account.

Now, it would help if we knew whether there were any uevents issued
after we stopped udevd in initramfs and before we started a new one from
root fs. If there are no uevents in between, we don't need to reevaluate
all the rules, we can just keep the existing database content. This can
become prominent if we have more devices in the system. It helps to
minimize the number of execution during boot, the overall impact of
triggering uevents and related processing, hence also making it more
straightforward to watch the state of the device and possibly debug
related problems.

The same principle applies to any uevent listener in userspace, not just
udevd:

  - recording last seen SEQNUM

  - when new instance of the listener is running

    - trigger synthetic uevents to refresh state

    - comparing the PREV_SEQNUM from uevent generated with stored SEQNUM:

      - If they differ, refresh the state fully, considering the state
we know of as obsolete as there were already other uevents before we
haven't processed. This may include deeper scans which may take longer,
executing more helper tools to get the state of the device.

      - If the are same, reuse the state we already know about is
up-to-date (there were no uevents we haven't seen), don't reevaluate
everything.

With the new synthetic uevent interface (the extention to
/sys/.../uevent) that accepts arguments (UUID and key=value pairs), we
can make it possible to specify the nature of the triggered uevent, we
can mark it through the key=value pairs which the uevent listener in
userspace can check for, so if we can also:

   - either refresh only the state for devices where we missed uevents
(that could be used during bootup, only to refresh the state for devices
where we have no current state yet), comparing current PREV_SEQNUM and
stored last SEQNUM in userspace

   - or to force full refresh if that's needed for a reason.

Also, some devices may have more complex activation scheme than usual
"device addition" to the system. There can be a sequence of uevents
which are expected for a successful change of device state, so we have a
state machine that tracks uevents (as a confirmation from kernel for
userspace that we can hook on in udev rules and uevent listeners). And
here, it's very helpful to know that we have missed uevent(s) before or
not, either they were not delivered or the uevent listener itself has
been restarted and during this period, it may have missed uevents.

In summary, it's adding more robustness to the uevent interface where we
can directly and quickly detect we missed the uevent or not for a
specific device and hence we can optimize udev rules/uevent listeners
based on that so we don't dully refresh all the state and reexecute all
rules and code for any triggered uevents. It's really not needed all the
time.

-- 
Peter

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] kobject: record and send PREV_SEQNUM with uevents
  2017-10-17 10:04       ` Peter Rajnoha
@ 2017-10-17 13:26         ` Greg KH
  2017-10-18  7:21           ` Peter Rajnoha
  0 siblings, 1 reply; 8+ messages in thread
From: Greg KH @ 2017-10-17 13:26 UTC (permalink / raw)
  To: Peter Rajnoha; +Cc: linux-kernel

On Tue, Oct 17, 2017 at 12:04:41PM +0200, Peter Rajnoha wrote:
> On 10/16/2017 04:49 PM, Greg KH wrote:
> > On Mon, Oct 16, 2017 at 04:00:26PM +0200, Peter Rajnoha wrote:
> >> On 10/16/2017 03:55 PM, Greg KH wrote:
> >>> On Mon, Oct 16, 2017 at 02:35:44PM +0200, Peter Rajnoha wrote:
> >>>> Record last uevent seqnum that was used with kobject's uevent and send
> >>>> it with next uevent as PREV_SEQNUM=<num> variable.
> >>>>
> >>>> This enhances the way we are able to track and handle uevents in userspace
> >>>> if we are trying to catch up with all the uevents that were generated
> >>>> while we were not listening or the uevents which were not delivered.
> >>>> This may happen if the userspace listener is not running yet when the
> >>>> uevent is triggered or there's a period of time when it's not listening
> >>>> (e.g. because the userspace listener is being restarted while a new
> >>>> uevent fires).
> >>>>
> >>>> A userspace listener can compare the last uevent seqnum it knows about
> >>>> with the last uevent seqnum the kernel reports through uevents delivered
> >>>> to userspace, both genuine and synthetic ones (the ones generated by
> >>>> writing uevent action name to /sys/.../uevent file). Then, if needed,
> >>>> userspace may execute appropriate actions based on that. We don't need
> >>>> to reevaluate and recheck all items, instead, we can concentrate only
> >>>> on items for which we really and actually need to reflect changed state
> >>>> in kernel while the userspace listener was not listening for uevents.
> >>>>
> >>>> Signed-off-by: Peter Rajnoha <prajnoha@redhat.com>
> >>>> ---
> >>>>  include/linux/kobject.h |  1 +
> >>>>  lib/kobject.c           |  1 +
> >>>>  lib/kobject_uevent.c    | 15 +++++++++++++--
> >>>>  3 files changed, 15 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/include/linux/kobject.h b/include/linux/kobject.h
> >>>> index e0a6205caa71..ee6db28b6186 100644
> >>>> --- a/include/linux/kobject.h
> >>>> +++ b/include/linux/kobject.h
> >>>> @@ -73,6 +73,7 @@ struct kobject {
> >>>>  #ifdef CONFIG_DEBUG_KOBJECT_RELEASE
> >>>>  	struct delayed_work	release;
> >>>>  #endif
> >>>> +	u64			last_uevent_seqnum;
> >>>>  	unsigned int state_initialized:1;
> >>>>  	unsigned int state_in_sysfs:1;
> >>>>  	unsigned int state_add_uevent_sent:1;
> >>>> diff --git a/lib/kobject.c b/lib/kobject.c
> >>>> index 763d70a18941..2cc809f10ae7 100644
> >>>> --- a/lib/kobject.c
> >>>> +++ b/lib/kobject.c
> >>>> @@ -190,6 +190,7 @@ static void kobject_init_internal(struct kobject *kobj)
> >>>>  		return;
> >>>>  	kref_init(&kobj->kref);
> >>>>  	INIT_LIST_HEAD(&kobj->entry);
> >>>> +	kobj->last_uevent_seqnum = 0;
> >>>>  	kobj->state_in_sysfs = 0;
> >>>>  	kobj->state_add_uevent_sent = 0;
> >>>>  	kobj->state_remove_uevent_sent = 0;
> >>>> diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
> >>>> index f237a09a5862..f1cbed06a395 100644
> >>>> --- a/lib/kobject_uevent.c
> >>>> +++ b/lib/kobject_uevent.c
> >>>> @@ -454,8 +454,19 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
> >>>>  	}
> >>>>  
> >>>>  	mutex_lock(&uevent_sock_mutex);
> >>>> -	/* we will send an event, so request a new sequence number */
> >>>> -	retval = add_uevent_var(env, "SEQNUM=%llu", (unsigned long long)++uevent_seqnum);
> >>>> +	/*
> >>>> +	 * We will send an event, so request a new sequence number.
> >>>> +	 * Also, record the number for next uevent's PREV_SEQNUM.
> >>>> +	 */
> >>>> +	kobj->last_uevent_seqnum = ++uevent_seqnum;
> >>>> +	retval = add_uevent_var(env, "SEQNUM=%llu",
> >>>> +				(unsigned long long)uevent_seqnum);
> >>>> +	if (retval) {
> >>>> +		mutext_unlock(&uevent_sock_mutex);
> >>>> +		goto exit;
> >>>> +	}
> >>>> +	retval = add_uevent_var(env, "PREV_SEQNUM=%llu",
> >>>> +				(unsigned long long)kobj->last_uevent_seqnum);
> >>>
> >>> I'm confused, why are we storing "last_uevent_seqnum" here at all, if it
> >>> is only just the current one-1?  What else do you do with that value?
> >> I'm sorry, I've sent incorrect patch by mistake. This is the correct one:
> > 
> > Ok, this is a bit better (but it's not in a format I could apply it in,
> > if I wanted to...)
> > 
> >> From 91fa0d0f5c52b959a5ca4cc79d1a4f5970db75e2 Mon Sep 17 00:00:00 2001
> >> From: Peter Rajnoha <prajnoha@redhat.com>
> >> Date: Thu, 12 Oct 2017 11:33:48 +0200
> >> Subject: [PATCH] kobject: record and send PREV_SEQNUM with uevents
> >>
> >> Record last uevent seqnum that was used with kobject's uevent and send
> >> it with next uevent as PREV_SEQNUM=<num> variable.
> >>
> >> This enhances the way we are able to track and handle uevents in userspace
> >> if we are trying to catch up with all the uevents that were generated
> >> while we were not listening or the uevents which were not delivered.
> >> This may happen if the userspace listener is not running yet when the
> >> uevent is triggered or there's a period of time when it's not listening
> >> (e.g. because the userspace listener is being restarted while a new
> >> uevent fires).
> >>
> >> A userspace listener can compare the last uevent seqnum it knows about
> >> with the last uevent seqnum the kernel reports through uevents delivered
> >> to userspace, both genuine and synthetic ones (the ones generated by
> >> writing uevent action name to /sys/.../uevent file). Then, if needed,
> >> userspace may execute appropriate actions based on that. We don't need
> >> to reevaluate and recheck all items, instead, we can concentrate only
> >> on items for which we really and actually need to reflect changed state
> >> in kernel while the userspace listener was not listening for uevents.
> >>
> >> Signed-off-by: Peter Rajnoha <prajnoha@redhat.com>
> > 
> > I don't really understand what userspace can do with this.  Why does it
> > matter what the sequence number was for a specific kobject?  What can
> > happen because of this?
> > 
> > Do you have any example libudev code changes to show this in action so I
> > can maybe understand the need?
> > 
> 
> Not a libudev code change for this, at least I don't have this in plan
> at this moment.
> 
> It's more related to the way the udev rules are processed and, in
> general, the way any other uevent listener can react to uevents. So the
> change is in how "end consumer" of the uevent does its processing.
> 
> At the moment, the sequence number is used globally, we don't have
> per-kobject sequence numbers. So we can't use current SEQNUM that is
> attached to each uevent to tell whether we have missed previous
> uevent(s) for concrete device or not, we just know that some uevents for
> some devices were not processed. Therefore, when we're triggering
> uevents from userspace (udevadm trigger), we need to reevaluate all the
> rules for all devices, no matter if it's really needed or not.
> 
> For example, when we're switching from initramfs to root fs, we stop the
> first udevd instance which is in initramfs, we keep the database (at the
> moment, only for selected devices where we know that it's safe to do so
> - same rules used in initramfs as in root fs). Then we switch over to
> root fs and start the udevd again and we trigger uevents to fully
> refresh the state, taking also the existing database into account.
> 
> Now, it would help if we knew whether there were any uevents issued
> after we stopped udevd in initramfs and before we started a new one from
> root fs. If there are no uevents in between, we don't need to reevaluate
> all the rules, we can just keep the existing database content. This can
> become prominent if we have more devices in the system. It helps to
> minimize the number of execution during boot, the overall impact of
> triggering uevents and related processing, hence also making it more
> straightforward to watch the state of the device and possibly debug
> related problems.
> 
> The same principle applies to any uevent listener in userspace, not just
> udevd:
> 
>   - recording last seen SEQNUM
> 
>   - when new instance of the listener is running
> 
>     - trigger synthetic uevents to refresh state
> 
>     - comparing the PREV_SEQNUM from uevent generated with stored SEQNUM:

Why can't you just compare SEQNUM with the "last seen" SEQNUM, they
should just be 1 apart, right?  Isn't that what is done today?  How will
this differ from looking at PREV_SEQNUM?

>       - If they differ, refresh the state fully, considering the state
> we know of as obsolete as there were already other uevents before we
> haven't processed. This may include deeper scans which may take longer,
> executing more helper tools to get the state of the device.

But that is what "last seen" also shows.

>       - If the are same, reuse the state we already know about is
> up-to-date (there were no uevents we haven't seen), don't reevaluate
> everything.

Testing for +1 should be the same.

> With the new synthetic uevent interface (the extention to
> /sys/.../uevent) that accepts arguments (UUID and key=value pairs), we
> can make it possible to specify the nature of the triggered uevent, we
> can mark it through the key=value pairs which the uevent listener in
> userspace can check for, so if we can also:

but that doesn't matter for PREV_SEQNUM.

>    - either refresh only the state for devices where we missed uevents
> (that could be used during bootup, only to refresh the state for devices
> where we have no current state yet), comparing current PREV_SEQNUM and
> stored last SEQNUM in userspace
> 
>    - or to force full refresh if that's needed for a reason.
> 
> Also, some devices may have more complex activation scheme than usual
> "device addition" to the system. There can be a sequence of uevents
> which are expected for a successful change of device state, so we have a
> state machine that tracks uevents (as a confirmation from kernel for
> userspace that we can hook on in udev rules and uevent listeners). And
> here, it's very helpful to know that we have missed uevent(s) before or
> not, either they were not delivered or the uevent listener itself has
> been restarted and during this period, it may have missed uevents.
> 
> In summary, it's adding more robustness to the uevent interface where we
> can directly and quickly detect we missed the uevent or not for a
> specific device and hence we can optimize udev rules/uevent listeners
> based on that so we don't dully refresh all the state and reexecute all
> rules and code for any triggered uevents. It's really not needed all the
> time.

I'd like to see real, working code that uses this new interface, showing
how it is not possible to do what is needed without the PREV_SEQNUM,
before considering adding this.  Otherwise it's just stuff we have to
drag around for forever.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] kobject: record and send PREV_SEQNUM with uevents
  2017-10-17 13:26         ` Greg KH
@ 2017-10-18  7:21           ` Peter Rajnoha
  2017-10-18  9:42             ` Peter Rajnoha
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Rajnoha @ 2017-10-18  7:21 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-kernel

On 10/17/2017 03:26 PM, Greg KH wrote:
> On Tue, Oct 17, 2017 at 12:04:41PM +0200, Peter Rajnoha wrote:
>> On 10/16/2017 04:49 PM, Greg KH wrote:
>>> On Mon, Oct 16, 2017 at 04:00:26PM +0200, Peter Rajnoha wrote:
>>>> On 10/16/2017 03:55 PM, Greg KH wrote:
>>>>> On Mon, Oct 16, 2017 at 02:35:44PM +0200, Peter Rajnoha wrote:
>>>>>> Record last uevent seqnum that was used with kobject's uevent and send
>>>>>> it with next uevent as PREV_SEQNUM=<num> variable.
>>>>>>
>>>>>> This enhances the way we are able to track and handle uevents in userspace
>>>>>> if we are trying to catch up with all the uevents that were generated
>>>>>> while we were not listening or the uevents which were not delivered.
>>>>>> This may happen if the userspace listener is not running yet when the
>>>>>> uevent is triggered or there's a period of time when it's not listening
>>>>>> (e.g. because the userspace listener is being restarted while a new
>>>>>> uevent fires).
>>>>>>
>>>>>> A userspace listener can compare the last uevent seqnum it knows about
>>>>>> with the last uevent seqnum the kernel reports through uevents delivered
>>>>>> to userspace, both genuine and synthetic ones (the ones generated by
>>>>>> writing uevent action name to /sys/.../uevent file). Then, if needed,
>>>>>> userspace may execute appropriate actions based on that. We don't need
>>>>>> to reevaluate and recheck all items, instead, we can concentrate only
>>>>>> on items for which we really and actually need to reflect changed state
>>>>>> in kernel while the userspace listener was not listening for uevents.
>>>>>>
>>>>>> Signed-off-by: Peter Rajnoha <prajnoha@redhat.com>
>>>>>> ---
>>>>>>  include/linux/kobject.h |  1 +
>>>>>>  lib/kobject.c           |  1 +
>>>>>>  lib/kobject_uevent.c    | 15 +++++++++++++--
>>>>>>  3 files changed, 15 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/include/linux/kobject.h b/include/linux/kobject.h
>>>>>> index e0a6205caa71..ee6db28b6186 100644
>>>>>> --- a/include/linux/kobject.h
>>>>>> +++ b/include/linux/kobject.h
>>>>>> @@ -73,6 +73,7 @@ struct kobject {
>>>>>>  #ifdef CONFIG_DEBUG_KOBJECT_RELEASE
>>>>>>  	struct delayed_work	release;
>>>>>>  #endif
>>>>>> +	u64			last_uevent_seqnum;
>>>>>>  	unsigned int state_initialized:1;
>>>>>>  	unsigned int state_in_sysfs:1;
>>>>>>  	unsigned int state_add_uevent_sent:1;
>>>>>> diff --git a/lib/kobject.c b/lib/kobject.c
>>>>>> index 763d70a18941..2cc809f10ae7 100644
>>>>>> --- a/lib/kobject.c
>>>>>> +++ b/lib/kobject.c
>>>>>> @@ -190,6 +190,7 @@ static void kobject_init_internal(struct kobject *kobj)
>>>>>>  		return;
>>>>>>  	kref_init(&kobj->kref);
>>>>>>  	INIT_LIST_HEAD(&kobj->entry);
>>>>>> +	kobj->last_uevent_seqnum = 0;
>>>>>>  	kobj->state_in_sysfs = 0;
>>>>>>  	kobj->state_add_uevent_sent = 0;
>>>>>>  	kobj->state_remove_uevent_sent = 0;
>>>>>> diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
>>>>>> index f237a09a5862..f1cbed06a395 100644
>>>>>> --- a/lib/kobject_uevent.c
>>>>>> +++ b/lib/kobject_uevent.c
>>>>>> @@ -454,8 +454,19 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
>>>>>>  	}
>>>>>>  
>>>>>>  	mutex_lock(&uevent_sock_mutex);
>>>>>> -	/* we will send an event, so request a new sequence number */
>>>>>> -	retval = add_uevent_var(env, "SEQNUM=%llu", (unsigned long long)++uevent_seqnum);
>>>>>> +	/*
>>>>>> +	 * We will send an event, so request a new sequence number.
>>>>>> +	 * Also, record the number for next uevent's PREV_SEQNUM.
>>>>>> +	 */
>>>>>> +	kobj->last_uevent_seqnum = ++uevent_seqnum;
>>>>>> +	retval = add_uevent_var(env, "SEQNUM=%llu",
>>>>>> +				(unsigned long long)uevent_seqnum);
>>>>>> +	if (retval) {
>>>>>> +		mutext_unlock(&uevent_sock_mutex);
>>>>>> +		goto exit;
>>>>>> +	}
>>>>>> +	retval = add_uevent_var(env, "PREV_SEQNUM=%llu",
>>>>>> +				(unsigned long long)kobj->last_uevent_seqnum);
>>>>>
>>>>> I'm confused, why are we storing "last_uevent_seqnum" here at all, if it
>>>>> is only just the current one-1?  What else do you do with that value?
>>>> I'm sorry, I've sent incorrect patch by mistake. This is the correct one:
>>>
>>> Ok, this is a bit better (but it's not in a format I could apply it in,
>>> if I wanted to...)
>>>
>>>> From 91fa0d0f5c52b959a5ca4cc79d1a4f5970db75e2 Mon Sep 17 00:00:00 2001
>>>> From: Peter Rajnoha <prajnoha@redhat.com>
>>>> Date: Thu, 12 Oct 2017 11:33:48 +0200
>>>> Subject: [PATCH] kobject: record and send PREV_SEQNUM with uevents
>>>>
>>>> Record last uevent seqnum that was used with kobject's uevent and send
>>>> it with next uevent as PREV_SEQNUM=<num> variable.
>>>>
>>>> This enhances the way we are able to track and handle uevents in userspace
>>>> if we are trying to catch up with all the uevents that were generated
>>>> while we were not listening or the uevents which were not delivered.
>>>> This may happen if the userspace listener is not running yet when the
>>>> uevent is triggered or there's a period of time when it's not listening
>>>> (e.g. because the userspace listener is being restarted while a new
>>>> uevent fires).
>>>>
>>>> A userspace listener can compare the last uevent seqnum it knows about
>>>> with the last uevent seqnum the kernel reports through uevents delivered
>>>> to userspace, both genuine and synthetic ones (the ones generated by
>>>> writing uevent action name to /sys/.../uevent file). Then, if needed,
>>>> userspace may execute appropriate actions based on that. We don't need
>>>> to reevaluate and recheck all items, instead, we can concentrate only
>>>> on items for which we really and actually need to reflect changed state
>>>> in kernel while the userspace listener was not listening for uevents.
>>>>
>>>> Signed-off-by: Peter Rajnoha <prajnoha@redhat.com>
>>>
>>> I don't really understand what userspace can do with this.  Why does it
>>> matter what the sequence number was for a specific kobject?  What can
>>> happen because of this?
>>>
>>> Do you have any example libudev code changes to show this in action so I
>>> can maybe understand the need?
>>>
>>
>> Not a libudev code change for this, at least I don't have this in plan
>> at this moment.
>>
>> It's more related to the way the udev rules are processed and, in
>> general, the way any other uevent listener can react to uevents. So the
>> change is in how "end consumer" of the uevent does its processing.
>>
>> At the moment, the sequence number is used globally, we don't have
>> per-kobject sequence numbers. So we can't use current SEQNUM that is
>> attached to each uevent to tell whether we have missed previous
>> uevent(s) for concrete device or not, we just know that some uevents for
>> some devices were not processed. Therefore, when we're triggering
>> uevents from userspace (udevadm trigger), we need to reevaluate all the
>> rules for all devices, no matter if it's really needed or not.
>>
>> For example, when we're switching from initramfs to root fs, we stop the
>> first udevd instance which is in initramfs, we keep the database (at the
>> moment, only for selected devices where we know that it's safe to do so
>> - same rules used in initramfs as in root fs). Then we switch over to
>> root fs and start the udevd again and we trigger uevents to fully
>> refresh the state, taking also the existing database into account.
>>
>> Now, it would help if we knew whether there were any uevents issued
>> after we stopped udevd in initramfs and before we started a new one from
>> root fs. If there are no uevents in between, we don't need to reevaluate
>> all the rules, we can just keep the existing database content. This can
>> become prominent if we have more devices in the system. It helps to
>> minimize the number of execution during boot, the overall impact of
>> triggering uevents and related processing, hence also making it more
>> straightforward to watch the state of the device and possibly debug
>> related problems.
>>
>> The same principle applies to any uevent listener in userspace, not just
>> udevd:
>>
>>   - recording last seen SEQNUM
>>
>>   - when new instance of the listener is running
>>
>>     - trigger synthetic uevents to refresh state
>>
>>     - comparing the PREV_SEQNUM from uevent generated with stored SEQNUM:
> 
> Why can't you just compare SEQNUM with the "last seen" SEQNUM, they
> should just be 1 apart, right?  Isn't that what is done today?  How will
> this differ from looking at PREV_SEQNUM?

The SEQNUM is global, not per device (per kobject). It's 1 apart if you
have only 1 device, but that doesn't apply if you have more devices with
possible uevents interleaved.

For example (just provoking uevents here for clear example):

  echo add > /sys/block/sda/uevent
  echo add > /sys/block/sda/uevent
  echo add > /sys/block/sdb/uevent
  echo add > /sys/block/sda/uevent

Generates these uevents (I've removed uninteresting parts):

  KERNEL[45.739557] add
  ACTION=add
  DEVNAME=/dev/sda
  PREV_SEQNUM=2096
  SEQNUM=2112

  KERNEL[55.131767] add
  ACTION=add
  DEVNAME=/dev/sda
  PREV_SEQNUM=2112
  SEQNUM=2113
  SUBSYSTEM=block

  KERNEL[59.236131] add
  ACTION=add
  DEVNAME=/dev/sdb
  PREV_SEQNUM=2097
  SEQNUM=2114

  KERNEL[61.708551] add
  ACTION=add
  DEVNAME=/dev/sda
  PREV_SEQNUM=2113
  SEQNUM=2115

Here, the PREV_SEQNUM is 1 apart with SEQNUM for the 2nd uevent (sda),
because there was no other uevent (for a different device). But if you
check the 4th uevent (sda) where previous one was generated for another
device (sdb), the PREV_SEQNUM is not 1 apart. So in general, it may be
"n" apart from previous SEQNUM for a *particular device*. Globally, for
all devices, it is 1 apart, yes, but we don't care about that as that is
not quite useful.

So without PREV_SEQNUM, we can detect we have missed uevents, but we
don't know for which devices exactly, so we need to refresh everything,
not just the items we actually need. With PREV_SEQNUM we can identify
exact devices.

> 
>>       - If they differ, refresh the state fully, considering the state
>> we know of as obsolete as there were already other uevents before we
>> haven't processed. This may include deeper scans which may take longer,
>> executing more helper tools to get the state of the device.
> 
> But that is what "last seen" also shows.

Not per device, but only globally. So without PREV_SEQNUM it's either
"refresh all or nothing".

> 
>>       - If the are same, reuse the state we already know about is
>> up-to-date (there were no uevents we haven't seen), don't reevaluate
>> everything.
> 
> Testing for +1 should be the same.
> 
>> With the new synthetic uevent interface (the extention to
>> /sys/.../uevent) that accepts arguments (UUID and key=value pairs), we
>> can make it possible to specify the nature of the triggered uevent, we
>> can mark it through the key=value pairs which the uevent listener in
>> userspace can check for, so if we can also:
> 
> but that doesn't matter for PREV_SEQNUM.

I've mentioned that only for completeness and in advance, for the ones
who would argue that there can be a need to force full refresh for all
devices even though it may not be needed based on PREV_SEQNUM which will
cause selective refresh. So there's still a way to fallback to original
behavior, if that's needed.

> 
>>    - either refresh only the state for devices where we missed uevents
>> (that could be used during bootup, only to refresh the state for devices
>> where we have no current state yet), comparing current PREV_SEQNUM and
>> stored last SEQNUM in userspace
>>
>>    - or to force full refresh if that's needed for a reason.
>>
>> Also, some devices may have more complex activation scheme than usual
>> "device addition" to the system. There can be a sequence of uevents
>> which are expected for a successful change of device state, so we have a
>> state machine that tracks uevents (as a confirmation from kernel for
>> userspace that we can hook on in udev rules and uevent listeners). And
>> here, it's very helpful to know that we have missed uevent(s) before or
>> not, either they were not delivered or the uevent listener itself has
>> been restarted and during this period, it may have missed uevents.
>>
>> In summary, it's adding more robustness to the uevent interface where we
>> can directly and quickly detect we missed the uevent or not for a
>> specific device and hence we can optimize udev rules/uevent listeners
>> based on that so we don't dully refresh all the state and reexecute all
>> rules and code for any triggered uevents. It's really not needed all the
>> time.
> 
> I'd like to see real, working code that uses this new interface, showing
> how it is not possible to do what is needed without the PREV_SEQNUM,
> before considering adding this.  Otherwise it's just stuff we have to
> drag around for forever.
> 

OK... well, it's about changing concrete rules to skip various calls,
scans (e.g. blkid in case of block devices) and handling which is not
necessary if we detect that the uevent is synthetic and it's just used
to catch up with any missed uevents. In that case, we'll reuse existing
records. The same applies to any uevent listener besides udevd and its
records which are managed based on uevents.

-- 
Peter

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] kobject: record and send PREV_SEQNUM with uevents
  2017-10-18  7:21           ` Peter Rajnoha
@ 2017-10-18  9:42             ` Peter Rajnoha
  0 siblings, 0 replies; 8+ messages in thread
From: Peter Rajnoha @ 2017-10-18  9:42 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-kernel

On 10/18/2017 09:21 AM, Peter Rajnoha wrote:
> On 10/17/2017 03:26 PM, Greg KH wrote:
>> On Tue, Oct 17, 2017 at 12:04:41PM +0200, Peter Rajnoha wrote:
>>> On 10/16/2017 04:49 PM, Greg KH wrote:
>>>> On Mon, Oct 16, 2017 at 04:00:26PM +0200, Peter Rajnoha wrote:
>>>>> On 10/16/2017 03:55 PM, Greg KH wrote:
>>>>>> On Mon, Oct 16, 2017 at 02:35:44PM +0200, Peter Rajnoha wrote:
>>>>>>> Record last uevent seqnum that was used with kobject's uevent and send
>>>>>>> it with next uevent as PREV_SEQNUM=<num> variable.
>>>>>>>
>>>>>>> This enhances the way we are able to track and handle uevents in userspace
>>>>>>> if we are trying to catch up with all the uevents that were generated
>>>>>>> while we were not listening or the uevents which were not delivered.
>>>>>>> This may happen if the userspace listener is not running yet when the
>>>>>>> uevent is triggered or there's a period of time when it's not listening
>>>>>>> (e.g. because the userspace listener is being restarted while a new
>>>>>>> uevent fires).
>>>>>>>
>>>>>>> A userspace listener can compare the last uevent seqnum it knows about
>>>>>>> with the last uevent seqnum the kernel reports through uevents delivered
>>>>>>> to userspace, both genuine and synthetic ones (the ones generated by
>>>>>>> writing uevent action name to /sys/.../uevent file). Then, if needed,
>>>>>>> userspace may execute appropriate actions based on that. We don't need
>>>>>>> to reevaluate and recheck all items, instead, we can concentrate only
>>>>>>> on items for which we really and actually need to reflect changed state
>>>>>>> in kernel while the userspace listener was not listening for uevents.
>>>>>>>
>>>>>>> Signed-off-by: Peter Rajnoha <prajnoha@redhat.com>
>>>>>>> ---
>>>>>>>  include/linux/kobject.h |  1 +
>>>>>>>  lib/kobject.c           |  1 +
>>>>>>>  lib/kobject_uevent.c    | 15 +++++++++++++--
>>>>>>>  3 files changed, 15 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/include/linux/kobject.h b/include/linux/kobject.h
>>>>>>> index e0a6205caa71..ee6db28b6186 100644
>>>>>>> --- a/include/linux/kobject.h
>>>>>>> +++ b/include/linux/kobject.h
>>>>>>> @@ -73,6 +73,7 @@ struct kobject {
>>>>>>>  #ifdef CONFIG_DEBUG_KOBJECT_RELEASE
>>>>>>>  	struct delayed_work	release;
>>>>>>>  #endif
>>>>>>> +	u64			last_uevent_seqnum;
>>>>>>>  	unsigned int state_initialized:1;
>>>>>>>  	unsigned int state_in_sysfs:1;
>>>>>>>  	unsigned int state_add_uevent_sent:1;
>>>>>>> diff --git a/lib/kobject.c b/lib/kobject.c
>>>>>>> index 763d70a18941..2cc809f10ae7 100644
>>>>>>> --- a/lib/kobject.c
>>>>>>> +++ b/lib/kobject.c
>>>>>>> @@ -190,6 +190,7 @@ static void kobject_init_internal(struct kobject *kobj)
>>>>>>>  		return;
>>>>>>>  	kref_init(&kobj->kref);
>>>>>>>  	INIT_LIST_HEAD(&kobj->entry);
>>>>>>> +	kobj->last_uevent_seqnum = 0;
>>>>>>>  	kobj->state_in_sysfs = 0;
>>>>>>>  	kobj->state_add_uevent_sent = 0;
>>>>>>>  	kobj->state_remove_uevent_sent = 0;
>>>>>>> diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
>>>>>>> index f237a09a5862..f1cbed06a395 100644
>>>>>>> --- a/lib/kobject_uevent.c
>>>>>>> +++ b/lib/kobject_uevent.c
>>>>>>> @@ -454,8 +454,19 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
>>>>>>>  	}
>>>>>>>  
>>>>>>>  	mutex_lock(&uevent_sock_mutex);
>>>>>>> -	/* we will send an event, so request a new sequence number */
>>>>>>> -	retval = add_uevent_var(env, "SEQNUM=%llu", (unsigned long long)++uevent_seqnum);
>>>>>>> +	/*
>>>>>>> +	 * We will send an event, so request a new sequence number.
>>>>>>> +	 * Also, record the number for next uevent's PREV_SEQNUM.
>>>>>>> +	 */
>>>>>>> +	kobj->last_uevent_seqnum = ++uevent_seqnum;
>>>>>>> +	retval = add_uevent_var(env, "SEQNUM=%llu",
>>>>>>> +				(unsigned long long)uevent_seqnum);
>>>>>>> +	if (retval) {
>>>>>>> +		mutext_unlock(&uevent_sock_mutex);
>>>>>>> +		goto exit;
>>>>>>> +	}
>>>>>>> +	retval = add_uevent_var(env, "PREV_SEQNUM=%llu",
>>>>>>> +				(unsigned long long)kobj->last_uevent_seqnum);
>>>>>>
>>>>>> I'm confused, why are we storing "last_uevent_seqnum" here at all, if it
>>>>>> is only just the current one-1?  What else do you do with that value?
>>>>> I'm sorry, I've sent incorrect patch by mistake. This is the correct one:
>>>>
>>>> Ok, this is a bit better (but it's not in a format I could apply it in,
>>>> if I wanted to...)
>>>>
>>>>> From 91fa0d0f5c52b959a5ca4cc79d1a4f5970db75e2 Mon Sep 17 00:00:00 2001
>>>>> From: Peter Rajnoha <prajnoha@redhat.com>
>>>>> Date: Thu, 12 Oct 2017 11:33:48 +0200
>>>>> Subject: [PATCH] kobject: record and send PREV_SEQNUM with uevents
>>>>>
>>>>> Record last uevent seqnum that was used with kobject's uevent and send
>>>>> it with next uevent as PREV_SEQNUM=<num> variable.
>>>>>
>>>>> This enhances the way we are able to track and handle uevents in userspace
>>>>> if we are trying to catch up with all the uevents that were generated
>>>>> while we were not listening or the uevents which were not delivered.
>>>>> This may happen if the userspace listener is not running yet when the
>>>>> uevent is triggered or there's a period of time when it's not listening
>>>>> (e.g. because the userspace listener is being restarted while a new
>>>>> uevent fires).
>>>>>
>>>>> A userspace listener can compare the last uevent seqnum it knows about
>>>>> with the last uevent seqnum the kernel reports through uevents delivered
>>>>> to userspace, both genuine and synthetic ones (the ones generated by
>>>>> writing uevent action name to /sys/.../uevent file). Then, if needed,
>>>>> userspace may execute appropriate actions based on that. We don't need
>>>>> to reevaluate and recheck all items, instead, we can concentrate only
>>>>> on items for which we really and actually need to reflect changed state
>>>>> in kernel while the userspace listener was not listening for uevents.
>>>>>
>>>>> Signed-off-by: Peter Rajnoha <prajnoha@redhat.com>
>>>>
>>>> I don't really understand what userspace can do with this.  Why does it
>>>> matter what the sequence number was for a specific kobject?  What can
>>>> happen because of this?
>>>>
>>>> Do you have any example libudev code changes to show this in action so I
>>>> can maybe understand the need?
>>>>
>>>
>>> Not a libudev code change for this, at least I don't have this in plan
>>> at this moment.
>>>
>>> It's more related to the way the udev rules are processed and, in
>>> general, the way any other uevent listener can react to uevents. So the
>>> change is in how "end consumer" of the uevent does its processing.
>>>
>>> At the moment, the sequence number is used globally, we don't have
>>> per-kobject sequence numbers. So we can't use current SEQNUM that is
>>> attached to each uevent to tell whether we have missed previous
>>> uevent(s) for concrete device or not, we just know that some uevents for
>>> some devices were not processed. Therefore, when we're triggering
>>> uevents from userspace (udevadm trigger), we need to reevaluate all the
>>> rules for all devices, no matter if it's really needed or not.
>>>
>>> For example, when we're switching from initramfs to root fs, we stop the
>>> first udevd instance which is in initramfs, we keep the database (at the
>>> moment, only for selected devices where we know that it's safe to do so
>>> - same rules used in initramfs as in root fs). Then we switch over to
>>> root fs and start the udevd again and we trigger uevents to fully
>>> refresh the state, taking also the existing database into account.
>>>
>>> Now, it would help if we knew whether there were any uevents issued
>>> after we stopped udevd in initramfs and before we started a new one from
>>> root fs. If there are no uevents in between, we don't need to reevaluate
>>> all the rules, we can just keep the existing database content. This can
>>> become prominent if we have more devices in the system. It helps to
>>> minimize the number of execution during boot, the overall impact of
>>> triggering uevents and related processing, hence also making it more
>>> straightforward to watch the state of the device and possibly debug
>>> related problems.
>>>
>>> The same principle applies to any uevent listener in userspace, not just
>>> udevd:
>>>
>>>   - recording last seen SEQNUM
>>>
>>>   - when new instance of the listener is running
>>>
>>>     - trigger synthetic uevents to refresh state
>>>
>>>     - comparing the PREV_SEQNUM from uevent generated with stored SEQNUM:
>>
>> Why can't you just compare SEQNUM with the "last seen" SEQNUM, they
>> should just be 1 apart, right?  Isn't that what is done today?  How will
>> this differ from looking at PREV_SEQNUM?
> 
> The SEQNUM is global, not per device (per kobject). It's 1 apart if you
> have only 1 device, but that doesn't apply if you have more devices with
> possible uevents interleaved.
> 
> For example (just provoking uevents here for clear example):
> 
>   echo add > /sys/block/sda/uevent
>   echo add > /sys/block/sda/uevent
>   echo add > /sys/block/sdb/uevent
>   echo add > /sys/block/sda/uevent
> 
> Generates these uevents (I've removed uninteresting parts):
> 
>   KERNEL[45.739557] add
>   ACTION=add
>   DEVNAME=/dev/sda
>   PREV_SEQNUM=2096
>   SEQNUM=2112
> 
>   KERNEL[55.131767] add
>   ACTION=add
>   DEVNAME=/dev/sda
>   PREV_SEQNUM=2112
>   SEQNUM=2113
>   SUBSYSTEM=block
> 
>   KERNEL[59.236131] add
>   ACTION=add
>   DEVNAME=/dev/sdb
>   PREV_SEQNUM=2097
>   SEQNUM=2114
> 
>   KERNEL[61.708551] add
>   ACTION=add
>   DEVNAME=/dev/sda
>   PREV_SEQNUM=2113
>   SEQNUM=2115
> 
> Here, the PREV_SEQNUM is 1 apart with SEQNUM for the 2nd uevent (sda),
> because there was no other uevent (for a different device). But if you
> check the 4th uevent (sda) where previous one was generated for another
> device (sdb), the PREV_SEQNUM is not 1 apart. So in general, it may be
> "n" apart from previous SEQNUM for a *particular device*. Globally, for
> all devices, it is 1 apart, yes, but we don't care about that as that is
> not quite useful.
> 
> So without PREV_SEQNUM, we can detect we have missed uevents, but we
> don't know for which devices exactly, so we need to refresh everything,
> not just the items we actually need. With PREV_SEQNUM we can identify
> exact devices.
Also, without PREV_SEQNUM, to tell at least whether I have missed some
uevents, I need to check /sys/kernel/uevent_seqnum and then compare with
what I have in userspace as "last seen seqnum". But that is a race -
after reading the uevent_seqnum file, more uevents may trigger...

I wouldn't need to read the uevent_seqnum file though if I just read the
SEQNUM directly from the uevent. But then, if I detected the SEQNUM is
not good (it's not the "last know seqnum" + 1), I'd need to trigger the
full refresh for all devices while processing that one particular
uevent, and that's becoming messy.

While with the PREV_SEQNUM, we don't need to care, we just trigger the
synthetic uevents whenever we need and we let the check to be done *per
each device* while processing those triggered uevents:

  - if PREV_SEQNUM matches with what we have in records as "last seen"
for that particular device, we skip the uevent (we keep the old records
as they don't need any refresh)

 - if PREV_SEQNUM doesn't match, we refresh ONLY that one device

So it's also cleaner way to deal with refreshes, besides the fact it's
done per-device, not the "all devices or nothing" like it is with having
only the global seqnum view.

-- 
Peter

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-10-18  9:42 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-16 12:35 [PATCH] kobject: record and send PREV_SEQNUM with uevents Peter Rajnoha
2017-10-16 13:55 ` Greg KH
2017-10-16 14:00   ` Peter Rajnoha
2017-10-16 14:49     ` Greg KH
2017-10-17 10:04       ` Peter Rajnoha
2017-10-17 13:26         ` Greg KH
2017-10-18  7:21           ` Peter Rajnoha
2017-10-18  9:42             ` Peter Rajnoha

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.