All of lore.kernel.org
 help / color / mirror / Atom feed
* [GIT] Experimental threaded udev
@ 2009-05-28 14:35 Alan Jenkins
  2009-05-28 15:09 ` Kay Sievers
                   ` (26 more replies)
  0 siblings, 27 replies; 28+ messages in thread
From: Alan Jenkins @ 2009-05-28 14:35 UTC (permalink / raw)
  To: linux-hotplug

Now available for your delight and/or horror.

<http://github.com/sourcejedi/udev/commits/threading-v0.3>

For now, I'm still treating this as a patch series.  That is, I may
publish future versions with a rewritten history.  I'll preserve the old
branches though.

It turns out the MADV_DONTFORK hack I was so proud of is
implementation-dependant, i.e. a dirty hack.  However, I'm confident
that glibc can and should be modified to do it for all programs.  And it
is so worth it.  On my test machine, threading alone goes from 2s
boot-time coldplug to 1.3-ish.  MADV_DONTFORK takes it down to 0.7-ish. 
The hack is contained in the last patch, "when forking a program, only
copy the stack of the _current_ thread".

Thanks for your time and encouragement
Alan



      udevd: don't use alarm() for timeouts
      Build udevd with pthreads
      Protect selinux context against concurrent modification
      Add close-on-exec wrappers for open(), fopen(), pipe() and socket()
      Convert udevd and libudev to use close-on-exec wrapper functions
      Add abstraction layer for udev event tasks
      Run udev event tasks in threads
      udevd: when forking a program, only copy the stack of the
_current_ thread

 configure.ac                       |   15 ++
 extras/ata_id/Makefile.am          |    3 +-
 extras/cdrom_id/Makefile.am        |    3 +-
 extras/collect/Makefile.am         |    3 +-
 extras/edd_id/Makefile.am          |    3 +-
 extras/floppy/Makefile.am          |    1 +
 extras/fstab_import/Makefile.am    |    3 +-
 extras/scsi_id/Makefile.am         |    3 +-
 extras/usb_id/Makefile.am          |    3 +-
 m4/acx_pthread.m4                  |  280 ++++++++++++++++++++++++++++++
 udev/Makefile.am                   |    8 +
 udev/lib/Makefile.am               |    1 +
 udev/lib/libudev-cloexec.c         |  226 ++++++++++++++++++++++++
 udev/lib/libudev-ctrl.c            |    2 +-
 udev/lib/libudev-device-db-write.c |    2 +-
 udev/lib/libudev-device.c          |    6 +-
 udev/lib/libudev-monitor.c         |    4 +-
 udev/lib/libudev-private.h         |   13 ++
 udev/lib/libudev-queue.c           |    2 +-
 udev/lib/libudev-sysdeps.h         |   96 ++++++++++
 udev/lib/libudev.c                 |   18 ++-
 udev/test-udev.c                   |    7 -
 udev/udev-event.c                  |  335
+++++++++++++++++++++++++++++++++++-
 udev/udev-node.c                   |    2 +-
 udev/udev-rules.c                  |   53 +++---
 udev/udev-selinux.c                |    9 +-
 udev/udev-task.c                   |  295 +++++++++++++++++++++++++++++++
 udev/udev-util.c                   |  211 -----------------------
 udev/udev.h                        |   23 +++-
 udev/udevd.c                       |  323
++++++++++++++++++-----------------
 30 files changed, 1527 insertions(+), 426 deletions(-)
 create mode 100644 m4/acx_pthread.m4
 create mode 100644 udev/lib/libudev-cloexec.c
 create mode 100644 udev/lib/libudev-sysdeps.h
 create mode 100644 udev/udev-task.c


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [GIT] Experimental threaded udev
  2009-05-28 14:35 [GIT] Experimental threaded udev Alan Jenkins
@ 2009-05-28 15:09 ` Kay Sievers
  2009-05-28 15:39 ` Alan Jenkins
                   ` (25 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Kay Sievers @ 2009-05-28 15:09 UTC (permalink / raw)
  To: linux-hotplug

On Thu, May 28, 2009 at 16:35, Alan Jenkins <alan-jenkins@tuffmail.co.uk> wrote:
> Now available for your delight and/or horror.
>
> <http://github.com/sourcejedi/udev/commits/threading-v0.3>
>
> For now, I'm still treating this as a patch series.  That is, I may
> publish future versions with a rewritten history.  I'll preserve the old
> branches though.
>
> It turns out the MADV_DONTFORK hack I was so proud of is
> implementation-dependant, i.e. a dirty hack.  However, I'm confident
> that glibc can and should be modified to do it for all programs.  And it
> is so worth it.  On my test machine, threading alone goes from 2s
> boot-time coldplug to 1.3-ish.  MADV_DONTFORK takes it down to 0.7-ish.
> The hack is contained in the last patch, "when forking a program, only
> copy the stack of the _current_ thread".

Is that a single or dual CPU box?

With the threaded version, it's 0.18 (1.51 -> 1.33) seconds faster
here for a full coldplug run on a:
Intel(R) Core(TM)2 Duo CPU U9400  @ 1.40GHz.

It might be that the threaded version will only behave that much
better on a single CPU machine?

Thanks,
Kay

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [GIT] Experimental threaded udev
  2009-05-28 14:35 [GIT] Experimental threaded udev Alan Jenkins
  2009-05-28 15:09 ` Kay Sievers
@ 2009-05-28 15:39 ` Alan Jenkins
  2009-05-29 17:53 ` Alan Jenkins
                   ` (24 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Alan Jenkins @ 2009-05-28 15:39 UTC (permalink / raw)
  To: linux-hotplug

Kay Sievers wrote:
> On Thu, May 28, 2009 at 16:35, Alan Jenkins <alan-jenkins@tuffmail.co.uk> wrote:
>   
>> Now available for your delight and/or horror.
>>
>> <http://github.com/sourcejedi/udev/commits/threading-v0.3>
>>
>> For now, I'm still treating this as a patch series.  That is, I may
>> publish future versions with a rewritten history.  I'll preserve the old
>> branches though.
>>
>> It turns out the MADV_DONTFORK hack I was so proud of is
>> implementation-dependant, i.e. a dirty hack.  However, I'm confident
>> that glibc can and should be modified to do it for all programs.  And it
>> is so worth it.  On my test machine, threading alone goes from 2s
>> boot-time coldplug to 1.3-ish.  MADV_DONTFORK takes it down to 0.7-ish.
>> The hack is contained in the last patch, "when forking a program, only
>> copy the stack of the _current_ thread".
>>     
>
> Is that a single or dual CPU box?
>   

Single CPU - its my Celeron 630Mhz netbook.

> With the threaded version, it's 0.18 (1.51 -> 1.33) seconds faster
> here for a full coldplug run on a:
> Intel(R) Core(TM)2 Duo CPU U9400  @ 1.40GHz.
>
> It might be that the threaded version will only behave that much
> better on a single CPU machine?
>
> Thanks,
> Kay
>   

I'll have a look on my Core2Duo desktop.  My netbook core is much slower
than a C2D desktop, but it does require explanation.

There are a couple of global locks.  The selinux context is per-process,
so that's locked.  If you don't have full close-on-exec support, it has
to fallback to locking.  I haven't tested the full close-on-exec support
yet.

My memory says my desktop machine steams through coldplug in a fraction
of a second, but I've not measured it recently.

Thanks
Alan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [GIT] Experimental threaded udev
  2009-05-28 14:35 [GIT] Experimental threaded udev Alan Jenkins
  2009-05-28 15:09 ` Kay Sievers
  2009-05-28 15:39 ` Alan Jenkins
@ 2009-05-29 17:53 ` Alan Jenkins
  2009-05-29 18:11 ` Kay Sievers
                   ` (23 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Alan Jenkins @ 2009-05-29 17:53 UTC (permalink / raw)
  To: linux-hotplug

Kay Sievers wrote:
> On Thu, May 28, 2009 at 16:35, Alan Jenkins <alan-jenkins@tuffmail.co.uk> wrote:
>   
>> Now available for your delight and/or horror.
>>
>> <http://github.com/sourcejedi/udev/commits/threading-v0.3>
>>
>> For now, I'm still treating this as a patch series.  That is, I may
>> publish future versions with a rewritten history.  I'll preserve the old
>> branches though.
>>
>> It turns out the MADV_DONTFORK hack I was so proud of is
>> implementation-dependant, i.e. a dirty hack.  However, I'm confident
>> that glibc can and should be modified to do it for all programs.  And it
>> is so worth it.  On my test machine, threading alone goes from 2s
>> boot-time coldplug to 1.3-ish.  MADV_DONTFORK takes it down to 0.7-ish.
>> The hack is contained in the last patch, "when forking a program, only
>> copy the stack of the _current_ thread".
>>     
>
> Is that a single or dual CPU box?
>
> With the threaded version, it's 0.18 (1.51 -> 1.33) seconds faster
> here for a full coldplug run on a:
> Intel(R) Core(TM)2 Duo CPU U9400  @ 1.40GHz.
>
> It might be that the threaded version will only behave that much
> better on a single CPU machine?
>   

No, it's not that.  I'm afraid the result of ~0.7 seconds result was an 
accident; the bootchart looks suspicious and it didn't obtain when I 
retested.  The "fixed" version of the last patch doesn't work, and I 
don't think it can.

The version that crashed on boot may be useful for comparison purposes 
though.  I will try and see how much I can reduce the page fault 
overhead without using threads.  Maybe just recycling the event 
processes would bring similar gains, with less of the risks of threads.

Alan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [GIT] Experimental threaded udev
  2009-05-28 14:35 [GIT] Experimental threaded udev Alan Jenkins
                   ` (2 preceding siblings ...)
  2009-05-29 17:53 ` Alan Jenkins
@ 2009-05-29 18:11 ` Kay Sievers
  2009-06-01  2:41 ` Kay Sievers
                   ` (22 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Kay Sievers @ 2009-05-29 18:11 UTC (permalink / raw)
  To: linux-hotplug

On Fri, May 29, 2009 at 19:53, Alan Jenkins <alan-jenkins@tuffmail.co.uk> wrote:

>  Maybe just recycling the event processes would bring
> similar gains, with less of the risks of threads.

Yeah, I thought that too, without having tested anything, it could be,
that we just want to keep a pipe to the event process, and let the
event process send a signal back to the main daemon, that it has
handled the event, and it goes to sleep after that. The main daemon
can recycle a sleeping event process and push a new event over the
pipe to it. If no events are queued anymore, the main daemon just
closes the pipe, and the event process will exit.

With that model we might be able to reduce the number of fork()s
significantly. And we would still have the process separation, it's
robustness, and the lock-free behavior for malloc, cloexec and all
these issues.

Thanks,
Kay

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [GIT] Experimental threaded udev
  2009-05-28 14:35 [GIT] Experimental threaded udev Alan Jenkins
                   ` (3 preceding siblings ...)
  2009-05-29 18:11 ` Kay Sievers
@ 2009-06-01  2:41 ` Kay Sievers
  2009-06-01  9:29 ` Alan Jenkins
                   ` (21 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Kay Sievers @ 2009-06-01  2:41 UTC (permalink / raw)
  To: linux-hotplug

On Fri, 2009-05-29 at 20:11 +0200, Kay Sievers wrote:
> On Fri, May 29, 2009 at 19:53, Alan Jenkins <alan-jenkins@tuffmail.co.uk> wrote:
> 
> >  Maybe just recycling the event processes would bring
> > similar gains, with less of the risks of threads.
> 
> Yeah, I thought that too, without having tested anything, it could be,
> that we just want to keep a pipe to the event process, and let the
> event process send a signal back to the main daemon, that it has
> handled the event, and it goes to sleep after that. The main daemon
> can recycle a sleeping event process and push a new event over the
> pipe to it. If no events are queued anymore, the main daemon just
> closes the pipe, and the event process will exit.
> 
> With that model we might be able to reduce the number of fork()s
> significantly. And we would still have the process separation, it's
> robustness, and the lock-free behavior for malloc, cloexec and all
> these issues.

Here is a rough hack to check how it behaves. It boots my box, nothing
else I really checked.

It clones the event processes as the current version does, but the event
process stays around to get re-used as a worker for later events.
Further messages are send over netlink to the worker processes, and the
worker process signals its state back with sigqueue() rt-signals. When
the events have settled, the workers get killed after a few seconds of
idle time.

The current git version:
  $ time (udevadm trigger; udevadm settle)
  real	0m1.566s
  ...

  $ time /sbin/udevd.orig
  []
  user	0m0.420s
  sys	0m1.412s


The thread-version:
  $ time (udevadm trigger -Snet; udevadm settle)
  real 0m1.336s

  $ time udev/udevd
  []
  user	0m0.310s
  sys	0m0.679s


The worker-version:
  $ time (udevadm trigger; udevadm settle)
  real	0m1.171s
  ...

  $ time udev/udevd
  []
  user	0m0.057s
  sys	0m0.095s


The thread- and worker-versions do not create as many COW page-faults in
the daemon after every cloned event-process, and therefore need much
less CPU.

At least on the dual-core laptop here, the pool of workers seems to be
faster than the threads.

Thanks,
Kay



 lib/libudev-monitor.c |   41 +---
 lib/libudev-private.h |    2 
 udev-event.c          |    7 
 udev.h                |    6 
 udevd.c               |  500 +++++++++++++++++++++++++++++++-------------------
 5 files changed, 333 insertions(+), 223 deletions(-)

diff --git a/udev/lib/libudev-monitor.c b/udev/lib/libudev-monitor.c
index 395a4d2..54c9576 100644
--- a/udev/lib/libudev-monitor.c
+++ b/udev/lib/libudev-monitor.c
@@ -32,7 +32,6 @@ struct udev_monitor {
 	int refcount;
 	int sock;
 	struct sockaddr_nl snl;
-	struct sockaddr_nl snl_peer;
 	struct sockaddr_un sun;
 	socklen_t addrlen;
 	struct udev_list_node filter_subsystem_list;
@@ -171,8 +170,8 @@ struct udev_monitor *udev_monitor_new_from_netlink(struct udev *udev, const char
 		return NULL;
 
 	if (name = NULL)
-		return NULL;
-	if (strcmp(name, "kernel") = 0)
+		group = 0;
+	else if (strcmp(name, "kernel") = 0)
 		group = UDEV_MONITOR_KERNEL;
 	else if (strcmp(name, "udev") = 0)
 		group = UDEV_MONITOR_UDEV;
@@ -193,8 +192,6 @@ struct udev_monitor *udev_monitor_new_from_netlink(struct udev *udev, const char
 
 	udev_monitor->snl.nl_family = AF_NETLINK;
 	udev_monitor->snl.nl_groups = group;
-	udev_monitor->snl_peer.nl_family = AF_NETLINK;
-	udev_monitor->snl_peer.nl_groups = UDEV_MONITOR_UDEV;
 
 	dbg(udev, "monitor %p created with NETLINK_KOBJECT_UEVENT (%u)\n", udev_monitor, group);
 	return udev_monitor;
@@ -434,7 +431,6 @@ struct udev_device *udev_monitor_receive_device(struct udev_monitor *udev_monito
 	struct iovec iov;
 	char cred_msg[CMSG_SPACE(sizeof(struct ucred))];
 	struct cmsghdr *cmsg;
-	struct sockaddr_nl snl;
 	struct ucred *cred;
 	char buf[8192];
 	ssize_t buflen;
@@ -459,11 +455,6 @@ retry:
 	smsg.msg_control = cred_msg;
 	smsg.msg_controllen = sizeof(cred_msg);
 
-	if (udev_monitor->snl.nl_family != 0) {
-		smsg.msg_name = &snl;
-		smsg.msg_namelen = sizeof(snl);
-	}
-
 	buflen = recvmsg(udev_monitor->sock, &smsg, 0);
 	if (buflen < 0) {
 		if (errno != EINTR)
@@ -476,20 +467,6 @@ retry:
 		return NULL;
 	}
 
-	if (udev_monitor->snl.nl_family != 0) {
-		if (snl.nl_groups = 0) {
-			info(udev_monitor->udev, "unicast netlink message ignored\n");
-			return NULL;
-		}
-		if (snl.nl_groups = UDEV_MONITOR_KERNEL) {
-			if (snl.nl_pid > 0) {
-				info(udev_monitor->udev, "multicast kernel netlink message from pid %d ignored\n", snl.nl_pid);
-				return NULL;
-			}
-			is_kernel = 1;
-		}
-	}
-
 	cmsg = CMSG_FIRSTHDR(&smsg);
 	if (cmsg = NULL || cmsg->cmsg_type != SCM_CREDENTIALS) {
 		info(udev_monitor->udev, "no sender credentials received, message ignored\n");
@@ -621,7 +598,7 @@ retry:
 	return udev_device;
 }
 
-int udev_monitor_send_device(struct udev_monitor *udev_monitor, struct udev_device *udev_device)
+int udev_monitor_send_device(struct udev_monitor *udev_monitor, struct udev_device *udev_device, pid_t pid)
 {
 	struct msghdr smsg;
 	struct iovec iov[2];
@@ -660,6 +637,7 @@ int udev_monitor_send_device(struct udev_monitor *udev_monitor, struct udev_devi
 	} else if (udev_monitor->snl.nl_family != 0) {
 		const char *val;
 		struct udev_monitor_netlink_header nlh;
+		struct sockaddr_nl snl_peer;
 
 
 		/* add versioned header */
@@ -680,11 +658,18 @@ int udev_monitor_send_device(struct udev_monitor *udev_monitor, struct udev_devi
 		iov[1].iov_base = (char *)buf;
 		iov[1].iov_len = blen;
 
+		/* we will always get ECONNREFUSED when sending to the muticast group */
+		memset(&snl_peer, 0x00, sizeof(struct sockaddr_nl));
+		snl_peer.nl_family = AF_NETLINK;
+		if (pid > 0)
+			snl_peer.nl_pid = pid;
+		else
+			snl_peer.nl_groups = UDEV_MONITOR_UDEV;
+
 		memset(&smsg, 0x00, sizeof(struct msghdr));
 		smsg.msg_iov = iov;
 		smsg.msg_iovlen = 2;
-		/* no destination besides the muticast group, we will always get ECONNREFUSED */
-		smsg.msg_name = &udev_monitor->snl_peer;
+		smsg.msg_name = &snl_peer;
 		smsg.msg_namelen = sizeof(struct sockaddr_nl);
 	} else {
 		return -1;
diff --git a/udev/lib/libudev-private.h b/udev/lib/libudev-private.h
index 3eb3d79..3019920 100644
--- a/udev/lib/libudev-private.h
+++ b/udev/lib/libudev-private.h
@@ -86,7 +86,7 @@ int udev_device_delete_db(struct udev_device *udev_device);
 int udev_device_rename_db(struct udev_device *udev_device, const char *devpath);
 
 /* libudev-monitor - netlink/unix socket communication  */
-int udev_monitor_send_device(struct udev_monitor *udev_monitor, struct udev_device *udev_device);
+int udev_monitor_send_device(struct udev_monitor *udev_monitor, struct udev_device *udev_device, pid_t pid);
 int udev_monitor_set_receive_buffer_size(struct udev_monitor *udev_monitor, int size);
 
 /* libudev-ctrl - daemon runtime setup */
diff --git a/udev/udev-event.c b/udev/udev-event.c
index d521251..8ab262a 100644
--- a/udev/udev-event.c
+++ b/udev/udev-event.c
@@ -734,18 +734,13 @@ int udev_event_execute_run(struct udev_event *event)
 			monitor = udev_monitor_new_from_socket(event->udev, &cmd[strlen("socket:")]);
 			if (monitor = NULL)
 				continue;
-			udev_monitor_send_device(monitor, event->dev);
+			udev_monitor_send_device(monitor, event->dev, 0);
 			udev_monitor_unref(monitor);
 		} else {
 			char program[UTIL_PATH_SIZE];
 			char **envp;
 
 			udev_event_apply_format(event, cmd, program, sizeof(program));
-			if (event->trace)
-				fprintf(stderr, "run  %s (%llu) '%s'\n",
-				       udev_device_get_syspath(event->dev),
-				       udev_device_get_seqnum(event->dev),
-				       program);
 			envp = udev_device_get_properties_envp(event->dev);
 			if (util_run_program(event->udev, program, envp, NULL, 0, NULL) != 0) {
 				if (!udev_list_entry_get_flag(list_entry))
diff --git a/udev/udev.h b/udev/udev.h
index 8f2c1c6..ed29c4b 100644
--- a/udev/udev.h
+++ b/udev/udev.h
@@ -53,7 +53,6 @@ static inline void logging_close(void)
 }
 
 struct udev_event {
-	struct udev_list_node node;
 	struct udev *udev;
 	struct udev_device *dev;
 	struct udev_device *dev_parent;
@@ -64,10 +63,6 @@ struct udev_event {
 	uid_t uid;
 	gid_t gid;
 	struct udev_list_node run_list;
-	pid_t pid;
-	int exitstatus;
-	time_t queue_time;
-	unsigned long long int delaying_seqnum;
 	unsigned int group_final:1;
 	unsigned int owner_final:1;
 	unsigned int mode_final:1;
@@ -76,7 +71,6 @@ struct udev_event {
 	unsigned int run_final:1;
 	unsigned int ignore_device:1;
 	unsigned int inotify_watch:1;
-	unsigned int trace:1;
 };
 
 struct udev_watch {
diff --git a/udev/udevd.c b/udev/udevd.c
index 37b547a..6c41e5d 100644
--- a/udev/udevd.c
+++ b/udev/udevd.c
@@ -44,8 +44,7 @@
 #define UDEVD_PRIORITY			-4
 #define UDEV_PRIORITY			-2
 
-/* maximum limit of forked childs */
-#define UDEVD_MAX_CHILDS		256
+#define SIGRT_WORKER			SIGRTMIN+1
 
 static int debug;
 
@@ -61,34 +60,75 @@ static void log_fn(struct udev *udev, int priority,
 	}
 }
 
-static void reap_sigchilds(void);
-
 static int debug_trace;
 static struct udev_rules *rules;
 static struct udev_queue_export *udev_queue_export;
 static struct udev_ctrl *udev_ctrl;
-static struct udev_monitor *kernel_monitor;
-static volatile sig_atomic_t sigchilds_waiting;
+static struct udev_monitor *monitor;
+static pid_t main_pid;
+static volatile sig_atomic_t event_finished;
+static volatile sig_atomic_t worker_dead;
 static volatile sig_atomic_t udev_exit;
 static volatile sig_atomic_t reload_config;
 static volatile sig_atomic_t signal_received;
-static volatile pid_t settle_pid;
-static int run_exec_q;
+static pid_t settle_pid;
 static int stop_exec_q;
 static int max_childs;
-static int childs;
+static volatile int childs;
 static struct udev_list_node event_list;
+static struct udev_list_node worker_list;
+static volatile sig_atomic_t worker_exit;
+
+enum event_state {
+	EVENT_UNDEF,
+	EVENT_QUEUED,
+	EVENT_RUNNING,
+	EVENT_FINISHED,
+};
+
+struct event {
+	struct udev_list_node node;
+	struct udev *udev;
+	struct udev_device *dev;
+	enum event_state state;
+	int exitstatus;
+	unsigned long long int delaying_seqnum;
+};
 
-static struct udev_event *node_to_event(struct udev_list_node *node)
+static struct event *node_to_event(struct udev_list_node *node)
 {
 	char *event;
 
 	event = (char *)node;
-	event -= offsetof(struct udev_event, node);
-	return (struct udev_event *)event;
+	event -= offsetof(struct event, node);
+	return (struct event *)event;
 }
 
-static void event_queue_delete(struct udev_event *event)
+enum worker_state {
+	WORKER_UNDEF,
+	WORKER_RUNNING,
+	WORKER_IDLE,
+	WORKER_KILLED,
+	WORKER_DEAD,
+};
+
+struct worker {
+	struct udev_list_node node;
+	pid_t pid;
+	enum worker_state state;
+	struct event *event;
+};
+
+static struct worker *node_to_worker(struct udev_list_node *node)
+{
+	char *worker;
+
+	worker = (char *)node;
+	worker -= offsetof(struct worker, node);
+	return (struct worker *)worker;
+}
+
+static void event_queue_delete(struct event *event)
 {
 	udev_list_node_remove(&event->node);
 
@@ -99,48 +139,57 @@ static void event_queue_delete(struct udev_event *event)
 		udev_queue_export_device_finished(udev_queue_export, event->dev);
 
 	udev_device_unref(event->dev);
-	udev_event_unref(event);
+	free(event);
 }
 
 static void event_sig_handler(int signum)
 {
-	if (signum = SIGALRM)
+	switch (signum) {
+	case SIGALRM:
 		_exit(1);
+		break;
+	case SIGTERM:
+		worker_exit = 1;
+		break;
+	}
 }
 
-static void event_fork(struct udev_event *event)
+static void worker_new(struct event *event)
 {
+	struct worker *worker;
 	pid_t pid;
 	struct sigaction act;
-	int err;
-
-#if 0
-	/* single process, no forking, just for testing/profiling */
-	err = udev_event_execute_rules(event, rules);
-	if (err = 0 && !event->ignore_device && udev_get_run(event->udev))
-		udev_event_execute_run(event);
-	info(event->udev, "seq %llu exit with %i\n", udev_device_get_seqnum(event->dev), err);
-	event_queue_delete(event);
-	return;
-#endif
+	sigset_t mask;
 
-	if (debug_trace) {
-		event->trace = 1;
-		fprintf(stderr, "fork %s (%llu)\n",
-		       udev_device_get_syspath(event->dev),
-		       udev_device_get_seqnum(event->dev));
-	}
+	worker = calloc(1, sizeof(struct worker));
+	if (worker = NULL)
+		return;
+
+	/* block WORKER signals, until we joined the list with our new pid */
+	sigemptyset(&mask);
+	sigaddset(&mask, SIGRT_WORKER);
+	sigprocmask(SIG_BLOCK, &mask, NULL);
+
+	event->state = EVENT_RUNNING;
 
 	pid = fork();
 	switch (pid) {
-	case 0:
-		/* child */
+	case 0: {
+		struct udev_device *dev;
+
 		udev_queue_export_unref(udev_queue_export);
 		udev_ctrl_unref(udev_ctrl);
+		udev_monitor_unref(monitor);
 		logging_close();
 		logging_init("udevd-event");
 		setpriority(PRIO_PROCESS, 0, UDEV_PRIORITY);
 
+		/* re-open socket to listen to udevd only, and send back libudev events */
+		monitor = udev_monitor_new_from_netlink(event->udev, NULL);
+		if (monitor = NULL)
+			_exit(2);
+		udev_monitor_enable_receiving(monitor);
+
 		/* set signal handlers */
 		memset(&act, 0x00, sizeof(act));
 		act.sa_handler = event_sig_handler;
@@ -154,66 +203,135 @@ static void event_fork(struct udev_event *event)
 		sigaction(SIGTERM, &act, NULL);
 		sigaction(SIGCHLD, &act, NULL);
 		sigaction(SIGHUP, &act, NULL);
+		sigaction(SIGRT_WORKER, &act, NULL);
 
-		/* set timeout to prevent hanging processes */
-		alarm(UDEV_EVENT_TIMEOUT);
+		/* initial device */
+		dev = event->dev;
 
-		/* apply rules, create node, symlinks */
-		err = udev_event_execute_rules(event, rules);
+		while (!worker_exit) {
+			struct udev_event *udev_event;
+			union sigval sigval;
+			int err;
 
-		/* rules may change/disable the timeout */
-		if (udev_device_get_event_timeout(event->dev) >= 0)
-			alarm(udev_device_get_event_timeout(event->dev));
+			udev_event = udev_event_new(dev);
+			if (udev_event = NULL)
+				_exit(3);
 
-		/* execute RUN= */
-		if (err = 0 && !event->ignore_device && udev_get_run(event->udev))
-			udev_event_execute_run(event);
+			/* set timeout to prevent hanging processes */
+			alarm(UDEV_EVENT_TIMEOUT);
 
-		/* apply/restore inotify watch */
-		if (err = 0 && event->inotify_watch) {
-			udev_watch_begin(event->udev, event->dev);
-			udev_device_update_db(event->dev);
-		}
+			/* apply rules, create node, symlinks */
+			err = udev_event_execute_rules(udev_event, rules);
+
+			/* rules may change/disable the timeout */
+			if (udev_device_get_event_timeout(dev) >= 0)
+				alarm(udev_device_get_event_timeout(dev));
+
+			/* execute RUN= */
+			if (err = 0 && !udev_event->ignore_device && udev_get_run(udev_event->udev))
+				udev_event_execute_run(udev_event);
+
+			/* reset alarm */
+			alarm(0);
+
+			/* apply/restore inotify watch */
+			if (err = 0 && udev_event->inotify_watch) {
+				udev_watch_begin(udev_event->udev, dev);
+				udev_device_update_db(dev);
+			}
+
+			/* send processed event back to libudev listeners */
+			udev_monitor_send_device(monitor, dev, 0);
+
+			info(event->udev, "seq %llu finished with %i\n", udev_device_get_seqnum(dev), err);
+			udev_device_unref(dev);
+			udev_event_unref(udev_event);
+
+			/* send back the result of the event execution */
+			sigval.sival_int = err;
+			sigqueue(main_pid, SIGRT_WORKER, sigval);
 
-		/* send processed event back to the kernel netlink socket */
-		udev_monitor_send_device(kernel_monitor, event->dev);
+			/* wait for more device messages from udevd */
+			do
+				dev = udev_monitor_receive_device(monitor);
+			while (dev = NULL);
+		}
 
-		info(event->udev, "seq %llu exit with %i\n", udev_device_get_seqnum(event->dev), err);
+		udev_monitor_unref(monitor);
 		logging_close();
-		if (err != 0)
-			exit(1);
 		exit(0);
+	}
 	case -1:
+		event->state = EVENT_QUEUED;
+		free(worker);
 		err(event->udev, "fork of child failed: %m\n");
-		event_queue_delete(event);
 		break;
 	default:
-		/* get SIGCHLD in main loop */
-		info(event->udev, "seq %llu forked, pid [%d], '%s' '%s', %ld seconds old\n",
-		     udev_device_get_seqnum(event->dev),
-		     pid,
-		     udev_device_get_action(event->dev),
-		     udev_device_get_subsystem(event->dev),
-		     time(NULL) - event->queue_time);
-		event->pid = pid;
+		worker->pid = pid;
+		worker->event = event;
+		worker->state = WORKER_RUNNING;
+		udev_list_node_append(&worker->node, &worker_list);
 		childs++;
+		break;
+	}
+
+	sigprocmask(SIG_UNBLOCK, &mask, NULL);
+}
+
+static void event_run(struct event *event)
+{
+	struct udev_list_node *loop;
+
+	udev_list_node_foreach(loop, &worker_list) {
+		struct worker *worker = node_to_worker(loop);
+		ssize_t count;
+
+		if (worker->state != WORKER_IDLE)
+			continue;
+
+		worker->event = event;
+		worker->state = WORKER_RUNNING;
+		event->state = EVENT_RUNNING;
+		count = udev_monitor_send_device(monitor, event->dev, worker->pid);
+		if (count < 0) {
+			err(event->udev, "worker [%u] did not accept message, kill it\n", worker->pid);
+			event->state = EVENT_QUEUED;
+			worker->state = WORKER_KILLED;
+			kill(worker->pid, SIGKILL);
+			continue;
+		}
+		return;
+	}
+
+	if (childs >= max_childs) {
+		info(event->udev, "maximum number (%i) of childs reached\n", childs);
+		return;
 	}
+
+	/* start new worker and pass initial device */
+	worker_new(event);
 }
 
-static void event_queue_insert(struct udev_event *event)
+static void event_queue_insert(struct udev_device *dev)
 {
-	event->queue_time = time(NULL);
+	struct event *event;
 
-	udev_queue_export_device_queued(udev_queue_export, event->dev);
-	info(event->udev, "seq %llu queued, '%s' '%s'\n", udev_device_get_seqnum(event->dev),
-	     udev_device_get_action(event->dev), udev_device_get_subsystem(event->dev));
+	event = calloc(1, sizeof(struct event));
+	if (event = NULL)
+		return;
 
+	event->udev = udev_device_get_udev(dev);
+	event->dev = dev;
+	udev_queue_export_device_queued(udev_queue_export, dev);
+	info(event->udev, "seq %llu queued, '%s' '%s'\n", udev_device_get_seqnum(dev),
+	     udev_device_get_action(dev), udev_device_get_subsystem(dev));
+
+	event->state = EVENT_QUEUED;
 	udev_list_node_append(&event->node, &event_list);
-	run_exec_q = 1;
 
 	/* run all events with a timeout set immediately */
-	if (udev_device_get_timeout(event->dev) > 0) {
-		event_fork(event);
+	if (udev_device_get_timeout(dev) > 0) {
+		worker_new(event);
 		return;
 	}
 }
@@ -265,13 +383,13 @@ static int compare_devpath(const char *running, const char *waiting)
 }
 
 /* lookup event for identical, parent, child device */
-static int devpath_busy(struct udev_event *event)
+static int devpath_busy(struct event *event)
 {
 	struct udev_list_node *loop;
 
 	/* check if queue contains events we depend on */
 	udev_list_node_foreach(loop, &event_list) {
-		struct udev_event *loop_event = node_to_event(loop);
+		struct event *loop_event = node_to_event(loop);
 
 		/* we already found a later event, earlier can not block us, no need to check again */
 		if (udev_device_get_seqnum(loop_event->dev) < event->delaying_seqnum)
@@ -312,42 +430,37 @@ static void event_queue_manager(struct udev *udev)
 	struct udev_list_node *tmp;
 
 start_over:
-	if (udev_list_is_empty(&event_list)) {
-		if (childs > 0) {
-			err(udev, "event list empty, but childs count is %i", childs);
-			childs = 0;
-		}
-		return;
-	}
-
 	udev_list_node_foreach_safe(loop, tmp, &event_list) {
-		struct udev_event *loop_event = node_to_event(loop);
+		struct event *event = node_to_event(loop);
 
-		if (childs >= max_childs) {
-			info(udev, "maximum number (%i) of childs reached\n", childs);
-			break;
+		/* cleanup finished events */
+		if (event->state = EVENT_FINISHED) {
+			event_queue_delete(event);
+			continue;
 		}
 
-		if (loop_event->pid != 0)
+		if (stop_exec_q)
+			continue;
+
+		/* skip running events */
+		if (event->state != EVENT_QUEUED)
 			continue;
 
 		/* do not start event if parent or child event is still running */
-		if (devpath_busy(loop_event) != 0) {
+		if (devpath_busy(event) != 0) {
 			dbg(udev, "delay seq %llu (%s)\n",
-			    udev_device_get_seqnum(loop_event->dev),
-			    udev_device_get_devpath(loop_event->dev));
+			    udev_device_get_seqnum(event->dev),
+			    udev_device_get_devpath(event->dev));
 			continue;
 		}
 
-		event_fork(loop_event);
-		dbg(udev, "moved seq %llu to running list\n", udev_device_get_seqnum(loop_event->dev));
+		event_run(event);
+	}
 
-		/* retry if events finished in the meantime */
-		if (sigchilds_waiting) {
-			sigchilds_waiting = 0;
-			reap_sigchilds();
-			goto start_over;
-		}
+	/* keep the incoming queue small, retry if events finished in the meantime */
+	if (event_finished) {
+		event_finished = 0;
+		goto start_over;
 	}
 }
 
@@ -480,69 +593,64 @@ static int handle_inotify(struct udev *udev)
 	return 0;
 }
 
-static void sig_handler(int signum)
+static void sig_handler(int signum, siginfo_t *info, void *ucontext)
 {
 	switch (signum) {
 		case SIGINT:
 		case SIGTERM:
 			udev_exit = 1;
-			break;
+			return;
 		case SIGCHLD:
-			/* set flag, then write to pipe if needed */
-			sigchilds_waiting = 1;
-			break;
-		case SIGHUP:
-			reload_config = 1;
-			break;
-	}
+			while (1) {
+				pid_t pid;
+				struct udev_list_node *loop;
 
-	signal_received = 1;
-}
+				pid = waitpid(-1, NULL, WNOHANG);
+				if (pid <= 0)
+					break;
 
-static void udev_done(int pid, int exitstatus)
-{
-	struct udev_list_node *loop;
+				udev_list_node_foreach(loop, &worker_list) {
+					struct worker *worker = node_to_worker(loop);
 
-	/* find event associated with pid and delete it */
-	udev_list_node_foreach(loop, &event_list) {
-		struct udev_event *loop_event = node_to_event(loop);
-
-		if (loop_event->pid = pid) {
-			info(loop_event->udev, "seq %llu cleanup, pid [%d], status %i, %ld seconds old\n",
-			     udev_device_get_seqnum(loop_event->dev), loop_event->pid,
-			     exitstatus, time(NULL) - loop_event->queue_time);
-			loop_event->exitstatus = exitstatus;
-			if (debug_trace)
-				fprintf(stderr, "exit %s (%llu)\n",
-				       udev_device_get_syspath(loop_event->dev),
-				       udev_device_get_seqnum(loop_event->dev));
-			event_queue_delete(loop_event);
-			childs--;
-
-			/* there may be dependent events waiting */
-			run_exec_q = 1;
+					if (worker->pid != info->si_pid)
+						continue;
+
+					worker->state = WORKER_DEAD;
+					childs--;
+					break;
+				}
+			}
+			worker_dead = 1;
 			return;
-		}
+		case SIGHUP:
+			signal_received = 1;
+			reload_config = 1;
+			return;
+		default:
+			if (signum = SIGRT_WORKER) {
+				struct udev_list_node *loop;
+
+				/* lookup worker who sent the signal */
+				udev_list_node_foreach(loop, &worker_list) {
+					struct worker *worker = node_to_worker(loop);
+
+					if (worker->pid != info->si_pid)
+						continue;
+
+					/* worker returned */
+					worker->event->exitstatus = info->si_value.sival_int;
+					worker->event->state = EVENT_FINISHED;
+					worker->event = NULL;
+					worker->state = WORKER_IDLE;
+					event_finished = 1;
+					break;
+				}
+				return;
+			}
+		break;
 	}
-}
-
-static void reap_sigchilds(void)
-{
-	pid_t pid;
-	int status;
 
-	while (1) {
-		pid = waitpid(-1, &status, WNOHANG);
-		if (pid <= 0)
-			break;
-		if (WIFEXITED(status))
-			status = WEXITSTATUS(status);
-		else if (WIFSIGNALED(status))
-			status = WTERMSIG(status) + 128;
-		else
-			status = 0;
-		udev_done(pid, status);
-	}
+	signal_received = 1;
 }
 
 static void startup_log(struct udev *udev)
@@ -677,21 +785,24 @@ int main(int argc, char *argv[])
 		goto exit;
 	}
 
-	kernel_monitor = udev_monitor_new_from_netlink(udev, "kernel");
-	if (kernel_monitor = NULL || udev_monitor_enable_receiving(kernel_monitor) < 0) {
+	monitor = udev_monitor_new_from_netlink(udev, "kernel");
+	if (monitor = NULL || udev_monitor_enable_receiving(monitor) < 0) {
 		fprintf(stderr, "error initializing netlink socket\n");
 		err(udev, "error initializing netlink socket\n");
 		rc = 3;
 		goto exit;
 	}
-	udev_monitor_set_receive_buffer_size(kernel_monitor, 128*1024*1024);
+	udev_monitor_set_receive_buffer_size(monitor, 128*1024*1024);
 
 	rules = udev_rules_new(udev, resolve_names);
 	if (rules = NULL) {
 		err(udev, "error reading rules\n");
 		goto exit;
 	}
+
 	udev_list_init(&event_list);
+	udev_list_init(&worker_list);
+
 	udev_queue_export = udev_queue_export_new(udev);
 	if (udev_queue_export = NULL) {
 		err(udev, "error creating queue file\n");
@@ -704,19 +815,19 @@ int main(int argc, char *argv[])
 		pid = fork();
 		switch (pid) {
 		case 0:
-			dbg(udev, "daemonized fork running\n");
 			break;
 		case -1:
 			err(udev, "fork of daemon failed: %m\n");
 			rc = 4;
 			goto exit;
 		default:
-			dbg(udev, "child [%u] running, parent exits\n", pid);
 			rc = 0;
 			goto exit;
 		}
 	}
 
+	main_pid = getpid();
+
 	/* redirect std{out,err} */
 	if (!debug && !debug_trace) {
 		dup2(fd, STDIN_FILENO);
@@ -746,13 +857,14 @@ int main(int argc, char *argv[])
 
 	/* set signal handlers */
 	memset(&act, 0x00, sizeof(struct sigaction));
-	act.sa_handler = sig_handler;
+	act.sa_sigaction = sig_handler;
 	sigemptyset(&act.sa_mask);
-	act.sa_flags = SA_RESTART;
+	act.sa_flags = SA_RESTART | SA_SIGINFO;
 	sigaction(SIGINT, &act, NULL);
 	sigaction(SIGTERM, &act, NULL);
 	sigaction(SIGCHLD, &act, NULL);
 	sigaction(SIGHUP, &act, NULL);
+	sigaction(SIGRT_WORKER, &act, NULL);
 
 	/* watch rules directory */
 	udev_watch_init(udev);
@@ -782,10 +894,11 @@ int main(int argc, char *argv[])
 		max_childs = 1;
 	} else {
 		int memsize = mem_size_mb();
+
 		if (memsize > 0)
 			max_childs = 128 + (memsize / 4);
 		else
-			max_childs = UDEVD_MAX_CHILDS;
+			max_childs = 256;
 	}
 	/* possibly overwrite maximum limit of executed events */
 	value = getenv("UDEVD_MAX_CHILDS");
@@ -797,12 +910,15 @@ int main(int argc, char *argv[])
 		sigset_t blocked_mask, orig_mask;
 		struct pollfd pfd[4];
 		struct pollfd *ctrl_poll, *monitor_poll, *inotify_poll = NULL;
+		const struct timespec ts = { 3, 0 };
+		const struct timespec *timeout = NULL;
 		int nfds = 0;
 		int fdcount;
 
 		sigfillset(&blocked_mask);
 		sigprocmask(SIG_SETMASK, &blocked_mask, &orig_mask);
 		if (signal_received) {
+			signal_received = 0;
 			sigprocmask(SIG_SETMASK, &orig_mask, NULL);
 			goto handle_signals;
 		}
@@ -812,7 +928,7 @@ int main(int argc, char *argv[])
 		ctrl_poll->events = POLLIN;
 
 		monitor_poll = &pfd[nfds++];
-		monitor_poll->fd = udev_monitor_get_fd(kernel_monitor);
+		monitor_poll->fd = udev_monitor_get_fd(monitor);
 		monitor_poll->events = POLLIN;
 
 		if (inotify_fd >= 0) {
@@ -821,15 +937,33 @@ int main(int argc, char *argv[])
 			inotify_poll->events = POLLIN;
 		}
 
-		fdcount = ppoll(pfd, nfds, NULL, &orig_mask);
+		/* set timeout to kill idle workers */
+		if (udev_list_is_empty(&event_list) && !udev_list_is_empty(&worker_list))
+			timeout = &ts;
+
+		fdcount = ppoll(pfd, nfds, timeout, &orig_mask);
 		sigprocmask(SIG_SETMASK, &orig_mask, NULL);
 		if (fdcount < 0) {
 			if (errno = EINTR)
 				goto handle_signals;
-			err(udev, "error in select: %m\n");
 			continue;
 		}
 
+		/* timeout, no events - kill idle workers */
+		if (fdcount = 0) {
+			struct udev_list_node *loop;
+
+			udev_list_node_foreach(loop, &worker_list) {
+				struct worker *worker = node_to_worker(loop);
+
+				if (worker->state != WORKER_IDLE)
+					continue;
+
+				worker->state = WORKER_KILLED;
+				kill(worker->pid, SIGKILL);
+			}
+		}
+
 		/* get control message */
 		if (ctrl_poll->revents & POLLIN)
 			handle_ctrl_msg(udev_ctrl);
@@ -838,16 +972,11 @@ int main(int argc, char *argv[])
 		if (monitor_poll->revents & POLLIN) {
 			struct udev_device *dev;
 
-			dev = udev_monitor_receive_device(kernel_monitor);
-			if (dev != NULL) {
-				struct udev_event *event;
-
-				event = udev_event_new(dev);
-				if (event != NULL)
-					event_queue_insert(event);
-				else
-					udev_device_unref(dev);
-			}
+			dev = udev_monitor_receive_device(monitor);
+			if (dev != NULL)
+				event_queue_insert(dev);
+			else
+				udev_device_unref(dev);
 		}
 
 		/* rules directory inotify watch */
@@ -855,8 +984,6 @@ int main(int argc, char *argv[])
 			handle_inotify(udev);
 
 handle_signals:
-		signal_received = 0;
-
 		/* rules changed, set by inotify or a HUP signal */
 		if (reload_config) {
 			struct udev_rules *rules_new;
@@ -869,32 +996,41 @@ handle_signals:
 			}
 		}
 
-		if (sigchilds_waiting) {
-			sigchilds_waiting = 0;
-			reap_sigchilds();
-		}
+		/* cleanup killed worker */
+		if (worker_dead) {
+			struct udev_list_node *loop, *tmp;
+
+			udev_list_node_foreach_safe(loop, tmp, &worker_list) {
+				struct worker *worker = node_to_worker(loop);
 
-		if (run_exec_q) {
-			run_exec_q = 0;
-			if (!stop_exec_q)
-				event_queue_manager(udev);
+				if (worker->state != WORKER_DEAD)
+					continue;
+
+				udev_list_node_remove(&worker->node);
+				free(worker);
+			}
+			worker_dead = 0;
 		}
 
 		if (settle_pid > 0) {
 			kill(settle_pid, SIGUSR1);
 			settle_pid = 0;
 		}
+
+		if (!udev_list_is_empty(&event_list))
+			event_queue_manager(udev);
 	}
+
 	udev_queue_export_cleanup(udev_queue_export);
 	rc = 0;
+	killpg(0, SIGTERM);
 exit:
-
 	udev_queue_export_unref(udev_queue_export);
 	udev_rules_unref(rules);
 	udev_ctrl_unref(udev_ctrl);
 	if (inotify_fd >= 0)
 		close(inotify_fd);
-	udev_monitor_unref(kernel_monitor);
+	udev_monitor_unref(monitor);
 	udev_selinux_exit(udev);
 	udev_unref(udev);
 	logging_close();



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [GIT] Experimental threaded udev
  2009-05-28 14:35 [GIT] Experimental threaded udev Alan Jenkins
                   ` (4 preceding siblings ...)
  2009-06-01  2:41 ` Kay Sievers
@ 2009-06-01  9:29 ` Alan Jenkins
  2009-06-01 11:32 ` Kay Sievers
                   ` (20 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Alan Jenkins @ 2009-06-01  9:29 UTC (permalink / raw)
  To: linux-hotplug

Kay Sievers wrote:
> On Fri, 2009-05-29 at 20:11 +0200, Kay Sievers wrote:
>   
>> On Fri, May 29, 2009 at 19:53, Alan Jenkins <alan-jenkins@tuffmail.co.uk> wrote:
>>
>>     
>>>  Maybe just recycling the event processes would bring
>>> similar gains, with less of the risks of threads.
>>>       
>> Yeah, I thought that too, without having tested anything, it could be,
>> that we just want to keep a pipe to the event process, and let the
>> event process send a signal back to the main daemon, that it has
>> handled the event, and it goes to sleep after that. The main daemon
>> can recycle a sleeping event process and push a new event over the
>> pipe to it. If no events are queued anymore, the main daemon just
>> closes the pipe, and the event process will exit.
>>
>> With that model we might be able to reduce the number of fork()s
>> significantly. And we would still have the process separation, it's
>> robustness, and the lock-free behavior for malloc, cloexec and all
>> these issues.
>>     
>
> Here is a rough hack to check how it behaves. It boots my box, nothing
> else I really checked.
>
> It clones the event processes as the current version does, but the event
> process stays around to get re-used as a worker for later events.
> Further messages are send over netlink to the worker processes, and the
> worker process signals its state back with sigqueue() rt-signals. When
> the events have settled, the workers get killed after a few seconds of
> idle time.
>
> The current git version:
>   $ time (udevadm trigger; udevadm settle)
>   real	0m1.566s
>   ...
>
>   $ time /sbin/udevd.orig
>   []
>   user	0m0.420s
>   sys	0m1.412s
>
>
> The thread-version:
>   $ time (udevadm trigger -Snet; udevadm settle)
>   real 0m1.336s
>
>   $ time udev/udevd
>   []
>   user	0m0.310s
>   sys	0m0.679s
>
>
> The worker-version:
>   $ time (udevadm trigger; udevadm settle)
>   real	0m1.171s
>   ...
>
>   $ time udev/udevd
>   []
>   user	0m0.057s
>   sys	0m0.095s
>
>
> The thread- and worker-versions do not create as many COW page-faults in
> the daemon after every cloned event-process, and therefore need much
> less CPU.
>
> At least on the dual-core laptop here, the pool of workers seems to be
> faster than the threads.
>   

Great, you beat me to it.  It makes sense that this would be _bit_ 
faster than threads.  As I say I tried to avoid the page faults caused 
(somehow) by forking the extra thread-stack mappings, but it didn't 
really work out.  I'm surprised by quite how much faster it is though!

I have some thoughts which may help in bringing the code up from "rough 
hack" quality :-).

Aren't signal queues unreliable?  If you exceed the maximum number of 
queued signals, sigqueue will fail with EAGAIN, and I don't think 
there's a blocking version :-).  I think a  pipe would be better, or 
maybe you can do something with netlink.


+                       /* wait for more device messages from udevd */
+                       do
+                               dev = udev_monitor_receive_device(monitor);
+                       while (dev = NULL);

I think this loop should finish when the signal handler sets 
worker_exit?  But maybe you didn't actually install a handler for 
SIGTERM and it's still being reset to the default action:

               /* set signal handlers */
                memset(&act, 0x00, sizeof(act));
                act.sa_handler = event_sig_handler;
 @@ -154,66 +203,135 @@ static void event_fork(struct udev_event *event)
                sigaction(SIGTERM, &act, NULL);



I'm not sure you're handling worker processes crashing, it looks like 
you might leave the event in limbo if you get SIGCHLD for a worker in 
state WORKER_RUNNING.  I'm sure you'll test that though.

Talking of unexpected crashes.  I anticipated using a 
connection-oriented socketpair for passing events.  Using netlink for 
this means the worker threads don't get a nice notification if the 
daemon dies without killing them.  Unless this is covered by udevd being 
the session leader, or something like that?  I'll test this empirically 
- maybe it's not important, but I think it's good to know what will happen.

BTW I had a go at this too, but a couple of my workers fail in about 1 
run out of 20, apparently because calloc() returns NULL without setting 
errno.  I'm sure it's my fault but I'll try to track it down.  Obviously 
I'll let you know if your patch could be affected by the same problem.

Thanks
Alan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [GIT] Experimental threaded udev
  2009-05-28 14:35 [GIT] Experimental threaded udev Alan Jenkins
                   ` (5 preceding siblings ...)
  2009-06-01  9:29 ` Alan Jenkins
@ 2009-06-01 11:32 ` Kay Sievers
  2009-06-01 12:33 ` Kay Sievers
                   ` (19 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Kay Sievers @ 2009-06-01 11:32 UTC (permalink / raw)
  To: linux-hotplug

On Mon, Jun 1, 2009 at 11:29, Alan Jenkins <alan-jenkins@tuffmail.co.uk> wrote:
>> The thread- and worker-versions do not create as many COW page-faults in
>> the daemon after every cloned event-process, and therefore need much
>> less CPU.
>>
>> At least on the dual-core laptop here, the pool of workers seems to be
>> faster than the threads.
>>
>
> Great, you beat me to it.  It makes sense that this would be _bit_ faster
> than threads.  As I say I tried to avoid the page faults caused (somehow) by
> forking the extra thread-stack mappings, but it didn't really work out.  I'm
> surprised by quite how much faster it is though!

Yeah, the 1.2 sec system time looked a bit too scary.

> I have some thoughts which may help in bringing the code up from "rough
> hack" quality :-).

Cool. :)

> Aren't signal queues unreliable?  If you exceed the maximum number of queued
> signals, sigqueue will fail with EAGAIN, and I don't think there's a
> blocking version :-).

Right, it's a rlimit, and I think I checked and remember 40.000+
signals here. The workers could detect, and re-send a non-queued
signal, if needed.

> I think a  pipe would be better

Maybe. I was lazy and tried to avoid file descriptors, in case we get
a really large number of workers to maintain. :)

> or maybe you can do something with netlink.

Right, but then, I guess, we would need to do MSG_PEEK, or something,
to find out if the main daemon received a kernel message or a worker
message, before trying to do a receive_device().

> +                       /* wait for more device messages from udevd */
> +                       do
> +                               dev = udev_monitor_receive_device(monitor);
> +                       while (dev = NULL);
>
> I think this loop should finish when the signal handler sets worker_exit?
>  But maybe you didn't actually install a handler for SIGTERM and it's still
> being reset to the default action:

Yeah, it's not handled with worker_exit. Now, it's not reliable to
kill event processes from something else than the main daemon. The
worker_exit might be nice for valgrind tests though.

> I'm not sure you're handling worker processes crashing, it looks like you
> might leave the event in limbo if you get SIGCHLD for a worker in state
> WORKER_RUNNING.  I'm sure you'll test that though.

Maybe we can set the event to QUEUED, when a worker dies with an event attached.

> Talking of unexpected crashes.  I anticipated using a connection-oriented
> socketpair for passing events.  Using netlink for this means the worker
> threads don't get a nice notification if the daemon dies without killing
> them.

Right, thats not too nice. I guess we have pretty much lost if the
main daemon dies unexpectedly. We need to find out, if we want netlink
and signals, or socketpair()s here, I guess.

> Unless this is covered by udevd being the session leader, or
> something like that?  I'll test this empirically - maybe it's not important,
> but I think it's good to know what will happen.

If the main daemon exits normally, it sends TERM to the entire process group

> BTW I had a go at this too, but a couple of my workers fail in about 1 run
> out of 20, apparently because calloc() returns NULL without setting errno.
>  I'm sure it's my fault but I'll try to track it down.  Obviously I'll let
> you know if your patch could be affected by the same problem.

Oh, strange. Would be good to know what causes this.

Thanks,
Kay

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [GIT] Experimental threaded udev
  2009-05-28 14:35 [GIT] Experimental threaded udev Alan Jenkins
                   ` (6 preceding siblings ...)
  2009-06-01 11:32 ` Kay Sievers
@ 2009-06-01 12:33 ` Kay Sievers
  2009-06-01 13:30 ` Kay Sievers
                   ` (18 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Kay Sievers @ 2009-06-01 12:33 UTC (permalink / raw)
  To: linux-hotplug

[-- Attachment #1: Type: text/plain, Size: 635 bytes --]

On Mon, Jun 1, 2009 at 13:32, Kay Sievers <kay.sievers@vrfy.org> wrote:
> On Mon, Jun 1, 2009 at 11:29, Alan Jenkins <alan-jenkins@tuffmail.co.uk> wrote:

>> I think this loop should finish when the signal handler sets worker_exit?
>>  But maybe you didn't actually install a handler for SIGTERM and it's still
>> being reset to the default action:
>
> Yeah, it's not handled with worker_exit. Now, it's not reliable to
> kill event processes from something else than the main daemon. The
> worker_exit might be nice for valgrind tests though.

Here is version 2 of the patch, with a few things corrected.

Thanks,
Kay

[-- Attachment #2: worker2.patch --]
[-- Type: text/x-patch, Size: 27274 bytes --]

diff --git a/udev/lib/libudev-monitor.c b/udev/lib/libudev-monitor.c
index 395a4d2..54c9576 100644
--- a/udev/lib/libudev-monitor.c
+++ b/udev/lib/libudev-monitor.c
@@ -32,7 +32,6 @@ struct udev_monitor {
 	int refcount;
 	int sock;
 	struct sockaddr_nl snl;
-	struct sockaddr_nl snl_peer;
 	struct sockaddr_un sun;
 	socklen_t addrlen;
 	struct udev_list_node filter_subsystem_list;
@@ -171,8 +170,8 @@ struct udev_monitor *udev_monitor_new_from_netlink(struct udev *udev, const char
 		return NULL;
 
 	if (name == NULL)
-		return NULL;
-	if (strcmp(name, "kernel") == 0)
+		group = 0;
+	else if (strcmp(name, "kernel") == 0)
 		group = UDEV_MONITOR_KERNEL;
 	else if (strcmp(name, "udev") == 0)
 		group = UDEV_MONITOR_UDEV;
@@ -193,8 +192,6 @@ struct udev_monitor *udev_monitor_new_from_netlink(struct udev *udev, const char
 
 	udev_monitor->snl.nl_family = AF_NETLINK;
 	udev_monitor->snl.nl_groups = group;
-	udev_monitor->snl_peer.nl_family = AF_NETLINK;
-	udev_monitor->snl_peer.nl_groups = UDEV_MONITOR_UDEV;
 
 	dbg(udev, "monitor %p created with NETLINK_KOBJECT_UEVENT (%u)\n", udev_monitor, group);
 	return udev_monitor;
@@ -434,7 +431,6 @@ struct udev_device *udev_monitor_receive_device(struct udev_monitor *udev_monito
 	struct iovec iov;
 	char cred_msg[CMSG_SPACE(sizeof(struct ucred))];
 	struct cmsghdr *cmsg;
-	struct sockaddr_nl snl;
 	struct ucred *cred;
 	char buf[8192];
 	ssize_t buflen;
@@ -459,11 +455,6 @@ retry:
 	smsg.msg_control = cred_msg;
 	smsg.msg_controllen = sizeof(cred_msg);
 
-	if (udev_monitor->snl.nl_family != 0) {
-		smsg.msg_name = &snl;
-		smsg.msg_namelen = sizeof(snl);
-	}
-
 	buflen = recvmsg(udev_monitor->sock, &smsg, 0);
 	if (buflen < 0) {
 		if (errno != EINTR)
@@ -476,20 +467,6 @@ retry:
 		return NULL;
 	}
 
-	if (udev_monitor->snl.nl_family != 0) {
-		if (snl.nl_groups == 0) {
-			info(udev_monitor->udev, "unicast netlink message ignored\n");
-			return NULL;
-		}
-		if (snl.nl_groups == UDEV_MONITOR_KERNEL) {
-			if (snl.nl_pid > 0) {
-				info(udev_monitor->udev, "multicast kernel netlink message from pid %d ignored\n", snl.nl_pid);
-				return NULL;
-			}
-			is_kernel = 1;
-		}
-	}
-
 	cmsg = CMSG_FIRSTHDR(&smsg);
 	if (cmsg == NULL || cmsg->cmsg_type != SCM_CREDENTIALS) {
 		info(udev_monitor->udev, "no sender credentials received, message ignored\n");
@@ -621,7 +598,7 @@ retry:
 	return udev_device;
 }
 
-int udev_monitor_send_device(struct udev_monitor *udev_monitor, struct udev_device *udev_device)
+int udev_monitor_send_device(struct udev_monitor *udev_monitor, struct udev_device *udev_device, pid_t pid)
 {
 	struct msghdr smsg;
 	struct iovec iov[2];
@@ -660,6 +637,7 @@ int udev_monitor_send_device(struct udev_monitor *udev_monitor, struct udev_devi
 	} else if (udev_monitor->snl.nl_family != 0) {
 		const char *val;
 		struct udev_monitor_netlink_header nlh;
+		struct sockaddr_nl snl_peer;
 
 
 		/* add versioned header */
@@ -680,11 +658,18 @@ int udev_monitor_send_device(struct udev_monitor *udev_monitor, struct udev_devi
 		iov[1].iov_base = (char *)buf;
 		iov[1].iov_len = blen;
 
+		/* we will always get ECONNREFUSED when sending to the muticast group */
+		memset(&snl_peer, 0x00, sizeof(struct sockaddr_nl));
+		snl_peer.nl_family = AF_NETLINK;
+		if (pid > 0)
+			snl_peer.nl_pid = pid;
+		else
+			snl_peer.nl_groups = UDEV_MONITOR_UDEV;
+
 		memset(&smsg, 0x00, sizeof(struct msghdr));
 		smsg.msg_iov = iov;
 		smsg.msg_iovlen = 2;
-		/* no destination besides the muticast group, we will always get ECONNREFUSED */
-		smsg.msg_name = &udev_monitor->snl_peer;
+		smsg.msg_name = &snl_peer;
 		smsg.msg_namelen = sizeof(struct sockaddr_nl);
 	} else {
 		return -1;
diff --git a/udev/lib/libudev-private.h b/udev/lib/libudev-private.h
index 3eb3d79..3019920 100644
--- a/udev/lib/libudev-private.h
+++ b/udev/lib/libudev-private.h
@@ -86,7 +86,7 @@ int udev_device_delete_db(struct udev_device *udev_device);
 int udev_device_rename_db(struct udev_device *udev_device, const char *devpath);
 
 /* libudev-monitor - netlink/unix socket communication  */
-int udev_monitor_send_device(struct udev_monitor *udev_monitor, struct udev_device *udev_device);
+int udev_monitor_send_device(struct udev_monitor *udev_monitor, struct udev_device *udev_device, pid_t pid);
 int udev_monitor_set_receive_buffer_size(struct udev_monitor *udev_monitor, int size);
 
 /* libudev-ctrl - daemon runtime setup */
diff --git a/udev/udev-event.c b/udev/udev-event.c
index d521251..8ab262a 100644
--- a/udev/udev-event.c
+++ b/udev/udev-event.c
@@ -734,18 +734,13 @@ int udev_event_execute_run(struct udev_event *event)
 			monitor = udev_monitor_new_from_socket(event->udev, &cmd[strlen("socket:")]);
 			if (monitor == NULL)
 				continue;
-			udev_monitor_send_device(monitor, event->dev);
+			udev_monitor_send_device(monitor, event->dev, 0);
 			udev_monitor_unref(monitor);
 		} else {
 			char program[UTIL_PATH_SIZE];
 			char **envp;
 
 			udev_event_apply_format(event, cmd, program, sizeof(program));
-			if (event->trace)
-				fprintf(stderr, "run  %s (%llu) '%s'\n",
-				       udev_device_get_syspath(event->dev),
-				       udev_device_get_seqnum(event->dev),
-				       program);
 			envp = udev_device_get_properties_envp(event->dev);
 			if (util_run_program(event->udev, program, envp, NULL, 0, NULL) != 0) {
 				if (!udev_list_entry_get_flag(list_entry))
diff --git a/udev/udev.h b/udev/udev.h
index 8f2c1c6..ed29c4b 100644
--- a/udev/udev.h
+++ b/udev/udev.h
@@ -53,7 +53,6 @@ static inline void logging_close(void)
 }
 
 struct udev_event {
-	struct udev_list_node node;
 	struct udev *udev;
 	struct udev_device *dev;
 	struct udev_device *dev_parent;
@@ -64,10 +63,6 @@ struct udev_event {
 	uid_t uid;
 	gid_t gid;
 	struct udev_list_node run_list;
-	pid_t pid;
-	int exitstatus;
-	time_t queue_time;
-	unsigned long long int delaying_seqnum;
 	unsigned int group_final:1;
 	unsigned int owner_final:1;
 	unsigned int mode_final:1;
@@ -76,7 +71,6 @@ struct udev_event {
 	unsigned int run_final:1;
 	unsigned int ignore_device:1;
 	unsigned int inotify_watch:1;
-	unsigned int trace:1;
 };
 
 struct udev_watch {
diff --git a/udev/udevd.c b/udev/udevd.c
index 37b547a..a1acb2e 100644
--- a/udev/udevd.c
+++ b/udev/udevd.c
@@ -44,8 +44,7 @@
 #define UDEVD_PRIORITY			-4
 #define UDEV_PRIORITY			-2
 
-/* maximum limit of forked childs */
-#define UDEVD_MAX_CHILDS		256
+#define SIGRT_WORKER			SIGRTMIN+1
 
 static int debug;
 
@@ -61,34 +60,75 @@ static void log_fn(struct udev *udev, int priority,
 	}
 }
 
-static void reap_sigchilds(void);
-
 static int debug_trace;
 static struct udev_rules *rules;
 static struct udev_queue_export *udev_queue_export;
 static struct udev_ctrl *udev_ctrl;
-static struct udev_monitor *kernel_monitor;
-static volatile sig_atomic_t sigchilds_waiting;
+static struct udev_monitor *monitor;
+static pid_t main_pid;
+static volatile sig_atomic_t event_finished;
+static volatile sig_atomic_t worker_dead;
 static volatile sig_atomic_t udev_exit;
 static volatile sig_atomic_t reload_config;
 static volatile sig_atomic_t signal_received;
-static volatile pid_t settle_pid;
-static int run_exec_q;
+static pid_t settle_pid;
 static int stop_exec_q;
 static int max_childs;
-static int childs;
+static volatile int childs;
 static struct udev_list_node event_list;
+static struct udev_list_node worker_list;
+static volatile sig_atomic_t worker_exit;
+
+enum event_state {
+	EVENT_UNDEF,
+	EVENT_QUEUED,
+	EVENT_RUNNING,
+	EVENT_FINISHED,
+};
+
+struct event {
+	struct udev_list_node node;
+	struct udev *udev;
+	struct udev_device *dev;
+	enum event_state state;
+	int exitstatus;
+	unsigned long long int delaying_seqnum;
+};
 
-static struct udev_event *node_to_event(struct udev_list_node *node)
+static struct event *node_to_event(struct udev_list_node *node)
 {
 	char *event;
 
 	event = (char *)node;
-	event -= offsetof(struct udev_event, node);
-	return (struct udev_event *)event;
+	event -= offsetof(struct event, node);
+	return (struct event *)event;
+}
+
+enum worker_state {
+	WORKER_UNDEF,
+	WORKER_RUNNING,
+	WORKER_IDLE,
+	WORKER_KILLED,
+	WORKER_DEAD,
+};
+
+struct worker {
+	struct udev_list_node node;
+	pid_t pid;
+	enum worker_state state;
+	struct event *event;
+};
+
+static struct worker *node_to_worker(struct udev_list_node *node)
+{
+	char *worker;
+
+	worker = (char *)node;
+	worker -= offsetof(struct worker, node);
+	return (struct worker *)worker;
 }
 
-static void event_queue_delete(struct udev_event *event)
+static void event_queue_delete(struct event *event)
 {
 	udev_list_node_remove(&event->node);
 
@@ -99,121 +139,200 @@ static void event_queue_delete(struct udev_event *event)
 		udev_queue_export_device_finished(udev_queue_export, event->dev);
 
 	udev_device_unref(event->dev);
-	udev_event_unref(event);
+	free(event);
 }
 
 static void event_sig_handler(int signum)
 {
-	if (signum == SIGALRM)
+	switch (signum) {
+	case SIGALRM:
 		_exit(1);
+		break;
+	case SIGTERM:
+		worker_exit = 1;
+		break;
+	}
 }
 
-static void event_fork(struct udev_event *event)
+static void worker_new(struct event *event)
 {
+	struct worker *worker;
 	pid_t pid;
 	struct sigaction act;
-	int err;
-
-#if 0
-	/* single process, no forking, just for testing/profiling */
-	err = udev_event_execute_rules(event, rules);
-	if (err == 0 && !event->ignore_device && udev_get_run(event->udev))
-		udev_event_execute_run(event);
-	info(event->udev, "seq %llu exit with %i\n", udev_device_get_seqnum(event->dev), err);
-	event_queue_delete(event);
-	return;
-#endif
+	sigset_t mask;
 
-	if (debug_trace) {
-		event->trace = 1;
-		fprintf(stderr, "fork %s (%llu)\n",
-		       udev_device_get_syspath(event->dev),
-		       udev_device_get_seqnum(event->dev));
-	}
+	worker = calloc(1, sizeof(struct worker));
+	if (worker == NULL)
+		return;
+
+	/* block WORKER signals, until we joined the list with our new pid */
+	sigemptyset(&mask);
+	sigaddset(&mask, SIGRT_WORKER);
+	sigprocmask(SIG_BLOCK, &mask, NULL);
+
+	event->state = EVENT_RUNNING;
 
 	pid = fork();
 	switch (pid) {
-	case 0:
-		/* child */
+	case 0: {
+		struct udev_device *dev;
+
 		udev_queue_export_unref(udev_queue_export);
 		udev_ctrl_unref(udev_ctrl);
+		udev_monitor_unref(monitor);
 		logging_close();
 		logging_init("udevd-event");
 		setpriority(PRIO_PROCESS, 0, UDEV_PRIORITY);
 
+		/* re-open socket to listen to udevd only, and send back libudev events */
+		monitor = udev_monitor_new_from_netlink(event->udev, NULL);
+		if (monitor == NULL)
+			_exit(2);
+		udev_monitor_enable_receiving(monitor);
+
 		/* set signal handlers */
 		memset(&act, 0x00, sizeof(act));
 		act.sa_handler = event_sig_handler;
 		sigemptyset (&act.sa_mask);
 		act.sa_flags = 0;
+		sigaction(SIGTERM, &act, NULL);
 		sigaction(SIGALRM, &act, NULL);
 
 		/* reset to default */
 		act.sa_handler = SIG_DFL;
 		sigaction(SIGINT, &act, NULL);
-		sigaction(SIGTERM, &act, NULL);
 		sigaction(SIGCHLD, &act, NULL);
 		sigaction(SIGHUP, &act, NULL);
+		sigaction(SIGRT_WORKER, &act, NULL);
 
-		/* set timeout to prevent hanging processes */
-		alarm(UDEV_EVENT_TIMEOUT);
+		/* initial device */
+		dev = event->dev;
 
-		/* apply rules, create node, symlinks */
-		err = udev_event_execute_rules(event, rules);
+		while (!worker_exit) {
+			struct udev_event *udev_event;
+			union sigval sigval;
+			int err;
 
-		/* rules may change/disable the timeout */
-		if (udev_device_get_event_timeout(event->dev) >= 0)
-			alarm(udev_device_get_event_timeout(event->dev));
+			udev_event = udev_event_new(dev);
+			if (udev_event == NULL)
+				_exit(3);
 
-		/* execute RUN= */
-		if (err == 0 && !event->ignore_device && udev_get_run(event->udev))
-			udev_event_execute_run(event);
+			/* set timeout to prevent hanging processes */
+			alarm(UDEV_EVENT_TIMEOUT);
 
-		/* apply/restore inotify watch */
-		if (err == 0 && event->inotify_watch) {
-			udev_watch_begin(event->udev, event->dev);
-			udev_device_update_db(event->dev);
-		}
+			/* apply rules, create node, symlinks */
+			err = udev_event_execute_rules(udev_event, rules);
+
+			/* rules may change/disable the timeout */
+			if (udev_device_get_event_timeout(dev) >= 0)
+				alarm(udev_device_get_event_timeout(dev));
+
+			/* execute RUN= */
+			if (err == 0 && !udev_event->ignore_device && udev_get_run(udev_event->udev))
+				udev_event_execute_run(udev_event);
+
+			/* reset alarm */
+			alarm(0);
+
+			/* apply/restore inotify watch */
+			if (err == 0 && udev_event->inotify_watch) {
+				udev_watch_begin(udev_event->udev, dev);
+				udev_device_update_db(dev);
+			}
+
+			/* send processed event back to libudev listeners */
+			udev_monitor_send_device(monitor, dev, 0);
 
-		/* send processed event back to the kernel netlink socket */
-		udev_monitor_send_device(kernel_monitor, event->dev);
+			info(event->udev, "seq %llu finished with %i\n", udev_device_get_seqnum(dev), err);
+			udev_device_unref(dev);
+			udev_event_unref(udev_event);
+
+			/* send back the result of the event execution */
+			sigval.sival_int = err;
+			/* FIXME: handle EAGAIN */
+			sigqueue(main_pid, SIGRT_WORKER, sigval);
+
+			/* wait for more device messages from udevd */
+			do
+				dev = udev_monitor_receive_device(monitor);
+			while (!worker_exit && dev == NULL);
+		}
 
-		info(event->udev, "seq %llu exit with %i\n", udev_device_get_seqnum(event->dev), err);
+		udev_monitor_unref(monitor);
 		logging_close();
-		if (err != 0)
-			exit(1);
 		exit(0);
+	}
 	case -1:
+		event->state = EVENT_QUEUED;
+		free(worker);
 		err(event->udev, "fork of child failed: %m\n");
-		event_queue_delete(event);
 		break;
 	default:
-		/* get SIGCHLD in main loop */
-		info(event->udev, "seq %llu forked, pid [%d], '%s' '%s', %ld seconds old\n",
-		     udev_device_get_seqnum(event->dev),
-		     pid,
-		     udev_device_get_action(event->dev),
-		     udev_device_get_subsystem(event->dev),
-		     time(NULL) - event->queue_time);
-		event->pid = pid;
+		worker->pid = pid;
+		worker->event = event;
+		worker->state = WORKER_RUNNING;
+		udev_list_node_append(&worker->node, &worker_list);
 		childs++;
+		break;
 	}
+
+	sigprocmask(SIG_UNBLOCK, &mask, NULL);
 }
 
-static void event_queue_insert(struct udev_event *event)
+static void event_run(struct event *event)
 {
-	event->queue_time = time(NULL);
+	struct udev_list_node *loop;
 
-	udev_queue_export_device_queued(udev_queue_export, event->dev);
-	info(event->udev, "seq %llu queued, '%s' '%s'\n", udev_device_get_seqnum(event->dev),
-	     udev_device_get_action(event->dev), udev_device_get_subsystem(event->dev));
+	udev_list_node_foreach(loop, &worker_list) {
+		struct worker *worker = node_to_worker(loop);
+		ssize_t count;
 
+		if (worker->state != WORKER_IDLE)
+			continue;
+
+		worker->event = event;
+		worker->state = WORKER_RUNNING;
+		event->state = EVENT_RUNNING;
+		count = udev_monitor_send_device(monitor, event->dev, worker->pid);
+		if (count < 0) {
+			err(event->udev, "worker [%u] did not accept message, kill it\n", worker->pid);
+			event->state = EVENT_QUEUED;
+			worker->state = WORKER_KILLED;
+			kill(worker->pid, SIGKILL);
+			continue;
+		}
+		return;
+	}
+
+	if (childs >= max_childs) {
+		info(event->udev, "maximum number (%i) of childs reached\n", childs);
+		return;
+	}
+
+	/* start new worker and pass initial device */
+	worker_new(event);
+}
+
+static void event_queue_insert(struct udev_device *dev)
+{
+	struct event *event;
+
+	event = calloc(1, sizeof(struct event));
+	if (event == NULL)
+		return;
+
+	event->udev = udev_device_get_udev(dev);
+	event->dev = dev;
+	udev_queue_export_device_queued(udev_queue_export, dev);
+	info(event->udev, "seq %llu queued, '%s' '%s'\n", udev_device_get_seqnum(dev),
+	     udev_device_get_action(dev), udev_device_get_subsystem(dev));
+
+	event->state = EVENT_QUEUED;
 	udev_list_node_append(&event->node, &event_list);
-	run_exec_q = 1;
 
 	/* run all events with a timeout set immediately */
-	if (udev_device_get_timeout(event->dev) > 0) {
-		event_fork(event);
+	if (udev_device_get_timeout(dev) > 0) {
+		worker_new(event);
 		return;
 	}
 }
@@ -265,13 +384,13 @@ static int compare_devpath(const char *running, const char *waiting)
 }
 
 /* lookup event for identical, parent, child device */
-static int devpath_busy(struct udev_event *event)
+static int devpath_busy(struct event *event)
 {
 	struct udev_list_node *loop;
 
 	/* check if queue contains events we depend on */
 	udev_list_node_foreach(loop, &event_list) {
-		struct udev_event *loop_event = node_to_event(loop);
+		struct event *loop_event = node_to_event(loop);
 
 		/* we already found a later event, earlier can not block us, no need to check again */
 		if (udev_device_get_seqnum(loop_event->dev) < event->delaying_seqnum)
@@ -312,42 +431,37 @@ static void event_queue_manager(struct udev *udev)
 	struct udev_list_node *tmp;
 
 start_over:
-	if (udev_list_is_empty(&event_list)) {
-		if (childs > 0) {
-			err(udev, "event list empty, but childs count is %i", childs);
-			childs = 0;
-		}
-		return;
-	}
-
 	udev_list_node_foreach_safe(loop, tmp, &event_list) {
-		struct udev_event *loop_event = node_to_event(loop);
+		struct event *event = node_to_event(loop);
 
-		if (childs >= max_childs) {
-			info(udev, "maximum number (%i) of childs reached\n", childs);
-			break;
+		/* cleanup finished events */
+		if (event->state == EVENT_FINISHED) {
+			event_queue_delete(event);
+			continue;
 		}
 
-		if (loop_event->pid != 0)
+		if (stop_exec_q)
+			continue;
+
+		/* skip running events */
+		if (event->state != EVENT_QUEUED)
 			continue;
 
 		/* do not start event if parent or child event is still running */
-		if (devpath_busy(loop_event) != 0) {
+		if (devpath_busy(event) != 0) {
 			dbg(udev, "delay seq %llu (%s)\n",
-			    udev_device_get_seqnum(loop_event->dev),
-			    udev_device_get_devpath(loop_event->dev));
+			    udev_device_get_seqnum(event->dev),
+			    udev_device_get_devpath(event->dev));
 			continue;
 		}
 
-		event_fork(loop_event);
-		dbg(udev, "moved seq %llu to running list\n", udev_device_get_seqnum(loop_event->dev));
+		event_run(event);
+	}
 
-		/* retry if events finished in the meantime */
-		if (sigchilds_waiting) {
-			sigchilds_waiting = 0;
-			reap_sigchilds();
-			goto start_over;
-		}
+	/* keep the incoming queue small, retry if events finished in the meantime */
+	if (event_finished) {
+		event_finished = 0;
+		goto start_over;
 	}
 }
 
@@ -480,69 +594,64 @@ static int handle_inotify(struct udev *udev)
 	return 0;
 }
 
-static void sig_handler(int signum)
+static void sig_handler(int signum, siginfo_t *info, void *ucontext)
 {
 	switch (signum) {
 		case SIGINT:
 		case SIGTERM:
 			udev_exit = 1;
-			break;
+			return;
 		case SIGCHLD:
-			/* set flag, then write to pipe if needed */
-			sigchilds_waiting = 1;
-			break;
-		case SIGHUP:
-			reload_config = 1;
-			break;
-	}
+			while (1) {
+				pid_t pid;
+				struct udev_list_node *loop;
 
-	signal_received = 1;
-}
+				pid = waitpid(-1, NULL, WNOHANG);
+				if (pid <= 0)
+					break;
 
-static void udev_done(int pid, int exitstatus)
-{
-	struct udev_list_node *loop;
+				udev_list_node_foreach(loop, &worker_list) {
+					struct worker *worker = node_to_worker(loop);
 
-	/* find event associated with pid and delete it */
-	udev_list_node_foreach(loop, &event_list) {
-		struct udev_event *loop_event = node_to_event(loop);
-
-		if (loop_event->pid == pid) {
-			info(loop_event->udev, "seq %llu cleanup, pid [%d], status %i, %ld seconds old\n",
-			     udev_device_get_seqnum(loop_event->dev), loop_event->pid,
-			     exitstatus, time(NULL) - loop_event->queue_time);
-			loop_event->exitstatus = exitstatus;
-			if (debug_trace)
-				fprintf(stderr, "exit %s (%llu)\n",
-				       udev_device_get_syspath(loop_event->dev),
-				       udev_device_get_seqnum(loop_event->dev));
-			event_queue_delete(loop_event);
-			childs--;
-
-			/* there may be dependent events waiting */
-			run_exec_q = 1;
+					if (worker->pid != info->si_pid)
+						continue;
+
+					worker->state = WORKER_DEAD;
+					childs--;
+					break;
+				}
+			}
+			worker_dead = 1;
 			return;
-		}
+		case SIGHUP:
+			signal_received = 1;
+			reload_config = 1;
+			return;
+		default:
+			if (signum == SIGRT_WORKER) {
+				struct udev_list_node *loop;
+
+				/* lookup worker who sent the signal */
+				udev_list_node_foreach(loop, &worker_list) {
+					struct worker *worker = node_to_worker(loop);
+
+					if (worker->pid != info->si_pid)
+						continue;
+
+					/* worker returned */
+					worker->event->exitstatus = info->si_value.sival_int;
+					worker->event->state = EVENT_FINISHED;
+					worker->event = NULL;
+					worker->state = WORKER_IDLE;
+					event_finished = 1;
+					break;
+				}
+				return;
+			}
+		break;
 	}
-}
 
-static void reap_sigchilds(void)
-{
-	pid_t pid;
-	int status;
-
-	while (1) {
-		pid = waitpid(-1, &status, WNOHANG);
-		if (pid <= 0)
-			break;
-		if (WIFEXITED(status))
-			status = WEXITSTATUS(status);
-		else if (WIFSIGNALED(status))
-			status = WTERMSIG(status) + 128;
-		else
-			status = 0;
-		udev_done(pid, status);
-	}
+	signal_received = 1;
 }
 
 static void startup_log(struct udev *udev)
@@ -677,21 +786,24 @@ int main(int argc, char *argv[])
 		goto exit;
 	}
 
-	kernel_monitor = udev_monitor_new_from_netlink(udev, "kernel");
-	if (kernel_monitor == NULL || udev_monitor_enable_receiving(kernel_monitor) < 0) {
+	monitor = udev_monitor_new_from_netlink(udev, "kernel");
+	if (monitor == NULL || udev_monitor_enable_receiving(monitor) < 0) {
 		fprintf(stderr, "error initializing netlink socket\n");
 		err(udev, "error initializing netlink socket\n");
 		rc = 3;
 		goto exit;
 	}
-	udev_monitor_set_receive_buffer_size(kernel_monitor, 128*1024*1024);
+	udev_monitor_set_receive_buffer_size(monitor, 128*1024*1024);
 
 	rules = udev_rules_new(udev, resolve_names);
 	if (rules == NULL) {
 		err(udev, "error reading rules\n");
 		goto exit;
 	}
+
 	udev_list_init(&event_list);
+	udev_list_init(&worker_list);
+
 	udev_queue_export = udev_queue_export_new(udev);
 	if (udev_queue_export == NULL) {
 		err(udev, "error creating queue file\n");
@@ -704,19 +816,19 @@ int main(int argc, char *argv[])
 		pid = fork();
 		switch (pid) {
 		case 0:
-			dbg(udev, "daemonized fork running\n");
 			break;
 		case -1:
 			err(udev, "fork of daemon failed: %m\n");
 			rc = 4;
 			goto exit;
 		default:
-			dbg(udev, "child [%u] running, parent exits\n", pid);
 			rc = 0;
 			goto exit;
 		}
 	}
 
+	main_pid = getpid();
+
 	/* redirect std{out,err} */
 	if (!debug && !debug_trace) {
 		dup2(fd, STDIN_FILENO);
@@ -746,13 +858,14 @@ int main(int argc, char *argv[])
 
 	/* set signal handlers */
 	memset(&act, 0x00, sizeof(struct sigaction));
-	act.sa_handler = sig_handler;
+	act.sa_sigaction = sig_handler;
 	sigemptyset(&act.sa_mask);
-	act.sa_flags = SA_RESTART;
+	act.sa_flags = SA_RESTART | SA_SIGINFO;
 	sigaction(SIGINT, &act, NULL);
 	sigaction(SIGTERM, &act, NULL);
 	sigaction(SIGCHLD, &act, NULL);
 	sigaction(SIGHUP, &act, NULL);
+	sigaction(SIGRT_WORKER, &act, NULL);
 
 	/* watch rules directory */
 	udev_watch_init(udev);
@@ -782,10 +895,11 @@ int main(int argc, char *argv[])
 		max_childs = 1;
 	} else {
 		int memsize = mem_size_mb();
+
 		if (memsize > 0)
 			max_childs = 128 + (memsize / 4);
 		else
-			max_childs = UDEVD_MAX_CHILDS;
+			max_childs = 256;
 	}
 	/* possibly overwrite maximum limit of executed events */
 	value = getenv("UDEVD_MAX_CHILDS");
@@ -797,12 +911,15 @@ int main(int argc, char *argv[])
 		sigset_t blocked_mask, orig_mask;
 		struct pollfd pfd[4];
 		struct pollfd *ctrl_poll, *monitor_poll, *inotify_poll = NULL;
+		const struct timespec ts = { 10, 0 };
+		const struct timespec *timeout = NULL;
 		int nfds = 0;
 		int fdcount;
 
 		sigfillset(&blocked_mask);
 		sigprocmask(SIG_SETMASK, &blocked_mask, &orig_mask);
 		if (signal_received) {
+			signal_received = 0;
 			sigprocmask(SIG_SETMASK, &orig_mask, NULL);
 			goto handle_signals;
 		}
@@ -812,7 +929,7 @@ int main(int argc, char *argv[])
 		ctrl_poll->events = POLLIN;
 
 		monitor_poll = &pfd[nfds++];
-		monitor_poll->fd = udev_monitor_get_fd(kernel_monitor);
+		monitor_poll->fd = udev_monitor_get_fd(monitor);
 		monitor_poll->events = POLLIN;
 
 		if (inotify_fd >= 0) {
@@ -821,15 +938,33 @@ int main(int argc, char *argv[])
 			inotify_poll->events = POLLIN;
 		}
 
-		fdcount = ppoll(pfd, nfds, NULL, &orig_mask);
+		/* set timeout to kill idle workers */
+		if (udev_list_is_empty(&event_list) && !udev_list_is_empty(&worker_list))
+			timeout = &ts;
+
+		fdcount = ppoll(pfd, nfds, timeout, &orig_mask);
 		sigprocmask(SIG_SETMASK, &orig_mask, NULL);
 		if (fdcount < 0) {
 			if (errno == EINTR)
 				goto handle_signals;
-			err(udev, "error in select: %m\n");
 			continue;
 		}
 
+		/* timeout, no events - kill idle workers */
+		if (fdcount == 0) {
+			struct udev_list_node *loop;
+
+			udev_list_node_foreach(loop, &worker_list) {
+				struct worker *worker = node_to_worker(loop);
+
+				if (worker->state != WORKER_IDLE)
+					continue;
+
+				worker->state = WORKER_KILLED;
+				kill(worker->pid, SIGTERM);
+			}
+		}
+
 		/* get control message */
 		if (ctrl_poll->revents & POLLIN)
 			handle_ctrl_msg(udev_ctrl);
@@ -838,16 +973,11 @@ int main(int argc, char *argv[])
 		if (monitor_poll->revents & POLLIN) {
 			struct udev_device *dev;
 
-			dev = udev_monitor_receive_device(kernel_monitor);
-			if (dev != NULL) {
-				struct udev_event *event;
-
-				event = udev_event_new(dev);
-				if (event != NULL)
-					event_queue_insert(event);
-				else
-					udev_device_unref(dev);
-			}
+			dev = udev_monitor_receive_device(monitor);
+			if (dev != NULL)
+				event_queue_insert(dev);
+			else
+				udev_device_unref(dev);
 		}
 
 		/* rules directory inotify watch */
@@ -855,8 +985,6 @@ int main(int argc, char *argv[])
 			handle_inotify(udev);
 
 handle_signals:
-		signal_received = 0;
-
 		/* rules changed, set by inotify or a HUP signal */
 		if (reload_config) {
 			struct udev_rules *rules_new;
@@ -869,32 +997,45 @@ handle_signals:
 			}
 		}
 
-		if (sigchilds_waiting) {
-			sigchilds_waiting = 0;
-			reap_sigchilds();
-		}
+		/* cleanup killed worker */
+		if (worker_dead) {
+			struct udev_list_node *loop, *tmp;
 
-		if (run_exec_q) {
-			run_exec_q = 0;
-			if (!stop_exec_q)
-				event_queue_manager(udev);
+			udev_list_node_foreach_safe(loop, tmp, &worker_list) {
+				struct worker *worker = node_to_worker(loop);
+
+				if (worker->state != WORKER_DEAD)
+					continue;
+
+				/* recycle event, if worker died unexpectedly */
+				if (worker->event != NULL)
+					worker->event->state = EVENT_QUEUED;
+
+				udev_list_node_remove(&worker->node);
+				free(worker);
+			}
+			worker_dead = 0;
 		}
 
 		if (settle_pid > 0) {
 			kill(settle_pid, SIGUSR1);
 			settle_pid = 0;
 		}
+
+		if (!udev_list_is_empty(&event_list))
+			event_queue_manager(udev);
 	}
+
 	udev_queue_export_cleanup(udev_queue_export);
 	rc = 0;
+	killpg(0, SIGTERM);
 exit:
-
 	udev_queue_export_unref(udev_queue_export);
 	udev_rules_unref(rules);
 	udev_ctrl_unref(udev_ctrl);
 	if (inotify_fd >= 0)
 		close(inotify_fd);
-	udev_monitor_unref(kernel_monitor);
+	udev_monitor_unref(monitor);
 	udev_selinux_exit(udev);
 	udev_unref(udev);
 	logging_close();

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [GIT] Experimental threaded udev
  2009-05-28 14:35 [GIT] Experimental threaded udev Alan Jenkins
                   ` (7 preceding siblings ...)
  2009-06-01 12:33 ` Kay Sievers
@ 2009-06-01 13:30 ` Kay Sievers
  2009-06-01 13:46 ` Alan Jenkins
                   ` (17 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Kay Sievers @ 2009-06-01 13:30 UTC (permalink / raw)
  To: linux-hotplug

On Mon, 2009-06-01 at 14:33 +0200, Kay Sievers wrote:

> Here is version 2 of the patch, with a few things corrected.

Pretty weird test, but it seems to survive on my Dual Core 1.5 GHz
laptop:

Create lots of devices:
  $ time (modprobe scsi_debug add_host\x16 max_luns2 num_tgts\x16 num_parts=2; udevadm settle)
  real	11m28.193s

23.000 block devices:
  $ find /sys/class/block/ | wc -l
  23824

72.000 sysfs devices:
  $ find /sys/ -name uevent | wc -l
  72224

We created 262 workers:
  $ ps afx | grep udevd | wc -l
  262

And 10 seconds later, all workers are gone:
  $ ps afx | grep udevd | wc -l
  2

Kay


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [GIT] Experimental threaded udev
  2009-05-28 14:35 [GIT] Experimental threaded udev Alan Jenkins
                   ` (8 preceding siblings ...)
  2009-06-01 13:30 ` Kay Sievers
@ 2009-06-01 13:46 ` Alan Jenkins
  2009-06-01 13:57 ` Kay Sievers
                   ` (16 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Alan Jenkins @ 2009-06-01 13:46 UTC (permalink / raw)
  To: linux-hotplug

Kay Sievers wrote:
> On Mon, Jun 1, 2009 at 11:29, Alan Jenkins <alan-jenkins@tuffmail.co.uk> wrote:
>   
>>> The thread- and worker-versions do not create as many COW page-faults in
>>> the daemon after every cloned event-process, and therefore need much
>>> less CPU.
>>>
>>> At least on the dual-core laptop here, the pool of workers seems to be
>>> faster than the threads.
>>>
>>>       
>> Great, you beat me to it.  It makes sense that this would be _bit_ faster
>> than threads.  As I say I tried to avoid the page faults caused (somehow) by
>> forking the extra thread-stack mappings, but it didn't really work out.  I'm
>> surprised by quite how much faster it is though!
>>     
>
> Yeah, the 1.2 sec system time looked a bit too scary.
>
>   
>> I have some thoughts which may help in bringing the code up from "rough
>> hack" quality :-).
>>     
>
> Cool. :)
>
>   
>> Aren't signal queues unreliable?  If you exceed the maximum number of queued
>> signals, sigqueue will fail with EAGAIN, and I don't think there's a
>> blocking version :-).
>>     
>
> Right, it's a rlimit, and I think I checked and remember 40.000+
> signals here. The workers could detect, and re-send a non-queued
> signal, if needed.
>   

Non-queued signal transmission isn't blocking either.  The signal just 
gets dropped if there's already one in-flight.

>   
>> I think a  pipe would be better
>>     
>
> Maybe. I was lazy and tried to avoid file descriptors, in case we get
> a really large number of workers to maintain. :)
>
>   
>> or maybe you can do something with netlink.
>>     
>
> Right, but then, I guess, we would need to do MSG_PEEK, or something,
> to find out if the main daemon received a kernel message or a worker
> message, before trying to do a receive_device().
>   

Ok.  I meant we could send all the completions down the same pipe, which 
wouldn't require lots of FDs.  Before this patch, we were happily 
sharing kernel_monitor->sock between all the child processes.



>> +                       /* wait for more device messages from udevd */
>> +                       do
>> +                               dev = udev_monitor_receive_device(monitor);
>> +                       while (dev = NULL);
>>
>> I think this loop should finish when the signal handler sets worker_exit?
>>  But maybe you didn't actually install a handler for SIGTERM and it's still
>> being reset to the default action:
>>     
>
> Yeah, it's not handled with worker_exit. Now, it's not reliable to
> kill event processes from something else than the main daemon. The
> worker_exit might be nice for valgrind tests though.
>
>   
>> I'm not sure you're handling worker processes crashing, it looks like you
>> might leave the event in limbo if you get SIGCHLD for a worker in state
>> WORKER_RUNNING.  I'm sure you'll test that though.
>>     
>
> Maybe we can set the event to QUEUED, when a worker dies with an event attached.
>   

Retry the event, you mean?  How many times? :-).

The code looked like you freed the worker, but left the event RUNNING, 
and it would never be released.  I would delete the event instead, just 
like the old system.

I haven't read V2 yet though, maybe you fixed it.

>   
>> Talking of unexpected crashes.  I anticipated using a connection-oriented
>> socketpair for passing events.  Using netlink for this means the worker
>> threads don't get a nice notification if the daemon dies without killing
>> them.
>>     
>
> Right, thats not too nice. I guess we have pretty much lost if the
> main daemon dies unexpectedly. We need to find out, if we want netlink
> and signals, or socketpair()s here, I guess.
>
>   
>> Unless this is covered by udevd being the session leader, or
>> something like that?  I'll test this empirically - maybe it's not important,
>> but I think it's good to know what will happen.
>>     
>
> If the main daemon exits normally, it sends TERM to the entire process group
>
>   
>> BTW I had a go at this too, but a couple of my workers fail in about 1 run
>> out of 20, apparently because calloc() returns NULL without setting errno.
>>  I'm sure it's my fault but I'll try to track it down.  Obviously I'll let
>> you know if your patch could be affected by the same problem.
>>     
>
> Oh, strange. Would be good to know what causes this.
>   
Don't worry about it, it was just my buggy code :-).  I used event->udev 
after udev_event_unref(event).

Thanks
Alan


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [GIT] Experimental threaded udev
  2009-05-28 14:35 [GIT] Experimental threaded udev Alan Jenkins
                   ` (9 preceding siblings ...)
  2009-06-01 13:46 ` Alan Jenkins
@ 2009-06-01 13:57 ` Kay Sievers
  2009-06-01 16:22 ` Kay Sievers
                   ` (15 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Kay Sievers @ 2009-06-01 13:57 UTC (permalink / raw)
  To: linux-hotplug

On Mon, Jun 1, 2009 at 15:46, Alan Jenkins <alan-jenkins@tuffmail.co.uk> wrote:

>> Right, it's a rlimit, and I think I checked and remember 40.000+
>> signals here. The workers could detect, and re-send a non-queued
>> signal, if needed.
>
> Non-queued signal transmission isn't blocking either.  The signal just gets
> dropped if there's already one in-flight.

Oh, I meant we get an error back, if we can not queue the signal, so
we can re-send it. The rt-signals should be fully reliable regarding
that.

> Ok.  I meant we could send all the completions down the same pipe, which
> wouldn't require lots of FDs.  Before this patch, we were happily sharing
> kernel_monitor->sock between all the child processes.

Ah, I see. Yeah, we share the inotify fd still, I think. We could have
a pie, sure, if it's better for some reason. Scott mentioned signalfd,
which we could look at too. Rt-signals might be bit more efficient
than a pipe, we should find that out.

>> Maybe we can set the event to QUEUED, when a worker dies with an event
>> attached.
>>
>
> Retry the event, you mean?  How many times? :-).

Not sure. :)

> The code looked like you freed the worker, but left the event RUNNING, and
> it would never be released.  I would delete the event instead, just like the
> old system.
>
> I haven't read V2 yet though, maybe you fixed it.

I just set it back to QUEUED for now. Not sure if droppin git or
re-trying it a few times would be better.

Thanks,
Kay

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [GIT] Experimental threaded udev
  2009-05-28 14:35 [GIT] Experimental threaded udev Alan Jenkins
                   ` (10 preceding siblings ...)
  2009-06-01 13:57 ` Kay Sievers
@ 2009-06-01 16:22 ` Kay Sievers
  2009-06-01 16:24 ` Alan Jenkins
                   ` (14 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Kay Sievers @ 2009-06-01 16:22 UTC (permalink / raw)
  To: linux-hotplug

[-- Attachment #1: Type: text/plain, Size: 875 bytes --]

On Mon, Jun 1, 2009 at 15:57, Kay Sievers <kay.sievers@vrfy.org> wrote:

>> The code looked like you freed the worker, but left the event RUNNING, and
>> it would never be released.  I would delete the event instead, just like the
>> old system.
>>
>> I haven't read V2 yet though, maybe you fixed it.
>
> I just set it back to QUEUED for now. Not sure if droppin git or
> re-trying it a few times would be better.

Version 3, which should clean up events with a worker that died. Also
kills all workers if the config has changed.

I guess we should find out if we go for rtsignals, a pipe and/or
signalfd. Any ideas?

Would be good to start with something in a repository, instead of
sending huge patches, I guess.

It's so much faster than the current clone per event, seems we want to
have something like that in a released version.

Thanks,
Kay

[-- Attachment #2: worker3.patch --]
[-- Type: text/x-patch, Size: 27616 bytes --]

diff --git a/udev/lib/libudev-monitor.c b/udev/lib/libudev-monitor.c
index 395a4d2..54c9576 100644
--- a/udev/lib/libudev-monitor.c
+++ b/udev/lib/libudev-monitor.c
@@ -32,7 +32,6 @@ struct udev_monitor {
 	int refcount;
 	int sock;
 	struct sockaddr_nl snl;
-	struct sockaddr_nl snl_peer;
 	struct sockaddr_un sun;
 	socklen_t addrlen;
 	struct udev_list_node filter_subsystem_list;
@@ -171,8 +170,8 @@ struct udev_monitor *udev_monitor_new_from_netlink(struct udev *udev, const char
 		return NULL;
 
 	if (name == NULL)
-		return NULL;
-	if (strcmp(name, "kernel") == 0)
+		group = 0;
+	else if (strcmp(name, "kernel") == 0)
 		group = UDEV_MONITOR_KERNEL;
 	else if (strcmp(name, "udev") == 0)
 		group = UDEV_MONITOR_UDEV;
@@ -193,8 +192,6 @@ struct udev_monitor *udev_monitor_new_from_netlink(struct udev *udev, const char
 
 	udev_monitor->snl.nl_family = AF_NETLINK;
 	udev_monitor->snl.nl_groups = group;
-	udev_monitor->snl_peer.nl_family = AF_NETLINK;
-	udev_monitor->snl_peer.nl_groups = UDEV_MONITOR_UDEV;
 
 	dbg(udev, "monitor %p created with NETLINK_KOBJECT_UEVENT (%u)\n", udev_monitor, group);
 	return udev_monitor;
@@ -434,7 +431,6 @@ struct udev_device *udev_monitor_receive_device(struct udev_monitor *udev_monito
 	struct iovec iov;
 	char cred_msg[CMSG_SPACE(sizeof(struct ucred))];
 	struct cmsghdr *cmsg;
-	struct sockaddr_nl snl;
 	struct ucred *cred;
 	char buf[8192];
 	ssize_t buflen;
@@ -459,11 +455,6 @@ retry:
 	smsg.msg_control = cred_msg;
 	smsg.msg_controllen = sizeof(cred_msg);
 
-	if (udev_monitor->snl.nl_family != 0) {
-		smsg.msg_name = &snl;
-		smsg.msg_namelen = sizeof(snl);
-	}
-
 	buflen = recvmsg(udev_monitor->sock, &smsg, 0);
 	if (buflen < 0) {
 		if (errno != EINTR)
@@ -476,20 +467,6 @@ retry:
 		return NULL;
 	}
 
-	if (udev_monitor->snl.nl_family != 0) {
-		if (snl.nl_groups == 0) {
-			info(udev_monitor->udev, "unicast netlink message ignored\n");
-			return NULL;
-		}
-		if (snl.nl_groups == UDEV_MONITOR_KERNEL) {
-			if (snl.nl_pid > 0) {
-				info(udev_monitor->udev, "multicast kernel netlink message from pid %d ignored\n", snl.nl_pid);
-				return NULL;
-			}
-			is_kernel = 1;
-		}
-	}
-
 	cmsg = CMSG_FIRSTHDR(&smsg);
 	if (cmsg == NULL || cmsg->cmsg_type != SCM_CREDENTIALS) {
 		info(udev_monitor->udev, "no sender credentials received, message ignored\n");
@@ -621,7 +598,7 @@ retry:
 	return udev_device;
 }
 
-int udev_monitor_send_device(struct udev_monitor *udev_monitor, struct udev_device *udev_device)
+int udev_monitor_send_device(struct udev_monitor *udev_monitor, struct udev_device *udev_device, pid_t pid)
 {
 	struct msghdr smsg;
 	struct iovec iov[2];
@@ -660,6 +637,7 @@ int udev_monitor_send_device(struct udev_monitor *udev_monitor, struct udev_devi
 	} else if (udev_monitor->snl.nl_family != 0) {
 		const char *val;
 		struct udev_monitor_netlink_header nlh;
+		struct sockaddr_nl snl_peer;
 
 
 		/* add versioned header */
@@ -680,11 +658,18 @@ int udev_monitor_send_device(struct udev_monitor *udev_monitor, struct udev_devi
 		iov[1].iov_base = (char *)buf;
 		iov[1].iov_len = blen;
 
+		/* we will always get ECONNREFUSED when sending to the muticast group */
+		memset(&snl_peer, 0x00, sizeof(struct sockaddr_nl));
+		snl_peer.nl_family = AF_NETLINK;
+		if (pid > 0)
+			snl_peer.nl_pid = pid;
+		else
+			snl_peer.nl_groups = UDEV_MONITOR_UDEV;
+
 		memset(&smsg, 0x00, sizeof(struct msghdr));
 		smsg.msg_iov = iov;
 		smsg.msg_iovlen = 2;
-		/* no destination besides the muticast group, we will always get ECONNREFUSED */
-		smsg.msg_name = &udev_monitor->snl_peer;
+		smsg.msg_name = &snl_peer;
 		smsg.msg_namelen = sizeof(struct sockaddr_nl);
 	} else {
 		return -1;
diff --git a/udev/lib/libudev-private.h b/udev/lib/libudev-private.h
index 3eb3d79..3019920 100644
--- a/udev/lib/libudev-private.h
+++ b/udev/lib/libudev-private.h
@@ -86,7 +86,7 @@ int udev_device_delete_db(struct udev_device *udev_device);
 int udev_device_rename_db(struct udev_device *udev_device, const char *devpath);
 
 /* libudev-monitor - netlink/unix socket communication  */
-int udev_monitor_send_device(struct udev_monitor *udev_monitor, struct udev_device *udev_device);
+int udev_monitor_send_device(struct udev_monitor *udev_monitor, struct udev_device *udev_device, pid_t pid);
 int udev_monitor_set_receive_buffer_size(struct udev_monitor *udev_monitor, int size);
 
 /* libudev-ctrl - daemon runtime setup */
diff --git a/udev/udev-event.c b/udev/udev-event.c
index d521251..8ab262a 100644
--- a/udev/udev-event.c
+++ b/udev/udev-event.c
@@ -734,18 +734,13 @@ int udev_event_execute_run(struct udev_event *event)
 			monitor = udev_monitor_new_from_socket(event->udev, &cmd[strlen("socket:")]);
 			if (monitor == NULL)
 				continue;
-			udev_monitor_send_device(monitor, event->dev);
+			udev_monitor_send_device(monitor, event->dev, 0);
 			udev_monitor_unref(monitor);
 		} else {
 			char program[UTIL_PATH_SIZE];
 			char **envp;
 
 			udev_event_apply_format(event, cmd, program, sizeof(program));
-			if (event->trace)
-				fprintf(stderr, "run  %s (%llu) '%s'\n",
-				       udev_device_get_syspath(event->dev),
-				       udev_device_get_seqnum(event->dev),
-				       program);
 			envp = udev_device_get_properties_envp(event->dev);
 			if (util_run_program(event->udev, program, envp, NULL, 0, NULL) != 0) {
 				if (!udev_list_entry_get_flag(list_entry))
diff --git a/udev/udev.h b/udev/udev.h
index 8f2c1c6..ed29c4b 100644
--- a/udev/udev.h
+++ b/udev/udev.h
@@ -53,7 +53,6 @@ static inline void logging_close(void)
 }
 
 struct udev_event {
-	struct udev_list_node node;
 	struct udev *udev;
 	struct udev_device *dev;
 	struct udev_device *dev_parent;
@@ -64,10 +63,6 @@ struct udev_event {
 	uid_t uid;
 	gid_t gid;
 	struct udev_list_node run_list;
-	pid_t pid;
-	int exitstatus;
-	time_t queue_time;
-	unsigned long long int delaying_seqnum;
 	unsigned int group_final:1;
 	unsigned int owner_final:1;
 	unsigned int mode_final:1;
@@ -76,7 +71,6 @@ struct udev_event {
 	unsigned int run_final:1;
 	unsigned int ignore_device:1;
 	unsigned int inotify_watch:1;
-	unsigned int trace:1;
 };
 
 struct udev_watch {
diff --git a/udev/udevd.c b/udev/udevd.c
index 37b547a..8890ee0 100644
--- a/udev/udevd.c
+++ b/udev/udevd.c
@@ -44,8 +44,7 @@
 #define UDEVD_PRIORITY			-4
 #define UDEV_PRIORITY			-2
 
-/* maximum limit of forked childs */
-#define UDEVD_MAX_CHILDS		256
+#define SIGRT_WORKER			SIGRTMIN+1
 
 static int debug;
 
@@ -61,34 +60,75 @@ static void log_fn(struct udev *udev, int priority,
 	}
 }
 
-static void reap_sigchilds(void);
-
 static int debug_trace;
 static struct udev_rules *rules;
 static struct udev_queue_export *udev_queue_export;
 static struct udev_ctrl *udev_ctrl;
-static struct udev_monitor *kernel_monitor;
-static volatile sig_atomic_t sigchilds_waiting;
+static struct udev_monitor *monitor;
+static pid_t main_pid;
+static volatile sig_atomic_t event_finished;
+static volatile sig_atomic_t worker_dead;
 static volatile sig_atomic_t udev_exit;
 static volatile sig_atomic_t reload_config;
 static volatile sig_atomic_t signal_received;
-static volatile pid_t settle_pid;
-static int run_exec_q;
+static pid_t settle_pid;
 static int stop_exec_q;
 static int max_childs;
-static int childs;
+static volatile int childs;
 static struct udev_list_node event_list;
+static struct udev_list_node worker_list;
+static volatile sig_atomic_t worker_exit;
+
+enum event_state {
+	EVENT_UNDEF,
+	EVENT_QUEUED,
+	EVENT_RUNNING,
+	EVENT_FINISHED,
+};
+
+struct event {
+	struct udev_list_node node;
+	struct udev *udev;
+	struct udev_device *dev;
+	enum event_state state;
+	int exitstatus;
+	unsigned long long int delaying_seqnum;
+};
 
-static struct udev_event *node_to_event(struct udev_list_node *node)
+static struct event *node_to_event(struct udev_list_node *node)
 {
 	char *event;
 
 	event = (char *)node;
-	event -= offsetof(struct udev_event, node);
-	return (struct udev_event *)event;
+	event -= offsetof(struct event, node);
+	return (struct event *)event;
+}
+
+enum worker_state {
+	WORKER_UNDEF,
+	WORKER_RUNNING,
+	WORKER_IDLE,
+	WORKER_KILLED,
+	WORKER_DEAD,
+};
+
+struct worker {
+	struct udev_list_node node;
+	pid_t pid;
+	enum worker_state state;
+	struct event *event;
+};
+
+static struct worker *node_to_worker(struct udev_list_node *node)
+{
+	char *worker;
+
+	worker = (char *)node;
+	worker -= offsetof(struct worker, node);
+	return (struct worker *)worker;
 }
 
-static void event_queue_delete(struct udev_event *event)
+static void event_queue_delete(struct event *event)
 {
 	udev_list_node_remove(&event->node);
 
@@ -99,125 +139,219 @@ static void event_queue_delete(struct udev_event *event)
 		udev_queue_export_device_finished(udev_queue_export, event->dev);
 
 	udev_device_unref(event->dev);
-	udev_event_unref(event);
+	free(event);
 }
 
 static void event_sig_handler(int signum)
 {
-	if (signum == SIGALRM)
+	switch (signum) {
+	case SIGALRM:
 		_exit(1);
+		break;
+	case SIGTERM:
+		worker_exit = 1;
+		break;
+	}
 }
 
-static void event_fork(struct udev_event *event)
+static void worker_new(struct event *event)
 {
+	struct worker *worker;
 	pid_t pid;
 	struct sigaction act;
-	int err;
-
-#if 0
-	/* single process, no forking, just for testing/profiling */
-	err = udev_event_execute_rules(event, rules);
-	if (err == 0 && !event->ignore_device && udev_get_run(event->udev))
-		udev_event_execute_run(event);
-	info(event->udev, "seq %llu exit with %i\n", udev_device_get_seqnum(event->dev), err);
-	event_queue_delete(event);
-	return;
-#endif
+	sigset_t mask;
 
-	if (debug_trace) {
-		event->trace = 1;
-		fprintf(stderr, "fork %s (%llu)\n",
-		       udev_device_get_syspath(event->dev),
-		       udev_device_get_seqnum(event->dev));
-	}
+	worker = calloc(1, sizeof(struct worker));
+	if (worker == NULL)
+		return;
+
+	/* block WORKER signals, until we joined the list with our new pid */
+	sigemptyset(&mask);
+	sigaddset(&mask, SIGRT_WORKER);
+	sigprocmask(SIG_BLOCK, &mask, NULL);
+
+	event->state = EVENT_RUNNING;
 
 	pid = fork();
 	switch (pid) {
-	case 0:
-		/* child */
+	case 0: {
+		struct udev_device *dev;
+
 		udev_queue_export_unref(udev_queue_export);
 		udev_ctrl_unref(udev_ctrl);
+		udev_monitor_unref(monitor);
 		logging_close();
 		logging_init("udevd-event");
 		setpriority(PRIO_PROCESS, 0, UDEV_PRIORITY);
 
+		/* re-open socket to listen to udevd only, and send back libudev events */
+		monitor = udev_monitor_new_from_netlink(event->udev, NULL);
+		if (monitor == NULL)
+			_exit(2);
+		udev_monitor_enable_receiving(monitor);
+
 		/* set signal handlers */
 		memset(&act, 0x00, sizeof(act));
 		act.sa_handler = event_sig_handler;
 		sigemptyset (&act.sa_mask);
 		act.sa_flags = 0;
+		sigaction(SIGTERM, &act, NULL);
 		sigaction(SIGALRM, &act, NULL);
 
 		/* reset to default */
 		act.sa_handler = SIG_DFL;
 		sigaction(SIGINT, &act, NULL);
-		sigaction(SIGTERM, &act, NULL);
 		sigaction(SIGCHLD, &act, NULL);
 		sigaction(SIGHUP, &act, NULL);
+		sigaction(SIGRT_WORKER, &act, NULL);
 
-		/* set timeout to prevent hanging processes */
-		alarm(UDEV_EVENT_TIMEOUT);
+		/* initial device */
+		dev = event->dev;
 
-		/* apply rules, create node, symlinks */
-		err = udev_event_execute_rules(event, rules);
+		while (!worker_exit) {
+			struct udev_event *udev_event;
+			union sigval sigval;
+			int err;
 
-		/* rules may change/disable the timeout */
-		if (udev_device_get_event_timeout(event->dev) >= 0)
-			alarm(udev_device_get_event_timeout(event->dev));
+			udev_event = udev_event_new(dev);
+			if (udev_event == NULL)
+				_exit(3);
 
-		/* execute RUN= */
-		if (err == 0 && !event->ignore_device && udev_get_run(event->udev))
-			udev_event_execute_run(event);
+			/* set timeout to prevent hanging processes */
+			alarm(UDEV_EVENT_TIMEOUT);
 
-		/* apply/restore inotify watch */
-		if (err == 0 && event->inotify_watch) {
-			udev_watch_begin(event->udev, event->dev);
-			udev_device_update_db(event->dev);
-		}
+			/* apply rules, create node, symlinks */
+			err = udev_event_execute_rules(udev_event, rules);
+
+			/* rules may change/disable the timeout */
+			if (udev_device_get_event_timeout(dev) >= 0)
+				alarm(udev_device_get_event_timeout(dev));
+
+			/* execute RUN= */
+			if (err == 0 && !udev_event->ignore_device && udev_get_run(udev_event->udev))
+				udev_event_execute_run(udev_event);
+
+			/* reset alarm */
+			alarm(0);
+
+			/* apply/restore inotify watch */
+			if (err == 0 && udev_event->inotify_watch) {
+				udev_watch_begin(udev_event->udev, dev);
+				udev_device_update_db(dev);
+			}
+
+			/* send processed event back to libudev listeners */
+			udev_monitor_send_device(monitor, dev, 0);
+
+			info(event->udev, "seq %llu finished with %i\n", udev_device_get_seqnum(dev), err);
+			udev_device_unref(dev);
+			udev_event_unref(udev_event);
 
-		/* send processed event back to the kernel netlink socket */
-		udev_monitor_send_device(kernel_monitor, event->dev);
+			/* send back the result of the event execution */
+			sigval.sival_int = err;
+			/* FIXME: handle EAGAIN */
+			sigqueue(main_pid, SIGRT_WORKER, sigval);
 
-		info(event->udev, "seq %llu exit with %i\n", udev_device_get_seqnum(event->dev), err);
+			/* wait for more device messages from udevd */
+			do
+				dev = udev_monitor_receive_device(monitor);
+			while (!worker_exit && dev == NULL);
+		}
+
+		udev_monitor_unref(monitor);
 		logging_close();
-		if (err != 0)
-			exit(1);
 		exit(0);
+	}
 	case -1:
+		event->state = EVENT_QUEUED;
+		free(worker);
 		err(event->udev, "fork of child failed: %m\n");
-		event_queue_delete(event);
 		break;
 	default:
-		/* get SIGCHLD in main loop */
-		info(event->udev, "seq %llu forked, pid [%d], '%s' '%s', %ld seconds old\n",
-		     udev_device_get_seqnum(event->dev),
-		     pid,
-		     udev_device_get_action(event->dev),
-		     udev_device_get_subsystem(event->dev),
-		     time(NULL) - event->queue_time);
-		event->pid = pid;
+		worker->pid = pid;
+		worker->event = event;
+		worker->state = WORKER_RUNNING;
+		udev_list_node_append(&worker->node, &worker_list);
 		childs++;
+		break;
+	}
+
+	sigprocmask(SIG_UNBLOCK, &mask, NULL);
+}
+
+static void event_run(struct event *event)
+{
+	struct udev_list_node *loop;
+
+	udev_list_node_foreach(loop, &worker_list) {
+		struct worker *worker = node_to_worker(loop);
+		ssize_t count;
+
+		if (worker->state != WORKER_IDLE)
+			continue;
+
+		worker->event = event;
+		worker->state = WORKER_RUNNING;
+		event->state = EVENT_RUNNING;
+		count = udev_monitor_send_device(monitor, event->dev, worker->pid);
+		if (count < 0) {
+			err(event->udev, "worker [%u] did not accept message, kill it\n", worker->pid);
+			event->state = EVENT_QUEUED;
+			worker->state = WORKER_KILLED;
+			kill(worker->pid, SIGKILL);
+			continue;
+		}
+		return;
+	}
+
+	if (childs >= max_childs) {
+		info(event->udev, "maximum number (%i) of childs reached\n", childs);
+		return;
 	}
+
+	/* start new worker and pass initial device */
+	worker_new(event);
 }
 
-static void event_queue_insert(struct udev_event *event)
+static void event_queue_insert(struct udev_device *dev)
 {
-	event->queue_time = time(NULL);
+	struct event *event;
+
+	event = calloc(1, sizeof(struct event));
+	if (event == NULL)
+		return;
 
-	udev_queue_export_device_queued(udev_queue_export, event->dev);
-	info(event->udev, "seq %llu queued, '%s' '%s'\n", udev_device_get_seqnum(event->dev),
-	     udev_device_get_action(event->dev), udev_device_get_subsystem(event->dev));
+	event->udev = udev_device_get_udev(dev);
+	event->dev = dev;
+	udev_queue_export_device_queued(udev_queue_export, dev);
+	info(event->udev, "seq %llu queued, '%s' '%s'\n", udev_device_get_seqnum(dev),
+	     udev_device_get_action(dev), udev_device_get_subsystem(dev));
 
+	event->state = EVENT_QUEUED;
 	udev_list_node_append(&event->node, &event_list);
-	run_exec_q = 1;
 
 	/* run all events with a timeout set immediately */
-	if (udev_device_get_timeout(event->dev) > 0) {
-		event_fork(event);
+	if (udev_device_get_timeout(dev) > 0) {
+		worker_new(event);
 		return;
 	}
 }
 
+static void worker_kill_idle(void)
+{
+	struct udev_list_node *loop;
+
+	udev_list_node_foreach(loop, &worker_list) {
+		struct worker *worker = node_to_worker(loop);
+
+		if (worker->state != WORKER_IDLE)
+			continue;
+
+		worker->state = WORKER_KILLED;
+		kill(worker->pid, SIGTERM);
+	}
+}
+
 static int mem_size_mb(void)
 {
 	FILE *f;
@@ -265,13 +399,13 @@ static int compare_devpath(const char *running, const char *waiting)
 }
 
 /* lookup event for identical, parent, child device */
-static int devpath_busy(struct udev_event *event)
+static int devpath_busy(struct event *event)
 {
 	struct udev_list_node *loop;
 
 	/* check if queue contains events we depend on */
 	udev_list_node_foreach(loop, &event_list) {
-		struct udev_event *loop_event = node_to_event(loop);
+		struct event *loop_event = node_to_event(loop);
 
 		/* we already found a later event, earlier can not block us, no need to check again */
 		if (udev_device_get_seqnum(loop_event->dev) < event->delaying_seqnum)
@@ -312,42 +446,37 @@ static void event_queue_manager(struct udev *udev)
 	struct udev_list_node *tmp;
 
 start_over:
-	if (udev_list_is_empty(&event_list)) {
-		if (childs > 0) {
-			err(udev, "event list empty, but childs count is %i", childs);
-			childs = 0;
-		}
-		return;
-	}
-
 	udev_list_node_foreach_safe(loop, tmp, &event_list) {
-		struct udev_event *loop_event = node_to_event(loop);
+		struct event *event = node_to_event(loop);
 
-		if (childs >= max_childs) {
-			info(udev, "maximum number (%i) of childs reached\n", childs);
-			break;
+		/* cleanup finished events */
+		if (event->state == EVENT_FINISHED) {
+			event_queue_delete(event);
+			continue;
 		}
 
-		if (loop_event->pid != 0)
+		if (stop_exec_q)
+			continue;
+
+		/* skip running events */
+		if (event->state != EVENT_QUEUED)
 			continue;
 
 		/* do not start event if parent or child event is still running */
-		if (devpath_busy(loop_event) != 0) {
+		if (devpath_busy(event) != 0) {
 			dbg(udev, "delay seq %llu (%s)\n",
-			    udev_device_get_seqnum(loop_event->dev),
-			    udev_device_get_devpath(loop_event->dev));
+			    udev_device_get_seqnum(event->dev),
+			    udev_device_get_devpath(event->dev));
 			continue;
 		}
 
-		event_fork(loop_event);
-		dbg(udev, "moved seq %llu to running list\n", udev_device_get_seqnum(loop_event->dev));
+		event_run(event);
+	}
 
-		/* retry if events finished in the meantime */
-		if (sigchilds_waiting) {
-			sigchilds_waiting = 0;
-			reap_sigchilds();
-			goto start_over;
-		}
+	/* keep the incoming queue small, retry if events finished in the meantime */
+	if (event_finished) {
+		event_finished = 0;
+		goto start_over;
 	}
 }
 
@@ -480,69 +609,70 @@ static int handle_inotify(struct udev *udev)
 	return 0;
 }
 
-static void sig_handler(int signum)
+static void sig_handler(int signum, siginfo_t *info, void *ucontext)
 {
 	switch (signum) {
 		case SIGINT:
 		case SIGTERM:
 			udev_exit = 1;
-			break;
+			return;
 		case SIGCHLD:
-			/* set flag, then write to pipe if needed */
-			sigchilds_waiting = 1;
-			break;
-		case SIGHUP:
-			reload_config = 1;
-			break;
-	}
+			while (1) {
+				pid_t pid;
+				struct udev_list_node *loop;
 
-	signal_received = 1;
-}
+				pid = waitpid(-1, NULL, WNOHANG);
+				if (pid <= 0)
+					break;
 
-static void udev_done(int pid, int exitstatus)
-{
-	struct udev_list_node *loop;
+				udev_list_node_foreach(loop, &worker_list) {
+					struct worker *worker = node_to_worker(loop);
 
-	/* find event associated with pid and delete it */
-	udev_list_node_foreach(loop, &event_list) {
-		struct udev_event *loop_event = node_to_event(loop);
-
-		if (loop_event->pid == pid) {
-			info(loop_event->udev, "seq %llu cleanup, pid [%d], status %i, %ld seconds old\n",
-			     udev_device_get_seqnum(loop_event->dev), loop_event->pid,
-			     exitstatus, time(NULL) - loop_event->queue_time);
-			loop_event->exitstatus = exitstatus;
-			if (debug_trace)
-				fprintf(stderr, "exit %s (%llu)\n",
-				       udev_device_get_syspath(loop_event->dev),
-				       udev_device_get_seqnum(loop_event->dev));
-			event_queue_delete(loop_event);
-			childs--;
-
-			/* there may be dependent events waiting */
-			run_exec_q = 1;
-			return;
-		}
-	}
-}
+					if (worker->pid != pid)
+						continue;
 
-static void reap_sigchilds(void)
-{
-	pid_t pid;
-	int status;
+					/* finish event, if worker died unexpectedly */
+					if (worker->event != NULL) {
+						worker->event->exitstatus = 1;
+						worker->event->state = EVENT_FINISHED;
+					}
 
-	while (1) {
-		pid = waitpid(-1, &status, WNOHANG);
-		if (pid <= 0)
-			break;
-		if (WIFEXITED(status))
-			status = WEXITSTATUS(status);
-		else if (WIFSIGNALED(status))
-			status = WTERMSIG(status) + 128;
-		else
-			status = 0;
-		udev_done(pid, status);
+					worker->state = WORKER_DEAD;
+					childs--;
+					break;
+				}
+			}
+			worker_dead = 1;
+			return;
+		case SIGHUP:
+			signal_received = 1;
+			reload_config = 1;
+			return;
+		default:
+			if (signum == SIGRT_WORKER) {
+				struct udev_list_node *loop;
+
+				/* lookup worker who sent the signal */
+				udev_list_node_foreach(loop, &worker_list) {
+					struct worker *worker = node_to_worker(loop);
+
+					if (worker->pid != info->si_pid)
+						continue;
+
+					/* worker returned */
+					worker->event->exitstatus = info->si_value.sival_int;
+					worker->event->state = EVENT_FINISHED;
+					worker->event = NULL;
+					worker->state = WORKER_IDLE;
+					event_finished = 1;
+					break;
+				}
+				return;
+			}
+		break;
 	}
+
+	signal_received = 1;
 }
 
 static void startup_log(struct udev *udev)
@@ -677,21 +807,24 @@ int main(int argc, char *argv[])
 		goto exit;
 	}
 
-	kernel_monitor = udev_monitor_new_from_netlink(udev, "kernel");
-	if (kernel_monitor == NULL || udev_monitor_enable_receiving(kernel_monitor) < 0) {
+	monitor = udev_monitor_new_from_netlink(udev, "kernel");
+	if (monitor == NULL || udev_monitor_enable_receiving(monitor) < 0) {
 		fprintf(stderr, "error initializing netlink socket\n");
 		err(udev, "error initializing netlink socket\n");
 		rc = 3;
 		goto exit;
 	}
-	udev_monitor_set_receive_buffer_size(kernel_monitor, 128*1024*1024);
+	udev_monitor_set_receive_buffer_size(monitor, 128*1024*1024);
 
 	rules = udev_rules_new(udev, resolve_names);
 	if (rules == NULL) {
 		err(udev, "error reading rules\n");
 		goto exit;
 	}
+
 	udev_list_init(&event_list);
+	udev_list_init(&worker_list);
+
 	udev_queue_export = udev_queue_export_new(udev);
 	if (udev_queue_export == NULL) {
 		err(udev, "error creating queue file\n");
@@ -704,19 +837,19 @@ int main(int argc, char *argv[])
 		pid = fork();
 		switch (pid) {
 		case 0:
-			dbg(udev, "daemonized fork running\n");
 			break;
 		case -1:
 			err(udev, "fork of daemon failed: %m\n");
 			rc = 4;
 			goto exit;
 		default:
-			dbg(udev, "child [%u] running, parent exits\n", pid);
 			rc = 0;
 			goto exit;
 		}
 	}
 
+	main_pid = getpid();
+
 	/* redirect std{out,err} */
 	if (!debug && !debug_trace) {
 		dup2(fd, STDIN_FILENO);
@@ -746,13 +879,14 @@ int main(int argc, char *argv[])
 
 	/* set signal handlers */
 	memset(&act, 0x00, sizeof(struct sigaction));
-	act.sa_handler = sig_handler;
+	act.sa_sigaction = sig_handler;
 	sigemptyset(&act.sa_mask);
-	act.sa_flags = SA_RESTART;
+	act.sa_flags = SA_RESTART | SA_SIGINFO;
 	sigaction(SIGINT, &act, NULL);
 	sigaction(SIGTERM, &act, NULL);
 	sigaction(SIGCHLD, &act, NULL);
 	sigaction(SIGHUP, &act, NULL);
+	sigaction(SIGRT_WORKER, &act, NULL);
 
 	/* watch rules directory */
 	udev_watch_init(udev);
@@ -782,10 +916,11 @@ int main(int argc, char *argv[])
 		max_childs = 1;
 	} else {
 		int memsize = mem_size_mb();
+
 		if (memsize > 0)
 			max_childs = 128 + (memsize / 4);
 		else
-			max_childs = UDEVD_MAX_CHILDS;
+			max_childs = 256;
 	}
 	/* possibly overwrite maximum limit of executed events */
 	value = getenv("UDEVD_MAX_CHILDS");
@@ -797,12 +932,15 @@ int main(int argc, char *argv[])
 		sigset_t blocked_mask, orig_mask;
 		struct pollfd pfd[4];
 		struct pollfd *ctrl_poll, *monitor_poll, *inotify_poll = NULL;
+		const struct timespec ts = { 10, 0 };
+		const struct timespec *timeout;
 		int nfds = 0;
 		int fdcount;
 
 		sigfillset(&blocked_mask);
 		sigprocmask(SIG_SETMASK, &blocked_mask, &orig_mask);
 		if (signal_received) {
+			signal_received = 0;
 			sigprocmask(SIG_SETMASK, &orig_mask, NULL);
 			goto handle_signals;
 		}
@@ -812,7 +950,7 @@ int main(int argc, char *argv[])
 		ctrl_poll->events = POLLIN;
 
 		monitor_poll = &pfd[nfds++];
-		monitor_poll->fd = udev_monitor_get_fd(kernel_monitor);
+		monitor_poll->fd = udev_monitor_get_fd(monitor);
 		monitor_poll->events = POLLIN;
 
 		if (inotify_fd >= 0) {
@@ -821,14 +959,20 @@ int main(int argc, char *argv[])
 			inotify_poll->events = POLLIN;
 		}
 
-		fdcount = ppoll(pfd, nfds, NULL, &orig_mask);
+		/* set timeout to kill idle workers */
+		if (udev_list_is_empty(&event_list) && !udev_list_is_empty(&worker_list))
+			timeout = &ts;
+		else
+			timeout = NULL;
+
+		fdcount = ppoll(pfd, nfds, timeout, &orig_mask);
 		sigprocmask(SIG_SETMASK, &orig_mask, NULL);
-		if (fdcount < 0) {
-			if (errno == EINTR)
-				goto handle_signals;
-			err(udev, "error in select: %m\n");
+		if (fdcount < 0 && errno != EINTR)
 			continue;
-		}
+
+		/* timeout or config changed - kill idle workers */
+		if (fdcount == 0)
+			worker_kill_idle();
 
 		/* get control message */
 		if (ctrl_poll->revents & POLLIN)
@@ -838,16 +982,11 @@ int main(int argc, char *argv[])
 		if (monitor_poll->revents & POLLIN) {
 			struct udev_device *dev;
 
-			dev = udev_monitor_receive_device(kernel_monitor);
-			if (dev != NULL) {
-				struct udev_event *event;
-
-				event = udev_event_new(dev);
-				if (event != NULL)
-					event_queue_insert(event);
-				else
-					udev_device_unref(dev);
-			}
+			dev = udev_monitor_receive_device(monitor);
+			if (dev != NULL)
+				event_queue_insert(dev);
+			else
+				udev_device_unref(dev);
 		}
 
 		/* rules directory inotify watch */
@@ -855,13 +994,12 @@ int main(int argc, char *argv[])
 			handle_inotify(udev);
 
 handle_signals:
-		signal_received = 0;
-
 		/* rules changed, set by inotify or a HUP signal */
 		if (reload_config) {
 			struct udev_rules *rules_new;
 
 			reload_config = 0;
+			worker_kill_idle();
 			rules_new = udev_rules_new(udev, resolve_names);
 			if (rules_new != NULL) {
 				udev_rules_unref(rules);
@@ -869,32 +1007,41 @@ handle_signals:
 			}
 		}
 
-		if (sigchilds_waiting) {
-			sigchilds_waiting = 0;
-			reap_sigchilds();
-		}
+		/* cleanup killed worker */
+		if (worker_dead) {
+			struct udev_list_node *loop, *tmp;
+
+			udev_list_node_foreach_safe(loop, tmp, &worker_list) {
+				struct worker *worker = node_to_worker(loop);
+
+				if (worker->state != WORKER_DEAD)
+					continue;
 
-		if (run_exec_q) {
-			run_exec_q = 0;
-			if (!stop_exec_q)
-				event_queue_manager(udev);
+				udev_list_node_remove(&worker->node);
+				free(worker);
+			}
+			worker_dead = 0;
 		}
 
 		if (settle_pid > 0) {
 			kill(settle_pid, SIGUSR1);
 			settle_pid = 0;
 		}
+
+		if (!udev_list_is_empty(&event_list))
+			event_queue_manager(udev);
 	}
+
 	udev_queue_export_cleanup(udev_queue_export);
 	rc = 0;
+	killpg(0, SIGTERM);
 exit:
-
 	udev_queue_export_unref(udev_queue_export);
 	udev_rules_unref(rules);
 	udev_ctrl_unref(udev_ctrl);
 	if (inotify_fd >= 0)
 		close(inotify_fd);
-	udev_monitor_unref(kernel_monitor);
+	udev_monitor_unref(monitor);
 	udev_selinux_exit(udev);
 	udev_unref(udev);
 	logging_close();

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [GIT] Experimental threaded udev
  2009-05-28 14:35 [GIT] Experimental threaded udev Alan Jenkins
                   ` (11 preceding siblings ...)
  2009-06-01 16:22 ` Kay Sievers
@ 2009-06-01 16:24 ` Alan Jenkins
  2009-06-01 19:39 ` Kay Sievers
                   ` (13 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Alan Jenkins @ 2009-06-01 16:24 UTC (permalink / raw)
  To: linux-hotplug

Kay Sievers wrote:
> On Mon, Jun 1, 2009 at 13:32, Kay Sievers <kay.sievers@vrfy.org> wrote:
>   
>> On Mon, Jun 1, 2009 at 11:29, Alan Jenkins <alan-jenkins@tuffmail.co.uk> wrote:
>>     
>
>   
>>> I think this loop should finish when the signal handler sets worker_exit?
>>>  But maybe you didn't actually install a handler for SIGTERM and it's still
>>> being reset to the default action:
>>>       
>> Yeah, it's not handled with worker_exit. Now, it's not reliable to
>> kill event processes from something else than the main daemon. The
>> worker_exit might be nice for valgrind tests though.
>>     
>
> Here is version 2 of the patch, with a few things corrected.
>
> Thanks,
> Kay
>   

Ok.

I don't think the signal handler should be doing list traversal, in case 
it interrupts list manipulation in the main loop.  Or perhaps the main 
loop should only allow signals during ppoll().

Thanks
Alan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [GIT] Experimental threaded udev
  2009-05-28 14:35 [GIT] Experimental threaded udev Alan Jenkins
                   ` (12 preceding siblings ...)
  2009-06-01 16:24 ` Alan Jenkins
@ 2009-06-01 19:39 ` Kay Sievers
  2009-06-02  4:58 ` Kay Sievers
                   ` (12 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Kay Sievers @ 2009-06-01 19:39 UTC (permalink / raw)
  To: linux-hotplug

On Mon, Jun 1, 2009 at 18:24, Alan Jenkins <alan-jenkins@tuffmail.co.uk> wrote:

> I don't think the signal handler should be doing list traversal, in case it
> interrupts list manipulation in the main loop.  Or perhaps the main loop
> should only allow signals during ppoll().

Make sense. we can block them during list mangling, or just go for
signalfd, and convert all the signal handling to it.

Thanks,
Kay

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [GIT] Experimental threaded udev
  2009-05-28 14:35 [GIT] Experimental threaded udev Alan Jenkins
                   ` (13 preceding siblings ...)
  2009-06-01 19:39 ` Kay Sievers
@ 2009-06-02  4:58 ` Kay Sievers
  2009-06-02  9:13 ` Alan Jenkins
                   ` (11 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Kay Sievers @ 2009-06-02  4:58 UTC (permalink / raw)
  To: linux-hotplug

[-- Attachment #1: Type: text/plain, Size: 654 bytes --]

On Mon, Jun 1, 2009 at 21:39, Kay Sievers <kay.sievers@vrfy.org> wrote:
> On Mon, Jun 1, 2009 at 18:24, Alan Jenkins <alan-jenkins@tuffmail.co.uk> wrote:
>
>> I don't think the signal handler should be doing list traversal, in case it
>> interrupts list manipulation in the main loop.  Or perhaps the main loop
>> should only allow signals during ppoll().
>
> Make sense. we can block them during list mangling, or just go for
> signalfd, and convert all the signal handling to it.

Here is a version that uses signalfd(), and a socketpair() from the
workers to pass data back. Same numbers so far, but the code looks a
lot simpler.

Kay

[-- Attachment #2: worker4.patch --]
[-- Type: text/x-patch, Size: 37857 bytes --]

diff --git a/configure.ac b/configure.ac
index f1d008e..f126146 100644
--- a/configure.ac
+++ b/configure.ac
@@ -23,10 +23,6 @@ AC_SUBST(LIBUDEV_LT_AGE)
 
 AC_PATH_PROG([XSLTPROC], [xsltproc])
 
-AC_CHECK_LIB(c, inotify_init,
-	[AC_DEFINE([HAVE_INOTIFY], 1, [inotify available])],
-	[AC_MSG_WARN([inotify support disabled])])
-
 AC_ARG_WITH(udev-prefix,
 	AS_HELP_STRING([--with-udev-prefix=DIR], [add prefix to internal udev path names]),
 	[], [with_udev_prefix='${exec_prefix}'])
diff --git a/udev/Makefile.am b/udev/Makefile.am
index 6cd2f23..94989e6 100644
--- a/udev/Makefile.am
+++ b/udev/Makefile.am
@@ -14,7 +14,6 @@ common_ldadd =
 
 common_files = \
 	udev.h \
-	udev-sysdeps.h \
 	udev-event.c \
 	udev-watch.c \
 	udev-node.c \
diff --git a/udev/lib/libudev-monitor.c b/udev/lib/libudev-monitor.c
index 395a4d2..54c9576 100644
--- a/udev/lib/libudev-monitor.c
+++ b/udev/lib/libudev-monitor.c
@@ -32,7 +32,6 @@ struct udev_monitor {
 	int refcount;
 	int sock;
 	struct sockaddr_nl snl;
-	struct sockaddr_nl snl_peer;
 	struct sockaddr_un sun;
 	socklen_t addrlen;
 	struct udev_list_node filter_subsystem_list;
@@ -171,8 +170,8 @@ struct udev_monitor *udev_monitor_new_from_netlink(struct udev *udev, const char
 		return NULL;
 
 	if (name == NULL)
-		return NULL;
-	if (strcmp(name, "kernel") == 0)
+		group = 0;
+	else if (strcmp(name, "kernel") == 0)
 		group = UDEV_MONITOR_KERNEL;
 	else if (strcmp(name, "udev") == 0)
 		group = UDEV_MONITOR_UDEV;
@@ -193,8 +192,6 @@ struct udev_monitor *udev_monitor_new_from_netlink(struct udev *udev, const char
 
 	udev_monitor->snl.nl_family = AF_NETLINK;
 	udev_monitor->snl.nl_groups = group;
-	udev_monitor->snl_peer.nl_family = AF_NETLINK;
-	udev_monitor->snl_peer.nl_groups = UDEV_MONITOR_UDEV;
 
 	dbg(udev, "monitor %p created with NETLINK_KOBJECT_UEVENT (%u)\n", udev_monitor, group);
 	return udev_monitor;
@@ -434,7 +431,6 @@ struct udev_device *udev_monitor_receive_device(struct udev_monitor *udev_monito
 	struct iovec iov;
 	char cred_msg[CMSG_SPACE(sizeof(struct ucred))];
 	struct cmsghdr *cmsg;
-	struct sockaddr_nl snl;
 	struct ucred *cred;
 	char buf[8192];
 	ssize_t buflen;
@@ -459,11 +455,6 @@ retry:
 	smsg.msg_control = cred_msg;
 	smsg.msg_controllen = sizeof(cred_msg);
 
-	if (udev_monitor->snl.nl_family != 0) {
-		smsg.msg_name = &snl;
-		smsg.msg_namelen = sizeof(snl);
-	}
-
 	buflen = recvmsg(udev_monitor->sock, &smsg, 0);
 	if (buflen < 0) {
 		if (errno != EINTR)
@@ -476,20 +467,6 @@ retry:
 		return NULL;
 	}
 
-	if (udev_monitor->snl.nl_family != 0) {
-		if (snl.nl_groups == 0) {
-			info(udev_monitor->udev, "unicast netlink message ignored\n");
-			return NULL;
-		}
-		if (snl.nl_groups == UDEV_MONITOR_KERNEL) {
-			if (snl.nl_pid > 0) {
-				info(udev_monitor->udev, "multicast kernel netlink message from pid %d ignored\n", snl.nl_pid);
-				return NULL;
-			}
-			is_kernel = 1;
-		}
-	}
-
 	cmsg = CMSG_FIRSTHDR(&smsg);
 	if (cmsg == NULL || cmsg->cmsg_type != SCM_CREDENTIALS) {
 		info(udev_monitor->udev, "no sender credentials received, message ignored\n");
@@ -621,7 +598,7 @@ retry:
 	return udev_device;
 }
 
-int udev_monitor_send_device(struct udev_monitor *udev_monitor, struct udev_device *udev_device)
+int udev_monitor_send_device(struct udev_monitor *udev_monitor, struct udev_device *udev_device, pid_t pid)
 {
 	struct msghdr smsg;
 	struct iovec iov[2];
@@ -660,6 +637,7 @@ int udev_monitor_send_device(struct udev_monitor *udev_monitor, struct udev_devi
 	} else if (udev_monitor->snl.nl_family != 0) {
 		const char *val;
 		struct udev_monitor_netlink_header nlh;
+		struct sockaddr_nl snl_peer;
 
 
 		/* add versioned header */
@@ -680,11 +658,18 @@ int udev_monitor_send_device(struct udev_monitor *udev_monitor, struct udev_devi
 		iov[1].iov_base = (char *)buf;
 		iov[1].iov_len = blen;
 
+		/* we will always get ECONNREFUSED when sending to the muticast group */
+		memset(&snl_peer, 0x00, sizeof(struct sockaddr_nl));
+		snl_peer.nl_family = AF_NETLINK;
+		if (pid > 0)
+			snl_peer.nl_pid = pid;
+		else
+			snl_peer.nl_groups = UDEV_MONITOR_UDEV;
+
 		memset(&smsg, 0x00, sizeof(struct msghdr));
 		smsg.msg_iov = iov;
 		smsg.msg_iovlen = 2;
-		/* no destination besides the muticast group, we will always get ECONNREFUSED */
-		smsg.msg_name = &udev_monitor->snl_peer;
+		smsg.msg_name = &snl_peer;
 		smsg.msg_namelen = sizeof(struct sockaddr_nl);
 	} else {
 		return -1;
diff --git a/udev/lib/libudev-private.h b/udev/lib/libudev-private.h
index dc02a84..e600802 100644
--- a/udev/lib/libudev-private.h
+++ b/udev/lib/libudev-private.h
@@ -86,7 +86,7 @@ int udev_device_delete_db(struct udev_device *udev_device);
 int udev_device_rename_db(struct udev_device *udev_device, const char *devpath);
 
 /* libudev-monitor - netlink/unix socket communication  */
-int udev_monitor_send_device(struct udev_monitor *udev_monitor, struct udev_device *udev_device);
+int udev_monitor_send_device(struct udev_monitor *udev_monitor, struct udev_device *udev_device, pid_t pid);
 int udev_monitor_set_receive_buffer_size(struct udev_monitor *udev_monitor, int size);
 
 /* libudev-ctrl - daemon runtime setup */
diff --git a/udev/udev-event.c b/udev/udev-event.c
index d521251..8ab262a 100644
--- a/udev/udev-event.c
+++ b/udev/udev-event.c
@@ -734,18 +734,13 @@ int udev_event_execute_run(struct udev_event *event)
 			monitor = udev_monitor_new_from_socket(event->udev, &cmd[strlen("socket:")]);
 			if (monitor == NULL)
 				continue;
-			udev_monitor_send_device(monitor, event->dev);
+			udev_monitor_send_device(monitor, event->dev, 0);
 			udev_monitor_unref(monitor);
 		} else {
 			char program[UTIL_PATH_SIZE];
 			char **envp;
 
 			udev_event_apply_format(event, cmd, program, sizeof(program));
-			if (event->trace)
-				fprintf(stderr, "run  %s (%llu) '%s'\n",
-				       udev_device_get_syspath(event->dev),
-				       udev_device_get_seqnum(event->dev),
-				       program);
 			envp = udev_device_get_properties_envp(event->dev);
 			if (util_run_program(event->udev, program, envp, NULL, 0, NULL) != 0) {
 				if (!udev_list_entry_get_flag(list_entry))
diff --git a/udev/udev-sysdeps.h b/udev/udev-sysdeps.h
deleted file mode 100644
index 35671ba..0000000
--- a/udev/udev-sysdeps.h
+++ /dev/null
@@ -1,44 +0,0 @@
-/*
- * wrapping of libc features and kernel interfaces
- *
- * Copyright (C) 2005-2008 Kay Sievers <kay.sievers@vrfy.org>
- *
- * This program is free software: you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation, either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program.  If not, see <http://www.gnu.org/licenses/>.
- */
-
-#ifndef _UDEV_SYSDEPS_H_
-#define _UDEV_SYSDEPS_H_
-
-#include <stdint.h>
-#include <errno.h>
-
-#ifndef HAVE_INOTIFY
-static inline int inotify_init(void)
-{
-	errno = ENOSYS;
-	return -1;
-}
-
-static inline int inotify_add_watch(int fd, const char *name, uint32_t mask)
-{
-	return -1;
-}
-
-#define IN_CREATE	0
-#define IN_DELETE	0
-#define IN_MOVE		0
-#define IN_CLOSE_WRITE	0
-
-#endif /* HAVE_INOTIFY */
-#endif
diff --git a/udev/udev-watch.c b/udev/udev-watch.c
index 53492e5..5a49c96 100644
--- a/udev/udev-watch.c
+++ b/udev/udev-watch.c
@@ -26,27 +26,24 @@
 #include <stdlib.h>
 #include <string.h>
 #include <unistd.h>
-#ifdef HAVE_INOTIFY
 #include <sys/inotify.h>
-#endif
 
 #include "udev.h"
 
-int inotify_fd = -1;
+static int inotify_fd = -1;
 
 /* inotify descriptor, will be shared with rules directory;
  * set to cloexec since we need our children to be able to add
  * watches for us
  */
-void udev_watch_init(struct udev *udev)
+int udev_watch_init(struct udev *udev)
 {
 	inotify_fd = inotify_init();
 	if (inotify_fd >= 0)
 		util_set_fd_cloexec(inotify_fd);
-	else if (errno == ENOSYS)
-		info(udev, "unable to use inotify, udevd will not monitor rule files changes\n");
 	else
 		err(udev, "inotify_init failed: %m\n");
+	return inotify_fd;
 }
 
 /* move any old watches directory out of the way, and then restore
diff --git a/udev/udev.h b/udev/udev.h
index 8f2c1c6..7187975 100644
--- a/udev/udev.h
+++ b/udev/udev.h
@@ -22,7 +22,6 @@
 #include <sys/types.h>
 #include <sys/param.h>
 
-#include "udev-sysdeps.h"
 #include "lib/libudev.h"
 #include "lib/libudev-private.h"
 
@@ -53,7 +52,6 @@ static inline void logging_close(void)
 }
 
 struct udev_event {
-	struct udev_list_node node;
 	struct udev *udev;
 	struct udev_device *dev;
 	struct udev_device *dev_parent;
@@ -64,10 +62,6 @@ struct udev_event {
 	uid_t uid;
 	gid_t gid;
 	struct udev_list_node run_list;
-	pid_t pid;
-	int exitstatus;
-	time_t queue_time;
-	unsigned long long int delaying_seqnum;
 	unsigned int group_final:1;
 	unsigned int owner_final:1;
 	unsigned int mode_final:1;
@@ -76,7 +70,6 @@ struct udev_event {
 	unsigned int run_final:1;
 	unsigned int ignore_device:1;
 	unsigned int inotify_watch:1;
-	unsigned int trace:1;
 };
 
 struct udev_watch {
@@ -101,8 +94,7 @@ int udev_event_apply_subsys_kernel(struct udev_event *event, const char *string,
 				   char *result, size_t maxsize, int read_value);
 
 /* udev-watch.c */
-extern int inotify_fd;
-void udev_watch_init(struct udev *udev);
+int udev_watch_init(struct udev *udev);
 void udev_watch_restore(struct udev *udev);
 void udev_watch_begin(struct udev *udev, struct udev_device *dev);
 void udev_watch_end(struct udev *udev, struct udev_device *dev);
diff --git a/udev/udevd.c b/udev/udevd.c
index 37b547a..7e6d244 100644
--- a/udev/udevd.c
+++ b/udev/udevd.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2004-2008 Kay Sievers <kay.sievers@vrfy.org>
+ * Copyright (C) 2004-2009 Kay Sievers <kay.sievers@vrfy.org>
  * Copyright (C) 2004 Chris Friesen <chris_friesen@sympatico.ca>
  * Copyright (C) 2009 Canonical Ltd.
  * Copyright (C) 2009 Scott James Remnant <scott@netsplit.com>
@@ -30,23 +30,21 @@
 #include <time.h>
 #include <getopt.h>
 #include <dirent.h>
+#include <sys/prctl.h>
+#include <sys/socket.h>
+#include <sys/signalfd.h>
 #include <sys/select.h>
 #include <sys/poll.h>
 #include <sys/wait.h>
 #include <sys/stat.h>
 #include <sys/ioctl.h>
-#ifdef HAVE_INOTIFY
 #include <sys/inotify.h>
-#endif
 
 #include "udev.h"
 
 #define UDEVD_PRIORITY			-4
 #define UDEV_PRIORITY			-2
 
-/* maximum limit of forked childs */
-#define UDEVD_MAX_CHILDS		256
-
 static int debug;
 
 static void log_fn(struct udev *udev, int priority,
@@ -61,163 +59,311 @@ static void log_fn(struct udev *udev, int priority,
 	}
 }
 
-static void reap_sigchilds(void);
-
 static int debug_trace;
 static struct udev_rules *rules;
 static struct udev_queue_export *udev_queue_export;
 static struct udev_ctrl *udev_ctrl;
-static struct udev_monitor *kernel_monitor;
-static volatile sig_atomic_t sigchilds_waiting;
-static volatile sig_atomic_t udev_exit;
-static volatile sig_atomic_t reload_config;
-static volatile sig_atomic_t signal_received;
-static volatile pid_t settle_pid;
-static int run_exec_q;
-static int stop_exec_q;
+static struct udev_monitor *monitor;
+static int worker_socket[2];
+static pid_t settle_pid;
+static int stop_exec_queue;
+static int reload_config;
 static int max_childs;
 static int childs;
 static struct udev_list_node event_list;
+static struct udev_list_node worker_list;
+static int udev_exit;
+static volatile sig_atomic_t worker_exit;
+
+enum poll_fd {
+	FD_CONTROL,
+	FD_NETLINK,
+	FD_INOTIFY,
+	FD_SIGNAL,
+	FD_WORKER,
+};
+
+static struct pollfd pfd[] = {
+	[FD_NETLINK] = { .events = POLLIN },
+	[FD_WORKER] =  { .events = POLLIN },
+	[FD_SIGNAL] =  { .events = POLLIN },
+	[FD_INOTIFY] = { .events = POLLIN },
+	[FD_CONTROL] = { .events = POLLIN },
+};
+
+enum event_state {
+	EVENT_UNDEF,
+	EVENT_QUEUED,
+	EVENT_RUNNING,
+};
+
+struct event {
+	struct udev_list_node node;
+	struct udev *udev;
+	struct udev_device *dev;
+	enum event_state state;
+	int exitcode;
+	unsigned long long int delaying_seqnum;
+};
 
-static struct udev_event *node_to_event(struct udev_list_node *node)
+static struct event *node_to_event(struct udev_list_node *node)
 {
 	char *event;
 
 	event = (char *)node;
-	event -= offsetof(struct udev_event, node);
-	return (struct udev_event *)event;
+	event -= offsetof(struct event, node);
+	return (struct event *)event;
+}
+
+enum worker_state {
+	WORKER_UNDEF,
+	WORKER_RUNNING,
+	WORKER_IDLE,
+	WORKER_KILLED,
+};
+
+struct worker {
+	struct udev_list_node node;
+	pid_t pid;
+	enum worker_state state;
+	struct event *event;
+};
+
+/* passed from worker to main process */
+struct worker_message {
+	pid_t pid;
+	int exitcode;
+};
+
+static struct worker *node_to_worker(struct udev_list_node *node)
+{
+	char *worker;
+
+	worker = (char *)node;
+	worker -= offsetof(struct worker, node);
+	return (struct worker *)worker;
 }
 
-static void event_queue_delete(struct udev_event *event)
+static void event_queue_delete(struct event *event)
 {
 	udev_list_node_remove(&event->node);
 
 	/* mark as failed, if "add" event returns non-zero */
-	if (event->exitstatus && strcmp(udev_device_get_action(event->dev), "add") == 0)
+	if (event->exitcode && strcmp(udev_device_get_action(event->dev), "add") == 0)
 		udev_queue_export_device_failed(udev_queue_export, event->dev);
 	else
 		udev_queue_export_device_finished(udev_queue_export, event->dev);
 
 	udev_device_unref(event->dev);
-	udev_event_unref(event);
+	free(event);
 }
 
 static void event_sig_handler(int signum)
 {
-	if (signum == SIGALRM)
+	switch (signum) {
+	case SIGALRM:
 		_exit(1);
+		break;
+	case SIGTERM:
+		worker_exit = 1;
+		break;
+	}
 }
 
-static void event_fork(struct udev_event *event)
+static void worker_new(struct event *event)
 {
+	struct worker *worker;
 	pid_t pid;
 	struct sigaction act;
-	int err;
-
-#if 0
-	/* single process, no forking, just for testing/profiling */
-	err = udev_event_execute_rules(event, rules);
-	if (err == 0 && !event->ignore_device && udev_get_run(event->udev))
-		udev_event_execute_run(event);
-	info(event->udev, "seq %llu exit with %i\n", udev_device_get_seqnum(event->dev), err);
-	event_queue_delete(event);
-	return;
-#endif
 
-	if (debug_trace) {
-		event->trace = 1;
-		fprintf(stderr, "fork %s (%llu)\n",
-		       udev_device_get_syspath(event->dev),
-		       udev_device_get_seqnum(event->dev));
-	}
+	worker = calloc(1, sizeof(struct worker));
+	if (worker == NULL)
+		return;
 
 	pid = fork();
 	switch (pid) {
-	case 0:
-		/* child */
+	case 0: {
+		sigset_t mask;
+		struct udev_device *dev;
+
 		udev_queue_export_unref(udev_queue_export);
 		udev_ctrl_unref(udev_ctrl);
+		udev_monitor_unref(monitor);
+		close(pfd[FD_SIGNAL].fd);
+		close(worker_socket[READ_END]);
 		logging_close();
-		logging_init("udevd-event");
+		logging_init("udevd-work");
 		setpriority(PRIO_PROCESS, 0, UDEV_PRIORITY);
 
+		/* re-open socket to listen to udevd only, and send back libudev events */
+		monitor = udev_monitor_new_from_netlink(event->udev, NULL);
+		if (monitor == NULL)
+			_exit(2);
+		udev_monitor_enable_receiving(monitor);
+
 		/* set signal handlers */
 		memset(&act, 0x00, sizeof(act));
 		act.sa_handler = event_sig_handler;
 		sigemptyset (&act.sa_mask);
 		act.sa_flags = 0;
+		sigaction(SIGTERM, &act, NULL);
 		sigaction(SIGALRM, &act, NULL);
 
-		/* reset to default */
-		act.sa_handler = SIG_DFL;
-		sigaction(SIGINT, &act, NULL);
-		sigaction(SIGTERM, &act, NULL);
-		sigaction(SIGCHLD, &act, NULL);
-		sigaction(SIGHUP, &act, NULL);
+		/* unblock signals */
+		sigfillset(&mask);
+		sigdelset(&mask, SIGTERM);
+		sigdelset(&mask, SIGALRM);
+		sigprocmask(SIG_SETMASK, &mask, NULL);
 
-		/* set timeout to prevent hanging processes */
-		alarm(UDEV_EVENT_TIMEOUT);
+		/* request TERM signal if main daemon exits */
+		prctl(PR_SET_PDEATHSIG, SIGTERM);
 
-		/* apply rules, create node, symlinks */
-		err = udev_event_execute_rules(event, rules);
+		/* initial device */
+		dev = event->dev;
 
-		/* rules may change/disable the timeout */
-		if (udev_device_get_event_timeout(event->dev) >= 0)
-			alarm(udev_device_get_event_timeout(event->dev));
+		while (!worker_exit) {
+			struct udev_event *udev_event;
+			struct worker_message msg;
+			int err;
 
-		/* execute RUN= */
-		if (err == 0 && !event->ignore_device && udev_get_run(event->udev))
-			udev_event_execute_run(event);
+			udev_event = udev_event_new(dev);
+			if (udev_event == NULL)
+				_exit(3);
 
-		/* apply/restore inotify watch */
-		if (err == 0 && event->inotify_watch) {
-			udev_watch_begin(event->udev, event->dev);
-			udev_device_update_db(event->dev);
-		}
+			/* set timeout to prevent hanging processes */
+			alarm(UDEV_EVENT_TIMEOUT);
+
+			/* apply rules, create node, symlinks */
+			err = udev_event_execute_rules(udev_event, rules);
+
+			/* rules may change/disable the timeout */
+			if (udev_device_get_event_timeout(dev) >= 0)
+				alarm(udev_device_get_event_timeout(dev));
+
+			/* execute RUN= */
+			if (err == 0 && !udev_event->ignore_device && udev_get_run(udev_event->udev))
+				udev_event_execute_run(udev_event);
+
+			/* reset alarm */
+			alarm(0);
+
+			/* apply/restore inotify watch */
+			if (err == 0 && udev_event->inotify_watch) {
+				udev_watch_begin(udev_event->udev, dev);
+				udev_device_update_db(dev);
+			}
+
+			/* send processed event back to libudev listeners */
+			udev_monitor_send_device(monitor, dev, 0);
+
+			info(event->udev, "seq %llu finished with %i\n", udev_device_get_seqnum(dev), err);
+			udev_device_unref(dev);
+			udev_event_unref(udev_event);
+
+			/* send back the result of the event execution */
+			msg.exitcode = err;
+			msg.pid = getpid();
+			send(worker_socket[WRITE_END], &msg, sizeof(struct worker_message), 0);
 
-		/* send processed event back to the kernel netlink socket */
-		udev_monitor_send_device(kernel_monitor, event->dev);
+			/* wait for more device messages from udevd */
+			do
+				dev = udev_monitor_receive_device(monitor);
+			while (!worker_exit && dev == NULL);
+		}
 
-		info(event->udev, "seq %llu exit with %i\n", udev_device_get_seqnum(event->dev), err);
+		udev_monitor_unref(monitor);
 		logging_close();
-		if (err != 0)
-			exit(1);
 		exit(0);
+	}
 	case -1:
+		event->state = EVENT_QUEUED;
+		free(worker);
 		err(event->udev, "fork of child failed: %m\n");
-		event_queue_delete(event);
 		break;
 	default:
-		/* get SIGCHLD in main loop */
-		info(event->udev, "seq %llu forked, pid [%d], '%s' '%s', %ld seconds old\n",
-		     udev_device_get_seqnum(event->dev),
-		     pid,
-		     udev_device_get_action(event->dev),
-		     udev_device_get_subsystem(event->dev),
-		     time(NULL) - event->queue_time);
-		event->pid = pid;
+		event->state = EVENT_RUNNING;
+		worker->event = event;
+		worker->pid = pid;
+		worker->state = WORKER_RUNNING;
+		udev_list_node_append(&worker->node, &worker_list);
 		childs++;
+		break;
 	}
 }
 
-static void event_queue_insert(struct udev_event *event)
+static void event_run(struct event *event)
 {
-	event->queue_time = time(NULL);
+	struct udev_list_node *loop;
 
-	udev_queue_export_device_queued(udev_queue_export, event->dev);
-	info(event->udev, "seq %llu queued, '%s' '%s'\n", udev_device_get_seqnum(event->dev),
-	     udev_device_get_action(event->dev), udev_device_get_subsystem(event->dev));
+	udev_list_node_foreach(loop, &worker_list) {
+		struct worker *worker = node_to_worker(loop);
+		ssize_t count;
 
+		if (worker->state != WORKER_IDLE)
+			continue;
+
+		worker->event = event;
+		worker->state = WORKER_RUNNING;
+		event->state = EVENT_RUNNING;
+		count = udev_monitor_send_device(monitor, event->dev, worker->pid);
+		if (count < 0) {
+			err(event->udev, "worker [%u] did not accept message, kill it\n", worker->pid);
+			event->state = EVENT_QUEUED;
+			worker->state = WORKER_KILLED;
+			kill(worker->pid, SIGKILL);
+			continue;
+		}
+		return;
+	}
+
+	if (childs >= max_childs) {
+		info(event->udev, "maximum number (%i) of childs reached\n", childs);
+		return;
+	}
+
+	/* start new worker and pass initial device */
+	worker_new(event);
+}
+
+static void event_queue_insert(struct udev_device *dev)
+{
+	struct event *event;
+
+	event = calloc(1, sizeof(struct event));
+	if (event == NULL)
+		return;
+
+	event->udev = udev_device_get_udev(dev);
+	event->dev = dev;
+	udev_queue_export_device_queued(udev_queue_export, dev);
+	info(event->udev, "seq %llu queued, '%s' '%s'\n", udev_device_get_seqnum(dev),
+	     udev_device_get_action(dev), udev_device_get_subsystem(dev));
+
+	event->state = EVENT_QUEUED;
 	udev_list_node_append(&event->node, &event_list);
-	run_exec_q = 1;
 
 	/* run all events with a timeout set immediately */
-	if (udev_device_get_timeout(event->dev) > 0) {
-		event_fork(event);
+	if (udev_device_get_timeout(dev) > 0) {
+		worker_new(event);
 		return;
 	}
 }
 
+static void worker_kill_idle(void)
+{
+	struct udev_list_node *loop;
+
+	udev_list_node_foreach(loop, &worker_list) {
+		struct worker *worker = node_to_worker(loop);
+
+		if (worker->state != WORKER_IDLE)
+			continue;
+
+		worker->state = WORKER_KILLED;
+		kill(worker->pid, SIGTERM);
+	}
+}
+
 static int mem_size_mb(void)
 {
 	FILE *f;
@@ -265,13 +411,13 @@ static int compare_devpath(const char *running, const char *waiting)
 }
 
 /* lookup event for identical, parent, child device */
-static int devpath_busy(struct udev_event *event)
+static int devpath_busy(struct event *event)
 {
 	struct udev_list_node *loop;
 
 	/* check if queue contains events we depend on */
 	udev_list_node_foreach(loop, &event_list) {
-		struct udev_event *loop_event = node_to_event(loop);
+		struct event *loop_event = node_to_event(loop);
 
 		/* we already found a later event, earlier can not block us, no need to check again */
 		if (udev_device_get_seqnum(loop_event->dev) < event->delaying_seqnum)
@@ -305,52 +451,6 @@ static int devpath_busy(struct udev_event *event)
 	return 0;
 }
 
-/* serializes events for the identical and parent and child devices */
-static void event_queue_manager(struct udev *udev)
-{
-	struct udev_list_node *loop;
-	struct udev_list_node *tmp;
-
-start_over:
-	if (udev_list_is_empty(&event_list)) {
-		if (childs > 0) {
-			err(udev, "event list empty, but childs count is %i", childs);
-			childs = 0;
-		}
-		return;
-	}
-
-	udev_list_node_foreach_safe(loop, tmp, &event_list) {
-		struct udev_event *loop_event = node_to_event(loop);
-
-		if (childs >= max_childs) {
-			info(udev, "maximum number (%i) of childs reached\n", childs);
-			break;
-		}
-
-		if (loop_event->pid != 0)
-			continue;
-
-		/* do not start event if parent or child event is still running */
-		if (devpath_busy(loop_event) != 0) {
-			dbg(udev, "delay seq %llu (%s)\n",
-			    udev_device_get_seqnum(loop_event->dev),
-			    udev_device_get_devpath(loop_event->dev));
-			continue;
-		}
-
-		event_fork(loop_event);
-		dbg(udev, "moved seq %llu to running list\n", udev_device_get_seqnum(loop_event->dev));
-
-		/* retry if events finished in the meantime */
-		if (sigchilds_waiting) {
-			sigchilds_waiting = 0;
-			reap_sigchilds();
-			goto start_over;
-		}
-	}
-}
-
 /* receive the udevd message from userspace */
 static void handle_ctrl_msg(struct udev_ctrl *uctrl)
 {
@@ -371,13 +471,12 @@ static void handle_ctrl_msg(struct udev_ctrl *uctrl)
 
 	if (udev_ctrl_get_stop_exec_queue(ctrl_msg) > 0) {
 		info(udev, "udevd message (STOP_EXEC_QUEUE) received\n");
-		stop_exec_q = 1;
+		stop_exec_queue = 1;
 	}
 
 	if (udev_ctrl_get_start_exec_queue(ctrl_msg) > 0) {
 		info(udev, "udevd message (START_EXEC_QUEUE) received\n");
-		stop_exec_q = 0;
-		event_queue_manager(udev);
+		stop_exec_queue = 0;
 	}
 
 	if (udev_ctrl_get_reload_rules(ctrl_msg) > 0) {
@@ -420,6 +519,8 @@ static void handle_ctrl_msg(struct udev_ctrl *uctrl)
 	settle_pid = udev_ctrl_get_settle(ctrl_msg);
 	if (settle_pid > 0) {
 		info(udev, "udevd message (SETTLE) received\n");
+		kill(settle_pid, SIGUSR1);
+		settle_pid = 0;
 	}
 	udev_ctrl_msg_unref(ctrl_msg);
 }
@@ -427,22 +528,20 @@ static void handle_ctrl_msg(struct udev_ctrl *uctrl)
 /* read inotify messages */
 static int handle_inotify(struct udev *udev)
 {
-	int nbytes, pos;
+	ssize_t nbytes, pos;
 	char *buf;
 	struct inotify_event *ev;
 
-	if ((ioctl(inotify_fd, FIONREAD, &nbytes) < 0) || (nbytes <= 0))
+	if ((ioctl(pfd[FD_INOTIFY].fd, FIONREAD, &nbytes) < 0) || (nbytes <= 0))
 		return 0;
 
 	buf = malloc(nbytes);
 	if (buf == NULL) {
 		err(udev, "error getting buffer for inotify, disable watching\n");
-		close(inotify_fd);
-		inotify_fd = -1;
-		return 0;
+		return -1;
 	}
 
-	read(inotify_fd, buf, nbytes);
+	nbytes = read(pfd[FD_INOTIFY].fd, buf, nbytes);
 
 	for (pos = 0; pos < nbytes; pos += sizeof(struct inotify_event) + ev->len) {
 		struct udev_device *dev;
@@ -480,71 +579,6 @@ static int handle_inotify(struct udev *udev)
 	return 0;
 }
 
-static void sig_handler(int signum)
-{
-	switch (signum) {
-		case SIGINT:
-		case SIGTERM:
-			udev_exit = 1;
-			break;
-		case SIGCHLD:
-			/* set flag, then write to pipe if needed */
-			sigchilds_waiting = 1;
-			break;
-		case SIGHUP:
-			reload_config = 1;
-			break;
-	}
-
-	signal_received = 1;
-}
-
-static void udev_done(int pid, int exitstatus)
-{
-	struct udev_list_node *loop;
-
-	/* find event associated with pid and delete it */
-	udev_list_node_foreach(loop, &event_list) {
-		struct udev_event *loop_event = node_to_event(loop);
-
-		if (loop_event->pid == pid) {
-			info(loop_event->udev, "seq %llu cleanup, pid [%d], status %i, %ld seconds old\n",
-			     udev_device_get_seqnum(loop_event->dev), loop_event->pid,
-			     exitstatus, time(NULL) - loop_event->queue_time);
-			loop_event->exitstatus = exitstatus;
-			if (debug_trace)
-				fprintf(stderr, "exit %s (%llu)\n",
-				       udev_device_get_syspath(loop_event->dev),
-				       udev_device_get_seqnum(loop_event->dev));
-			event_queue_delete(loop_event);
-			childs--;
-
-			/* there may be dependent events waiting */
-			run_exec_q = 1;
-			return;
-		}
-	}
-}
-
-static void reap_sigchilds(void)
-{
-	pid_t pid;
-	int status;
-
-	while (1) {
-		pid = waitpid(-1, &status, WNOHANG);
-		if (pid <= 0)
-			break;
-		if (WIFEXITED(status))
-			status = WEXITSTATUS(status);
-		else if (WIFSIGNALED(status))
-			status = WTERMSIG(status) + 128;
-		else
-			status = 0;
-		udev_done(pid, status);
-	}
-}
-
 static void startup_log(struct udev *udev)
 {
 	FILE *f;
@@ -576,7 +610,7 @@ int main(int argc, char *argv[])
 {
 	struct udev *udev;
 	int fd;
-	struct sigaction act;
+	sigset_t mask;
 	const char *value;
 	int daemonize = 0;
 	int resolve_names = 1;
@@ -669,29 +703,76 @@ int main(int argc, char *argv[])
 		rc = 1;
 		goto exit;
 	}
-
 	if (udev_ctrl_enable_receiving(udev_ctrl) < 0) {
 		fprintf(stderr, "error binding control socket, seems udevd is already running\n");
 		err(udev, "error binding control socket, seems udevd is already running\n");
 		rc = 1;
 		goto exit;
 	}
+	pfd[FD_CONTROL].fd = udev_ctrl_get_fd(udev_ctrl);
 
-	kernel_monitor = udev_monitor_new_from_netlink(udev, "kernel");
-	if (kernel_monitor == NULL || udev_monitor_enable_receiving(kernel_monitor) < 0) {
+	monitor = udev_monitor_new_from_netlink(udev, "kernel");
+	if (monitor == NULL || udev_monitor_enable_receiving(monitor) < 0) {
 		fprintf(stderr, "error initializing netlink socket\n");
 		err(udev, "error initializing netlink socket\n");
 		rc = 3;
 		goto exit;
 	}
-	udev_monitor_set_receive_buffer_size(kernel_monitor, 128*1024*1024);
+	udev_monitor_set_receive_buffer_size(monitor, 128*1024*1024);
+	pfd[FD_NETLINK].fd = udev_monitor_get_fd(monitor);
+
+	pfd[FD_INOTIFY].fd = udev_watch_init(udev);
+	if (pfd[FD_INOTIFY].fd < 0) {
+		fprintf(stderr, "error initializing inotify\n");
+		err(udev, "error initializing inotify\n");
+		rc = 4;
+		goto exit;
+	}
+
+	if (udev_get_rules_path(udev) != NULL) {
+		inotify_add_watch(pfd[FD_NETLINK].fd, udev_get_rules_path(udev),
+				  IN_CREATE | IN_DELETE | IN_MOVE | IN_CLOSE_WRITE);
+	} else {
+		char filename[UTIL_PATH_SIZE];
+
+		inotify_add_watch(pfd[FD_NETLINK].fd, UDEV_PREFIX "/lib/udev/rules.d",
+				  IN_CREATE | IN_DELETE | IN_MOVE | IN_CLOSE_WRITE);
+		inotify_add_watch(pfd[FD_NETLINK].fd, SYSCONFDIR "/udev/rules.d",
+				  IN_CREATE | IN_DELETE | IN_MOVE | IN_CLOSE_WRITE);
+
+		/* watch dynamic rules directory */
+		util_strscpyl(filename, sizeof(filename), udev_get_dev_path(udev), "/.udev/rules.d", NULL);
+		inotify_add_watch(pfd[FD_NETLINK].fd, filename,
+				  IN_CREATE | IN_DELETE | IN_MOVE | IN_CLOSE_WRITE);
+	}
+	udev_watch_restore(udev);
+
+	/* block and listen to all signals on signalfd */
+	sigfillset(&mask);
+	sigprocmask(SIG_SETMASK, &mask, NULL);
+	pfd[FD_SIGNAL].fd = signalfd(-1, &mask, 0);
+	if (pfd[FD_SIGNAL].fd < 0) {
+		fprintf(stderr, "error getting signalfd\n");
+		err(udev, "error getting signalfd\n");
+		rc = 5;
+		goto exit;
+	}
+
+	/* unnamed socket from workers to the main daemon */
+	if (socketpair(AF_LOCAL, SOCK_DGRAM, 0, worker_socket) < 0) {
+		fprintf(stderr, "error getting socketpair\n");
+		err(udev, "error getting socketpair\n");
+		rc = 6;
+		goto exit;
+	}
+	pfd[FD_WORKER].fd = worker_socket[READ_END];
 
 	rules = udev_rules_new(udev, resolve_names);
 	if (rules == NULL) {
 		err(udev, "error reading rules\n");
 		goto exit;
 	}
-	udev_list_init(&event_list);
+
 	udev_queue_export = udev_queue_export_new(udev);
 	if (udev_queue_export == NULL) {
 		err(udev, "error creating queue file\n");
@@ -704,19 +785,19 @@ int main(int argc, char *argv[])
 		pid = fork();
 		switch (pid) {
 		case 0:
-			dbg(udev, "daemonized fork running\n");
 			break;
 		case -1:
 			err(udev, "fork of daemon failed: %m\n");
 			rc = 4;
 			goto exit;
 		default:
-			dbg(udev, "child [%u] running, parent exits\n", pid);
 			rc = 0;
 			goto exit;
 		}
 	}
 
+	startup_log(udev);
+
 	/* redirect std{out,err} */
 	if (!debug && !debug_trace) {
 		dup2(fd, STDIN_FILENO);
@@ -742,159 +823,189 @@ int main(int argc, char *argv[])
 		close(fd);
 	}
 
-	startup_log(udev);
-
-	/* set signal handlers */
-	memset(&act, 0x00, sizeof(struct sigaction));
-	act.sa_handler = sig_handler;
-	sigemptyset(&act.sa_mask);
-	act.sa_flags = SA_RESTART;
-	sigaction(SIGINT, &act, NULL);
-	sigaction(SIGTERM, &act, NULL);
-	sigaction(SIGCHLD, &act, NULL);
-	sigaction(SIGHUP, &act, NULL);
-
-	/* watch rules directory */
-	udev_watch_init(udev);
-	if (inotify_fd >= 0) {
-		if (udev_get_rules_path(udev) != NULL) {
-			inotify_add_watch(inotify_fd, udev_get_rules_path(udev),
-					  IN_CREATE | IN_DELETE | IN_MOVE | IN_CLOSE_WRITE);
-		} else {
-			char filename[UTIL_PATH_SIZE];
-
-			inotify_add_watch(inotify_fd, UDEV_PREFIX "/lib/udev/rules.d",
-					  IN_CREATE | IN_DELETE | IN_MOVE | IN_CLOSE_WRITE);
-			inotify_add_watch(inotify_fd, SYSCONFDIR "/udev/rules.d",
-					  IN_CREATE | IN_DELETE | IN_MOVE | IN_CLOSE_WRITE);
-
-			/* watch dynamic rules directory */
-			util_strscpyl(filename, sizeof(filename), udev_get_dev_path(udev), "/.udev/rules.d", NULL);
-			inotify_add_watch(inotify_fd, filename,
-					  IN_CREATE | IN_DELETE | IN_MOVE | IN_CLOSE_WRITE);
-		}
-
-		udev_watch_restore(udev);
-	}
-
 	/* in trace mode run one event after the other */
 	if (debug_trace) {
 		max_childs = 1;
 	} else {
 		int memsize = mem_size_mb();
+
 		if (memsize > 0)
 			max_childs = 128 + (memsize / 4);
 		else
-			max_childs = UDEVD_MAX_CHILDS;
+			max_childs = 256;
 	}
+
 	/* possibly overwrite maximum limit of executed events */
 	value = getenv("UDEVD_MAX_CHILDS");
 	if (value)
 		max_childs = strtoul(value, NULL, 10);
 	info(udev, "initialize max_childs to %u\n", max_childs);
 
+	udev_list_init(&event_list);
+	udev_list_init(&worker_list);
+
 	while (!udev_exit) {
-		sigset_t blocked_mask, orig_mask;
-		struct pollfd pfd[4];
-		struct pollfd *ctrl_poll, *monitor_poll, *inotify_poll = NULL;
-		int nfds = 0;
 		int fdcount;
+		int timeout;
 
-		sigfillset(&blocked_mask);
-		sigprocmask(SIG_SETMASK, &blocked_mask, &orig_mask);
-		if (signal_received) {
-			sigprocmask(SIG_SETMASK, &orig_mask, NULL);
-			goto handle_signals;
-		}
-
-		ctrl_poll = &pfd[nfds++];
-		ctrl_poll->fd = udev_ctrl_get_fd(udev_ctrl);
-		ctrl_poll->events = POLLIN;
-
-		monitor_poll = &pfd[nfds++];
-		monitor_poll->fd = udev_monitor_get_fd(kernel_monitor);
-		monitor_poll->events = POLLIN;
+		/* set timeout to kill idle workers */
+		if (udev_list_is_empty(&event_list) && !udev_list_is_empty(&worker_list))
+			timeout = 10 * 1000;
+		else
+			timeout = -1;
+		/* wait for events */
+		fdcount = poll(pfd, ARRAY_SIZE(pfd), timeout);
+		if (fdcount < 0)
+			continue;
 
-		if (inotify_fd >= 0) {
-			inotify_poll = &pfd[nfds++];
-			inotify_poll->fd = inotify_fd;
-			inotify_poll->events = POLLIN;
+		/* timeout - kill idle workers */
+		if (fdcount == 0)
+			worker_kill_idle();
+
+		/* event has finished */
+		if (pfd[FD_WORKER].revents & POLLIN) {
+			while (1) {
+				struct worker_message msg;
+				ssize_t size;
+				struct udev_list_node *loop;
+
+				size = recv(pfd[FD_WORKER].fd, &msg, sizeof(struct worker_message), MSG_DONTWAIT);
+				if (size != sizeof(struct worker_message))
+					break;
+
+				/* lookup worker who sent the signal */
+				udev_list_node_foreach(loop, &worker_list) {
+					struct worker *worker = node_to_worker(loop);
+
+					if (worker->pid != msg.pid)
+						continue;
+
+					/* worker returned */
+					worker->event->exitcode = msg.exitcode;
+					event_queue_delete(worker->event);
+					worker->event = NULL;
+					worker->state = WORKER_IDLE;
+					break;
+				}
+			}
 		}
 
-		fdcount = ppoll(pfd, nfds, NULL, &orig_mask);
-		sigprocmask(SIG_SETMASK, &orig_mask, NULL);
-		if (fdcount < 0) {
-			if (errno == EINTR)
-				goto handle_signals;
-			err(udev, "error in select: %m\n");
-			continue;
+		/* get kernel uevent */
+		if (pfd[FD_NETLINK].revents & POLLIN) {
+			struct udev_device *dev;
+
+			dev = udev_monitor_receive_device(monitor);
+			if (dev != NULL)
+				event_queue_insert(dev);
+			else
+				udev_device_unref(dev);
 		}
 
 		/* get control message */
-		if (ctrl_poll->revents & POLLIN)
+		if (pfd[FD_CONTROL].revents & POLLIN)
 			handle_ctrl_msg(udev_ctrl);
 
-		/* get kernel uevent */
-		if (monitor_poll->revents & POLLIN) {
-			struct udev_device *dev;
+		/* start new events */
+		if (!udev_list_is_empty(&event_list) && !stop_exec_queue) {
+			struct udev_list_node *loop;
 
-			dev = udev_monitor_receive_device(kernel_monitor);
-			if (dev != NULL) {
-				struct udev_event *event;
+			udev_list_node_foreach(loop, &event_list) {
+				struct event *event = node_to_event(loop);
+
+				/* skip running events */
+				if (event->state != EVENT_QUEUED)
+					continue;
+
+				/* do not start event if parent or child event is still running */
+				if (devpath_busy(event) != 0) {
+					dbg(udev, "delay seq %llu (%s)\n",
+					    udev_device_get_seqnum(event->dev),
+					    udev_device_get_devpath(event->dev));
+					continue;
+				}
 
-				event = udev_event_new(dev);
-				if (event != NULL)
-					event_queue_insert(event);
-				else
-					udev_device_unref(dev);
+				event_run(event);
 			}
 		}
 
-		/* rules directory inotify watch */
-		if (inotify_poll && (inotify_poll->revents & POLLIN))
-			handle_inotify(udev);
+		/* get signal */
+		if (pfd[FD_SIGNAL].revents & POLLIN) {
+			struct signalfd_siginfo fdsi;
+			ssize_t size;
+
+			size = read(pfd[FD_SIGNAL].fd, &fdsi, sizeof(struct signalfd_siginfo));
+			if (size != sizeof(struct signalfd_siginfo))
+				continue;
+			switch (fdsi.ssi_signo) {
+			case SIGINT:
+			case SIGTERM:
+				udev_exit = 1;
+				break;
+			case SIGCHLD:
+				while (1) {
+					pid_t pid;
+					struct udev_list_node *loop, *tmp;
+
+					pid = waitpid(-1, NULL, WNOHANG);
+					if (pid <= 0)
+						break;
+
+					udev_list_node_foreach_safe(loop, tmp, &worker_list) {
+						struct worker *worker = node_to_worker(loop);
+
+						if (worker->pid != pid)
+							continue;
+
+						/* fail event, if worker died unexpectedly */
+						if (worker->event != NULL) {
+							worker->event->exitcode = 127;
+							event_queue_delete(worker->event);
+						}
+
+						udev_list_node_remove(&worker->node);
+						free(worker);
+						childs--;
+						break;
+					}
+				}
+				break;
+			case SIGHUP:
+				reload_config = 1;
+				break;
+			}
+		}
 
-handle_signals:
-		signal_received = 0;
+		/* device node and rules directory inotify watch */
+		if (pfd[FD_INOTIFY].revents & POLLIN)
+			handle_inotify(udev);
 
 		/* rules changed, set by inotify or a HUP signal */
 		if (reload_config) {
 			struct udev_rules *rules_new;
 
-			reload_config = 0;
+			worker_kill_idle();
 			rules_new = udev_rules_new(udev, resolve_names);
 			if (rules_new != NULL) {
 				udev_rules_unref(rules);
 				rules = rules_new;
 			}
-		}
-
-		if (sigchilds_waiting) {
-			sigchilds_waiting = 0;
-			reap_sigchilds();
-		}
-
-		if (run_exec_q) {
-			run_exec_q = 0;
-			if (!stop_exec_q)
-				event_queue_manager(udev);
-		}
-
-		if (settle_pid > 0) {
-			kill(settle_pid, SIGUSR1);
-			settle_pid = 0;
+			reload_config = 0;
 		}
 	}
+
 	udev_queue_export_cleanup(udev_queue_export);
 	rc = 0;
 exit:
-
 	udev_queue_export_unref(udev_queue_export);
 	udev_rules_unref(rules);
 	udev_ctrl_unref(udev_ctrl);
-	if (inotify_fd >= 0)
-		close(inotify_fd);
-	udev_monitor_unref(kernel_monitor);
+	if (pfd[FD_SIGNAL].fd >= 0)
+		close(pfd[FD_SIGNAL].fd);
+	if (worker_socket[READ_END] >= 0)
+		close(worker_socket[READ_END]);
+	if (worker_socket[WRITE_END] >= 0)
+		close(worker_socket[WRITE_END]);
+	udev_monitor_unref(monitor);
 	udev_selinux_exit(udev);
 	udev_unref(udev);
 	logging_close();

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [GIT] Experimental threaded udev
  2009-05-28 14:35 [GIT] Experimental threaded udev Alan Jenkins
                   ` (14 preceding siblings ...)
  2009-06-02  4:58 ` Kay Sievers
@ 2009-06-02  9:13 ` Alan Jenkins
  2009-06-02  9:26 ` Alan Jenkins
                   ` (10 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Alan Jenkins @ 2009-06-02  9:13 UTC (permalink / raw)
  To: linux-hotplug

Kay Sievers wrote:
> On Mon, Jun 1, 2009 at 15:57, Kay Sievers <kay.sievers@vrfy.org> wrote: 
>
>   
>>> The code looked like you freed the worker, but left the event RUNNING, and
>>> it would never be released.  I would delete the event instead, just like the
>>> old system.
>>>
>>> I haven't read V2 yet though, maybe you fixed it.
>>>       
>> I just set it back to QUEUED for now. Not sure if droppin git or
>> re-trying it a few times would be better.
>>     
>
> Version 3, which should clean up events with a worker that died. Also
> kills all workers if the config has changed.
>   

For strict correctness, I guess we need to do the same for "udevadm 
control --log-priority" and "udevadm control --env".

Hmm, it says worker_kill_idle().  If there are running events at the 
time, their stale workers will survive.  At the moment kill(worker->pid, 
SIGTERM) allows the current event to finish before it terminates, so one 
solution would be to just kill all the workers.

Regards
Alan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [GIT] Experimental threaded udev
  2009-05-28 14:35 [GIT] Experimental threaded udev Alan Jenkins
                   ` (15 preceding siblings ...)
  2009-06-02  9:13 ` Alan Jenkins
@ 2009-06-02  9:26 ` Alan Jenkins
  2009-06-02 11:39 ` Kay Sievers
                   ` (9 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Alan Jenkins @ 2009-06-02  9:26 UTC (permalink / raw)
  To: linux-hotplug

On 6/2/09, Kay Sievers <kay.sievers@vrfy.org> wrote:
> On Mon, Jun 1, 2009 at 21:39, Kay Sievers <kay.sievers@vrfy.org> wrote:
>> On Mon, Jun 1, 2009 at 18:24, Alan Jenkins <alan-jenkins@tuffmail.co.uk>
>> wrote:
>>
>>> I don't think the signal handler should be doing list traversal, in case
>>> it
>>> interrupts list manipulation in the main loop.  Or perhaps the main loop
>>> should only allow signals during ppoll().
>>
>> Make sense. we can block them during list mangling, or just go for
>> signalfd, and convert all the signal handling to it.
>
> Here is a version that uses signalfd(), and a socketpair() from the
> workers to pass data back. Same numbers so far, but the code looks a
> lot simpler.
>
> Kay

Yes, that looks great.

Alan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [GIT] Experimental threaded udev
  2009-05-28 14:35 [GIT] Experimental threaded udev Alan Jenkins
                   ` (16 preceding siblings ...)
  2009-06-02  9:26 ` Alan Jenkins
@ 2009-06-02 11:39 ` Kay Sievers
  2009-06-02 14:05 ` Kay Sievers
                   ` (8 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Kay Sievers @ 2009-06-02 11:39 UTC (permalink / raw)
  To: linux-hotplug

On Tue, Jun 2, 2009 at 11:13, Alan Jenkins <alan-jenkins@tuffmail.co.uk> wrote:
>> Version 3, which should clean up events with a worker that died. Also
>> kills all workers if the config has changed.
>
> For strict correctness, I guess we need to do the same for "udevadm control
> --log-priority" and "udevadm control --env".

Yeah, makes sense, will add that.

> Hmm, it says worker_kill_idle().  If there are running events at the time,
> their stale workers will survive.  At the moment kill(worker->pid, SIGTERM)
> allows the current event to finish before it terminates, so one solution
> would be to just kill all the workers.

The main daemon will wakeup with a timeout as long as workers exists.
There meigt be several rounds of killing, until all workers are gone.

Thanks,
Kay

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [GIT] Experimental threaded udev
  2009-05-28 14:35 [GIT] Experimental threaded udev Alan Jenkins
                   ` (17 preceding siblings ...)
  2009-06-02 11:39 ` Kay Sievers
@ 2009-06-02 14:05 ` Kay Sievers
  2009-06-03 19:44 ` Kay Sievers
                   ` (7 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Kay Sievers @ 2009-06-02 14:05 UTC (permalink / raw)
  To: linux-hotplug

[-- Attachment #1: Type: text/plain, Size: 1829 bytes --]

On Tue, Jun 2, 2009 at 13:39, Kay Sievers <kay.sievers@vrfy.org> wrote:
> On Tue, Jun 2, 2009 at 11:13, Alan Jenkins <alan-jenkins@tuffmail.co.uk> wrote:
>>> Version 3, which should clean up events with a worker that died. Also
>>> kills all workers if the config has changed.
>>
>> For strict correctness, I guess we need to do the same for "udevadm control
>> --log-priority" and "udevadm control --env".
>
> Yeah, makes sense, will add that.

Did that.

>> Hmm, it says worker_kill_idle().  If there are running events at the time,
>> their stale workers will survive.  At the moment kill(worker->pid, SIGTERM)
>> allows the current event to finish before it terminates, so one solution
>> would be to just kill all the workers.
>
> The main daemon will wakeup with a timeout as long as workers exists.
> There migt be several rounds of killing, until all workers are gone.

We kill all workers now.

Version 5. I guess we should go for the signalfd stuff, it looks prett
nice. The socketpair instead of the rtsignal is also nice, it allows
us to pass back arbitrary data from finished events to the main
daemon.

This is a full coldplug run for 520 devices:
  $ time udev/udevd
  []
  user 0m0.016s
  sys	0m0.049s

We need less than a second now (compared to 1.5 for the current version):
  $ time (udevadm trigger; udevadm settle)
  real	0m0.983s

It spawned 35 workers for 500 events:
  $ ps ax | grep udevd | wc -l
  35

Now that the page faults are gone, and the main daemon does not need
that much CPU anymore, all the silly things show up, like running a
few shell scripts. :)

  $ time (udevadm trigger; udevadm settle)
  real	0m0.959s

  $ mv /lib/udev/path_id /lib/udev/path_id.0

  $ time (udevadm trigger -Snet; udevadm settle)
  real	0m0.714s

Thanks,
Kay

[-- Attachment #2: worker5.patch --]
[-- Type: text/x-patch, Size: 41810 bytes --]

diff --git a/NEWS b/NEWS
index ac44d7a..8e9a1bc 100644
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,8 @@ udev 143
 ========
 Bugfixes.
 
+Inotify support is required now to compile and run udev.
+
 The format of the queue exported by the udev damon has changed. There is
 no longer a /dev/.udev/queue/ directory. The queue can be accessed with
 udevadm settle and libudedv.
diff --git a/configure.ac b/configure.ac
index f1d008e..f126146 100644
--- a/configure.ac
+++ b/configure.ac
@@ -23,10 +23,6 @@ AC_SUBST(LIBUDEV_LT_AGE)
 
 AC_PATH_PROG([XSLTPROC], [xsltproc])
 
-AC_CHECK_LIB(c, inotify_init,
-	[AC_DEFINE([HAVE_INOTIFY], 1, [inotify available])],
-	[AC_MSG_WARN([inotify support disabled])])
-
 AC_ARG_WITH(udev-prefix,
 	AS_HELP_STRING([--with-udev-prefix=DIR], [add prefix to internal udev path names]),
 	[], [with_udev_prefix='${exec_prefix}'])
diff --git a/udev/Makefile.am b/udev/Makefile.am
index 6cd2f23..94989e6 100644
--- a/udev/Makefile.am
+++ b/udev/Makefile.am
@@ -14,7 +14,6 @@ common_ldadd =
 
 common_files = \
 	udev.h \
-	udev-sysdeps.h \
 	udev-event.c \
 	udev-watch.c \
 	udev-node.c \
diff --git a/udev/lib/libudev-monitor.c b/udev/lib/libudev-monitor.c
index 395a4d2..54c9576 100644
--- a/udev/lib/libudev-monitor.c
+++ b/udev/lib/libudev-monitor.c
@@ -32,7 +32,6 @@ struct udev_monitor {
 	int refcount;
 	int sock;
 	struct sockaddr_nl snl;
-	struct sockaddr_nl snl_peer;
 	struct sockaddr_un sun;
 	socklen_t addrlen;
 	struct udev_list_node filter_subsystem_list;
@@ -171,8 +170,8 @@ struct udev_monitor *udev_monitor_new_from_netlink(struct udev *udev, const char
 		return NULL;
 
 	if (name == NULL)
-		return NULL;
-	if (strcmp(name, "kernel") == 0)
+		group = 0;
+	else if (strcmp(name, "kernel") == 0)
 		group = UDEV_MONITOR_KERNEL;
 	else if (strcmp(name, "udev") == 0)
 		group = UDEV_MONITOR_UDEV;
@@ -193,8 +192,6 @@ struct udev_monitor *udev_monitor_new_from_netlink(struct udev *udev, const char
 
 	udev_monitor->snl.nl_family = AF_NETLINK;
 	udev_monitor->snl.nl_groups = group;
-	udev_monitor->snl_peer.nl_family = AF_NETLINK;
-	udev_monitor->snl_peer.nl_groups = UDEV_MONITOR_UDEV;
 
 	dbg(udev, "monitor %p created with NETLINK_KOBJECT_UEVENT (%u)\n", udev_monitor, group);
 	return udev_monitor;
@@ -434,7 +431,6 @@ struct udev_device *udev_monitor_receive_device(struct udev_monitor *udev_monito
 	struct iovec iov;
 	char cred_msg[CMSG_SPACE(sizeof(struct ucred))];
 	struct cmsghdr *cmsg;
-	struct sockaddr_nl snl;
 	struct ucred *cred;
 	char buf[8192];
 	ssize_t buflen;
@@ -459,11 +455,6 @@ retry:
 	smsg.msg_control = cred_msg;
 	smsg.msg_controllen = sizeof(cred_msg);
 
-	if (udev_monitor->snl.nl_family != 0) {
-		smsg.msg_name = &snl;
-		smsg.msg_namelen = sizeof(snl);
-	}
-
 	buflen = recvmsg(udev_monitor->sock, &smsg, 0);
 	if (buflen < 0) {
 		if (errno != EINTR)
@@ -476,20 +467,6 @@ retry:
 		return NULL;
 	}
 
-	if (udev_monitor->snl.nl_family != 0) {
-		if (snl.nl_groups == 0) {
-			info(udev_monitor->udev, "unicast netlink message ignored\n");
-			return NULL;
-		}
-		if (snl.nl_groups == UDEV_MONITOR_KERNEL) {
-			if (snl.nl_pid > 0) {
-				info(udev_monitor->udev, "multicast kernel netlink message from pid %d ignored\n", snl.nl_pid);
-				return NULL;
-			}
-			is_kernel = 1;
-		}
-	}
-
 	cmsg = CMSG_FIRSTHDR(&smsg);
 	if (cmsg == NULL || cmsg->cmsg_type != SCM_CREDENTIALS) {
 		info(udev_monitor->udev, "no sender credentials received, message ignored\n");
@@ -621,7 +598,7 @@ retry:
 	return udev_device;
 }
 
-int udev_monitor_send_device(struct udev_monitor *udev_monitor, struct udev_device *udev_device)
+int udev_monitor_send_device(struct udev_monitor *udev_monitor, struct udev_device *udev_device, pid_t pid)
 {
 	struct msghdr smsg;
 	struct iovec iov[2];
@@ -660,6 +637,7 @@ int udev_monitor_send_device(struct udev_monitor *udev_monitor, struct udev_devi
 	} else if (udev_monitor->snl.nl_family != 0) {
 		const char *val;
 		struct udev_monitor_netlink_header nlh;
+		struct sockaddr_nl snl_peer;
 
 
 		/* add versioned header */
@@ -680,11 +658,18 @@ int udev_monitor_send_device(struct udev_monitor *udev_monitor, struct udev_devi
 		iov[1].iov_base = (char *)buf;
 		iov[1].iov_len = blen;
 
+		/* we will always get ECONNREFUSED when sending to the muticast group */
+		memset(&snl_peer, 0x00, sizeof(struct sockaddr_nl));
+		snl_peer.nl_family = AF_NETLINK;
+		if (pid > 0)
+			snl_peer.nl_pid = pid;
+		else
+			snl_peer.nl_groups = UDEV_MONITOR_UDEV;
+
 		memset(&smsg, 0x00, sizeof(struct msghdr));
 		smsg.msg_iov = iov;
 		smsg.msg_iovlen = 2;
-		/* no destination besides the muticast group, we will always get ECONNREFUSED */
-		smsg.msg_name = &udev_monitor->snl_peer;
+		smsg.msg_name = &snl_peer;
 		smsg.msg_namelen = sizeof(struct sockaddr_nl);
 	} else {
 		return -1;
diff --git a/udev/lib/libudev-private.h b/udev/lib/libudev-private.h
index dc02a84..e600802 100644
--- a/udev/lib/libudev-private.h
+++ b/udev/lib/libudev-private.h
@@ -86,7 +86,7 @@ int udev_device_delete_db(struct udev_device *udev_device);
 int udev_device_rename_db(struct udev_device *udev_device, const char *devpath);
 
 /* libudev-monitor - netlink/unix socket communication  */
-int udev_monitor_send_device(struct udev_monitor *udev_monitor, struct udev_device *udev_device);
+int udev_monitor_send_device(struct udev_monitor *udev_monitor, struct udev_device *udev_device, pid_t pid);
 int udev_monitor_set_receive_buffer_size(struct udev_monitor *udev_monitor, int size);
 
 /* libudev-ctrl - daemon runtime setup */
diff --git a/udev/udev-event.c b/udev/udev-event.c
index d521251..8ab262a 100644
--- a/udev/udev-event.c
+++ b/udev/udev-event.c
@@ -734,18 +734,13 @@ int udev_event_execute_run(struct udev_event *event)
 			monitor = udev_monitor_new_from_socket(event->udev, &cmd[strlen("socket:")]);
 			if (monitor == NULL)
 				continue;
-			udev_monitor_send_device(monitor, event->dev);
+			udev_monitor_send_device(monitor, event->dev, 0);
 			udev_monitor_unref(monitor);
 		} else {
 			char program[UTIL_PATH_SIZE];
 			char **envp;
 
 			udev_event_apply_format(event, cmd, program, sizeof(program));
-			if (event->trace)
-				fprintf(stderr, "run  %s (%llu) '%s'\n",
-				       udev_device_get_syspath(event->dev),
-				       udev_device_get_seqnum(event->dev),
-				       program);
 			envp = udev_device_get_properties_envp(event->dev);
 			if (util_run_program(event->udev, program, envp, NULL, 0, NULL) != 0) {
 				if (!udev_list_entry_get_flag(list_entry))
diff --git a/udev/udev-sysdeps.h b/udev/udev-sysdeps.h
deleted file mode 100644
index 35671ba..0000000
--- a/udev/udev-sysdeps.h
+++ /dev/null
@@ -1,44 +0,0 @@
-/*
- * wrapping of libc features and kernel interfaces
- *
- * Copyright (C) 2005-2008 Kay Sievers <kay.sievers@vrfy.org>
- *
- * This program is free software: you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation, either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program.  If not, see <http://www.gnu.org/licenses/>.
- */
-
-#ifndef _UDEV_SYSDEPS_H_
-#define _UDEV_SYSDEPS_H_
-
-#include <stdint.h>
-#include <errno.h>
-
-#ifndef HAVE_INOTIFY
-static inline int inotify_init(void)
-{
-	errno = ENOSYS;
-	return -1;
-}
-
-static inline int inotify_add_watch(int fd, const char *name, uint32_t mask)
-{
-	return -1;
-}
-
-#define IN_CREATE	0
-#define IN_DELETE	0
-#define IN_MOVE		0
-#define IN_CLOSE_WRITE	0
-
-#endif /* HAVE_INOTIFY */
-#endif
diff --git a/udev/udev-watch.c b/udev/udev-watch.c
index 53492e5..5a49c96 100644
--- a/udev/udev-watch.c
+++ b/udev/udev-watch.c
@@ -26,27 +26,24 @@
 #include <stdlib.h>
 #include <string.h>
 #include <unistd.h>
-#ifdef HAVE_INOTIFY
 #include <sys/inotify.h>
-#endif
 
 #include "udev.h"
 
-int inotify_fd = -1;
+static int inotify_fd = -1;
 
 /* inotify descriptor, will be shared with rules directory;
  * set to cloexec since we need our children to be able to add
  * watches for us
  */
-void udev_watch_init(struct udev *udev)
+int udev_watch_init(struct udev *udev)
 {
 	inotify_fd = inotify_init();
 	if (inotify_fd >= 0)
 		util_set_fd_cloexec(inotify_fd);
-	else if (errno == ENOSYS)
-		info(udev, "unable to use inotify, udevd will not monitor rule files changes\n");
 	else
 		err(udev, "inotify_init failed: %m\n");
+	return inotify_fd;
 }
 
 /* move any old watches directory out of the way, and then restore
diff --git a/udev/udev.h b/udev/udev.h
index 8f2c1c6..7187975 100644
--- a/udev/udev.h
+++ b/udev/udev.h
@@ -22,7 +22,6 @@
 #include <sys/types.h>
 #include <sys/param.h>
 
-#include "udev-sysdeps.h"
 #include "lib/libudev.h"
 #include "lib/libudev-private.h"
 
@@ -53,7 +52,6 @@ static inline void logging_close(void)
 }
 
 struct udev_event {
-	struct udev_list_node node;
 	struct udev *udev;
 	struct udev_device *dev;
 	struct udev_device *dev_parent;
@@ -64,10 +62,6 @@ struct udev_event {
 	uid_t uid;
 	gid_t gid;
 	struct udev_list_node run_list;
-	pid_t pid;
-	int exitstatus;
-	time_t queue_time;
-	unsigned long long int delaying_seqnum;
 	unsigned int group_final:1;
 	unsigned int owner_final:1;
 	unsigned int mode_final:1;
@@ -76,7 +70,6 @@ struct udev_event {
 	unsigned int run_final:1;
 	unsigned int ignore_device:1;
 	unsigned int inotify_watch:1;
-	unsigned int trace:1;
 };
 
 struct udev_watch {
@@ -101,8 +94,7 @@ int udev_event_apply_subsys_kernel(struct udev_event *event, const char *string,
 				   char *result, size_t maxsize, int read_value);
 
 /* udev-watch.c */
-extern int inotify_fd;
-void udev_watch_init(struct udev *udev);
+int udev_watch_init(struct udev *udev);
 void udev_watch_restore(struct udev *udev);
 void udev_watch_begin(struct udev *udev, struct udev_device *dev);
 void udev_watch_end(struct udev *udev, struct udev_device *dev);
diff --git a/udev/udevadm.xml b/udev/udevadm.xml
index 538180b..2e03d98 100644
--- a/udev/udevadm.xml
+++ b/udev/udevadm.xml
@@ -285,9 +285,9 @@
               <term><option>--reload-rules</option></term>
               <listitem>
                 <para>Signal udevd to reload the rules files.
-                Usually the udev daemon detects changes automatically, this may
-                only be needed on systems without inotify support. Reloading rules
-                does not apply any changes to already existing devices.</para>
+                The udev daemon detects changes automatically, this option is
+                usually not needed. Reloading rules does not apply any changes
+                to already existing devices.</para>
               </listitem>
             </varlistentry>
             <varlistentry>
diff --git a/udev/udevd.c b/udev/udevd.c
index 37b547a..6b47a30 100644
--- a/udev/udevd.c
+++ b/udev/udevd.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2004-2008 Kay Sievers <kay.sievers@vrfy.org>
+ * Copyright (C) 2004-2009 Kay Sievers <kay.sievers@vrfy.org>
  * Copyright (C) 2004 Chris Friesen <chris_friesen@sympatico.ca>
  * Copyright (C) 2009 Canonical Ltd.
  * Copyright (C) 2009 Scott James Remnant <scott@netsplit.com>
@@ -30,23 +30,21 @@
 #include <time.h>
 #include <getopt.h>
 #include <dirent.h>
+#include <sys/prctl.h>
+#include <sys/socket.h>
+#include <sys/signalfd.h>
 #include <sys/select.h>
 #include <sys/poll.h>
 #include <sys/wait.h>
 #include <sys/stat.h>
 #include <sys/ioctl.h>
-#ifdef HAVE_INOTIFY
 #include <sys/inotify.h>
-#endif
 
 #include "udev.h"
 
 #define UDEVD_PRIORITY			-4
 #define UDEV_PRIORITY			-2
 
-/* maximum limit of forked childs */
-#define UDEVD_MAX_CHILDS		256
-
 static int debug;
 
 static void log_fn(struct udev *udev, int priority,
@@ -61,163 +59,320 @@ static void log_fn(struct udev *udev, int priority,
 	}
 }
 
-static void reap_sigchilds(void);
-
 static int debug_trace;
 static struct udev_rules *rules;
 static struct udev_queue_export *udev_queue_export;
 static struct udev_ctrl *udev_ctrl;
-static struct udev_monitor *kernel_monitor;
-static volatile sig_atomic_t sigchilds_waiting;
-static volatile sig_atomic_t udev_exit;
-static volatile sig_atomic_t reload_config;
-static volatile sig_atomic_t signal_received;
-static volatile pid_t settle_pid;
-static int run_exec_q;
-static int stop_exec_q;
+static struct udev_monitor *monitor;
+static int worker_socket[2];
+static pid_t settle_pid;
+static int stop_exec_queue;
+static int reload_config;
 static int max_childs;
 static int childs;
 static struct udev_list_node event_list;
-
-static struct udev_event *node_to_event(struct udev_list_node *node)
+static struct udev_list_node worker_list;
+static int udev_exit;
+static volatile sig_atomic_t worker_exit;
+
+enum poll_fd {
+	FD_CONTROL,
+	FD_NETLINK,
+	FD_INOTIFY,
+	FD_SIGNAL,
+	FD_WORKER,
+};
+
+static struct pollfd pfd[] = {
+	[FD_NETLINK] = { .events = POLLIN },
+	[FD_WORKER] =  { .events = POLLIN },
+	[FD_SIGNAL] =  { .events = POLLIN },
+	[FD_INOTIFY] = { .events = POLLIN },
+	[FD_CONTROL] = { .events = POLLIN },
+};
+
+enum event_state {
+	EVENT_UNDEF,
+	EVENT_QUEUED,
+	EVENT_RUNNING,
+};
+
+struct event {
+	struct udev_list_node node;
+	struct udev *udev;
+	struct udev_device *dev;
+	enum event_state state;
+	int exitcode;
+	unsigned long long int delaying_seqnum;
+	unsigned long long int seqnum;
+	const char *devpath;
+	size_t devpath_len;
+	const char *devpath_old;
+};
+
+static struct event *node_to_event(struct udev_list_node *node)
 {
 	char *event;
 
 	event = (char *)node;
-	event -= offsetof(struct udev_event, node);
-	return (struct udev_event *)event;
+	event -= offsetof(struct event, node);
+	return (struct event *)event;
 }
 
-static void event_queue_delete(struct udev_event *event)
+enum worker_state {
+	WORKER_UNDEF,
+	WORKER_RUNNING,
+	WORKER_IDLE,
+	WORKER_KILLED,
+};
+
+struct worker {
+	struct udev_list_node node;
+	pid_t pid;
+	enum worker_state state;
+	struct event *event;
+};
+
+/* passed from worker to main process */
+struct worker_message {
+	pid_t pid;
+	int exitcode;
+};
+
+static struct worker *node_to_worker(struct udev_list_node *node)
+{
+	char *worker;
+
+	worker = (char *)node;
+	worker -= offsetof(struct worker, node);
+	return (struct worker *)worker;
+}
+
+static void event_queue_delete(struct event *event)
 {
 	udev_list_node_remove(&event->node);
 
 	/* mark as failed, if "add" event returns non-zero */
-	if (event->exitstatus && strcmp(udev_device_get_action(event->dev), "add") == 0)
+	if (event->exitcode && strcmp(udev_device_get_action(event->dev), "add") == 0)
 		udev_queue_export_device_failed(udev_queue_export, event->dev);
 	else
 		udev_queue_export_device_finished(udev_queue_export, event->dev);
 
 	udev_device_unref(event->dev);
-	udev_event_unref(event);
+	free(event);
 }
 
 static void event_sig_handler(int signum)
 {
-	if (signum == SIGALRM)
+	switch (signum) {
+	case SIGALRM:
 		_exit(1);
+		break;
+	case SIGTERM:
+		worker_exit = 1;
+		break;
+	}
 }
 
-static void event_fork(struct udev_event *event)
+static void worker_new(struct event *event)
 {
+	struct worker *worker;
 	pid_t pid;
 	struct sigaction act;
-	int err;
-
-#if 0
-	/* single process, no forking, just for testing/profiling */
-	err = udev_event_execute_rules(event, rules);
-	if (err == 0 && !event->ignore_device && udev_get_run(event->udev))
-		udev_event_execute_run(event);
-	info(event->udev, "seq %llu exit with %i\n", udev_device_get_seqnum(event->dev), err);
-	event_queue_delete(event);
-	return;
-#endif
 
-	if (debug_trace) {
-		event->trace = 1;
-		fprintf(stderr, "fork %s (%llu)\n",
-		       udev_device_get_syspath(event->dev),
-		       udev_device_get_seqnum(event->dev));
-	}
+	worker = calloc(1, sizeof(struct worker));
+	if (worker == NULL)
+		return;
 
 	pid = fork();
 	switch (pid) {
-	case 0:
-		/* child */
+	case 0: {
+		sigset_t mask;
+		struct udev_device *dev;
+
 		udev_queue_export_unref(udev_queue_export);
 		udev_ctrl_unref(udev_ctrl);
+		udev_monitor_unref(monitor);
+		close(pfd[FD_SIGNAL].fd);
+		close(worker_socket[READ_END]);
 		logging_close();
-		logging_init("udevd-event");
+		logging_init("udevd-work");
 		setpriority(PRIO_PROCESS, 0, UDEV_PRIORITY);
 
+		/* re-open socket to listen to udevd only, and send back libudev events */
+		monitor = udev_monitor_new_from_netlink(event->udev, NULL);
+		if (monitor == NULL)
+			_exit(2);
+		udev_monitor_enable_receiving(monitor);
+
 		/* set signal handlers */
 		memset(&act, 0x00, sizeof(act));
 		act.sa_handler = event_sig_handler;
 		sigemptyset (&act.sa_mask);
 		act.sa_flags = 0;
+		sigaction(SIGTERM, &act, NULL);
 		sigaction(SIGALRM, &act, NULL);
 
-		/* reset to default */
-		act.sa_handler = SIG_DFL;
-		sigaction(SIGINT, &act, NULL);
-		sigaction(SIGTERM, &act, NULL);
-		sigaction(SIGCHLD, &act, NULL);
-		sigaction(SIGHUP, &act, NULL);
+		/* unblock signals */
+		sigfillset(&mask);
+		sigdelset(&mask, SIGTERM);
+		sigdelset(&mask, SIGALRM);
+		sigprocmask(SIG_SETMASK, &mask, NULL);
 
-		/* set timeout to prevent hanging processes */
-		alarm(UDEV_EVENT_TIMEOUT);
+		/* request TERM signal if main daemon exits */
+		prctl(PR_SET_PDEATHSIG, SIGTERM);
 
-		/* apply rules, create node, symlinks */
-		err = udev_event_execute_rules(event, rules);
+		/* initial device */
+		dev = event->dev;
 
-		/* rules may change/disable the timeout */
-		if (udev_device_get_event_timeout(event->dev) >= 0)
-			alarm(udev_device_get_event_timeout(event->dev));
+		while (!worker_exit) {
+			struct udev_event *udev_event;
+			struct worker_message msg;
+			int err;
 
-		/* execute RUN= */
-		if (err == 0 && !event->ignore_device && udev_get_run(event->udev))
-			udev_event_execute_run(event);
+			udev_event = udev_event_new(dev);
+			if (udev_event == NULL)
+				_exit(3);
 
-		/* apply/restore inotify watch */
-		if (err == 0 && event->inotify_watch) {
-			udev_watch_begin(event->udev, event->dev);
-			udev_device_update_db(event->dev);
-		}
+			/* set timeout to prevent hanging processes */
+			alarm(UDEV_EVENT_TIMEOUT);
+
+			/* apply rules, create node, symlinks */
+			err = udev_event_execute_rules(udev_event, rules);
+
+			/* rules may change/disable the timeout */
+			if (udev_device_get_event_timeout(dev) >= 0)
+				alarm(udev_device_get_event_timeout(dev));
+
+			/* execute RUN= */
+			if (err == 0 && !udev_event->ignore_device && udev_get_run(udev_event->udev))
+				udev_event_execute_run(udev_event);
+
+			/* reset alarm */
+			alarm(0);
+
+			/* apply/restore inotify watch */
+			if (err == 0 && udev_event->inotify_watch) {
+				udev_watch_begin(udev_event->udev, dev);
+				udev_device_update_db(dev);
+			}
+
+			/* send processed event back to libudev listeners */
+			udev_monitor_send_device(monitor, dev, 0);
+
+			info(event->udev, "seq %llu finished with %i\n", udev_device_get_seqnum(dev), err);
+			udev_device_unref(dev);
+			udev_event_unref(udev_event);
+
+			/* send back the result of the event execution */
+			msg.exitcode = err;
+			msg.pid = getpid();
+			send(worker_socket[WRITE_END], &msg, sizeof(struct worker_message), 0);
 
-		/* send processed event back to the kernel netlink socket */
-		udev_monitor_send_device(kernel_monitor, event->dev);
+			/* wait for more device messages from udevd */
+			do
+				dev = udev_monitor_receive_device(monitor);
+			while (!worker_exit && dev == NULL);
+		}
 
-		info(event->udev, "seq %llu exit with %i\n", udev_device_get_seqnum(event->dev), err);
+		udev_monitor_unref(monitor);
 		logging_close();
-		if (err != 0)
-			exit(1);
 		exit(0);
+	}
 	case -1:
+		event->state = EVENT_QUEUED;
+		free(worker);
 		err(event->udev, "fork of child failed: %m\n");
-		event_queue_delete(event);
 		break;
 	default:
-		/* get SIGCHLD in main loop */
-		info(event->udev, "seq %llu forked, pid [%d], '%s' '%s', %ld seconds old\n",
-		     udev_device_get_seqnum(event->dev),
-		     pid,
-		     udev_device_get_action(event->dev),
-		     udev_device_get_subsystem(event->dev),
-		     time(NULL) - event->queue_time);
-		event->pid = pid;
+		event->state = EVENT_RUNNING;
+		worker->event = event;
+		worker->pid = pid;
+		worker->state = WORKER_RUNNING;
+		udev_list_node_append(&worker->node, &worker_list);
 		childs++;
+		break;
+	}
+}
+
+static void event_run(struct event *event)
+{
+	struct udev_list_node *loop;
+
+	udev_list_node_foreach(loop, &worker_list) {
+		struct worker *worker = node_to_worker(loop);
+		ssize_t count;
+
+		if (worker->state != WORKER_IDLE)
+			continue;
+
+		worker->event = event;
+		worker->state = WORKER_RUNNING;
+		event->state = EVENT_RUNNING;
+		count = udev_monitor_send_device(monitor, event->dev, worker->pid);
+		if (count < 0) {
+			err(event->udev, "worker [%u] did not accept message, kill it\n", worker->pid);
+			event->state = EVENT_QUEUED;
+			worker->state = WORKER_KILLED;
+			kill(worker->pid, SIGKILL);
+			continue;
+		}
+		return;
 	}
+
+	if (childs >= max_childs) {
+		info(event->udev, "maximum number (%i) of childs reached\n", childs);
+		return;
+	}
+
+	/* start new worker and pass initial device */
+	worker_new(event);
 }
 
-static void event_queue_insert(struct udev_event *event)
+static void event_queue_insert(struct udev_device *dev)
 {
-	event->queue_time = time(NULL);
+	struct event *event;
+
+	event = calloc(1, sizeof(struct event));
+	if (event == NULL)
+		return;
 
-	udev_queue_export_device_queued(udev_queue_export, event->dev);
-	info(event->udev, "seq %llu queued, '%s' '%s'\n", udev_device_get_seqnum(event->dev),
-	     udev_device_get_action(event->dev), udev_device_get_subsystem(event->dev));
+	event->udev = udev_device_get_udev(dev);
+	event->dev = dev;
+	event->seqnum = udev_device_get_seqnum(dev);
+	event->devpath = udev_device_get_devpath(dev);
+	event->devpath_len = strlen(event->devpath);
+	event->devpath_old = udev_device_get_devpath_old(dev);
 
+	udev_queue_export_device_queued(udev_queue_export, dev);
+	info(event->udev, "seq %llu queued, '%s' '%s'\n", udev_device_get_seqnum(dev),
+	     udev_device_get_action(dev), udev_device_get_subsystem(dev));
+
+	event->state = EVENT_QUEUED;
 	udev_list_node_append(&event->node, &event_list);
-	run_exec_q = 1;
 
 	/* run all events with a timeout set immediately */
-	if (udev_device_get_timeout(event->dev) > 0) {
-		event_fork(event);
+	if (udev_device_get_timeout(dev) > 0) {
+		worker_new(event);
 		return;
 	}
 }
 
+static void worker_kill(void)
+{
+	struct udev_list_node *loop;
+
+	udev_list_node_foreach(loop, &worker_list) {
+		struct worker *worker = node_to_worker(loop);
+
+		if (worker->state == WORKER_KILLED)
+			continue;
+
+		worker->state = WORKER_KILLED;
+		kill(worker->pid, SIGTERM);
+	}
+}
+
 static int mem_size_mb(void)
 {
 	FILE *f;
@@ -241,112 +396,111 @@ static int mem_size_mb(void)
 	return memsize;
 }
 
-static int compare_devpath(const char *running, const char *waiting)
-{
-	int i = 0;
-
-	while (running[i] != '\0' && running[i] == waiting[i])
-		i++;
-
-	/* identical device event found */
-	if (running[i] == '\0' && waiting[i] == '\0')
-		return 1;
-
-	/* parent device event found */
-	if (running[i] == '\0' && waiting[i] == '/')
-		return 2;
-
-	/* child device event found */
-	if (running[i] == '/' && waiting[i] == '\0')
-		return 3;
-
-	/* no matching event */
-	return 0;
-}
-
 /* lookup event for identical, parent, child device */
-static int devpath_busy(struct udev_event *event)
+static int devpath_busy(struct event *event)
 {
 	struct udev_list_node *loop;
+	size_t common;
 
 	/* check if queue contains events we depend on */
 	udev_list_node_foreach(loop, &event_list) {
-		struct udev_event *loop_event = node_to_event(loop);
+		struct event *loop_event = node_to_event(loop);
 
 		/* we already found a later event, earlier can not block us, no need to check again */
-		if (udev_device_get_seqnum(loop_event->dev) < event->delaying_seqnum)
+		if (loop_event->seqnum < event->delaying_seqnum)
 			continue;
 
 		/* event we checked earlier still exists, no need to check again */
-		if (udev_device_get_seqnum(loop_event->dev) == event->delaying_seqnum)
+		if (loop_event->seqnum == event->delaying_seqnum)
 			return 2;
 
 		/* found ourself, no later event can block us */
-		if (udev_device_get_seqnum(loop_event->dev) >= udev_device_get_seqnum(event->dev))
+		if (loop_event->seqnum >= event->seqnum)
 			break;
 
 		/* check our old name */
-		if (udev_device_get_devpath_old(event->dev) != NULL)
-			if (strcmp(udev_device_get_devpath(loop_event->dev), udev_device_get_devpath_old(event->dev)) == 0) {
-				event->delaying_seqnum = udev_device_get_seqnum(loop_event->dev);
+		if (event->devpath_old != NULL)
+			if (strcmp(loop_event->devpath, event->devpath_old) == 0) {
+				event->delaying_seqnum = loop_event->seqnum;
 				return 3;
 			}
 
-		/* check identical, parent, or child device event */
-		if (compare_devpath(udev_device_get_devpath(loop_event->dev), udev_device_get_devpath(event->dev)) != 0) {
-			dbg(event->udev, "%llu, device event still pending %llu (%s)\n",
-			    udev_device_get_seqnum(event->dev),
-			    udev_device_get_seqnum(loop_event->dev),
-			    udev_device_get_devpath(loop_event->dev));
-			event->delaying_seqnum = udev_device_get_seqnum(loop_event->dev);
+		/* compare devpath */
+		common = MIN(loop_event->devpath_len, event->devpath_len);
+
+		/* one devpath is contained in the other? */
+		if (memcmp(loop_event->devpath, event->devpath, common) != 0)
+			continue;
+
+		/* identical device event found */
+		if (loop_event->devpath_len == event->devpath_len) {
+			event->delaying_seqnum = loop_event->seqnum;
 			return 4;
 		}
+
+		/* parent device event found */
+		if (event->devpath[common] == '/') {
+			event->delaying_seqnum = loop_event->seqnum;
+			return 5;
+		}
+
+		/* child device event found */
+		if (loop_event->devpath[common] == '/') {
+			event->delaying_seqnum = loop_event->seqnum;
+			return 6;
+		}
+
+		/* no matching device */
+		continue;
 	}
+
 	return 0;
 }
 
-/* serializes events for the identical and parent and child devices */
-static void event_queue_manager(struct udev *udev)
+static void events_start(struct udev *udev)
 {
 	struct udev_list_node *loop;
-	struct udev_list_node *tmp;
-
-start_over:
-	if (udev_list_is_empty(&event_list)) {
-		if (childs > 0) {
-			err(udev, "event list empty, but childs count is %i", childs);
-			childs = 0;
-		}
-		return;
-	}
-
-	udev_list_node_foreach_safe(loop, tmp, &event_list) {
-		struct udev_event *loop_event = node_to_event(loop);
 
-		if (childs >= max_childs) {
-			info(udev, "maximum number (%i) of childs reached\n", childs);
-			break;
-		}
+	udev_list_node_foreach(loop, &event_list) {
+		struct event *event = node_to_event(loop);
 
-		if (loop_event->pid != 0)
+		if (event->state != EVENT_QUEUED)
 			continue;
 
 		/* do not start event if parent or child event is still running */
-		if (devpath_busy(loop_event) != 0) {
-			dbg(udev, "delay seq %llu (%s)\n",
-			    udev_device_get_seqnum(loop_event->dev),
-			    udev_device_get_devpath(loop_event->dev));
+		if (devpath_busy(event) != 0) {
+			dbg(udev, "delay seq %llu (%s)\n", event->seqnum, event->devpath);
 			continue;
 		}
 
-		event_fork(loop_event);
-		dbg(udev, "moved seq %llu to running list\n", udev_device_get_seqnum(loop_event->dev));
+		event_run(event);
+	}
+}
+
+static void worker_returned(void)
+{
+	while (1) {
+		struct worker_message msg;
+		ssize_t size;
+		struct udev_list_node *loop;
+
+		size = recv(pfd[FD_WORKER].fd, &msg, sizeof(struct worker_message), MSG_DONTWAIT);
+		if (size != sizeof(struct worker_message))
+			break;
+
+		/* lookup worker who sent the signal */
+		udev_list_node_foreach(loop, &worker_list) {
+			struct worker *worker = node_to_worker(loop);
+
+			if (worker->pid != msg.pid)
+				continue;
 
-		/* retry if events finished in the meantime */
-		if (sigchilds_waiting) {
-			sigchilds_waiting = 0;
-			reap_sigchilds();
-			goto start_over;
+			/* worker returned */
+			worker->event->exitcode = msg.exitcode;
+			event_queue_delete(worker->event);
+			worker->event = NULL;
+			worker->state = WORKER_IDLE;
+			break;
 		}
 	}
 }
@@ -367,17 +521,17 @@ static void handle_ctrl_msg(struct udev_ctrl *uctrl)
 	if (i >= 0) {
 		info(udev, "udevd message (SET_LOG_PRIORITY) received, log_priority=%i\n", i);
 		udev_set_log_priority(udev, i);
+		worker_kill();
 	}
 
 	if (udev_ctrl_get_stop_exec_queue(ctrl_msg) > 0) {
 		info(udev, "udevd message (STOP_EXEC_QUEUE) received\n");
-		stop_exec_q = 1;
+		stop_exec_queue = 1;
 	}
 
 	if (udev_ctrl_get_start_exec_queue(ctrl_msg) > 0) {
 		info(udev, "udevd message (START_EXEC_QUEUE) received\n");
-		stop_exec_q = 0;
-		event_queue_manager(udev);
+		stop_exec_queue = 0;
 	}
 
 	if (udev_ctrl_get_reload_rules(ctrl_msg) > 0) {
@@ -409,6 +563,7 @@ static void handle_ctrl_msg(struct udev_ctrl *uctrl)
 			}
 			free(key);
 		}
+		worker_kill();
 	}
 
 	i = udev_ctrl_get_set_max_childs(ctrl_msg);
@@ -420,6 +575,8 @@ static void handle_ctrl_msg(struct udev_ctrl *uctrl)
 	settle_pid = udev_ctrl_get_settle(ctrl_msg);
 	if (settle_pid > 0) {
 		info(udev, "udevd message (SETTLE) received\n");
+		kill(settle_pid, SIGUSR1);
+		settle_pid = 0;
 	}
 	udev_ctrl_msg_unref(ctrl_msg);
 }
@@ -427,22 +584,20 @@ static void handle_ctrl_msg(struct udev_ctrl *uctrl)
 /* read inotify messages */
 static int handle_inotify(struct udev *udev)
 {
-	int nbytes, pos;
+	ssize_t nbytes, pos;
 	char *buf;
 	struct inotify_event *ev;
 
-	if ((ioctl(inotify_fd, FIONREAD, &nbytes) < 0) || (nbytes <= 0))
+	if ((ioctl(pfd[FD_INOTIFY].fd, FIONREAD, &nbytes) < 0) || (nbytes <= 0))
 		return 0;
 
 	buf = malloc(nbytes);
 	if (buf == NULL) {
 		err(udev, "error getting buffer for inotify, disable watching\n");
-		close(inotify_fd);
-		inotify_fd = -1;
-		return 0;
+		return -1;
 	}
 
-	read(inotify_fd, buf, nbytes);
+	nbytes = read(pfd[FD_INOTIFY].fd, buf, nbytes);
 
 	for (pos = 0; pos < nbytes; pos += sizeof(struct inotify_event) + ev->len) {
 		struct udev_device *dev;
@@ -480,68 +635,44 @@ static int handle_inotify(struct udev *udev)
 	return 0;
 }
 
-static void sig_handler(int signum)
+static void handle_signal(int signo)
 {
-	switch (signum) {
-		case SIGINT:
-		case SIGTERM:
-			udev_exit = 1;
-			break;
-		case SIGCHLD:
-			/* set flag, then write to pipe if needed */
-			sigchilds_waiting = 1;
-			break;
-		case SIGHUP:
-			reload_config = 1;
-			break;
-	}
+	switch (signo) {
+	case SIGINT:
+	case SIGTERM:
+		udev_exit = 1;
+		break;
+	case SIGCHLD:
+		while (1) {
+			pid_t pid;
+			struct udev_list_node *loop, *tmp;
 
-	signal_received = 1;
-}
+			pid = waitpid(-1, NULL, WNOHANG);
+			if (pid <= 0)
+				break;
 
-static void udev_done(int pid, int exitstatus)
-{
-	struct udev_list_node *loop;
+			udev_list_node_foreach_safe(loop, tmp, &worker_list) {
+				struct worker *worker = node_to_worker(loop);
 
-	/* find event associated with pid and delete it */
-	udev_list_node_foreach(loop, &event_list) {
-		struct udev_event *loop_event = node_to_event(loop);
-
-		if (loop_event->pid == pid) {
-			info(loop_event->udev, "seq %llu cleanup, pid [%d], status %i, %ld seconds old\n",
-			     udev_device_get_seqnum(loop_event->dev), loop_event->pid,
-			     exitstatus, time(NULL) - loop_event->queue_time);
-			loop_event->exitstatus = exitstatus;
-			if (debug_trace)
-				fprintf(stderr, "exit %s (%llu)\n",
-				       udev_device_get_syspath(loop_event->dev),
-				       udev_device_get_seqnum(loop_event->dev));
-			event_queue_delete(loop_event);
-			childs--;
-
-			/* there may be dependent events waiting */
-			run_exec_q = 1;
-			return;
-		}
-	}
-}
+				if (worker->pid != pid)
+					continue;
 
-static void reap_sigchilds(void)
-{
-	pid_t pid;
-	int status;
+				/* fail event, if worker died unexpectedly */
+				if (worker->event != NULL) {
+					worker->event->exitcode = 127;
+					event_queue_delete(worker->event);
+				}
 
-	while (1) {
-		pid = waitpid(-1, &status, WNOHANG);
-		if (pid <= 0)
-			break;
-		if (WIFEXITED(status))
-			status = WEXITSTATUS(status);
-		else if (WIFSIGNALED(status))
-			status = WTERMSIG(status) + 128;
-		else
-			status = 0;
-		udev_done(pid, status);
+				udev_list_node_remove(&worker->node);
+				free(worker);
+				childs--;
+				break;
+			}
+		}
+		break;
+	case SIGHUP:
+		reload_config = 1;
+		break;
 	}
 }
 
@@ -576,7 +707,7 @@ int main(int argc, char *argv[])
 {
 	struct udev *udev;
 	int fd;
-	struct sigaction act;
+	sigset_t mask;
 	const char *value;
 	int daemonize = 0;
 	int resolve_names = 1;
@@ -669,29 +800,76 @@ int main(int argc, char *argv[])
 		rc = 1;
 		goto exit;
 	}
-
 	if (udev_ctrl_enable_receiving(udev_ctrl) < 0) {
 		fprintf(stderr, "error binding control socket, seems udevd is already running\n");
 		err(udev, "error binding control socket, seems udevd is already running\n");
 		rc = 1;
 		goto exit;
 	}
+	pfd[FD_CONTROL].fd = udev_ctrl_get_fd(udev_ctrl);
 
-	kernel_monitor = udev_monitor_new_from_netlink(udev, "kernel");
-	if (kernel_monitor == NULL || udev_monitor_enable_receiving(kernel_monitor) < 0) {
+	monitor = udev_monitor_new_from_netlink(udev, "kernel");
+	if (monitor == NULL || udev_monitor_enable_receiving(monitor) < 0) {
 		fprintf(stderr, "error initializing netlink socket\n");
 		err(udev, "error initializing netlink socket\n");
 		rc = 3;
 		goto exit;
 	}
-	udev_monitor_set_receive_buffer_size(kernel_monitor, 128*1024*1024);
+	udev_monitor_set_receive_buffer_size(monitor, 128*1024*1024);
+	pfd[FD_NETLINK].fd = udev_monitor_get_fd(monitor);
+
+	pfd[FD_INOTIFY].fd = udev_watch_init(udev);
+	if (pfd[FD_INOTIFY].fd < 0) {
+		fprintf(stderr, "error initializing inotify\n");
+		err(udev, "error initializing inotify\n");
+		rc = 4;
+		goto exit;
+	}
+
+	if (udev_get_rules_path(udev) != NULL) {
+		inotify_add_watch(pfd[FD_INOTIFY].fd, udev_get_rules_path(udev),
+				  IN_CREATE | IN_DELETE | IN_MOVE | IN_CLOSE_WRITE);
+	} else {
+		char filename[UTIL_PATH_SIZE];
+
+		inotify_add_watch(pfd[FD_INOTIFY].fd, UDEV_PREFIX "/lib/udev/rules.d",
+				  IN_CREATE | IN_DELETE | IN_MOVE | IN_CLOSE_WRITE);
+		inotify_add_watch(pfd[FD_INOTIFY].fd, SYSCONFDIR "/udev/rules.d",
+				  IN_CREATE | IN_DELETE | IN_MOVE | IN_CLOSE_WRITE);
+
+		/* watch dynamic rules directory */
+		util_strscpyl(filename, sizeof(filename), udev_get_dev_path(udev), "/.udev/rules.d", NULL);
+		inotify_add_watch(pfd[FD_INOTIFY].fd, filename,
+				  IN_CREATE | IN_DELETE | IN_MOVE | IN_CLOSE_WRITE);
+	}
+	udev_watch_restore(udev);
+
+	/* block and listen to all signals on signalfd */
+	sigfillset(&mask);
+	sigprocmask(SIG_SETMASK, &mask, NULL);
+	pfd[FD_SIGNAL].fd = signalfd(-1, &mask, 0);
+	if (pfd[FD_SIGNAL].fd < 0) {
+		fprintf(stderr, "error getting signalfd\n");
+		err(udev, "error getting signalfd\n");
+		rc = 5;
+		goto exit;
+	}
+
+	/* unnamed socket from workers to the main daemon */
+	if (socketpair(AF_LOCAL, SOCK_DGRAM, 0, worker_socket) < 0) {
+		fprintf(stderr, "error getting socketpair\n");
+		err(udev, "error getting socketpair\n");
+		rc = 6;
+		goto exit;
+	}
+	pfd[FD_WORKER].fd = worker_socket[READ_END];
 
 	rules = udev_rules_new(udev, resolve_names);
 	if (rules == NULL) {
 		err(udev, "error reading rules\n");
 		goto exit;
 	}
-	udev_list_init(&event_list);
+
 	udev_queue_export = udev_queue_export_new(udev);
 	if (udev_queue_export == NULL) {
 		err(udev, "error creating queue file\n");
@@ -704,19 +882,19 @@ int main(int argc, char *argv[])
 		pid = fork();
 		switch (pid) {
 		case 0:
-			dbg(udev, "daemonized fork running\n");
 			break;
 		case -1:
 			err(udev, "fork of daemon failed: %m\n");
 			rc = 4;
 			goto exit;
 		default:
-			dbg(udev, "child [%u] running, parent exits\n", pid);
 			rc = 0;
 			goto exit;
 		}
 	}
 
+	startup_log(udev);
+
 	/* redirect std{out,err} */
 	if (!debug && !debug_trace) {
 		dup2(fd, STDIN_FILENO);
@@ -742,159 +920,109 @@ int main(int argc, char *argv[])
 		close(fd);
 	}
 
-	startup_log(udev);
-
-	/* set signal handlers */
-	memset(&act, 0x00, sizeof(struct sigaction));
-	act.sa_handler = sig_handler;
-	sigemptyset(&act.sa_mask);
-	act.sa_flags = SA_RESTART;
-	sigaction(SIGINT, &act, NULL);
-	sigaction(SIGTERM, &act, NULL);
-	sigaction(SIGCHLD, &act, NULL);
-	sigaction(SIGHUP, &act, NULL);
-
-	/* watch rules directory */
-	udev_watch_init(udev);
-	if (inotify_fd >= 0) {
-		if (udev_get_rules_path(udev) != NULL) {
-			inotify_add_watch(inotify_fd, udev_get_rules_path(udev),
-					  IN_CREATE | IN_DELETE | IN_MOVE | IN_CLOSE_WRITE);
-		} else {
-			char filename[UTIL_PATH_SIZE];
-
-			inotify_add_watch(inotify_fd, UDEV_PREFIX "/lib/udev/rules.d",
-					  IN_CREATE | IN_DELETE | IN_MOVE | IN_CLOSE_WRITE);
-			inotify_add_watch(inotify_fd, SYSCONFDIR "/udev/rules.d",
-					  IN_CREATE | IN_DELETE | IN_MOVE | IN_CLOSE_WRITE);
-
-			/* watch dynamic rules directory */
-			util_strscpyl(filename, sizeof(filename), udev_get_dev_path(udev), "/.udev/rules.d", NULL);
-			inotify_add_watch(inotify_fd, filename,
-					  IN_CREATE | IN_DELETE | IN_MOVE | IN_CLOSE_WRITE);
-		}
-
-		udev_watch_restore(udev);
-	}
-
 	/* in trace mode run one event after the other */
 	if (debug_trace) {
 		max_childs = 1;
 	} else {
 		int memsize = mem_size_mb();
+
 		if (memsize > 0)
-			max_childs = 128 + (memsize / 4);
+			max_childs = 128 + (memsize / 8);
 		else
-			max_childs = UDEVD_MAX_CHILDS;
+			max_childs = 128;
 	}
+
 	/* possibly overwrite maximum limit of executed events */
 	value = getenv("UDEVD_MAX_CHILDS");
 	if (value)
 		max_childs = strtoul(value, NULL, 10);
 	info(udev, "initialize max_childs to %u\n", max_childs);
 
+	udev_list_init(&event_list);
+	udev_list_init(&worker_list);
+
 	while (!udev_exit) {
-		sigset_t blocked_mask, orig_mask;
-		struct pollfd pfd[4];
-		struct pollfd *ctrl_poll, *monitor_poll, *inotify_poll = NULL;
-		int nfds = 0;
 		int fdcount;
+		int timeout;
 
-		sigfillset(&blocked_mask);
-		sigprocmask(SIG_SETMASK, &blocked_mask, &orig_mask);
-		if (signal_received) {
-			sigprocmask(SIG_SETMASK, &orig_mask, NULL);
-			goto handle_signals;
-		}
+		/* set timeout to kill idle workers */
+		if (udev_list_is_empty(&event_list) && !udev_list_is_empty(&worker_list))
+			timeout = 3 * 1000;
+		else
+			timeout = -1;
+		/* wait for events */
+		fdcount = poll(pfd, ARRAY_SIZE(pfd), timeout);
+		if (fdcount < 0)
+			continue;
 
-		ctrl_poll = &pfd[nfds++];
-		ctrl_poll->fd = udev_ctrl_get_fd(udev_ctrl);
-		ctrl_poll->events = POLLIN;
+		/* timeout - kill idle workers */
+		if (fdcount == 0)
+			worker_kill();
 
-		monitor_poll = &pfd[nfds++];
-		monitor_poll->fd = udev_monitor_get_fd(kernel_monitor);
-		monitor_poll->events = POLLIN;
+		/* event has finished */
+		if (pfd[FD_WORKER].revents & POLLIN)
+			worker_returned();
 
-		if (inotify_fd >= 0) {
-			inotify_poll = &pfd[nfds++];
-			inotify_poll->fd = inotify_fd;
-			inotify_poll->events = POLLIN;
-		}
+		/* get kernel uevent */
+		if (pfd[FD_NETLINK].revents & POLLIN) {
+			struct udev_device *dev;
 
-		fdcount = ppoll(pfd, nfds, NULL, &orig_mask);
-		sigprocmask(SIG_SETMASK, &orig_mask, NULL);
-		if (fdcount < 0) {
-			if (errno == EINTR)
-				goto handle_signals;
-			err(udev, "error in select: %m\n");
-			continue;
+			dev = udev_monitor_receive_device(monitor);
+			if (dev != NULL)
+				event_queue_insert(dev);
+			else
+				udev_device_unref(dev);
 		}
 
 		/* get control message */
-		if (ctrl_poll->revents & POLLIN)
+		if (pfd[FD_CONTROL].revents & POLLIN)
 			handle_ctrl_msg(udev_ctrl);
 
-		/* get kernel uevent */
-		if (monitor_poll->revents & POLLIN) {
-			struct udev_device *dev;
+		/* start new events */
+		if (!udev_list_is_empty(&event_list) && !stop_exec_queue)
+			events_start(udev);
 
-			dev = udev_monitor_receive_device(kernel_monitor);
-			if (dev != NULL) {
-				struct udev_event *event;
+		/* get signal */
+		if (pfd[FD_SIGNAL].revents & POLLIN) {
+			struct signalfd_siginfo fdsi;
+			ssize_t size;
 
-				event = udev_event_new(dev);
-				if (event != NULL)
-					event_queue_insert(event);
-				else
-					udev_device_unref(dev);
-			}
+			size = read(pfd[FD_SIGNAL].fd, &fdsi, sizeof(struct signalfd_siginfo));
+			if (size == sizeof(struct signalfd_siginfo))
+				handle_signal(fdsi.ssi_signo);
 		}
 
-		/* rules directory inotify watch */
-		if (inotify_poll && (inotify_poll->revents & POLLIN))
+		/* device node and rules directory inotify watch */
+		if (pfd[FD_INOTIFY].revents & POLLIN)
 			handle_inotify(udev);
 
-handle_signals:
-		signal_received = 0;
-
 		/* rules changed, set by inotify or a HUP signal */
 		if (reload_config) {
 			struct udev_rules *rules_new;
 
-			reload_config = 0;
+			worker_kill();
 			rules_new = udev_rules_new(udev, resolve_names);
 			if (rules_new != NULL) {
 				udev_rules_unref(rules);
 				rules = rules_new;
 			}
-		}
-
-		if (sigchilds_waiting) {
-			sigchilds_waiting = 0;
-			reap_sigchilds();
-		}
-
-		if (run_exec_q) {
-			run_exec_q = 0;
-			if (!stop_exec_q)
-				event_queue_manager(udev);
-		}
-
-		if (settle_pid > 0) {
-			kill(settle_pid, SIGUSR1);
-			settle_pid = 0;
+			reload_config = 0;
 		}
 	}
+
 	udev_queue_export_cleanup(udev_queue_export);
 	rc = 0;
 exit:
-
 	udev_queue_export_unref(udev_queue_export);
 	udev_rules_unref(rules);
 	udev_ctrl_unref(udev_ctrl);
-	if (inotify_fd >= 0)
-		close(inotify_fd);
-	udev_monitor_unref(kernel_monitor);
+	if (pfd[FD_SIGNAL].fd >= 0)
+		close(pfd[FD_SIGNAL].fd);
+	if (worker_socket[READ_END] >= 0)
+		close(worker_socket[READ_END]);
+	if (worker_socket[WRITE_END] >= 0)
+		close(worker_socket[WRITE_END]);
+	udev_monitor_unref(monitor);
 	udev_selinux_exit(udev);
 	udev_unref(udev);
 	logging_close();

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [GIT] Experimental threaded udev
  2009-05-28 14:35 [GIT] Experimental threaded udev Alan Jenkins
                   ` (18 preceding siblings ...)
  2009-06-02 14:05 ` Kay Sievers
@ 2009-06-03 19:44 ` Kay Sievers
  2009-06-03 20:46 ` Alan Jenkins
                   ` (6 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Kay Sievers @ 2009-06-03 19:44 UTC (permalink / raw)
  To: linux-hotplug

[-- Attachment #1: Type: text/plain, Size: 550 bytes --]

On Tue, Jun 2, 2009 at 16:05, Kay Sievers <kay.sievers@vrfy.org> wrote:

> Version 5. I guess we should go for the signalfd stuff, it looks prett
> nice. The socketpair instead of the rtsignal is also nice, it allows
> us to pass back arbitrary data from finished events to the main
> daemon.

Version 7. We need to handle the netlink unicast addresses, they are
not neccessarily the process pid, so we need to remember the netlink
address.

We also keep 2 workers around, and don't kill them, to handle incoming
events without forking.

Thanks,
Kay

[-- Attachment #2: worker7.patch --]
[-- Type: text/x-patch, Size: 47190 bytes --]

diff --git a/NEWS b/NEWS
index ac44d7a..8b51c10 100644
--- a/NEWS
+++ b/NEWS
@@ -2,9 +2,17 @@ udev 143
 ========
 Bugfixes.
 
+Event processes now get re-used after they handled an event. This reduces
+pressure on the CPU significantly because cloned event processes no longer
+cause page faults in the main daemon. After the events have settled, the
+no longer needed worker processes get killed.
+
+To be able to use signalfd(), udev depends on kernel version 2.6.25 now.
+Also inotify support is required now to run udev.
+
 The format of the queue exported by the udev damon has changed. There is
-no longer a /dev/.udev/queue/ directory. The queue can be accessed with
-udevadm settle and libudedv.
+no longer a /dev/.udev/queue/ directory. The current event queue can be
+accessed with udevadm settle and libudedv.
 
 udev 142
 ========
diff --git a/README b/README
index 773bc55..a14e5c0 100644
--- a/README
+++ b/README
@@ -9,11 +9,13 @@ Important Note:
   recommend to replace a distro's udev installation with the upstream version.
 
 Requirements:
-  - Version 2.6.22 of the Linux kernel for reliable operation of this release of
-    udev. The kernel must not use the CONFIG_SYSFS_DEPRECATED* option.
+  - Version 2.6.25 of the Linux kernel with sysfs, procfs, signalfd, inotify,
+    unix domain sockets, networking and hotplug enabled.
 
-  - The kernel must have sysfs, unix domain sockets and networking enabled.
-    Unix domain sockets (CONFIG_UNIX) as a loadable kernel module is not
+  - For reliable operation, the kernel must not use the CONFIG_SYSFS_DEPRECATED*
+    option.
+
+  - Unix domain sockets (CONFIG_UNIX) as a loadable kernel module is not
     supported.
 
   - The proc filesystem must be mounted on /proc/, the sysfs filesystem must
@@ -29,21 +31,18 @@ Operation:
   Udev creates and removes device nodes in /dev/, based on events the kernel
   sends out on device discovery or removal.
 
-  - Very early in the boot process, the /dev/ directory should get a 'tmpfs'
-    filesystem mounted, which is populated from scratch by udev. Created nodes
-    or changed permissions will not survive a reboot, which is intentional.
+  - Early in the boot process, the /dev/ directory should get a 'tmpfs'
+    filesystem mounted, which is maintained by udev. Created nodes or changed
+    permissions will not survive a reboot, which is intentional.
 
   - The content of /lib/udev/devices/ directory which contains the nodes,
     symlinks and directories, which are always expected to be in /dev, should
     be copied over to the tmpfs mounted /dev, to provide the required nodes
     to initialize udev and continue booting.
 
-  - The old hotplug helper /sbin/hotplug should be disabled on bootup, before
-    actions like loading kernel modules are taken, which may cause a lot of
-    events.
-
-  - The udevd daemon must be started on bootup to receive netlink uevents
-    from the kernel driver core.
+  - The old hotplug helper /sbin/hotplug should be disabled in the kernel
+    configuration, it is not needed, and may render the system unusable
+    because of a fork-bombing behavior.
 
   - All kernel events are matched against a set of specified rules in
     /lib/udev/rules.d/ which make it possible to hook into the event
diff --git a/TODO b/TODO
index 5b6af64..bedccdb 100644
--- a/TODO
+++ b/TODO
@@ -1,3 +1,4 @@
+
   o add tests for kernel provided DEVNAME logic
   o drop modprobe floppy alias (SUSE), it will be in the module (2.6.30)
   o remove MMC rules, they got a modalias now (2.6.30)
diff --git a/configure.ac b/configure.ac
index f1d008e..9857d52 100644
--- a/configure.ac
+++ b/configure.ac
@@ -5,6 +5,7 @@ AC_PREREQ(2.60)
 AM_INIT_AUTOMAKE([check-news foreign 1.9 dist-bzip2])
 AC_DISABLE_STATIC
 AC_USE_SYSTEM_EXTENSIONS
+dnl AM_SILENT_RULES
 AC_SYS_LARGEFILE
 AC_CONFIG_MACRO_DIR([m4])
 AC_PROG_LIBTOOL
@@ -23,10 +24,6 @@ AC_SUBST(LIBUDEV_LT_AGE)
 
 AC_PATH_PROG([XSLTPROC], [xsltproc])
 
-AC_CHECK_LIB(c, inotify_init,
-	[AC_DEFINE([HAVE_INOTIFY], 1, [inotify available])],
-	[AC_MSG_WARN([inotify support disabled])])
-
 AC_ARG_WITH(udev-prefix,
 	AS_HELP_STRING([--with-udev-prefix=DIR], [add prefix to internal udev path names]),
 	[], [with_udev_prefix='${exec_prefix}'])
diff --git a/udev/Makefile.am b/udev/Makefile.am
index 6cd2f23..94989e6 100644
--- a/udev/Makefile.am
+++ b/udev/Makefile.am
@@ -14,7 +14,6 @@ common_ldadd =
 
 common_files = \
 	udev.h \
-	udev-sysdeps.h \
 	udev-event.c \
 	udev-watch.c \
 	udev-node.c \
diff --git a/udev/lib/libudev-monitor.c b/udev/lib/libudev-monitor.c
index 395a4d2..33a0605 100644
--- a/udev/lib/libudev-monitor.c
+++ b/udev/lib/libudev-monitor.c
@@ -32,15 +32,17 @@ struct udev_monitor {
 	int refcount;
 	int sock;
 	struct sockaddr_nl snl;
-	struct sockaddr_nl snl_peer;
+	struct sockaddr_nl snl_trusted_sender;
+	struct sockaddr_nl snl_destination;
 	struct sockaddr_un sun;
 	socklen_t addrlen;
 	struct udev_list_node filter_subsystem_list;
 };
 
 enum udev_monitor_netlink_group {
-	UDEV_MONITOR_KERNEL	= 1,
-	UDEV_MONITOR_UDEV	= 2,
+	UDEV_MONITOR_NONE,
+	UDEV_MONITOR_KERNEL,
+	UDEV_MONITOR_UDEV,
 };
 
 #define UDEV_MONITOR_MAGIC		0xcafe1dea
@@ -171,11 +173,11 @@ struct udev_monitor *udev_monitor_new_from_netlink(struct udev *udev, const char
 		return NULL;
 
 	if (name == NULL)
-		return NULL;
-	if (strcmp(name, "kernel") == 0)
-		group = UDEV_MONITOR_KERNEL;
+		group = UDEV_MONITOR_NONE;
 	else if (strcmp(name, "udev") == 0)
 		group = UDEV_MONITOR_UDEV;
+	else if (strcmp(name, "kernel") == 0)
+		group = UDEV_MONITOR_KERNEL;
 	else
 		return NULL;
 
@@ -193,8 +195,10 @@ struct udev_monitor *udev_monitor_new_from_netlink(struct udev *udev, const char
 
 	udev_monitor->snl.nl_family = AF_NETLINK;
 	udev_monitor->snl.nl_groups = group;
-	udev_monitor->snl_peer.nl_family = AF_NETLINK;
-	udev_monitor->snl_peer.nl_groups = UDEV_MONITOR_UDEV;
+
+	/* default destination for sending */
+	udev_monitor->snl_destination.nl_family = AF_NETLINK;
+	udev_monitor->snl_destination.nl_groups = UDEV_MONITOR_UDEV;
 
 	dbg(udev, "monitor %p created with NETLINK_KOBJECT_UEVENT (%u)\n", udev_monitor, group);
 	return udev_monitor;
@@ -281,6 +285,12 @@ int udev_monitor_filter_update(struct udev_monitor *udev_monitor)
 	return err;
 }
 
+int udev_monitor_allow_unicast_sender(struct udev_monitor *udev_monitor, struct udev_monitor *sender)
+{
+	udev_monitor->snl_trusted_sender.nl_pid = sender->snl.nl_pid;
+	return 0;
+}
+
 int udev_monitor_enable_receiving(struct udev_monitor *udev_monitor)
 {
 	int err;
@@ -293,6 +303,19 @@ int udev_monitor_enable_receiving(struct udev_monitor *udev_monitor)
 		udev_monitor_filter_update(udev_monitor);
 		err = bind(udev_monitor->sock,
 			   (struct sockaddr *)&udev_monitor->snl, sizeof(struct sockaddr_nl));
+		if (err == 0) {
+			struct sockaddr_nl snl;
+			socklen_t addrlen;
+
+			/*
+			 * get the address the kernel has assigned us
+			 * it is usually, but not neccessarily the pid
+			 */
+			addrlen = sizeof(struct sockaddr_nl);
+			err = getsockname(udev_monitor->sock, (struct sockaddr *)&snl, &addrlen);
+			if (err == 0)
+				udev_monitor->snl.nl_pid = snl.nl_pid;
+		}
 	} else {
 		return -EINVAL;
 	}
@@ -314,6 +337,15 @@ int udev_monitor_set_receive_buffer_size(struct udev_monitor *udev_monitor, int
 	return setsockopt(udev_monitor->sock, SOL_SOCKET, SO_RCVBUFFORCE, &size, sizeof(size));
 }
 
+int udev_monitor_disconnect(struct udev_monitor *udev_monitor)
+{
+	int err;
+
+	err = close(udev_monitor->sock);
+	udev_monitor->sock = -1;
+	return err;
+}
+
 /**
  * udev_monitor_ref:
  * @udev_monitor: udev monitor
@@ -478,10 +510,13 @@ retry:
 
 	if (udev_monitor->snl.nl_family != 0) {
 		if (snl.nl_groups == 0) {
-			info(udev_monitor->udev, "unicast netlink message ignored\n");
-			return NULL;
-		}
-		if (snl.nl_groups == UDEV_MONITOR_KERNEL) {
+			/* unicast message, check if we trust the sender */
+			if (udev_monitor->snl_trusted_sender.nl_pid == 0 ||
+			    snl.nl_pid != udev_monitor->snl_trusted_sender.nl_pid) {
+				info(udev_monitor->udev, "unicast netlink message ignored\n");
+				return NULL;
+			}
+		} else if (snl.nl_groups == UDEV_MONITOR_KERNEL) {
 			if (snl.nl_pid > 0) {
 				info(udev_monitor->udev, "multicast kernel netlink message from pid %d ignored\n", snl.nl_pid);
 				return NULL;
@@ -621,7 +656,8 @@ retry:
 	return udev_device;
 }
 
-int udev_monitor_send_device(struct udev_monitor *udev_monitor, struct udev_device *udev_device)
+int udev_monitor_send_device(struct udev_monitor *udev_monitor,
+			     struct udev_monitor *destination, struct udev_device *udev_device)
 {
 	struct msghdr smsg;
 	struct iovec iov[2];
@@ -683,8 +719,16 @@ int udev_monitor_send_device(struct udev_monitor *udev_monitor, struct udev_devi
 		memset(&smsg, 0x00, sizeof(struct msghdr));
 		smsg.msg_iov = iov;
 		smsg.msg_iovlen = 2;
-		/* no destination besides the muticast group, we will always get ECONNREFUSED */
-		smsg.msg_name = &udev_monitor->snl_peer;
+		/*
+		 * Use custom address for target, or the default one.
+		 *
+		 * If we send to a muticast group, we will get
+		 * ECONNREFUSED, which is expected.
+		 */
+		if (destination != NULL)
+			smsg.msg_name = &destination->snl;
+		else
+			smsg.msg_name = &udev_monitor->snl_destination;
 		smsg.msg_namelen = sizeof(struct sockaddr_nl);
 	} else {
 		return -1;
diff --git a/udev/lib/libudev-private.h b/udev/lib/libudev-private.h
index dc02a84..5512341 100644
--- a/udev/lib/libudev-private.h
+++ b/udev/lib/libudev-private.h
@@ -86,7 +86,10 @@ int udev_device_delete_db(struct udev_device *udev_device);
 int udev_device_rename_db(struct udev_device *udev_device, const char *devpath);
 
 /* libudev-monitor - netlink/unix socket communication  */
-int udev_monitor_send_device(struct udev_monitor *udev_monitor, struct udev_device *udev_device);
+int udev_monitor_disconnect(struct udev_monitor *udev_monitor);
+int udev_monitor_allow_unicast_sender(struct udev_monitor *udev_monitor, struct udev_monitor *sender);
+int udev_monitor_send_device(struct udev_monitor *udev_monitor,
+			     struct udev_monitor *destination, struct udev_device *udev_device);
 int udev_monitor_set_receive_buffer_size(struct udev_monitor *udev_monitor, int size);
 
 /* libudev-ctrl - daemon runtime setup */
diff --git a/udev/udev-event.c b/udev/udev-event.c
index d521251..3f69c0b 100644
--- a/udev/udev-event.c
+++ b/udev/udev-event.c
@@ -734,18 +734,13 @@ int udev_event_execute_run(struct udev_event *event)
 			monitor = udev_monitor_new_from_socket(event->udev, &cmd[strlen("socket:")]);
 			if (monitor == NULL)
 				continue;
-			udev_monitor_send_device(monitor, event->dev);
+			udev_monitor_send_device(monitor, NULL, event->dev);
 			udev_monitor_unref(monitor);
 		} else {
 			char program[UTIL_PATH_SIZE];
 			char **envp;
 
 			udev_event_apply_format(event, cmd, program, sizeof(program));
-			if (event->trace)
-				fprintf(stderr, "run  %s (%llu) '%s'\n",
-				       udev_device_get_syspath(event->dev),
-				       udev_device_get_seqnum(event->dev),
-				       program);
 			envp = udev_device_get_properties_envp(event->dev);
 			if (util_run_program(event->udev, program, envp, NULL, 0, NULL) != 0) {
 				if (!udev_list_entry_get_flag(list_entry))
diff --git a/udev/udev-sysdeps.h b/udev/udev-sysdeps.h
deleted file mode 100644
index 35671ba..0000000
--- a/udev/udev-sysdeps.h
+++ /dev/null
@@ -1,44 +0,0 @@
-/*
- * wrapping of libc features and kernel interfaces
- *
- * Copyright (C) 2005-2008 Kay Sievers <kay.sievers@vrfy.org>
- *
- * This program is free software: you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation, either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program.  If not, see <http://www.gnu.org/licenses/>.
- */
-
-#ifndef _UDEV_SYSDEPS_H_
-#define _UDEV_SYSDEPS_H_
-
-#include <stdint.h>
-#include <errno.h>
-
-#ifndef HAVE_INOTIFY
-static inline int inotify_init(void)
-{
-	errno = ENOSYS;
-	return -1;
-}
-
-static inline int inotify_add_watch(int fd, const char *name, uint32_t mask)
-{
-	return -1;
-}
-
-#define IN_CREATE	0
-#define IN_DELETE	0
-#define IN_MOVE		0
-#define IN_CLOSE_WRITE	0
-
-#endif /* HAVE_INOTIFY */
-#endif
diff --git a/udev/udev-watch.c b/udev/udev-watch.c
index 53492e5..5a49c96 100644
--- a/udev/udev-watch.c
+++ b/udev/udev-watch.c
@@ -26,27 +26,24 @@
 #include <stdlib.h>
 #include <string.h>
 #include <unistd.h>
-#ifdef HAVE_INOTIFY
 #include <sys/inotify.h>
-#endif
 
 #include "udev.h"
 
-int inotify_fd = -1;
+static int inotify_fd = -1;
 
 /* inotify descriptor, will be shared with rules directory;
  * set to cloexec since we need our children to be able to add
  * watches for us
  */
-void udev_watch_init(struct udev *udev)
+int udev_watch_init(struct udev *udev)
 {
 	inotify_fd = inotify_init();
 	if (inotify_fd >= 0)
 		util_set_fd_cloexec(inotify_fd);
-	else if (errno == ENOSYS)
-		info(udev, "unable to use inotify, udevd will not monitor rule files changes\n");
 	else
 		err(udev, "inotify_init failed: %m\n");
+	return inotify_fd;
 }
 
 /* move any old watches directory out of the way, and then restore
diff --git a/udev/udev.h b/udev/udev.h
index 8f2c1c6..7187975 100644
--- a/udev/udev.h
+++ b/udev/udev.h
@@ -22,7 +22,6 @@
 #include <sys/types.h>
 #include <sys/param.h>
 
-#include "udev-sysdeps.h"
 #include "lib/libudev.h"
 #include "lib/libudev-private.h"
 
@@ -53,7 +52,6 @@ static inline void logging_close(void)
 }
 
 struct udev_event {
-	struct udev_list_node node;
 	struct udev *udev;
 	struct udev_device *dev;
 	struct udev_device *dev_parent;
@@ -64,10 +62,6 @@ struct udev_event {
 	uid_t uid;
 	gid_t gid;
 	struct udev_list_node run_list;
-	pid_t pid;
-	int exitstatus;
-	time_t queue_time;
-	unsigned long long int delaying_seqnum;
 	unsigned int group_final:1;
 	unsigned int owner_final:1;
 	unsigned int mode_final:1;
@@ -76,7 +70,6 @@ struct udev_event {
 	unsigned int run_final:1;
 	unsigned int ignore_device:1;
 	unsigned int inotify_watch:1;
-	unsigned int trace:1;
 };
 
 struct udev_watch {
@@ -101,8 +94,7 @@ int udev_event_apply_subsys_kernel(struct udev_event *event, const char *string,
 				   char *result, size_t maxsize, int read_value);
 
 /* udev-watch.c */
-extern int inotify_fd;
-void udev_watch_init(struct udev *udev);
+int udev_watch_init(struct udev *udev);
 void udev_watch_restore(struct udev *udev);
 void udev_watch_begin(struct udev *udev, struct udev_device *dev);
 void udev_watch_end(struct udev *udev, struct udev_device *dev);
diff --git a/udev/udevadm.xml b/udev/udevadm.xml
index 538180b..2e03d98 100644
--- a/udev/udevadm.xml
+++ b/udev/udevadm.xml
@@ -285,9 +285,9 @@
               <term><option>--reload-rules</option></term>
               <listitem>
                 <para>Signal udevd to reload the rules files.
-                Usually the udev daemon detects changes automatically, this may
-                only be needed on systems without inotify support. Reloading rules
-                does not apply any changes to already existing devices.</para>
+                The udev daemon detects changes automatically, this option is
+                usually not needed. Reloading rules does not apply any changes
+                to already existing devices.</para>
               </listitem>
             </varlistentry>
             <varlistentry>
diff --git a/udev/udevd.c b/udev/udevd.c
index 37b547a..dce81ac 100644
--- a/udev/udevd.c
+++ b/udev/udevd.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2004-2008 Kay Sievers <kay.sievers@vrfy.org>
+ * Copyright (C) 2004-2009 Kay Sievers <kay.sievers@vrfy.org>
  * Copyright (C) 2004 Chris Friesen <chris_friesen@sympatico.ca>
  * Copyright (C) 2009 Canonical Ltd.
  * Copyright (C) 2009 Scott James Remnant <scott@netsplit.com>
@@ -30,23 +30,21 @@
 #include <time.h>
 #include <getopt.h>
 #include <dirent.h>
+#include <sys/prctl.h>
+#include <sys/socket.h>
+#include <sys/signalfd.h>
 #include <sys/select.h>
 #include <sys/poll.h>
 #include <sys/wait.h>
 #include <sys/stat.h>
 #include <sys/ioctl.h>
-#ifdef HAVE_INOTIFY
 #include <sys/inotify.h>
-#endif
 
 #include "udev.h"
 
 #define UDEVD_PRIORITY			-4
 #define UDEV_PRIORITY			-2
 
-/* maximum limit of forked childs */
-#define UDEVD_MAX_CHILDS		256
-
 static int debug;
 
 static void log_fn(struct udev *udev, int priority,
@@ -61,84 +59,159 @@ static void log_fn(struct udev *udev, int priority,
 	}
 }
 
-static void reap_sigchilds(void);
-
 static int debug_trace;
 static struct udev_rules *rules;
 static struct udev_queue_export *udev_queue_export;
 static struct udev_ctrl *udev_ctrl;
-static struct udev_monitor *kernel_monitor;
-static volatile sig_atomic_t sigchilds_waiting;
-static volatile sig_atomic_t udev_exit;
-static volatile sig_atomic_t reload_config;
-static volatile sig_atomic_t signal_received;
-static volatile pid_t settle_pid;
-static int run_exec_q;
-static int stop_exec_q;
+static struct udev_monitor *monitor;
+static int worker_watch[2];
+static pid_t settle_pid;
+static int stop_exec_queue;
+static int reload_config;
 static int max_childs;
 static int childs;
 static struct udev_list_node event_list;
-
-static struct udev_event *node_to_event(struct udev_list_node *node)
+static struct udev_list_node worker_list;
+static int udev_exit;
+static volatile sig_atomic_t worker_exit;
+
+enum poll_fd {
+	FD_CONTROL,
+	FD_NETLINK,
+	FD_INOTIFY,
+	FD_SIGNAL,
+	FD_WORKER,
+};
+
+static struct pollfd pfd[] = {
+	[FD_NETLINK] = { .events = POLLIN },
+	[FD_WORKER] =  { .events = POLLIN },
+	[FD_SIGNAL] =  { .events = POLLIN },
+	[FD_INOTIFY] = { .events = POLLIN },
+	[FD_CONTROL] = { .events = POLLIN },
+};
+
+enum event_state {
+	EVENT_UNDEF,
+	EVENT_QUEUED,
+	EVENT_RUNNING,
+};
+
+struct event {
+	struct udev_list_node node;
+	struct udev *udev;
+	struct udev_device *dev;
+	enum event_state state;
+	int exitcode;
+	unsigned long long int delaying_seqnum;
+	unsigned long long int seqnum;
+	const char *devpath;
+	size_t devpath_len;
+	const char *devpath_old;
+};
+
+static struct event *node_to_event(struct udev_list_node *node)
 {
 	char *event;
 
 	event = (char *)node;
-	event -= offsetof(struct udev_event, node);
-	return (struct udev_event *)event;
+	event -= offsetof(struct event, node);
+	return (struct event *)event;
+}
+
+enum worker_state {
+	WORKER_UNDEF,
+	WORKER_RUNNING,
+	WORKER_IDLE,
+	WORKER_KILLED,
+};
+
+struct worker {
+	struct udev_list_node node;
+	pid_t pid;
+	struct udev_monitor *monitor;
+	enum worker_state state;
+	struct event *event;
+};
+
+/* passed from worker to main process */
+struct worker_message {
+	pid_t pid;
+	int exitcode;
+};
+
+static struct worker *node_to_worker(struct udev_list_node *node)
+{
+	char *worker;
+
+	worker = (char *)node;
+	worker -= offsetof(struct worker, node);
+	return (struct worker *)worker;
 }
 
-static void event_queue_delete(struct udev_event *event)
+static void event_queue_delete(struct event *event)
 {
 	udev_list_node_remove(&event->node);
 
 	/* mark as failed, if "add" event returns non-zero */
-	if (event->exitstatus && strcmp(udev_device_get_action(event->dev), "add") == 0)
+	if (event->exitcode && strcmp(udev_device_get_action(event->dev), "add") == 0)
 		udev_queue_export_device_failed(udev_queue_export, event->dev);
 	else
 		udev_queue_export_device_finished(udev_queue_export, event->dev);
 
 	udev_device_unref(event->dev);
-	udev_event_unref(event);
+	free(event);
 }
 
 static void event_sig_handler(int signum)
 {
-	if (signum == SIGALRM)
+	switch (signum) {
+	case SIGALRM:
 		_exit(1);
+		break;
+	case SIGTERM:
+		worker_exit = 1;
+		break;
+	}
+}
+
+static void worker_unref(struct worker *worker)
+{
+	udev_monitor_unref(worker->monitor);
+	free(worker);
 }
 
-static void event_fork(struct udev_event *event)
+static void worker_new(struct event *event)
 {
+	struct worker *worker;
+	struct udev_monitor *worker_monitor;
 	pid_t pid;
 	struct sigaction act;
-	int err;
-
-#if 0
-	/* single process, no forking, just for testing/profiling */
-	err = udev_event_execute_rules(event, rules);
-	if (err == 0 && !event->ignore_device && udev_get_run(event->udev))
-		udev_event_execute_run(event);
-	info(event->udev, "seq %llu exit with %i\n", udev_device_get_seqnum(event->dev), err);
-	event_queue_delete(event);
-	return;
-#endif
 
-	if (debug_trace) {
-		event->trace = 1;
-		fprintf(stderr, "fork %s (%llu)\n",
-		       udev_device_get_syspath(event->dev),
-		       udev_device_get_seqnum(event->dev));
-	}
+	/* listen for new events */
+	worker_monitor = udev_monitor_new_from_netlink(event->udev, NULL);
+	if (worker_monitor == NULL)
+		return;
+	/* allow the main daemon netlink address to send devices to the worker */
+	udev_monitor_allow_unicast_sender(worker_monitor, monitor);
+	udev_monitor_enable_receiving(worker_monitor);
+
+	worker = calloc(1, sizeof(struct worker));
+	if (worker == NULL)
+		return;
 
 	pid = fork();
 	switch (pid) {
-	case 0:
-		/* child */
+	case 0: {
+		sigset_t mask;
+		struct udev_device *dev;
+
 		udev_queue_export_unref(udev_queue_export);
 		udev_ctrl_unref(udev_ctrl);
+		close(pfd[FD_SIGNAL].fd);
+		close(worker_watch[READ_END]);
 		logging_close();
-		logging_init("udevd-event");
+		logging_init("udevd-work");
 		setpriority(PRIO_PROCESS, 0, UDEV_PRIORITY);
 
 		/* set signal handlers */
@@ -146,78 +219,182 @@ static void event_fork(struct udev_event *event)
 		act.sa_handler = event_sig_handler;
 		sigemptyset (&act.sa_mask);
 		act.sa_flags = 0;
+		sigaction(SIGTERM, &act, NULL);
 		sigaction(SIGALRM, &act, NULL);
 
-		/* reset to default */
-		act.sa_handler = SIG_DFL;
-		sigaction(SIGINT, &act, NULL);
-		sigaction(SIGTERM, &act, NULL);
-		sigaction(SIGCHLD, &act, NULL);
-		sigaction(SIGHUP, &act, NULL);
+		/* unblock signals */
+		sigfillset(&mask);
+		sigdelset(&mask, SIGTERM);
+		sigdelset(&mask, SIGALRM);
+		sigprocmask(SIG_SETMASK, &mask, NULL);
 
-		/* set timeout to prevent hanging processes */
-		alarm(UDEV_EVENT_TIMEOUT);
+		/* request TERM signal if parent exits */
+		prctl(PR_SET_PDEATHSIG, SIGTERM);
 
-		/* apply rules, create node, symlinks */
-		err = udev_event_execute_rules(event, rules);
+		/* initial device */
+		dev = event->dev;
 
-		/* rules may change/disable the timeout */
-		if (udev_device_get_event_timeout(event->dev) >= 0)
-			alarm(udev_device_get_event_timeout(event->dev));
+		while (!worker_exit) {
+			struct udev_event *udev_event;
+			struct worker_message msg;
+			int err;
 
-		/* execute RUN= */
-		if (err == 0 && !event->ignore_device && udev_get_run(event->udev))
-			udev_event_execute_run(event);
+			udev_event = udev_event_new(dev);
+			if (udev_event == NULL)
+				_exit(3);
 
-		/* apply/restore inotify watch */
-		if (err == 0 && event->inotify_watch) {
-			udev_watch_begin(event->udev, event->dev);
-			udev_device_update_db(event->dev);
-		}
+			/* set timeout to prevent hanging processes */
+			alarm(UDEV_EVENT_TIMEOUT);
+
+			/* apply rules, create node, symlinks */
+			err = udev_event_execute_rules(udev_event, rules);
+
+			/* rules may change/disable the timeout */
+			if (udev_device_get_event_timeout(dev) >= 0)
+				alarm(udev_device_get_event_timeout(dev));
+
+			/* execute RUN= */
+			if (err == 0 && !udev_event->ignore_device && udev_get_run(udev_event->udev))
+				udev_event_execute_run(udev_event);
+
+			/* reset alarm */
+			alarm(0);
+
+			/* apply/restore inotify watch */
+			if (err == 0 && udev_event->inotify_watch) {
+				udev_watch_begin(udev_event->udev, dev);
+				udev_device_update_db(dev);
+			}
 
-		/* send processed event back to the kernel netlink socket */
-		udev_monitor_send_device(kernel_monitor, event->dev);
+			/* send processed event back to libudev listeners */
+			udev_monitor_send_device(worker_monitor, NULL, dev);
 
-		info(event->udev, "seq %llu exit with %i\n", udev_device_get_seqnum(event->dev), err);
+			info(event->udev, "seq %llu finished with %i\n", udev_device_get_seqnum(dev), err);
+			udev_device_unref(dev);
+			udev_event_unref(udev_event);
+
+			/* send back the result of the event execution */
+			msg.exitcode = err;
+			msg.pid = getpid();
+			send(worker_watch[WRITE_END], &msg, sizeof(struct worker_message), 0);
+
+			/* wait for more device messages from udevd */
+			do
+				dev = udev_monitor_receive_device(worker_monitor);
+			while (!worker_exit && dev == NULL);
+		}
+
+		udev_monitor_unref(worker_monitor);
 		logging_close();
-		if (err != 0)
-			exit(1);
 		exit(0);
+	}
 	case -1:
+		udev_monitor_unref(worker_monitor);
+		event->state = EVENT_QUEUED;
+		free(worker);
 		err(event->udev, "fork of child failed: %m\n");
-		event_queue_delete(event);
 		break;
 	default:
-		/* get SIGCHLD in main loop */
-		info(event->udev, "seq %llu forked, pid [%d], '%s' '%s', %ld seconds old\n",
-		     udev_device_get_seqnum(event->dev),
-		     pid,
-		     udev_device_get_action(event->dev),
-		     udev_device_get_subsystem(event->dev),
-		     time(NULL) - event->queue_time);
-		event->pid = pid;
+		/* close monitor, but keep address around */
+		udev_monitor_disconnect(worker_monitor);
+		worker->monitor = worker_monitor;
+		worker->pid = pid;
+		worker->state = WORKER_RUNNING;
+		worker->event = event;
+		event->state = EVENT_RUNNING;
+		udev_list_node_append(&worker->node, &worker_list);
 		childs++;
+		break;
+	}
+}
+
+static void event_run(struct event *event)
+{
+	struct udev_list_node *loop;
+
+	udev_list_node_foreach(loop, &worker_list) {
+		struct worker *worker = node_to_worker(loop);
+		ssize_t count;
+
+		if (worker->state != WORKER_IDLE)
+			continue;
+
+		worker->event = event;
+		worker->state = WORKER_RUNNING;
+		event->state = EVENT_RUNNING;
+		count = udev_monitor_send_device(monitor, worker->monitor, event->dev);
+		if (count < 0) {
+			err(event->udev, "worker [%u] did not accept message, kill it\n", worker->pid);
+			event->state = EVENT_QUEUED;
+			worker->state = WORKER_KILLED;
+			kill(worker->pid, SIGKILL);
+			continue;
+		}
+		return;
+	}
+
+	if (childs >= max_childs) {
+		info(event->udev, "maximum number (%i) of childs reached\n", childs);
+		return;
 	}
+
+	/* start new worker and pass initial device */
+	worker_new(event);
 }
 
-static void event_queue_insert(struct udev_event *event)
+static void event_queue_insert(struct udev_device *dev)
 {
-	event->queue_time = time(NULL);
+	struct event *event;
 
-	udev_queue_export_device_queued(udev_queue_export, event->dev);
-	info(event->udev, "seq %llu queued, '%s' '%s'\n", udev_device_get_seqnum(event->dev),
-	     udev_device_get_action(event->dev), udev_device_get_subsystem(event->dev));
+	event = calloc(1, sizeof(struct event));
+	if (event == NULL)
+		return;
+
+	event->udev = udev_device_get_udev(dev);
+	event->dev = dev;
+	event->seqnum = udev_device_get_seqnum(dev);
+	event->devpath = udev_device_get_devpath(dev);
+	event->devpath_len = strlen(event->devpath);
+	event->devpath_old = udev_device_get_devpath_old(dev);
+
+	udev_queue_export_device_queued(udev_queue_export, dev);
+	info(event->udev, "seq %llu queued, '%s' '%s'\n", udev_device_get_seqnum(dev),
+	     udev_device_get_action(dev), udev_device_get_subsystem(dev));
 
+	event->state = EVENT_QUEUED;
 	udev_list_node_append(&event->node, &event_list);
-	run_exec_q = 1;
 
 	/* run all events with a timeout set immediately */
-	if (udev_device_get_timeout(event->dev) > 0) {
-		event_fork(event);
+	if (udev_device_get_timeout(dev) > 0) {
+		worker_new(event);
 		return;
 	}
 }
 
+static void worker_kill(int retain)
+{
+	struct udev_list_node *loop;
+	int max;
+
+	if (childs <= retain)
+		return;
+
+	max = childs - retain;
+
+	udev_list_node_foreach(loop, &worker_list) {
+		struct worker *worker = node_to_worker(loop);
+
+		if (max-- <= 0)
+			break;
+
+		if (worker->state == WORKER_KILLED)
+			continue;
+
+		worker->state = WORKER_KILLED;
+		kill(worker->pid, SIGTERM);
+	}
+}
+
 static int mem_size_mb(void)
 {
 	FILE *f;
@@ -241,112 +418,111 @@ static int mem_size_mb(void)
 	return memsize;
 }
 
-static int compare_devpath(const char *running, const char *waiting)
-{
-	int i = 0;
-
-	while (running[i] != '\0' && running[i] == waiting[i])
-		i++;
-
-	/* identical device event found */
-	if (running[i] == '\0' && waiting[i] == '\0')
-		return 1;
-
-	/* parent device event found */
-	if (running[i] == '\0' && waiting[i] == '/')
-		return 2;
-
-	/* child device event found */
-	if (running[i] == '/' && waiting[i] == '\0')
-		return 3;
-
-	/* no matching event */
-	return 0;
-}
-
 /* lookup event for identical, parent, child device */
-static int devpath_busy(struct udev_event *event)
+static int devpath_busy(struct event *event)
 {
 	struct udev_list_node *loop;
+	size_t common;
 
 	/* check if queue contains events we depend on */
 	udev_list_node_foreach(loop, &event_list) {
-		struct udev_event *loop_event = node_to_event(loop);
+		struct event *loop_event = node_to_event(loop);
 
 		/* we already found a later event, earlier can not block us, no need to check again */
-		if (udev_device_get_seqnum(loop_event->dev) < event->delaying_seqnum)
+		if (loop_event->seqnum < event->delaying_seqnum)
 			continue;
 
 		/* event we checked earlier still exists, no need to check again */
-		if (udev_device_get_seqnum(loop_event->dev) == event->delaying_seqnum)
+		if (loop_event->seqnum == event->delaying_seqnum)
 			return 2;
 
 		/* found ourself, no later event can block us */
-		if (udev_device_get_seqnum(loop_event->dev) >= udev_device_get_seqnum(event->dev))
+		if (loop_event->seqnum >= event->seqnum)
 			break;
 
 		/* check our old name */
-		if (udev_device_get_devpath_old(event->dev) != NULL)
-			if (strcmp(udev_device_get_devpath(loop_event->dev), udev_device_get_devpath_old(event->dev)) == 0) {
-				event->delaying_seqnum = udev_device_get_seqnum(loop_event->dev);
+		if (event->devpath_old != NULL)
+			if (strcmp(loop_event->devpath, event->devpath_old) == 0) {
+				event->delaying_seqnum = loop_event->seqnum;
 				return 3;
 			}
 
-		/* check identical, parent, or child device event */
-		if (compare_devpath(udev_device_get_devpath(loop_event->dev), udev_device_get_devpath(event->dev)) != 0) {
-			dbg(event->udev, "%llu, device event still pending %llu (%s)\n",
-			    udev_device_get_seqnum(event->dev),
-			    udev_device_get_seqnum(loop_event->dev),
-			    udev_device_get_devpath(loop_event->dev));
-			event->delaying_seqnum = udev_device_get_seqnum(loop_event->dev);
+		/* compare devpath */
+		common = MIN(loop_event->devpath_len, event->devpath_len);
+
+		/* one devpath is contained in the other? */
+		if (memcmp(loop_event->devpath, event->devpath, common) != 0)
+			continue;
+
+		/* identical device event found */
+		if (loop_event->devpath_len == event->devpath_len) {
+			event->delaying_seqnum = loop_event->seqnum;
 			return 4;
 		}
+
+		/* parent device event found */
+		if (event->devpath[common] == '/') {
+			event->delaying_seqnum = loop_event->seqnum;
+			return 5;
+		}
+
+		/* child device event found */
+		if (loop_event->devpath[common] == '/') {
+			event->delaying_seqnum = loop_event->seqnum;
+			return 6;
+		}
+
+		/* no matching device */
+		continue;
 	}
+
 	return 0;
 }
 
-/* serializes events for the identical and parent and child devices */
-static void event_queue_manager(struct udev *udev)
+static void events_start(struct udev *udev)
 {
 	struct udev_list_node *loop;
-	struct udev_list_node *tmp;
 
-start_over:
-	if (udev_list_is_empty(&event_list)) {
-		if (childs > 0) {
-			err(udev, "event list empty, but childs count is %i", childs);
-			childs = 0;
-		}
-		return;
-	}
-
-	udev_list_node_foreach_safe(loop, tmp, &event_list) {
-		struct udev_event *loop_event = node_to_event(loop);
-
-		if (childs >= max_childs) {
-			info(udev, "maximum number (%i) of childs reached\n", childs);
-			break;
-		}
+	udev_list_node_foreach(loop, &event_list) {
+		struct event *event = node_to_event(loop);
 
-		if (loop_event->pid != 0)
+		if (event->state != EVENT_QUEUED)
 			continue;
 
 		/* do not start event if parent or child event is still running */
-		if (devpath_busy(loop_event) != 0) {
-			dbg(udev, "delay seq %llu (%s)\n",
-			    udev_device_get_seqnum(loop_event->dev),
-			    udev_device_get_devpath(loop_event->dev));
+		if (devpath_busy(event) != 0) {
+			dbg(udev, "delay seq %llu (%s)\n", event->seqnum, event->devpath);
 			continue;
 		}
 
-		event_fork(loop_event);
-		dbg(udev, "moved seq %llu to running list\n", udev_device_get_seqnum(loop_event->dev));
+		event_run(event);
+	}
+}
+
+static void worker_returned(void)
+{
+	while (1) {
+		struct worker_message msg;
+		ssize_t size;
+		struct udev_list_node *loop;
+
+		size = recv(pfd[FD_WORKER].fd, &msg, sizeof(struct worker_message), MSG_DONTWAIT);
+		if (size != sizeof(struct worker_message))
+			break;
+
+		/* lookup worker who sent the signal */
+		udev_list_node_foreach(loop, &worker_list) {
+			struct worker *worker = node_to_worker(loop);
+
+			if (worker->pid != msg.pid)
+				continue;
 
-		/* retry if events finished in the meantime */
-		if (sigchilds_waiting) {
-			sigchilds_waiting = 0;
-			reap_sigchilds();
-			goto start_over;
+			/* worker returned */
+			worker->event->exitcode = msg.exitcode;
+			event_queue_delete(worker->event);
+			worker->event = NULL;
+			worker->state = WORKER_IDLE;
+			break;
 		}
 	}
 }
@@ -367,17 +543,17 @@ static void handle_ctrl_msg(struct udev_ctrl *uctrl)
 	if (i >= 0) {
 		info(udev, "udevd message (SET_LOG_PRIORITY) received, log_priority=%i\n", i);
 		udev_set_log_priority(udev, i);
+		worker_kill(0);
 	}
 
 	if (udev_ctrl_get_stop_exec_queue(ctrl_msg) > 0) {
 		info(udev, "udevd message (STOP_EXEC_QUEUE) received\n");
-		stop_exec_q = 1;
+		stop_exec_queue = 1;
 	}
 
 	if (udev_ctrl_get_start_exec_queue(ctrl_msg) > 0) {
 		info(udev, "udevd message (START_EXEC_QUEUE) received\n");
-		stop_exec_q = 0;
-		event_queue_manager(udev);
+		stop_exec_queue = 0;
 	}
 
 	if (udev_ctrl_get_reload_rules(ctrl_msg) > 0) {
@@ -409,6 +585,7 @@ static void handle_ctrl_msg(struct udev_ctrl *uctrl)
 			}
 			free(key);
 		}
+		worker_kill(0);
 	}
 
 	i = udev_ctrl_get_set_max_childs(ctrl_msg);
@@ -420,6 +597,8 @@ static void handle_ctrl_msg(struct udev_ctrl *uctrl)
 	settle_pid = udev_ctrl_get_settle(ctrl_msg);
 	if (settle_pid > 0) {
 		info(udev, "udevd message (SETTLE) received\n");
+		kill(settle_pid, SIGUSR1);
+		settle_pid = 0;
 	}
 	udev_ctrl_msg_unref(ctrl_msg);
 }
@@ -427,22 +606,20 @@ static void handle_ctrl_msg(struct udev_ctrl *uctrl)
 /* read inotify messages */
 static int handle_inotify(struct udev *udev)
 {
-	int nbytes, pos;
+	ssize_t nbytes, pos;
 	char *buf;
 	struct inotify_event *ev;
 
-	if ((ioctl(inotify_fd, FIONREAD, &nbytes) < 0) || (nbytes <= 0))
+	if ((ioctl(pfd[FD_INOTIFY].fd, FIONREAD, &nbytes) < 0) || (nbytes <= 0))
 		return 0;
 
 	buf = malloc(nbytes);
 	if (buf == NULL) {
 		err(udev, "error getting buffer for inotify, disable watching\n");
-		close(inotify_fd);
-		inotify_fd = -1;
-		return 0;
+		return -1;
 	}
 
-	read(inotify_fd, buf, nbytes);
+	nbytes = read(pfd[FD_INOTIFY].fd, buf, nbytes);
 
 	for (pos = 0; pos < nbytes; pos += sizeof(struct inotify_event) + ev->len) {
 		struct udev_device *dev;
@@ -476,72 +653,48 @@ static int handle_inotify(struct udev *udev)
 
 	}
 
-	free (buf);
+	free(buf);
 	return 0;
 }
 
-static void sig_handler(int signum)
+static void handle_signal(int signo)
 {
-	switch (signum) {
-		case SIGINT:
-		case SIGTERM:
-			udev_exit = 1;
-			break;
-		case SIGCHLD:
-			/* set flag, then write to pipe if needed */
-			sigchilds_waiting = 1;
-			break;
-		case SIGHUP:
-			reload_config = 1;
-			break;
-	}
+	switch (signo) {
+	case SIGINT:
+	case SIGTERM:
+		udev_exit = 1;
+		break;
+	case SIGCHLD:
+		while (1) {
+			pid_t pid;
+			struct udev_list_node *loop, *tmp;
 
-	signal_received = 1;
-}
+			pid = waitpid(-1, NULL, WNOHANG);
+			if (pid <= 0)
+				break;
 
-static void udev_done(int pid, int exitstatus)
-{
-	struct udev_list_node *loop;
+			udev_list_node_foreach_safe(loop, tmp, &worker_list) {
+				struct worker *worker = node_to_worker(loop);
 
-	/* find event associated with pid and delete it */
-	udev_list_node_foreach(loop, &event_list) {
-		struct udev_event *loop_event = node_to_event(loop);
-
-		if (loop_event->pid == pid) {
-			info(loop_event->udev, "seq %llu cleanup, pid [%d], status %i, %ld seconds old\n",
-			     udev_device_get_seqnum(loop_event->dev), loop_event->pid,
-			     exitstatus, time(NULL) - loop_event->queue_time);
-			loop_event->exitstatus = exitstatus;
-			if (debug_trace)
-				fprintf(stderr, "exit %s (%llu)\n",
-				       udev_device_get_syspath(loop_event->dev),
-				       udev_device_get_seqnum(loop_event->dev));
-			event_queue_delete(loop_event);
-			childs--;
-
-			/* there may be dependent events waiting */
-			run_exec_q = 1;
-			return;
-		}
-	}
-}
+				if (worker->pid != pid)
+					continue;
 
-static void reap_sigchilds(void)
-{
-	pid_t pid;
-	int status;
+				/* fail event, if worker died unexpectedly */
+				if (worker->event != NULL) {
+					worker->event->exitcode = 127;
+					event_queue_delete(worker->event);
+				}
 
-	while (1) {
-		pid = waitpid(-1, &status, WNOHANG);
-		if (pid <= 0)
-			break;
-		if (WIFEXITED(status))
-			status = WEXITSTATUS(status);
-		else if (WIFSIGNALED(status))
-			status = WTERMSIG(status) + 128;
-		else
-			status = 0;
-		udev_done(pid, status);
+				udev_list_node_remove(&worker->node);
+				worker_unref(worker);
+				childs--;
+				break;
+			}
+		}
+		break;
+	case SIGHUP:
+		reload_config = 1;
+		break;
 	}
 }
 
@@ -576,7 +729,7 @@ int main(int argc, char *argv[])
 {
 	struct udev *udev;
 	int fd;
-	struct sigaction act;
+	sigset_t mask;
 	const char *value;
 	int daemonize = 0;
 	int resolve_names = 1;
@@ -669,29 +822,76 @@ int main(int argc, char *argv[])
 		rc = 1;
 		goto exit;
 	}
-
 	if (udev_ctrl_enable_receiving(udev_ctrl) < 0) {
 		fprintf(stderr, "error binding control socket, seems udevd is already running\n");
 		err(udev, "error binding control socket, seems udevd is already running\n");
 		rc = 1;
 		goto exit;
 	}
+	pfd[FD_CONTROL].fd = udev_ctrl_get_fd(udev_ctrl);
 
-	kernel_monitor = udev_monitor_new_from_netlink(udev, "kernel");
-	if (kernel_monitor == NULL || udev_monitor_enable_receiving(kernel_monitor) < 0) {
+	monitor = udev_monitor_new_from_netlink(udev, "kernel");
+	if (monitor == NULL || udev_monitor_enable_receiving(monitor) < 0) {
 		fprintf(stderr, "error initializing netlink socket\n");
 		err(udev, "error initializing netlink socket\n");
 		rc = 3;
 		goto exit;
 	}
-	udev_monitor_set_receive_buffer_size(kernel_monitor, 128*1024*1024);
+	udev_monitor_set_receive_buffer_size(monitor, 128*1024*1024);
+	pfd[FD_NETLINK].fd = udev_monitor_get_fd(monitor);
+
+	pfd[FD_INOTIFY].fd = udev_watch_init(udev);
+	if (pfd[FD_INOTIFY].fd < 0) {
+		fprintf(stderr, "error initializing inotify\n");
+		err(udev, "error initializing inotify\n");
+		rc = 4;
+		goto exit;
+	}
+
+	if (udev_get_rules_path(udev) != NULL) {
+		inotify_add_watch(pfd[FD_INOTIFY].fd, udev_get_rules_path(udev),
+				  IN_CREATE | IN_DELETE | IN_MOVE | IN_CLOSE_WRITE);
+	} else {
+		char filename[UTIL_PATH_SIZE];
+
+		inotify_add_watch(pfd[FD_INOTIFY].fd, UDEV_PREFIX "/lib/udev/rules.d",
+				  IN_CREATE | IN_DELETE | IN_MOVE | IN_CLOSE_WRITE);
+		inotify_add_watch(pfd[FD_INOTIFY].fd, SYSCONFDIR "/udev/rules.d",
+				  IN_CREATE | IN_DELETE | IN_MOVE | IN_CLOSE_WRITE);
+
+		/* watch dynamic rules directory */
+		util_strscpyl(filename, sizeof(filename), udev_get_dev_path(udev), "/.udev/rules.d", NULL);
+		inotify_add_watch(pfd[FD_INOTIFY].fd, filename,
+				  IN_CREATE | IN_DELETE | IN_MOVE | IN_CLOSE_WRITE);
+	}
+	udev_watch_restore(udev);
+
+	/* block and listen to all signals on signalfd */
+	sigfillset(&mask);
+	sigprocmask(SIG_SETMASK, &mask, NULL);
+	pfd[FD_SIGNAL].fd = signalfd(-1, &mask, 0);
+	if (pfd[FD_SIGNAL].fd < 0) {
+		fprintf(stderr, "error getting signalfd\n");
+		err(udev, "error getting signalfd\n");
+		rc = 5;
+		goto exit;
+	}
+
+	/* unnamed socket from workers to the main daemon */
+	if (socketpair(AF_LOCAL, SOCK_DGRAM, 0, worker_watch) < 0) {
+		fprintf(stderr, "error getting socketpair\n");
+		err(udev, "error getting socketpair\n");
+		rc = 6;
+		goto exit;
+	}
+	pfd[FD_WORKER].fd = worker_watch[READ_END];
 
 	rules = udev_rules_new(udev, resolve_names);
 	if (rules == NULL) {
 		err(udev, "error reading rules\n");
 		goto exit;
 	}
-	udev_list_init(&event_list);
+
 	udev_queue_export = udev_queue_export_new(udev);
 	if (udev_queue_export == NULL) {
 		err(udev, "error creating queue file\n");
@@ -704,19 +904,19 @@ int main(int argc, char *argv[])
 		pid = fork();
 		switch (pid) {
 		case 0:
-			dbg(udev, "daemonized fork running\n");
 			break;
 		case -1:
 			err(udev, "fork of daemon failed: %m\n");
 			rc = 4;
 			goto exit;
 		default:
-			dbg(udev, "child [%u] running, parent exits\n", pid);
 			rc = 0;
 			goto exit;
 		}
 	}
 
+	startup_log(udev);
+
 	/* redirect std{out,err} */
 	if (!debug && !debug_trace) {
 		dup2(fd, STDIN_FILENO);
@@ -742,159 +942,109 @@ int main(int argc, char *argv[])
 		close(fd);
 	}
 
-	startup_log(udev);
-
-	/* set signal handlers */
-	memset(&act, 0x00, sizeof(struct sigaction));
-	act.sa_handler = sig_handler;
-	sigemptyset(&act.sa_mask);
-	act.sa_flags = SA_RESTART;
-	sigaction(SIGINT, &act, NULL);
-	sigaction(SIGTERM, &act, NULL);
-	sigaction(SIGCHLD, &act, NULL);
-	sigaction(SIGHUP, &act, NULL);
-
-	/* watch rules directory */
-	udev_watch_init(udev);
-	if (inotify_fd >= 0) {
-		if (udev_get_rules_path(udev) != NULL) {
-			inotify_add_watch(inotify_fd, udev_get_rules_path(udev),
-					  IN_CREATE | IN_DELETE | IN_MOVE | IN_CLOSE_WRITE);
-		} else {
-			char filename[UTIL_PATH_SIZE];
-
-			inotify_add_watch(inotify_fd, UDEV_PREFIX "/lib/udev/rules.d",
-					  IN_CREATE | IN_DELETE | IN_MOVE | IN_CLOSE_WRITE);
-			inotify_add_watch(inotify_fd, SYSCONFDIR "/udev/rules.d",
-					  IN_CREATE | IN_DELETE | IN_MOVE | IN_CLOSE_WRITE);
-
-			/* watch dynamic rules directory */
-			util_strscpyl(filename, sizeof(filename), udev_get_dev_path(udev), "/.udev/rules.d", NULL);
-			inotify_add_watch(inotify_fd, filename,
-					  IN_CREATE | IN_DELETE | IN_MOVE | IN_CLOSE_WRITE);
-		}
-
-		udev_watch_restore(udev);
-	}
-
 	/* in trace mode run one event after the other */
 	if (debug_trace) {
 		max_childs = 1;
 	} else {
 		int memsize = mem_size_mb();
+
 		if (memsize > 0)
-			max_childs = 128 + (memsize / 4);
+			max_childs = 128 + (memsize / 8);
 		else
-			max_childs = UDEVD_MAX_CHILDS;
+			max_childs = 128;
 	}
+
 	/* possibly overwrite maximum limit of executed events */
 	value = getenv("UDEVD_MAX_CHILDS");
 	if (value)
 		max_childs = strtoul(value, NULL, 10);
 	info(udev, "initialize max_childs to %u\n", max_childs);
 
+	udev_list_init(&event_list);
+	udev_list_init(&worker_list);
+
 	while (!udev_exit) {
-		sigset_t blocked_mask, orig_mask;
-		struct pollfd pfd[4];
-		struct pollfd *ctrl_poll, *monitor_poll, *inotify_poll = NULL;
-		int nfds = 0;
 		int fdcount;
+		int timeout;
 
-		sigfillset(&blocked_mask);
-		sigprocmask(SIG_SETMASK, &blocked_mask, &orig_mask);
-		if (signal_received) {
-			sigprocmask(SIG_SETMASK, &orig_mask, NULL);
-			goto handle_signals;
-		}
+		/* set timeout to kill idle workers */
+		if (udev_list_is_empty(&event_list) && childs > 2)
+			timeout = 3 * 1000;
+		else
+			timeout = -1;
+		/* wait for events */
+		fdcount = poll(pfd, ARRAY_SIZE(pfd), timeout);
+		if (fdcount < 0)
+			continue;
 
-		ctrl_poll = &pfd[nfds++];
-		ctrl_poll->fd = udev_ctrl_get_fd(udev_ctrl);
-		ctrl_poll->events = POLLIN;
+		/* timeout - kill idle workers */
+		if (fdcount == 0)
+			worker_kill(2);
 
-		monitor_poll = &pfd[nfds++];
-		monitor_poll->fd = udev_monitor_get_fd(kernel_monitor);
-		monitor_poll->events = POLLIN;
+		/* event has finished */
+		if (pfd[FD_WORKER].revents & POLLIN)
+			worker_returned();
 
-		if (inotify_fd >= 0) {
-			inotify_poll = &pfd[nfds++];
-			inotify_poll->fd = inotify_fd;
-			inotify_poll->events = POLLIN;
-		}
+		/* get kernel uevent */
+		if (pfd[FD_NETLINK].revents & POLLIN) {
+			struct udev_device *dev;
 
-		fdcount = ppoll(pfd, nfds, NULL, &orig_mask);
-		sigprocmask(SIG_SETMASK, &orig_mask, NULL);
-		if (fdcount < 0) {
-			if (errno == EINTR)
-				goto handle_signals;
-			err(udev, "error in select: %m\n");
-			continue;
+			dev = udev_monitor_receive_device(monitor);
+			if (dev != NULL)
+				event_queue_insert(dev);
+			else
+				udev_device_unref(dev);
 		}
 
 		/* get control message */
-		if (ctrl_poll->revents & POLLIN)
+		if (pfd[FD_CONTROL].revents & POLLIN)
 			handle_ctrl_msg(udev_ctrl);
 
-		/* get kernel uevent */
-		if (monitor_poll->revents & POLLIN) {
-			struct udev_device *dev;
+		/* start new events */
+		if (!udev_list_is_empty(&event_list) && !stop_exec_queue)
+			events_start(udev);
 
-			dev = udev_monitor_receive_device(kernel_monitor);
-			if (dev != NULL) {
-				struct udev_event *event;
+		/* get signal */
+		if (pfd[FD_SIGNAL].revents & POLLIN) {
+			struct signalfd_siginfo fdsi;
+			ssize_t size;
 
-				event = udev_event_new(dev);
-				if (event != NULL)
-					event_queue_insert(event);
-				else
-					udev_device_unref(dev);
-			}
+			size = read(pfd[FD_SIGNAL].fd, &fdsi, sizeof(struct signalfd_siginfo));
+			if (size == sizeof(struct signalfd_siginfo))
+				handle_signal(fdsi.ssi_signo);
 		}
 
-		/* rules directory inotify watch */
-		if (inotify_poll && (inotify_poll->revents & POLLIN))
+		/* device node and rules directory inotify watch */
+		if (pfd[FD_INOTIFY].revents & POLLIN)
 			handle_inotify(udev);
 
-handle_signals:
-		signal_received = 0;
-
 		/* rules changed, set by inotify or a HUP signal */
 		if (reload_config) {
 			struct udev_rules *rules_new;
 
-			reload_config = 0;
+			worker_kill(0);
 			rules_new = udev_rules_new(udev, resolve_names);
 			if (rules_new != NULL) {
 				udev_rules_unref(rules);
 				rules = rules_new;
 			}
-		}
-
-		if (sigchilds_waiting) {
-			sigchilds_waiting = 0;
-			reap_sigchilds();
-		}
-
-		if (run_exec_q) {
-			run_exec_q = 0;
-			if (!stop_exec_q)
-				event_queue_manager(udev);
-		}
-
-		if (settle_pid > 0) {
-			kill(settle_pid, SIGUSR1);
-			settle_pid = 0;
+			reload_config = 0;
 		}
 	}
+
 	udev_queue_export_cleanup(udev_queue_export);
 	rc = 0;
 exit:
-
 	udev_queue_export_unref(udev_queue_export);
 	udev_rules_unref(rules);
 	udev_ctrl_unref(udev_ctrl);
-	if (inotify_fd >= 0)
-		close(inotify_fd);
-	udev_monitor_unref(kernel_monitor);
+	if (pfd[FD_SIGNAL].fd >= 0)
+		close(pfd[FD_SIGNAL].fd);
+	if (worker_watch[READ_END] >= 0)
+		close(worker_watch[READ_END]);
+	if (worker_watch[WRITE_END] >= 0)
+		close(worker_watch[WRITE_END]);
+	udev_monitor_unref(monitor);
 	udev_selinux_exit(udev);
 	udev_unref(udev);
 	logging_close();

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [GIT] Experimental threaded udev
  2009-05-28 14:35 [GIT] Experimental threaded udev Alan Jenkins
                   ` (19 preceding siblings ...)
  2009-06-03 19:44 ` Kay Sievers
@ 2009-06-03 20:46 ` Alan Jenkins
  2009-06-03 22:20 ` Kay Sievers
                   ` (5 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Alan Jenkins @ 2009-06-03 20:46 UTC (permalink / raw)
  To: linux-hotplug

Kay Sievers wrote:
> On Tue, Jun 2, 2009 at 16:05, Kay Sievers <kay.sievers@vrfy.org> wrote:
>
>   
>> Version 5. I guess we should go for the signalfd stuff, it looks prett
>> nice. The socketpair instead of the rtsignal is also nice, it allows
>> us to pass back arbitrary data from finished events to the main
>> daemon.
>>     
>
> Version 7. We need to handle the netlink unicast addresses, they are
> not neccessarily the process pid, so we need to remember the netlink
> address.
>
> We also keep 2 workers around, and don't kill them, to handle incoming
> events without forking.
>
> Thanks,
> Kay
>   

Ok.

I think the patch breaks the settle control message.  It now sends the 
signal back to udevadm immediately, instead of postponing it until after 
handle_inotify(), which was apparently the point.

"
    Now udevadm settle will send a control message to udevd, which will
    respond by sending SIGUSR1 back to the waiting udevadm settle once it
    has completed the main loop iteration in which it received the control
    message.
   
    If there were no pending inotify events, this will simply wake up the
    udev daemon and allow settle to continue.  *If there are pending inotify
    events, they are handled first in the main loop* so when settle is
    continued they will have been turned into uevents and the kernel
    sequence number will have been incremented.
"

Perhaps it would be simplest to reorder the main loop so that 
handle_ctrl() comes after handle_inotify().  If I'm right, it could 
benefit from a comment pointing out that this order is significant and 
should be preserved.


That said, I don't completely understand the settle control message.  I 
don't get why udevadm-settle only sends it once at the start, instead of 
incorporating it as part of the delay loop.

Thanks
Alan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [GIT] Experimental threaded udev
  2009-05-28 14:35 [GIT] Experimental threaded udev Alan Jenkins
                   ` (20 preceding siblings ...)
  2009-06-03 20:46 ` Alan Jenkins
@ 2009-06-03 22:20 ` Kay Sievers
  2009-06-03 23:53 ` Kay Sievers
                   ` (4 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Kay Sievers @ 2009-06-03 22:20 UTC (permalink / raw)
  To: linux-hotplug

On Wed, Jun 3, 2009 at 22:46, Alan Jenkins <alan-jenkins@tuffmail.co.uk> wrote:
> I think the patch breaks the settle control message.  It now sends the
> signal back to udevadm immediately, instead of postponing it until after
> handle_inotify(), which was apparently the point.

> Perhaps it would be simplest to reorder the main loop so that handle_ctrl()
> comes after handle_inotify().  If I'm right, it could benefit from a comment
> pointing out that this order is significant and should be preserved.

Yeah, that sounds good. Thanks!

> That said, I don't completely understand the settle control message.  I
> don't get why udevadm-settle only sends it once at the start, instead of
> incorporating it as part of the delay loop.

I think, it's only needed to cover the delayed wakeup of udevd. When
no event is pending after close() of a device file, there is nothing
we can miss after that point, I think.

Thanks,
Kay

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [GIT] Experimental threaded udev
  2009-05-28 14:35 [GIT] Experimental threaded udev Alan Jenkins
                   ` (21 preceding siblings ...)
  2009-06-03 22:20 ` Kay Sievers
@ 2009-06-03 23:53 ` Kay Sievers
  2009-06-06 14:20 ` Kay Sievers
                   ` (3 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Kay Sievers @ 2009-06-03 23:53 UTC (permalink / raw)
  To: linux-hotplug

On Thu, Jun 4, 2009 at 00:20, Kay Sievers <kay.sievers@vrfy.org> wrote:

No more patches. I've commit the current version to the git tree.

I get now:

  $ time (udevadm trigger; udevadm settle)
  real	0m0.558s

I guess the path_id script need to be replaced with somethin better now. :)

  $ mv /lib/udev/path_id /lib/udev/path_id.0
  $ time (udevadm trigger; udevadm settle)
  real	0m0.368s

Thanks,
Kay

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [GIT] Experimental threaded udev
  2009-05-28 14:35 [GIT] Experimental threaded udev Alan Jenkins
                   ` (22 preceding siblings ...)
  2009-06-03 23:53 ` Kay Sievers
@ 2009-06-06 14:20 ` Kay Sievers
  2009-06-06 17:01 ` Bryan Kadzban
                   ` (2 subsequent siblings)
  26 siblings, 0 replies; 28+ messages in thread
From: Kay Sievers @ 2009-06-06 14:20 UTC (permalink / raw)
  To: linux-hotplug

On Thu, Jun 4, 2009 at 01:53, Kay Sievers<kay.sievers@vrfy.org> wrote:

> I guess the path_id script need to be replaced with something better now. :)
>
>  $ mv /lib/udev/path_id /lib/udev/path_id.0
>  $ time (udevadm trigger; udevadm settle)
>  real  0m0.368s

We got a C version of path_id now:
  $ time (udevadm trigger; udevadm settle)
  real	0m0.378s

Fibre Channel, iSCSI, SAS, S390, IDE support needs to be ported to it.
If one of these is found for a block device, the old shell script is
called instead of the C version. If anybody uses any of these
subsystems, would be great to get patches. :)

Thanks,
Kay

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [GIT] Experimental threaded udev
  2009-05-28 14:35 [GIT] Experimental threaded udev Alan Jenkins
                   ` (23 preceding siblings ...)
  2009-06-06 14:20 ` Kay Sievers
@ 2009-06-06 17:01 ` Bryan Kadzban
  2009-06-08 11:45 ` Scott James Remnant
  2009-06-08 16:29 ` Bryan Kadzban
  26 siblings, 0 replies; 28+ messages in thread
From: Bryan Kadzban @ 2009-06-06 17:01 UTC (permalink / raw)
  To: linux-hotplug

Kay Sievers wrote:
> Version 7.
>
> +Also inotify support is required now to run udev.

Hmm.  Unsure if this was the intention or not, but this breaks compiling
against "old" glibc versions (the change to unconditionally include
<sys/signalfd.h> doesn't help either).

I have two machines; both run fairly old glibcs, but very recent kernels
(2.6.28.x and 2.6.29.x).  I have no <sys/signalfd.h> on one, and neither
that nor <sys/inotify.h> on the other.  (I compiled these systems'
glibcs in about 2006 and 2003.  Upgrading the kernel is easy; upgrading
glibc to something that will have support for signalfd would entail
recompiling the entire machine, in both cases.)

Below is a patch to emulate <sys/signalfd.h> and <sys/inotify.h> if
those headers don't exist at configure time (via inline functions that
use syscall() directly).  It has to pull __NR_xxx (for the various xxx
syscalls) from the kernel includes, which are per-arch, so there are a
few new configure.ac tests to find the appropriate arch.  This takes a
while, but isn't run if both <sys/signalfd.h> and <sys/inotify.h> are
present.

Lots of this is copied from glibc-2.10.1, some from the kernel.

----

Don't require glibc to support signalfd and inotify: provide syscall
wrappers if needed.

Signed-Off-By: Bryan Kadzban <bryan@kadzban.is-a-geek.net>

 configure.ac        |   18 ++++++++
 udev/Makefile.am    |    1 +
 udev/udev-sysdeps.h |  107 ++++++++++++++++++++++++++++++++++++++++++++++
 udev/udev-watch.c   |    2 +-
 udev/udevd.c        |    3 +-
 5 files changed, 128 insertions(+), 3 deletions(-)

diff --git a/configure.ac b/configure.ac
index 9857d52..e2a39cb 100644
--- a/configure.ac
+++ b/configure.ac
@@ -24,6 +24,24 @@ AC_SUBST(LIBUDEV_LT_AGE)

 AC_PATH_PROG([XSLTPROC], [xsltproc])

+AC_CHECK_HEADER([sys/signalfd.h])
+AC_CHECK_HEADER([sys/inotify.h])
+
+if test x"$ac_cv_header_sys_signalfd_h" = xno || \
+		test x"$ac_cv_header_sys_inotify_h" = xno ; then
+	AC_MSG_CHECKING([kernel architecture string])
+	KERN_BASE=/lib/modules/`uname -r`/source
+	KERN_ARCH=`make -C $KERN_BASE -n -p | \
+		sed -n -e '/^hdr-arch/ { s/.*:= // p }'`
+	KERN_UNISTD=$KERN_BASE/arch/$KERN_ARCH/include/asm/unistd.h
+	AC_MSG_RESULT([$KERN_ARCH])
+else
+	KERN_UNISTD+fi
+
+AC_DEFINE_UNQUOTED([KERN_UNISTD], ["$KERN_UNISTD"],
+                   [Path to kernel unistd.h])
+
 AC_ARG_WITH(udev-prefix,
 	AS_HELP_STRING([--with-udev-prefix=DIR], [add prefix to internal udev
path names]),
 	[], [with_udev_prefix='${exec_prefix}'])
diff --git a/udev/Makefile.am b/udev/Makefile.am
index 94989e6..6cd2f23 100644
--- a/udev/Makefile.am
+++ b/udev/Makefile.am
@@ -14,6 +14,7 @@ common_ldadd 
 common_files = \
 	udev.h \
+	udev-sysdeps.h \
 	udev-event.c \
 	udev-watch.c \
 	udev-node.c \
diff --git a/udev/udev-sysdeps.h b/udev/udev-sysdeps.h
new file mode 100644
index 0000000..718ee8b
--- /dev/null
+++ b/udev/udev-sysdeps.h
@@ -0,0 +1,107 @@
+/*
+ * wrapping of kernel interfaces
+ *
+ * Copyright (C) 2005-2009 Kay Sievers <kay.sievers@vrfy.org>
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef _UDEV_SYSDEPS_H_
+#define _UDEV_SYSDEPS_H_
+
+#include <stdint.h>
+#include <errno.h>
+
+#if HAVE_SYS_INOTIFY_H
+#include <sys/inotify.h>
+#else
+#include <unistd.h>
+#include KERN_UNISTD
+
+struct inotify_event
+{
+	int wd;
+	uint32_t mask;
+	uint32_t cookie;
+	uint32_t len;
+	char name[1];
+};
+
+#define IN_CREATE	 0x00000100
+#define IN_DELETE	 0x00000200
+#define IN_MOVED_FROM	 0x00000040
+#define IN_MOVED_TO	 0x00000080
+#define IN_MOVE		 (IN_MOVED_FROM | IN_MOVED_TO)
+#define IN_CLOSE_WRITE	 0x00000008
+#define IN_IGNORED	 0x00008000
+
+static inline int inotify_init(void)
+{
+	return syscall(__NR_inotify_init);
+}
+
+static inline int inotify_add_watch(int fd, const char *name,
+		uint32_t mask)
+{
+	return syscall(__NR_inotify_add_watch, fd, name, mask);
+}
+#endif /* HAVE_SYS_INOTIFY_H */
+
+#if HAVE_SYS_SIGNALFD_H
+#include <sys/signalfd.h>
+#else
+#include <unistd.h>
+#include <signal.h>
+#include KERN_UNISTD
+
+struct signalfd_siginfo {
+	uint32_t ssi_signo;   /* Signal number */
+	int32_t  ssi_errno;   /* Error number (unused) */
+	int32_t  ssi_code;    /* Signal code */
+	uint32_t ssi_pid;     /* PID of sender */
+	uint32_t ssi_uid;     /* Real UID of sender */
+	int32_t  ssi_fd;      /* File descriptor (SIGIO) */
+	uint32_t ssi_tid;     /* Kernel timer ID (POSIX timers) */
+	uint32_t ssi_band;    /* Band event (SIGIO) */
+	uint32_t ssi_overrun; /* POSIX timer overrun count */
+	uint32_t ssi_trapno;  /* Trap number that caused signal */
+	int32_t  ssi_status;  /* Exit status or signal (SIGCHLD) */
+	int32_t  ssi_int;     /* Integer sent by sigqueue(2) */
+	uint64_t ssi_ptr;     /* Pointer sent by sigqueue(2) */
+	uint64_t ssi_utime;   /* User CPU time consumed (SIGCHLD) */
+	uint64_t ssi_stime;   /* System CPU time consumed (SIGCHLD) */
+	uint64_t ssi_addr;    /* Address that generated signal
+	                         (for hardware-generated signals) */
+	uint8_t  pad[48];     /* Pad size to 128 bytes (allow for
+	                         additional fields in the future) */
+};
+
+static inline int signalfd(int fd, sigset_t *mask, uint32_t flags)
+{
+	int rv = syscall(__NR_signalfd4, fd, mask, (size_t)8, flags);
+
+	if(rv < 0) {
+		if(flags != 0) {
+			errno = EINVAL;
+			return -1;
+		}
+
+		return syscall(__NR_signalfd, fd, mask, (size_t)8);
+	}
+	else
+		return rv;
+}
+#endif /* HAVE_SYS_SIGNALFD_H */
+
+#endif
diff --git a/udev/udev-watch.c b/udev/udev-watch.c
index 5a49c96..944cd4a 100644
--- a/udev/udev-watch.c
+++ b/udev/udev-watch.c
@@ -26,8 +26,8 @@
 #include <stdlib.h>
 #include <string.h>
 #include <unistd.h>
-#include <sys/inotify.h>

+#include "udev-sysdeps.h"
 #include "udev.h"

 static int inotify_fd = -1;
diff --git a/udev/udevd.c b/udev/udevd.c
index 2e7a179..a5d05fd 100644
--- a/udev/udevd.c
+++ b/udev/udevd.c
@@ -32,14 +32,13 @@
 #include <dirent.h>
 #include <sys/prctl.h>
 #include <sys/socket.h>
-#include <sys/signalfd.h>
 #include <sys/select.h>
 #include <sys/poll.h>
 #include <sys/wait.h>
 #include <sys/stat.h>
 #include <sys/ioctl.h>
-#include <sys/inotify.h>

+#include "udev-sysdeps.h"
 #include "udev.h"

 #define UDEVD_PRIORITY			-4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [GIT] Experimental threaded udev
  2009-05-28 14:35 [GIT] Experimental threaded udev Alan Jenkins
                   ` (24 preceding siblings ...)
  2009-06-06 17:01 ` Bryan Kadzban
@ 2009-06-08 11:45 ` Scott James Remnant
  2009-06-08 16:29 ` Bryan Kadzban
  26 siblings, 0 replies; 28+ messages in thread
From: Scott James Remnant @ 2009-06-08 11:45 UTC (permalink / raw)
  To: linux-hotplug

[-- Attachment #1: Type: text/plain, Size: 758 bytes --]

On Sat, 2009-06-06 at 10:01 -0700, Bryan Kadzban wrote:

> Below is a patch to emulate <sys/signalfd.h> and <sys/inotify.h> if
> those headers don't exist at configure time (via inline functions that
> use syscall() directly).  It has to pull __NR_xxx (for the various xxx
> syscalls) from the kernel includes, which are per-arch, so there are a
> few new configure.ac tests to find the appropriate arch.  This takes a
> while, but isn't run if both <sys/signalfd.h> and <sys/inotify.h> are
> present.
> 
For upstart, I made a deliberate decision not to do this anymore; if you
want to use an up-to-date set of plumbing tools, you should at least
update your glibc ;)  (my opinion, at least)

Scott
-- 
Scott James Remnant
scott@ubuntu.com

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [GIT] Experimental threaded udev
  2009-05-28 14:35 [GIT] Experimental threaded udev Alan Jenkins
                   ` (25 preceding siblings ...)
  2009-06-08 11:45 ` Scott James Remnant
@ 2009-06-08 16:29 ` Bryan Kadzban
  26 siblings, 0 replies; 28+ messages in thread
From: Bryan Kadzban @ 2009-06-08 16:29 UTC (permalink / raw)
  To: linux-hotplug

Scott James Remnant wrote:
> On Sat, 2009-06-06 at 10:01 -0700, Bryan Kadzban wrote:
> 
>> Below is a patch to emulate <sys/signalfd.h> and <sys/inotify.h> if 
>> those headers don't exist at configure time (via inline functions that
>> use syscall() directly).  It has to pull __NR_xxx (for the various xxx
>> syscalls) from the kernel includes, which are per-arch, so there are a
>> few new configure.ac tests to find the appropriate arch.  This takes a
>> while, but isn't run if both <sys/signalfd.h> and <sys/inotify.h> are
>> present.
> 
> For upstart, I made a deliberate decision not to do this anymore; if you
> want to use an up-to-date set of plumbing tools, you should at least 
> update your glibc ;)  (my opinion, at least)

There are two issues with this that I can think of.  First is that the only
glibc versions that support signalfd are not that old; the older they were,
the less I'd complain.  :-)

Second is that I'm fairly sure upgrading glibc requires a recompile of
basically the entire rest of the system.  For the same reason that you
can't dump a new kernel's headers in /usr/include/linux at will: in that
case the (new) structure sizes won't match what the (old) syscall wrappers
in glibc are sending to the kernel when invoking the (old) syscalls when
new programs are compiled.  In this case, the new glibc will be calling the
new syscalls using the new kernel headers, and new structures; this works
great for newly-compiled programs.  But every existing binary on the system
was built against the older structures, so the syscall wrapper will call
the wrong thing for them.

When glibc's foo(2) wrapper calls syscall(__NR_foo, foo), which expects a
pointer to "struct foo", but the new glibc's foo(2) wrapper calls into
syscall(__NR_foo2, foo2) instead, which expects a pointer to "struct foo2"
which has more members, current programs all break.

(Though I know glibc does weird stuff with symbol versioning: if that also
means they keep the "old" foo(2) wrapper around when a newer glibc is
compiled, then this argument is mostly wrong.  But I don't believe they do
this; I don't see multiple mount(2) wrappers, for instance.)

And I'd rather not spend a week rebuilding everything when I can spend a
couple hours writing this patch...

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2009-06-08 16:29 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-28 14:35 [GIT] Experimental threaded udev Alan Jenkins
2009-05-28 15:09 ` Kay Sievers
2009-05-28 15:39 ` Alan Jenkins
2009-05-29 17:53 ` Alan Jenkins
2009-05-29 18:11 ` Kay Sievers
2009-06-01  2:41 ` Kay Sievers
2009-06-01  9:29 ` Alan Jenkins
2009-06-01 11:32 ` Kay Sievers
2009-06-01 12:33 ` Kay Sievers
2009-06-01 13:30 ` Kay Sievers
2009-06-01 13:46 ` Alan Jenkins
2009-06-01 13:57 ` Kay Sievers
2009-06-01 16:22 ` Kay Sievers
2009-06-01 16:24 ` Alan Jenkins
2009-06-01 19:39 ` Kay Sievers
2009-06-02  4:58 ` Kay Sievers
2009-06-02  9:13 ` Alan Jenkins
2009-06-02  9:26 ` Alan Jenkins
2009-06-02 11:39 ` Kay Sievers
2009-06-02 14:05 ` Kay Sievers
2009-06-03 19:44 ` Kay Sievers
2009-06-03 20:46 ` Alan Jenkins
2009-06-03 22:20 ` Kay Sievers
2009-06-03 23:53 ` Kay Sievers
2009-06-06 14:20 ` Kay Sievers
2009-06-06 17:01 ` Bryan Kadzban
2009-06-08 11:45 ` Scott James Remnant
2009-06-08 16:29 ` Bryan Kadzban

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.