All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch 0/4] ipc sem improvements
@ 2009-08-11 11:09 npiggin
  2009-08-11 11:09 ` [patch 1/4] ipc: sem optimise undo list search npiggin
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: npiggin @ 2009-08-11 11:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Manfred Spraul, Nadia Derbey, Pierre Peiffer, linux-kernel

Hi, this patchset is some improvements I've been trying to make to sysv
semaphore implementation which is pretty suboptimal (O(n^2) behaviour
with n waiting processes, among other things).

I've been quite careful to try retaining fifo and avoiding starvation
even for complex operations.

Comments?

Thanks,
Nick


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 1/4] ipc: sem optimise undo list search
  2009-08-11 11:09 [patch 0/4] ipc sem improvements npiggin
@ 2009-08-11 11:09 ` npiggin
  2009-08-16 13:17   ` Manfred Spraul
  2009-08-11 11:09 ` [patch 2/4] ipc: sem use list operations npiggin
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: npiggin @ 2009-08-11 11:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Manfred Spraul, Nadia Derbey, Pierre Peiffer, linux-kernel

[-- Attachment #1: ipc-sem-undo-optimise.patch --]
[-- Type: text/plain, Size: 1766 bytes --]

Move last looked up sem_undo struct to the head of the task's undo list.
Attempt to move common entries to the front of the list so search time is
reduced. This reduces lookup_undo on oprofile of problematic SAP workload
by 30% (see patch 4 for a description of SAP workload).

Signed-off-by: Nick Piggin <npiggin@suse.de>
---
 ipc/sem.c |   26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

Index: linux-2.6/ipc/sem.c
===================================================================
--- linux-2.6.orig/ipc/sem.c
+++ linux-2.6/ipc/sem.c
@@ -961,17 +961,31 @@ static inline int get_undo_list(struct s
 	return 0;
 }
 
-static struct sem_undo *lookup_undo(struct sem_undo_list *ulp, int semid)
+static struct sem_undo *__lookup_undo(struct sem_undo_list *ulp, int semid)
 {
-	struct sem_undo *walk;
+	struct sem_undo *un;
 
-	list_for_each_entry_rcu(walk, &ulp->list_proc, list_proc) {
-		if (walk->semid == semid)
-			return walk;
+	list_for_each_entry_rcu(un, &ulp->list_proc, list_proc) {
+		if (un->semid == semid)
+			return un;
 	}
 	return NULL;
 }
 
+static struct sem_undo *lookup_undo(struct sem_undo_list *ulp, int semid)
+{
+	struct sem_undo *un;
+
+  	assert_spin_locked(&ulp->lock);
+
+	un = __lookup_undo(ulp, semid);
+	if (un) {
+		list_del_rcu(&un->list_proc);
+		list_add_rcu(&un->list_proc, &ulp->list_proc);
+	}
+	return un;
+}
+
 /**
  * find_alloc_undo - Lookup (and if not present create) undo array
  * @ns: namespace
@@ -1307,7 +1321,7 @@ void exit_sem(struct task_struct *tsk)
 		if (IS_ERR(sma))
 			continue;
 
-		un = lookup_undo(ulp, semid);
+		un = __lookup_undo(ulp, semid);
 		if (un == NULL) {
 			/* exit_sem raced with IPC_RMID+semget() that created
 			 * exactly the same semid. Nothing to do.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 2/4] ipc: sem use list operations
  2009-08-11 11:09 [patch 0/4] ipc sem improvements npiggin
  2009-08-11 11:09 ` [patch 1/4] ipc: sem optimise undo list search npiggin
@ 2009-08-11 11:09 ` npiggin
  2009-08-16 13:18   ` Manfred Spraul
  2009-08-11 11:09 ` [patch 3/4] ipc: sem preempt improve npiggin
  2009-08-11 11:09 ` [patch 4/4] ipc: sem optimise simple operations npiggin
  3 siblings, 1 reply; 18+ messages in thread
From: npiggin @ 2009-08-11 11:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Manfred Spraul, Nadia Derbey, Pierre Peiffer, linux-kernel

[-- Attachment #1: ipc-sem-use-list-ops.patch --]
[-- Type: text/plain, Size: 2838 bytes --]

Signed-off-by: Nick Piggin <npiggin@suse.de>
---
 ipc/sem.c |   79 +++++++++++++++++++++++++-------------------------------------
 1 file changed, 33 insertions(+), 46 deletions(-)

Index: linux-2.6/ipc/sem.c
===================================================================
--- linux-2.6.orig/ipc/sem.c
+++ linux-2.6/ipc/sem.c
@@ -402,58 +402,45 @@ undo:
  */
 static void update_queue (struct sem_array * sma)
 {
-	int error;
-	struct sem_queue * q;
+	struct sem_queue *q, *tq;
+
+again:
+	list_for_each_entry_safe(q, tq, &sma->sem_pending, list) {
+		int error;
+		int alter;
 
-	q = list_entry(sma->sem_pending.next, struct sem_queue, list);
-	while (&q->list != &sma->sem_pending) {
 		error = try_atomic_semop(sma, q->sops, q->nsops,
 					 q->undo, q->pid);
 
 		/* Does q->sleeper still need to sleep? */
-		if (error <= 0) {
-			struct sem_queue *n;
+		if (error > 0)
+			continue;
+
+		list_del(&q->list);
+
+		/*
+		 * The next operation that must be checked depends on the type
+		 * of the completed operation:
+		 * - if the operation modified the array, then restart from the
+		 *   head of the queue and check for threads that might be
+		 *   waiting for semaphore values to become 0.
+		 * - if the operation didn't modify the array, then just
+		 *   continue.
+		 */
+		alter = q->alter;
+
+		/* wake up the waiting thread */
+		q->status = IN_WAKEUP;
+
+		wake_up_process(q->sleeper);
+		/* hands-off: q will disappear immediately after
+		 * writing q->status.
+		 */
+		smp_wmb();
+		q->status = error;
 
-			/*
-			 * Continue scanning. The next operation
-			 * that must be checked depends on the type of the
-			 * completed operation:
-			 * - if the operation modified the array, then
-			 *   restart from the head of the queue and
-			 *   check for threads that might be waiting
-			 *   for semaphore values to become 0.
-			 * - if the operation didn't modify the array,
-			 *   then just continue.
-			 * The order of list_del() and reading ->next
-			 * is crucial: In the former case, the list_del()
-			 * must be done first [because we might be the
-			 * first entry in ->sem_pending], in the latter
-			 * case the list_del() must be done last
-			 * [because the list is invalid after the list_del()]
-			 */
-			if (q->alter) {
-				list_del(&q->list);
-				n = list_entry(sma->sem_pending.next,
-						struct sem_queue, list);
-			} else {
-				n = list_entry(q->list.next, struct sem_queue,
-						list);
-				list_del(&q->list);
-			}
-
-			/* wake up the waiting thread */
-			q->status = IN_WAKEUP;
-
-			wake_up_process(q->sleeper);
-			/* hands-off: q will disappear immediately after
-			 * writing q->status.
-			 */
-			smp_wmb();
-			q->status = error;
-			q = n;
-		} else {
-			q = list_entry(q->list.next, struct sem_queue, list);
-		}
+		if (alter)
+			goto again;
 	}
 }
 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 3/4] ipc: sem preempt improve
  2009-08-11 11:09 [patch 0/4] ipc sem improvements npiggin
  2009-08-11 11:09 ` [patch 1/4] ipc: sem optimise undo list search npiggin
  2009-08-11 11:09 ` [patch 2/4] ipc: sem use list operations npiggin
@ 2009-08-11 11:09 ` npiggin
  2009-08-16 13:20   ` Manfred Spraul
  2009-08-11 11:09 ` [patch 4/4] ipc: sem optimise simple operations npiggin
  3 siblings, 1 reply; 18+ messages in thread
From: npiggin @ 2009-08-11 11:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Manfred Spraul, Nadia Derbey, Pierre Peiffer, linux-kernel

[-- Attachment #1: ipc-sem-preempt-improve.patch --]
[-- Type: text/plain, Size: 2573 bytes --]

The strange sysv semaphore wakeup scheme has a kind of busy-wait lock
involved, which could deadlock if preemption is enabled during the
"lock".

It is an implementation detail (due to a spinlock being held) that this
is actually the case. However if "spinlocks" are made preemptible, or if
the sem lock is changed to a sleeping lock for example, then the wakeup
would become buggy. So this might be a bugfix for -rt kernels.

Imagine waker being preempted by wakee and never clearing IN_WAKEUP --
if wakee has higher RT priority then there is a priority inversion deadlock.
Even if there is not a priority inversion to cause a deadlock, then there
is still time wasted spinning.

Signed-off-by: Nick Piggin <npiggin@suse.de>
---
 ipc/sem.c |   38 +++++++++++++++++++++++---------------
 1 file changed, 23 insertions(+), 15 deletions(-)

Index: linux-2.6/ipc/sem.c
===================================================================
--- linux-2.6.orig/ipc/sem.c
+++ linux-2.6/ipc/sem.c
@@ -397,6 +397,27 @@ undo:
 	return result;
 }
 
+/*
+ * Wake up a process waiting on the sem queue with a given error.
+ * The queue is invalid (may not be accessed) after the function returns.
+ */
+static void wake_up_sem_queue(struct sem_queue *q, int error)
+{
+	/*
+	 * Hold preempt off so that we don't get preempted and have the
+	 * wakee busy-wait until we're scheduled back on. We're holding
+	 * locks here so it may not strictly be needed, however if the
+	 * locks become preemptible then this prevents such a problem.
+	 */
+	preempt_disable();
+	q->status = IN_WAKEUP;
+	wake_up_process(q->sleeper);
+	/* hands-off: q can disappear immediately after writing q->status. */
+	smp_wmb();
+	q->status = error;
+	preempt_enable();
+}
+
 /* Go through the pending queue for the indicated semaphore
  * looking for tasks that can be completed.
  */
@@ -428,17 +449,7 @@ again:
 		 *   continue.
 		 */
 		alter = q->alter;
-
-		/* wake up the waiting thread */
-		q->status = IN_WAKEUP;
-
-		wake_up_process(q->sleeper);
-		/* hands-off: q will disappear immediately after
-		 * writing q->status.
-		 */
-		smp_wmb();
-		q->status = error;
-
+		wake_up_sem_queue(q, error);
 		if (alter)
 			goto again;
 	}
@@ -522,10 +533,7 @@ static void freeary(struct ipc_namespace
 	list_for_each_entry_safe(q, tq, &sma->sem_pending, list) {
 		list_del(&q->list);
 
-		q->status = IN_WAKEUP;
-		wake_up_process(q->sleeper); /* doesn't sleep */
-		smp_wmb();
-		q->status = -EIDRM;	/* hands-off q */
+		wake_up_sem_queue(q, -EIDRM);
 	}
 
 	/* Remove the semaphore set from the IDR */



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [patch 4/4] ipc: sem optimise simple operations
  2009-08-11 11:09 [patch 0/4] ipc sem improvements npiggin
                   ` (2 preceding siblings ...)
  2009-08-11 11:09 ` [patch 3/4] ipc: sem preempt improve npiggin
@ 2009-08-11 11:09 ` npiggin
  2009-08-11 18:19   ` Manfred Spraul
                     ` (2 more replies)
  3 siblings, 3 replies; 18+ messages in thread
From: npiggin @ 2009-08-11 11:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Manfred Spraul, Nadia Derbey, Pierre Peiffer, linux-kernel

[-- Attachment #1: ipc-sem-simple-lists.patch --]
[-- Type: text/plain, Size: 9587 bytes --]

When sysv semaphores are used as a simple semaphore/mutex (single semop
operation on a single sem in the sem-array), the current implementation
is suboptimal because it has to handle the complex cases of multiple
operations atomically on multiple semaphores.

To optimise the simple case, add operations lists to struct sem, where
simple operations are added. When there are complex operations pending,
we ignore the simple lists and continue to use the old algorithm, however
when there are no complex operations, subsequent simple operations need
only look at the simple lists which can be done in a much simper way
(no need to search the list multiple times).

Timed with a simple test program to fork a number of threads which each
lock/unlock a sysv semaphore a number of times (with 1us busyloop in
each critical section and 10us busyloop outside). Run on 2s8c Opteron.

procs    time per loop vanilla  patched
    1     11986ns               11837ns
   32     10650ns                7746ns
  128     17103ns                8472ns
 1024    385885ns                9699ns

We hae a report of an SAP application with high idle time caused by very
high wait-time on a sysv semaphore, on which it is performing these "simple"
operations. Semaphore wait times are somewhat improved with this patch, and
sysv semaphore kernel functions are reduced by 40% in the profile (with this
and the previous patches combined). It is quite significant considering it
is measured on quite a small system (4s16c) and only with 64 processes.

Signed-off-by: Nick Piggin <npiggin@suse.de>
---
 include/linux/sem.h |    6 ++
 ipc/sem.c           |  130 +++++++++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 123 insertions(+), 13 deletions(-)

Index: linux-2.6/ipc/sem.c
===================================================================
--- linux-2.6.orig/ipc/sem.c
+++ linux-2.6/ipc/sem.c
@@ -240,6 +240,7 @@ static int newary(struct ipc_namespace *
 	key_t key = params->key;
 	int nsems = params->u.nsems;
 	int semflg = params->flg;
+	int i;
 
 	if (!nsems)
 		return -EINVAL;
@@ -272,6 +273,15 @@ static int newary(struct ipc_namespace *
 	ns->used_sems += nsems;
 
 	sma->sem_base = (struct sem *) &sma[1];
+
+	for (i = 0; i < nsems; i++) {
+		struct sem *s = &sma->sem_base[i];
+
+		INIT_LIST_HEAD(&s->negv_pending);
+		INIT_LIST_HEAD(&s->zero_pending);
+	}
+
+	sma->complex_count = 0;
 	INIT_LIST_HEAD(&sma->sem_pending);
 	INIT_LIST_HEAD(&sma->list_id);
 	sma->sem_nsems = nsems;
@@ -335,7 +345,7 @@ SYSCALL_DEFINE3(semget, key_t, key, int,
  * all at once. Return 0 if yes, 1 if need to sleep, else return error code.
  */
 
-static int try_atomic_semop (struct sem_array * sma, struct sembuf * sops,
+static int try_atomic_semop(struct sem_array * sma, struct sembuf * sops,
 			     int nsops, struct sem_undo *un, int pid)
 {
 	int result, sem_op;
@@ -421,7 +431,7 @@ static void wake_up_sem_queue(struct sem
 /* Go through the pending queue for the indicated semaphore
  * looking for tasks that can be completed.
  */
-static void update_queue (struct sem_array * sma)
+static void update_queue(struct sem_array *sma)
 {
 	struct sem_queue *q, *tq;
 
@@ -438,23 +448,94 @@ again:
 			continue;
 
 		list_del(&q->list);
+		if (q->nsops == 1)
+			list_del(&q->simple_list);
+		else
+			sma->complex_count--;
+
+		alter = q->alter;
+		wake_up_sem_queue(q, error);
 
 		/*
 		 * The next operation that must be checked depends on the type
 		 * of the completed operation:
 		 * - if the operation modified the array, then restart from the
 		 *   head of the queue and check for threads that might be
-		 *   waiting for semaphore values to become 0.
+		 *   waiting for this particular semaphore value.
 		 * - if the operation didn't modify the array, then just
 		 *   continue.
 		 */
-		alter = q->alter;
-		wake_up_sem_queue(q, error);
-		if (alter)
+		if (alter && !error)
 			goto again;
 	}
 }
 
+static void update_zero_queue(struct sem_array *sma, struct sem *sem)
+{
+	struct sem_queue *q, *tq;
+
+	list_for_each_entry_safe(q, tq, &sem->zero_pending, simple_list) {
+		int error;
+
+		BUG_ON(q->nsops != 1);
+		BUG_ON(q->alter);
+		error = try_atomic_semop(sma, q->sops, q->nsops,
+					 q->undo, q->pid);
+		BUG_ON(error);
+
+		list_del(&q->list);
+		list_del(&q->simple_list);
+		wake_up_sem_queue(q, error);
+	}
+}
+
+static void update_negv_queue(struct sem_array *sma, struct sem *sem)
+{
+	struct sem_queue *q, *tq;
+
+	list_for_each_entry_safe(q, tq, &sem->negv_pending, simple_list) {
+		int error;
+
+		BUG_ON(q->nsops != 1);
+		BUG_ON(!q->alter);
+		error = try_atomic_semop(sma, q->sops, q->nsops,
+					 q->undo, q->pid);
+
+		/* Does q->sleeper still need to sleep? */
+		if (error > 0)
+			continue;
+
+		 /* Continue scanning as in update_queue */
+		list_del(&q->list);
+		list_del(&q->simple_list);
+		wake_up_sem_queue(q, error);
+
+		/*
+		 * negv queue should only decrease queue value, so there is
+		 * no point in retrying failed ops on the negv queue after
+		 * altering the sem value here. If the semval is not greater
+		 * than zero, then there will be nothing else to do either.
+		 */
+		if (sem->semval <= 0)
+			break;
+	}
+}
+
+static void update_queue_simple(struct sem_array *sma, ushort semnum)
+{
+	if (unlikely(sma->complex_count)) {
+		update_queue(sma);
+	} else {
+		struct sem *sem;
+
+		sem = &sma->sem_base[semnum];
+		if (sem->semval > 0)
+			update_negv_queue(sma, sem);
+		if (sem->semval == 0)
+			update_zero_queue(sma, sem);
+	}
+}
+
 /* The following counts are associated to each semaphore:
  *   semncnt        number of tasks waiting on semval being nonzero
  *   semzcnt        number of tasks waiting on semval being zero
@@ -532,6 +613,9 @@ static void freeary(struct ipc_namespace
 	/* Wake up all pending processes and let them fail with EIDRM. */
 	list_for_each_entry_safe(q, tq, &sma->sem_pending, list) {
 		list_del(&q->list);
+		list_del(&q->simple_list);
+		if (q->nsops > 1)
+			sma->complex_count--;
 
 		wake_up_sem_queue(q, -EIDRM);
 	}
@@ -796,7 +880,7 @@ static int semctl_main(struct ipc_namesp
 		curr->sempid = task_tgid_vnr(current);
 		sma->sem_ctime = get_seconds();
 		/* maybe some queued-up processes were waiting for this */
-		update_queue(sma);
+		update_queue_simple(sma, semnum);
 		err = 0;
 		goto out_unlock;
 	}
@@ -1169,10 +1253,14 @@ SYSCALL_DEFINE4(semtimedop, int, semid,
 	if (error)
 		goto out_unlock_free;
 
-	error = try_atomic_semop (sma, sops, nsops, un, task_tgid_vnr(current));
+	error = try_atomic_semop(sma, sops, nsops, un, task_tgid_vnr(current));
 	if (error <= 0) {
-		if (alter && error == 0)
-			update_queue (sma);
+		if (alter && !error) {
+			if (nsops == 1)
+				update_queue_simple(sma, sops->sem_num);
+			else
+				update_queue(sma);
+		}
 		goto out_unlock_free;
 	}
 
@@ -1189,6 +1277,17 @@ SYSCALL_DEFINE4(semtimedop, int, semid,
 		list_add_tail(&queue.list, &sma->sem_pending);
 	else
 		list_add(&queue.list, &sma->sem_pending);
+	if (nsops == 1) {
+		struct sem *curr;
+		curr = &sma->sem_base[sops->sem_num];
+		if (alter)
+			list_add_tail(&queue.simple_list, &curr->negv_pending);
+		else
+			list_add_tail(&queue.simple_list, &curr->zero_pending);
+	} else {
+		INIT_LIST_HEAD(&queue.simple_list);
+		sma->complex_count++;
+	}
 
 	queue.status = -EINTR;
 	queue.sleeper = current;
@@ -1232,6 +1331,10 @@ SYSCALL_DEFINE4(semtimedop, int, semid,
 	if (timeout && jiffies_left == 0)
 		error = -EAGAIN;
 	list_del(&queue.list);
+	if (nsops == 1)
+		list_del(&queue.simple_list);
+	else
+		sma->complex_count--;
 
 out_unlock_free:
 	sem_unlock(sma);
@@ -1356,11 +1459,14 @@ void exit_sem(struct task_struct *tsk)
 				if (semaphore->semval > SEMVMX)
 					semaphore->semval = SEMVMX;
 				semaphore->sempid = task_tgid_vnr(current);
+				if (likely(!sma->complex_count))
+					update_queue_simple(sma, i);
 			}
 		}
 		sma->sem_otime = get_seconds();
 		/* maybe some queued-up processes were waiting for this */
-		update_queue(sma);
+		if (unlikely(sma->complex_count))
+			update_queue(sma);
 		sem_unlock(sma);
 
 		call_rcu(&un->rcu, free_un);
@@ -1374,7 +1480,7 @@ static int sysvipc_sem_proc_show(struct
 	struct sem_array *sma = it;
 
 	return seq_printf(s,
-			  "%10d %10d  %4o %10lu %5u %5u %5u %5u %10lu %10lu\n",
+			  "%10d %10d  %4o %10u %5u %5u %5u %5u %10lu %10lu\n",
 			  sma->sem_perm.key,
 			  sma->sem_perm.id,
 			  sma->sem_perm.mode,
Index: linux-2.6/include/linux/sem.h
===================================================================
--- linux-2.6.orig/include/linux/sem.h
+++ linux-2.6/include/linux/sem.h
@@ -86,6 +86,8 @@ struct task_struct;
 struct sem {
 	int	semval;		/* current value */
 	int	sempid;		/* pid of last operation */
+	struct list_head	negv_pending;
+	struct list_head	zero_pending;
 };
 
 /* One sem_array data structure for each set of semaphores in the system. */
@@ -96,11 +98,13 @@ struct sem_array {
 	struct sem		*sem_base;	/* ptr to first semaphore in array */
 	struct list_head	sem_pending;	/* pending operations to be processed */
 	struct list_head	list_id;	/* undo requests on this array */
-	unsigned long		sem_nsems;	/* no. of semaphores in array */
+	int			sem_nsems;	/* no. of semaphores in array */
+	int			complex_count;	/* pending complex operations */
 };
 
 /* One queue for each sleeping process in the system. */
 struct sem_queue {
+	struct list_head	simple_list; /* queue of pending operations */
 	struct list_head	list;	 /* queue of pending operations */
 	struct task_struct	*sleeper; /* this process */
 	struct sem_undo		*undo;	 /* undo structure */



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 4/4] ipc: sem optimise simple operations
  2009-08-11 11:09 ` [patch 4/4] ipc: sem optimise simple operations npiggin
@ 2009-08-11 18:19   ` Manfred Spraul
  2009-08-11 18:23     ` Zach Brown
  2009-08-11 20:07   ` Cyrill Gorcunov
  2009-08-14 18:48   ` Manfred Spraul
  2 siblings, 1 reply; 18+ messages in thread
From: Manfred Spraul @ 2009-08-11 18:19 UTC (permalink / raw)
  To: npiggin; +Cc: Andrew Morton, Nadia Derbey, Pierre Peiffer, linux-kernel

On 08/11/2009 01:09 PM, npiggin@suse.de wrote:
> Index: linux-2.6/include/linux/sem.h
> ===================================================================
> --- linux-2.6.orig/include/linux/sem.h
> +++ linux-2.6/include/linux/sem.h
> @@ -86,6 +86,8 @@ struct task_struct;
>   struct sem {
>   	int	semval;		/* current value */
>   	int	sempid;		/* pid of last operation */
> +	struct list_head	negv_pending;
> +	struct list_head	zero_pending;
>   };
>    
struct sem is increased from 8 to 24 bytes.
Is that still ok? Are there any apps that use really lots of semaphores 
(i.e.: far more than processes in the system?)

Postgres uses lots of small semaphore sets (each with 17 entries), I 
never had access to other apps.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 4/4] ipc: sem optimise simple operations
  2009-08-11 18:19   ` Manfred Spraul
@ 2009-08-11 18:23     ` Zach Brown
  2009-08-12  4:07       ` Nick Piggin
  0 siblings, 1 reply; 18+ messages in thread
From: Zach Brown @ 2009-08-11 18:23 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: npiggin, Andrew Morton, Nadia Derbey, Pierre Peiffer, linux-kernel

Manfred Spraul wrote:
> On 08/11/2009 01:09 PM, npiggin@suse.de wrote:
>> Index: linux-2.6/include/linux/sem.h
>> ===================================================================
>> --- linux-2.6.orig/include/linux/sem.h
>> +++ linux-2.6/include/linux/sem.h
>> @@ -86,6 +86,8 @@ struct task_struct;
>>   struct sem {
>>       int    semval;        /* current value */
>>       int    sempid;        /* pid of last operation */
>> +    struct list_head    negv_pending;
>> +    struct list_head    zero_pending;
>>   };
>>    
> struct sem is increased from 8 to 24 bytes.

And larger still with 64bit pointers.

If it's a problem, this can be scaled back.  You can have pointers to
lists and you can have fewer lists.

Hopefully it won't be a problem, though.  We can close our eyes and
pretend that the size of the semaphore sets scale with the size of the
system and that it's such a relatively small consumer of memory that no
one will notice :).

- z

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 4/4] ipc: sem optimise simple operations
  2009-08-11 11:09 ` [patch 4/4] ipc: sem optimise simple operations npiggin
  2009-08-11 18:19   ` Manfred Spraul
@ 2009-08-11 20:07   ` Cyrill Gorcunov
  2009-08-12  4:48     ` Nick Piggin
  2009-08-14 18:48   ` Manfred Spraul
  2 siblings, 1 reply; 18+ messages in thread
From: Cyrill Gorcunov @ 2009-08-11 20:07 UTC (permalink / raw)
  To: npiggin
  Cc: Andrew Morton, Manfred Spraul, Nadia Derbey, Pierre Peiffer,
	linux-kernel

[npiggin@suse.de - Tue, Aug 11, 2009 at 09:09:06PM +1000]
...
| +static void update_queue_simple(struct sem_array *sma, ushort semnum)
| +{
| +	if (unlikely(sma->complex_count)) {
| +		update_queue(sma);
| +	} else {
| +		struct sem *sem;
| +
| +		sem = &sma->sem_base[semnum];
| +		if (sem->semval > 0)
| +			update_negv_queue(sma, sem);
| +		if (sem->semval == 0)
| +			update_zero_queue(sma, sem);
| +	}
| +}
| +
...

Hi Nick,

mostly probably miss something but can't we trgigger BUG_ON at updating
zero queue if semaphore was created with undo list and via new operation
reached -ERANGE on undo value?

Again, I could be missing something or plain wrong. Just a thought.

	-- Cyrill

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 4/4] ipc: sem optimise simple operations
  2009-08-11 18:23     ` Zach Brown
@ 2009-08-12  4:07       ` Nick Piggin
  2009-08-12 18:35         ` Manfred Spraul
  0 siblings, 1 reply; 18+ messages in thread
From: Nick Piggin @ 2009-08-12  4:07 UTC (permalink / raw)
  To: Zach Brown
  Cc: Manfred Spraul, Andrew Morton, Nadia Derbey, Pierre Peiffer,
	linux-kernel

On Tue, Aug 11, 2009 at 11:23:36AM -0700, Zach Brown wrote:
> Manfred Spraul wrote:
> > On 08/11/2009 01:09 PM, npiggin@suse.de wrote:
> >> Index: linux-2.6/include/linux/sem.h
> >> ===================================================================
> >> --- linux-2.6.orig/include/linux/sem.h
> >> +++ linux-2.6/include/linux/sem.h
> >> @@ -86,6 +86,8 @@ struct task_struct;
> >>   struct sem {
> >>       int    semval;        /* current value */
> >>       int    sempid;        /* pid of last operation */
> >> +    struct list_head    negv_pending;
> >> +    struct list_head    zero_pending;
> >>   };
> >>    
> > struct sem is increased from 8 to 24 bytes.
> 
> And larger still with 64bit pointers.

Yes it is a significant growth. To answer Manfed's question, I don't
know if there are applications using large numbers of semaphores per
set. Google search for increase SEMMSL results in mostly Oracle,
which says to use 250 (which is our current default).

A semaphore set with 250 will use 2K before, and 10K afterward. I
don't know that it is a huge amount really, given that they also
have to presumably be *protecting* stuff.

We can convert them to hlists (I was going to send a patch to do
everything in hlists, but hlists are missing some _rcu variants...
maybe I should just convert the pending lists to start with).

 
> If it's a problem, this can be scaled back.  You can have pointers to
> lists and you can have fewer lists.
> 
> Hopefully it won't be a problem, though.  We can close our eyes and
> pretend that the size of the semaphore sets scale with the size of the
> system and that it's such a relatively small consumer of memory that no
> one will notice :).

The other thing is that using semaphores as sets really won't scale
well at all. It will scale better now that there are per-sem lists,
but there is still a per-set lock. They really should be discouraged.

It's not trivial to remove shared cachelines completely. Possible I
think, but I think it would further increase complexity without a
proven need at this point.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 4/4] ipc: sem optimise simple operations
  2009-08-11 20:07   ` Cyrill Gorcunov
@ 2009-08-12  4:48     ` Nick Piggin
  2009-08-12  5:43       ` Cyrill Gorcunov
  0 siblings, 1 reply; 18+ messages in thread
From: Nick Piggin @ 2009-08-12  4:48 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: Andrew Morton, Manfred Spraul, Nadia Derbey, Pierre Peiffer,
	linux-kernel

On Wed, Aug 12, 2009 at 12:07:11AM +0400, Cyrill Gorcunov wrote:
> [npiggin@suse.de - Tue, Aug 11, 2009 at 09:09:06PM +1000]
> ...
> | +static void update_queue_simple(struct sem_array *sma, ushort semnum)
> | +{
> | +	if (unlikely(sma->complex_count)) {
> | +		update_queue(sma);
> | +	} else {
> | +		struct sem *sem;
> | +
> | +		sem = &sma->sem_base[semnum];
> | +		if (sem->semval > 0)
> | +			update_negv_queue(sma, sem);
> | +		if (sem->semval == 0)
> | +			update_zero_queue(sma, sem);
> | +	}
> | +}
> | +
> ...
> 
> Hi Nick,
> 
> mostly probably miss something but can't we trgigger BUG_ON at updating
> zero queue if semaphore was created with undo list and via new operation
> reached -ERANGE on undo value?
> 
> Again, I could be missing something or plain wrong. Just a thought.

Hi Cyrill,

Thanks for looking... Hmm, you mean BUG_ON(error) due to try_atomic_semop
returning -ERANGE? I think it should not be possible because it should
prevent any operation from bringing the undo list to -ERANGE so then any
operation which does not modify the sem value should not go out of range
I think.

(I think it would be a bug if we ever return -ERANGE for a wait-for-zero
operation).

Thanks,
Nick

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 4/4] ipc: sem optimise simple operations
  2009-08-12  4:48     ` Nick Piggin
@ 2009-08-12  5:43       ` Cyrill Gorcunov
  0 siblings, 0 replies; 18+ messages in thread
From: Cyrill Gorcunov @ 2009-08-12  5:43 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, Manfred Spraul, Nadia Derbey, Pierre Peiffer,
	linux-kernel

On 8/12/09, Nick Piggin <npiggin@suse.de> wrote:
> On Wed, Aug 12, 2009 at 12:07:11AM +0400, Cyrill Gorcunov wrote:
>> [npiggin@suse.de - Tue, Aug 11, 2009 at 09:09:06PM +1000]
>> ...
>> | +static void update_queue_simple(struct sem_array *sma, ushort semnum)
>> | +{
>> | +	if (unlikely(sma->complex_count)) {
>> | +		update_queue(sma);
>> | +	} else {
>> | +		struct sem *sem;
>> | +
>> | +		sem = &sma->sem_base[semnum];
>> | +		if (sem->semval > 0)
>> | +			update_negv_queue(sma, sem);
>> | +		if (sem->semval == 0)
>> | +			update_zero_queue(sma, sem);
>> | +	}
>> | +}
>> | +
>> ...
>>
>> Hi Nick,
>>
>> mostly probably miss something but can't we trgigger BUG_ON at updating
>> zero queue if semaphore was created with undo list and via new operation
>> reached -ERANGE on undo value?
>>
>> Again, I could be missing something or plain wrong. Just a thought.
>
> Hi Cyrill,
>
> Thanks for looking... Hmm, you mean BUG_ON(error) due to try_atomic_semop
> returning -ERANGE? I think it should not be possible because it should
> prevent any operation from bringing the undo list to -ERANGE so then any
> operation which does not modify the sem value should not go out of range
> I think.
>
> (I think it would be a bug if we ever return -ERANGE for a wait-for-zero
> operation).
>
> Thanks,
> Nick
>
Thanks for explanation, Nick! I meant exactly that.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 4/4] ipc: sem optimise simple operations
  2009-08-12  4:07       ` Nick Piggin
@ 2009-08-12 18:35         ` Manfred Spraul
  2009-08-14  8:58           ` Nick Piggin
  0 siblings, 1 reply; 18+ messages in thread
From: Manfred Spraul @ 2009-08-12 18:35 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Zach Brown, Andrew Morton, Nadia Derbey, Pierre Peiffer, linux-kernel

On 08/12/2009 06:07 AM, Nick Piggin wrote:
> A semaphore set with 250 will use 2K before, and 10K afterward. I
> don't know that it is a huge amount really, given that they also
> have to presumably be *protecting* stuff.
>
>    
The allocation uses vmalloc for larger allocations, thus 10k should not 
be an issue.
> We can convert them to hlists (I was going to send a patch to do
> everything in hlists, but hlists are missing some _rcu variants...
> maybe I should just convert the pending lists to start with).
>
>    
Is it possible to use list_add and list_add_tail instead?
Add the "waiting for zero" to the beginning and "waiting for nonzero" to 
the end.

Basically, the same approach for "simple" and "complex" operations:
- if all operations are simple operations, then all operations affect 
only individual semaphores
--> use per semaphore lists
- if there is one complex operation, then the global, per 
semaphore-array lists are used.


--
     Manfred

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 4/4] ipc: sem optimise simple operations
  2009-08-12 18:35         ` Manfred Spraul
@ 2009-08-14  8:58           ` Nick Piggin
  2009-08-14 17:53             ` Manfred Spraul
  0 siblings, 1 reply; 18+ messages in thread
From: Nick Piggin @ 2009-08-14  8:58 UTC (permalink / raw)
  To: Manfred Spraul
  Cc: Zach Brown, Andrew Morton, Nadia Derbey, Pierre Peiffer, linux-kernel

On Wed, Aug 12, 2009 at 08:35:56PM +0200, Manfred Spraul wrote:
> On 08/12/2009 06:07 AM, Nick Piggin wrote:
> >A semaphore set with 250 will use 2K before, and 10K afterward. I
> >don't know that it is a huge amount really, given that they also
> >have to presumably be *protecting* stuff.
> >
> >   
> The allocation uses vmalloc for larger allocations, thus 10k should not 
> be an issue.
> >We can convert them to hlists (I was going to send a patch to do
> >everything in hlists, but hlists are missing some _rcu variants...
> >maybe I should just convert the pending lists to start with).
> >
> >   
> Is it possible to use list_add and list_add_tail instead?
> Add the "waiting for zero" to the beginning and "waiting for nonzero" to 
> the end.

The only problem with this is that it means we have to walk
through the list of wait-for-zero for every increment operation,
before getting to the list of negative ops.

The wait-for-zero list should be empty in the case of simple modify
operations which are strictly +1/-1 I guess though.

I do like how it cleanly splits the modify and non-modify operations
though. But if you feel strongly about saving the sapce, I will
do as you suggest.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 4/4] ipc: sem optimise simple operations
  2009-08-14  8:58           ` Nick Piggin
@ 2009-08-14 17:53             ` Manfred Spraul
  0 siblings, 0 replies; 18+ messages in thread
From: Manfred Spraul @ 2009-08-14 17:53 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Zach Brown, Andrew Morton, Nadia Derbey, Pierre Peiffer, linux-kernel

On 08/14/2009 10:58 AM, Nick Piggin wrote:
> I do like how it cleanly splits the modify and non-modify operations
> though. But if you feel strongly about saving the sapce, I will
> do as you suggest.
>
>    
No, saving space is secondary.

IMHO the global and the local lists should use the same algorithm, 
that's the main point.
Perhaps even the same function could be used:
One global struct sem_waiters {struct list_head zero; struct list_head 
decrease}.
One struct sem_waiters for each semaphore.

Then something like

     update_queue(sma, sma->complex_ops ? &sma->global_waiters : 
&sma->sem_base[i].local_waiters);

--
     Manfred

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 4/4] ipc: sem optimise simple operations
  2009-08-11 11:09 ` [patch 4/4] ipc: sem optimise simple operations npiggin
  2009-08-11 18:19   ` Manfred Spraul
  2009-08-11 20:07   ` Cyrill Gorcunov
@ 2009-08-14 18:48   ` Manfred Spraul
  2 siblings, 0 replies; 18+ messages in thread
From: Manfred Spraul @ 2009-08-14 18:48 UTC (permalink / raw)
  To: npiggin; +Cc: Andrew Morton, Nadia Derbey, Pierre Peiffer, linux-kernel

On 08/11/2009 01:09 PM, npiggin@suse.de wrote:
> @@ -438,23 +448,94 @@ again:
>   			continue;
>
>   		list_del(&q->list);
> +		if (q->nsops == 1)
> +			list_del(&q->simple_list);
> +		else
> +			sma->complex_count--;
> +
> +		alter = q->alter;
> +		wake_up_sem_queue(q, error);
>    
Unlink approach one: list_del() only if q->nsops!=1.

@@ -532,6 +613,9 @@ static void freeary(struct ipc_namespace
>   	/* Wake up all pending processes and let them fail with EIDRM. */
>   	list_for_each_entry_safe(q, tq,&sma->sem_pending, list) {
>   		list_del(&q->list);
> +		list_del(&q->simple_list);
> +		if (q->nsops>  1)
> +			sma->complex_count--;
>    
Unlink approach two: list_del() even if q->nsops==1.
[i.e.: rely on list_del() on an empty list is permitted].

I would merge that into a helper function.

--
     Manfred

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 1/4] ipc: sem optimise undo list search
  2009-08-11 11:09 ` [patch 1/4] ipc: sem optimise undo list search npiggin
@ 2009-08-16 13:17   ` Manfred Spraul
  0 siblings, 0 replies; 18+ messages in thread
From: Manfred Spraul @ 2009-08-16 13:17 UTC (permalink / raw)
  To: npiggin; +Cc: Andrew Morton, Nadia Derbey, Pierre Peiffer, linux-kernel

On 08/11/2009 01:09 PM, npiggin@suse.de wrote:
> Move last looked up sem_undo struct to the head of the task's undo list.
> Attempt to move common entries to the front of the list so search time is
> reduced. This reduces lookup_undo on oprofile of problematic SAP workload
> by 30% (see patch 4 for a description of SAP workload).
>
> Signed-off-by: Nick Piggin<npiggin@suse.de>
>    
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 2/4] ipc: sem use list operations
  2009-08-11 11:09 ` [patch 2/4] ipc: sem use list operations npiggin
@ 2009-08-16 13:18   ` Manfred Spraul
  0 siblings, 0 replies; 18+ messages in thread
From: Manfred Spraul @ 2009-08-16 13:18 UTC (permalink / raw)
  To: npiggin; +Cc: Andrew Morton, Nadia Derbey, Pierre Peiffer, linux-kernel

On 08/11/2009 01:09 PM, npiggin@suse.de wrote:
> Signed-off-by: Nick Piggin<npiggin@suse.de>
>    
Signed-off-by: Manfred Spraul <manfred@colorfullife.com

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [patch 3/4] ipc: sem preempt improve
  2009-08-11 11:09 ` [patch 3/4] ipc: sem preempt improve npiggin
@ 2009-08-16 13:20   ` Manfred Spraul
  0 siblings, 0 replies; 18+ messages in thread
From: Manfred Spraul @ 2009-08-16 13:20 UTC (permalink / raw)
  To: npiggin; +Cc: Andrew Morton, Nadia Derbey, Pierre Peiffer, linux-kernel

On 08/11/2009 01:09 PM, npiggin@suse.de wrote:
> The strange sysv semaphore wakeup scheme has a kind of busy-wait lock
> involved, which could deadlock if preemption is enabled during the
> "lock".
>
> It is an implementation detail (due to a spinlock being held) that this
> is actually the case. However if "spinlocks" are made preemptible, or if
> the sem lock is changed to a sleeping lock for example, then the wakeup
> would become buggy. So this might be a bugfix for -rt kernels.
>
> Imagine waker being preempted by wakee and never clearing IN_WAKEUP --
> if wakee has higher RT priority then there is a priority inversion deadlock.
> Even if there is not a priority inversion to cause a deadlock, then there
> is still time wasted spinning.
>
> Signed-off-by: Nick Piggin<npiggin@suse.de>
>    
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2009-08-16 13:19 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-11 11:09 [patch 0/4] ipc sem improvements npiggin
2009-08-11 11:09 ` [patch 1/4] ipc: sem optimise undo list search npiggin
2009-08-16 13:17   ` Manfred Spraul
2009-08-11 11:09 ` [patch 2/4] ipc: sem use list operations npiggin
2009-08-16 13:18   ` Manfred Spraul
2009-08-11 11:09 ` [patch 3/4] ipc: sem preempt improve npiggin
2009-08-16 13:20   ` Manfred Spraul
2009-08-11 11:09 ` [patch 4/4] ipc: sem optimise simple operations npiggin
2009-08-11 18:19   ` Manfred Spraul
2009-08-11 18:23     ` Zach Brown
2009-08-12  4:07       ` Nick Piggin
2009-08-12 18:35         ` Manfred Spraul
2009-08-14  8:58           ` Nick Piggin
2009-08-14 17:53             ` Manfred Spraul
2009-08-11 20:07   ` Cyrill Gorcunov
2009-08-12  4:48     ` Nick Piggin
2009-08-12  5:43       ` Cyrill Gorcunov
2009-08-14 18:48   ` Manfred Spraul

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.