linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* ipc semaphore optimization
@ 2003-06-27 17:13 Chen, Kenneth W
  0 siblings, 0 replies; 6+ messages in thread
From: Chen, Kenneth W @ 2003-06-27 17:13 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4392 bytes --]

This patch proposes a performance fix for the current IPC semaphore
implementation.

There are two shortcoming in the current implementation:
try_atomic_semop() was called two times to wake up a blocked process,
once from the update_queue() (executed from the process that wakes up
the sleeping process) and once in the retry part of the blocked process
(executed from the block process that gets woken up).

A second issue is that when several sleeping processes that are eligible
for wake up, they woke up in daisy chain formation and each one in turn
to wake up next process in line.  However, every time when a process
wakes up, it start scans the wait queue from the beginning, not from
where it was last scanned.  This causes large number of unnecessary
scanning of the wait queue under a situation of deep wait queue.
Blocked processes come and go, but chances are there are still quite a
few blocked processes sit at the beginning of that queue.

What we are proposing here is to merge the portion of the code in the
bottom part of sys_semtimedop() (code that gets executed when a sleeping
process gets woken up) into update_queue() function.  The benefit is two
folds: (1) is to reduce redundant calls to try_atomic_semop() and (2) to
increase efficiency of finding eligible processes to wake up and higher
concurrency for multiple wake-ups.

We have measured that this patch improves throughput for a large
application significantly on a industry standard benchmark.

This patch is relative to 2.5.72.  Any feedback is very much
appreciated.

Some kernel profile data attached:

Kernel profile before optimization:
-----------------------------------------------
                0.05    0.14   40805/529060      sys_semop [133]
                0.55    1.73  488255/529060      ia64_ret_from_syscall
[2]
[52]     2.5    0.59    1.88  529060         sys_semtimedop [52]
                0.05    0.83  477766/817966      schedule_timeout [62]
                0.34    0.46  529064/989340      update_queue [61]
                0.14    0.00 1006740/6473086     try_atomic_semop [75]
                0.06    0.00  529060/989336      ipcperms [149]
-----------------------------------------------

                0.30    0.40  460276/989340      semctl_main [68]
                0.34    0.46  529064/989340      sys_semtimedop [52]
[61]     1.5    0.64    0.87  989340         update_queue [61]
                0.75    0.00 5466346/6473086     try_atomic_semop [75]
                0.01    0.11  477676/576698      wake_up_process [146]
-----------------------------------------------
                0.14    0.00 1006740/6473086     sys_semtimedop [52]
                0.75    0.00 5466346/6473086     update_queue [61]
[75]     0.9    0.89    0.00 6473086         try_atomic_semop [75]
-----------------------------------------------


Kernel profile with optimization:

-----------------------------------------------
                0.03    0.05   26139/503178      sys_semop [155]
                0.46    0.92  477039/503178      ia64_ret_from_syscall
[2]
[61]     1.2    0.48    0.97  503178         sys_semtimedop [61]
                0.04    0.79  470724/784394      schedule_timeout [62]
                0.05    0.00  503178/3301773     try_atomic_semop [109]
                0.05    0.00  503178/930934      ipcperms [149]
                0.00    0.03   32454/460210      update_queue [99]
-----------------------------------------------
                0.00    0.03   32454/460210      sys_semtimedop [61]
                0.06    0.36  427756/460210      semctl_main [75]
[99]     0.4    0.06    0.39  460210         update_queue [99]
                0.30    0.00 2798595/3301773     try_atomic_semop [109]
                0.00    0.09  470630/614097      wake_up_process [146]
-----------------------------------------------
                0.05    0.00  503178/3301773     sys_semtimedop [61]
                0.30    0.00 2798595/3301773     update_queue [99]
[109]    0.3    0.35    0.00 3301773         try_atomic_semop [109]
----------------------------------------------- 

Both number of function calls to try_atomic_semop() and update_queue()
are reduced by 50% as a result of the merge.  Execution time of
sys_semtimedop is reduced because of the reduction in the low level
functions.

[-- Attachment #2: sem25.patch --]
[-- Type: application/octet-stream, Size: 3670 bytes --]

diff -Nur linux-2.5.72/ipc/sem.c linux-2.5.72.sem/ipc/sem.c
--- linux-2.5.72/ipc/sem.c	Mon Jun 16 21:19:59 2003
+++ linux-2.5.72.sem/ipc/sem.c	Wed Jun 25 20:29:56 2003
@@ -258,8 +258,7 @@
  */
 
 static int try_atomic_semop (struct sem_array * sma, struct sembuf * sops,
-			     int nsops, struct sem_undo *un, int pid,
-			     int do_undo)
+			     int nsops, struct sem_undo *un, int pid)
 {
 	int result, sem_op;
 	struct sembuf *sop;
@@ -289,10 +288,6 @@
 		curr->semval = result;
 	}
 
-	if (do_undo) {
-		result = 0;
-		goto undo;
-	}
 	sop--;
 	while (sop >= sops) {
 		sma->sem_base[sop->sem_num].sempid = pid;
@@ -334,23 +329,14 @@
 
 	for (q = sma->sem_pending; q; q = q->next) {
 			
-		if (q->status == 1)
-			continue;	/* this one was woken up before */
-
 		error = try_atomic_semop(sma, q->sops, q->nsops,
-					 q->undo, q->pid, q->alter);
+					 q->undo, q->pid);
 
 		/* Does q->sleeper still need to sleep? */
 		if (error <= 0) {
-				/* Found one, wake it up */
-			wake_up_process(q->sleeper);
-			if (error == 0 && q->alter) {
-				/* if q-> alter let it self try */
-				q->status = 1;
-				return;
-			}
 			q->status = error;
 			remove_from_queue(sma,q);
+			wake_up_process(q->sleeper);
 		}
 	}
 }
@@ -1077,7 +1063,7 @@
 	} else
 		un = NULL;
 
-	error = try_atomic_semop (sma, sops, nsops, un, current->pid, 0);
+	error = try_atomic_semop (sma, sops, nsops, un, current->pid);
 	if (error <= 0)
 		goto update;
 
@@ -1090,7 +1076,6 @@
 	queue.nsops = nsops;
 	queue.undo = un;
 	queue.pid = current->pid;
-	queue.alter = decrease;
 	queue.id = semid;
 	if (alter)
 		append_to_queue(sma ,&queue);
@@ -1098,53 +1083,45 @@
 		prepend_to_queue(sma ,&queue);
 	current->sysvsem.sleep_list = &queue;
 
-	for (;;) {
-		queue.status = -EINTR;
-		queue.sleeper = current;
-		current->state = TASK_INTERRUPTIBLE;
-		sem_unlock(sma);
-		unlock_semundo();
+	queue.status = -EINTR;
+	queue.sleeper = current;
+	current->state = TASK_INTERRUPTIBLE;
+	sem_unlock(sma);
+	unlock_semundo();
 
-		if (timeout)
-			jiffies_left = schedule_timeout(jiffies_left);
-		else
-			schedule();
+	if (timeout)
+		jiffies_left = schedule_timeout(jiffies_left);
+	else
+		schedule();
 
-		lock_semundo();
-		sma = sem_lock(semid);
-		if(sma==NULL) {
-			if(queue.prev != NULL)
-				BUG();
-			current->sysvsem.sleep_list = NULL;
-			error = -EIDRM;
-			goto out_semundo_free;
-		}
-		/*
-		 * If queue.status == 1 we where woken up and
-		 * have to retry else we simply return.
-		 * If an interrupt occurred we have to clean up the
-		 * queue
-		 *
-		 */
-		if (queue.status == 1)
-		{
-			error = try_atomic_semop (sma, sops, nsops, un,
-						  current->pid,0);
-			if (error <= 0) 
-				break;
-		} else {
-			error = queue.status;
-			if (error == -EINTR && timeout && jiffies_left == 0)
-				error = -EAGAIN;
-			if (queue.prev) /* got Interrupt */
-				break;
-			/* Everything done by update_queue */
-			current->sysvsem.sleep_list = NULL;
-			goto out_unlock_semundo_free;
-		}
+	lock_semundo();
+	sma = sem_lock(semid);
+	if(sma==NULL) {
+		if(queue.prev != NULL)
+			BUG();
+		current->sysvsem.sleep_list = NULL;
+		error = -EIDRM;
+		goto out_semundo_free;
 	}
+
+	/*
+	 * If queue.status != -EINTR we are woken up by another process
+	 */
+	error = queue.status;
+	if (queue.status != -EINTR) {
+		current->sysvsem.sleep_list = NULL;
+		goto out_unlock_semundo_free;
+	}
+
+	/*
+	 * If an interrupt occurred we have to clean up the queue
+	 */
+	if (timeout && jiffies_left == 0)
+		error = -EAGAIN;
 	current->sysvsem.sleep_list = NULL;
 	remove_from_queue(sma,&queue);
+	goto out_unlock_semundo_free;
+
 update:
 	if (alter)
 		update_queue (sma);

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: ipc semaphore optimization
@ 2003-07-01 16:36 Chen, Kenneth W
  0 siblings, 0 replies; 6+ messages in thread
From: Chen, Kenneth W @ 2003-07-01 16:36 UTC (permalink / raw)
  To: Luis Miguel Garcia; +Cc: linux-kernel

Last patch posted was relative to 2.5.73.  It is in Andrew Morton's
2.5.73-mm2 tree.

- Ken

-----Original Message-----
From: Luis Miguel Garcia [mailto:ktech@wanadoo.es] 
Sent: Tuesday, July 01, 2003 8:06 AM
To: linux-kernel@vger.kernel.org
Cc: Chen, Kenneth W
Subject: Re:ipc semaphore optimization


Well, it doesn't apply to current tree.

Perhaps already applied or something?

Thanks!


-- 
=============================================================
Luis Miguel Garcia Mancebo
Ingenieria Tecnica en Informatica de Gestion
Universidad de Deusto / University of Deusto
Bilbao / Spain
=============================================================

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: ipc semaphore optimization
@ 2003-06-27 22:48 Chen, Kenneth W
  0 siblings, 0 replies; 6+ messages in thread
From: Chen, Kenneth W @ 2003-06-27 22:48 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Manfred Spraul, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 328 bytes --]

Hi Andrew,

> ipc/sem.c changed significantly between 2.5.72 and 2.5.73.  Could
> you please update this, retest, resend?
> 
> Please Cc Manfred on the update too, thanks.

Updated patch relative to 2.5.73, tested and verified. Manfred's change
is in the sem_undo area while this patch is in the wake-up path.

- Ken

[-- Attachment #2: sem2573.patch --]
[-- Type: application/octet-stream, Size: 4286 bytes --]

diff -Nur linux-2.5.73/ipc/sem.c linux-2.5.73.sem/ipc/sem.c
--- linux-2.5.73/ipc/sem.c	Sun Jun 22 11:32:41 2003
+++ linux-2.5.73.sem/ipc/sem.c	Fri Jun 27 15:23:26 2003
@@ -49,6 +49,10 @@
  *      increase. If there are decrement operations in the operations
  *      array we do the same as before.
  *
+ * With the incarnation of O(1) scheduler, it becomes unnecessary to perform
+ * check/retry algorithm for waking up blocked processes as the new scheduler
+ * is better at handling thread switch than the old one.
+ *
  * /proc/sysvipc/sem support (c) 1999 Dragos Acostachioaie <dragos@iname.com>
  *
  * SMP-threaded, sysctl's added
@@ -258,8 +262,7 @@
  */
 
 static int try_atomic_semop (struct sem_array * sma, struct sembuf * sops,
-			     int nsops, struct sem_undo *un, int pid,
-			     int do_undo)
+			     int nsops, struct sem_undo *un, int pid)
 {
 	int result, sem_op;
 	struct sembuf *sop;
@@ -289,10 +292,6 @@
 		curr->semval = result;
 	}
 
-	if (do_undo) {
-		result = 0;
-		goto undo;
-	}
 	sop--;
 	while (sop >= sops) {
 		sma->sem_base[sop->sem_num].sempid = pid;
@@ -334,23 +333,14 @@
 
 	for (q = sma->sem_pending; q; q = q->next) {
 			
-		if (q->status == 1)
-			continue;	/* this one was woken up before */
-
 		error = try_atomic_semop(sma, q->sops, q->nsops,
-					 q->undo, q->pid, q->alter);
+					 q->undo, q->pid);
 
 		/* Does q->sleeper still need to sleep? */
 		if (error <= 0) {
-				/* Found one, wake it up */
-			wake_up_process(q->sleeper);
-			if (error == 0 && q->alter) {
-				/* if q-> alter let it self try */
-				q->status = 1;
-				return;
-			}
 			q->status = error;
 			remove_from_queue(sma,q);
+			wake_up_process(q->sleeper);
 		}
 	}
 }
@@ -1062,7 +1052,7 @@
 	if (error)
 		goto out_unlock_free;
 
-	error = try_atomic_semop (sma, sops, nsops, un, current->pid, 0);
+	error = try_atomic_semop (sma, sops, nsops, un, current->pid);
 	if (error <= 0)
 		goto update;
 
@@ -1075,55 +1065,46 @@
 	queue.nsops = nsops;
 	queue.undo = un;
 	queue.pid = current->pid;
-	queue.alter = decrease;
 	queue.id = semid;
 	if (alter)
 		append_to_queue(sma ,&queue);
 	else
 		prepend_to_queue(sma ,&queue);
 
-	for (;;) {
-		queue.status = -EINTR;
-		queue.sleeper = current;
-		current->state = TASK_INTERRUPTIBLE;
-		sem_unlock(sma);
+	queue.status = -EINTR;
+	queue.sleeper = current;
+	current->state = TASK_INTERRUPTIBLE;
+	sem_unlock(sma);
 
-		if (timeout)
-			jiffies_left = schedule_timeout(jiffies_left);
-		else
-			schedule();
+	if (timeout)
+		jiffies_left = schedule_timeout(jiffies_left);
+	else
+		schedule();
 
-		sma = sem_lock(semid);
-		if(sma==NULL) {
-			if(queue.prev != NULL)
-				BUG();
-			error = -EIDRM;
-			goto out_free;
-		}
-		/*
-		 * If queue.status == 1 we where woken up and
-		 * have to retry else we simply return.
-		 * If an interrupt occurred we have to clean up the
-		 * queue
-		 *
-		 */
-		if (queue.status == 1)
-		{
-			error = try_atomic_semop (sma, sops, nsops, un,
-						  current->pid,0);
-			if (error <= 0) 
-				break;
-		} else {
-			error = queue.status;
-			if (error == -EINTR && timeout && jiffies_left == 0)
-				error = -EAGAIN;
-			if (queue.prev) /* got Interrupt */
-				break;
-			/* Everything done by update_queue */
-			goto out_unlock_free;
-		}
+	sma = sem_lock(semid);
+	if(sma==NULL) {
+		if(queue.prev != NULL)
+			BUG();
+		error = -EIDRM;
+		goto out_free;
+	}
+
+	/*
+	 * If queue.status != -EINTR we are woken up by another process
+	 */
+	error = queue.status;
+	if (queue.status != -EINTR) {
+		goto out_unlock_free;
 	}
+
+	/*
+	 * If an interrupt occurred we have to clean up the queue
+	 */
+	if (timeout && jiffies_left == 0)
+		error = -EAGAIN;
 	remove_from_queue(sma,&queue);
+	goto out_unlock_free;
+
 update:
 	if (alter)
 		update_queue (sma);
diff -Nur linux-2.5.73/include/linux/sem.h linux-2.5.73.sem/include/linux/sem.h
--- linux-2.5.73/include/linux/sem.h	Sun Jun 22 11:32:42 2003
+++ linux-2.5.73.sem/include/linux/sem.h	Fri Jun 27 14:55:44 2003
@@ -109,7 +109,6 @@
 	int			id;	 /* internal sem id */
 	struct sembuf *		sops;	 /* array of pending operations */
 	int			nsops;	 /* number of operations */
-	int			alter;	 /* operation will alter semaphore */
 };
 
 /* Each task has a list of undo requests. They are executed automatically

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: ipc semaphore optimization
@ 2003-06-27 20:22 Chen, Kenneth W
  0 siblings, 0 replies; 6+ messages in thread
From: Chen, Kenneth W @ 2003-06-27 20:22 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: linux-kernel

> Perhaps the O(1) scheduler is better at handling the thread switches 
> than the old scheduler. Could you include an update of the comments
into 
> your patch?

Yes, our opinion align with your observation as well.  The O(1)
scheduler
handles the ctx much better.

- Ken

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ipc semaphore optimization
@ 2003-06-27 19:43 Manfred Spraul
  0 siblings, 0 replies; 6+ messages in thread
From: Manfred Spraul @ 2003-06-27 19:43 UTC (permalink / raw)
  To: Chen, Kenneth W; +Cc: linux-kernel

Chen wrote:

>What we are proposing here is to merge the portion of the code in the
>bottom part of sys_semtimedop() (code that gets executed when a sleeping
>process gets woken up) into update_queue() function.  The benefit is two
>folds: (1) is to reduce redundant calls to try_atomic_semop() and (2) to
>increase efficiency of finding eligible processes to wake up and higher
>concurrency for multiple wake-ups.
>  
>
Interesting. I'm not sure if you noticed it, but your patch backs out a 
change from Christoph Rohland from 1998, probably from SAP DB benchmarking:

linux/ipc/sem.c:
 * - The previous code had two flaws:
 *   1) It actively gave the semaphore to the next waiting process
 *      sleeping on the semaphore. Since this process did not have the
 *      cpu this led to many unnecessary context switches and bad
 *      performance. Now we only check which process should be able to
 *      get the semaphore and if this process wants to reduce some
 *      semaphore value we simply wake it up without doing the
 *      operation. So it has to try to get it later. Thus e.g. the
 *      running process may reacquire the semaphore during the current
 *      time slice. If it only waits for zero or increases the semaphore,
 *      we do the operation in advance and wake it up.

Perhaps the O(1) scheduler is better at handling the thread switches 
than the old scheduler. Could you include an update of the comments into 
your patch?

--
    Manfred


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: ipc semaphore optimization
@ 2003-06-27 18:03 Margit Schubert-While
  0 siblings, 0 replies; 6+ messages in thread
From: Margit Schubert-While @ 2003-06-27 18:03 UTC (permalink / raw)
  To: linux-kernel

Looks interesting. Give it a shot tomorrow.
Any chance you and/or Manfred Spraul can cook up a combined
patch with Manfred's changes from here :
http://marc.theaimsgroup.com/?l=linux-kernel&m=105567416526840&w=2

Margit


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2003-07-01 16:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-06-27 17:13 ipc semaphore optimization Chen, Kenneth W
2003-06-27 18:03 Margit Schubert-While
2003-06-27 19:43 Manfred Spraul
2003-06-27 20:22 Chen, Kenneth W
2003-06-27 22:48 Chen, Kenneth W
2003-07-01 16:36 Chen, Kenneth W

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).