linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.4.7-pre9..
@ 2001-07-20  5:17 Linus Torvalds
  2001-07-20  7:22 ` 2.4.7-pre9 Jens Axboe
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Linus Torvalds @ 2001-07-20  5:17 UTC (permalink / raw)
  To: Kernel Mailing List
  Cc: Alexander Viro, David S. Miller, Andrea Arcangeli, Alan Cox,
	David Woodhouse, linux-scsi, Andrew Morton


I'm getting ready to do a 2.4.7, but one of the fixes in 2.4.7 is a nasty
SMP race that was found and made it clear that using an old trick of
having a semaphore on the stack and doing "down()" on it to wait for some
event (that would do the "up()") was a really bad idea.

This kind of trick was used in the kernel vfork() implementation, and also
in block device "wait for request completion". I've fixed both with a new
and fairly simple "wait for completion" infrastructure, but I'd like
especially SCSI device driver writers to check their own drivers as a
result before I make the final 2.4.7.

I've changed all generic code, so drivers are all expected to compile and
work. However, some SCSI drivers use the semaphore trick in their own
code, and I've not mucked with that. It's not worth worrying about too
much, as the race is basically impossible to hit (famous last words), but
I wanted a heads-up and people to give it a quick look. I also wanted to
have people who actually have the hardware in question to verify that my
untested (but on the face of it obvious) changes are indeed working.

So please give it a quick spin,

		Linus


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.7-pre9..
  2001-07-20  5:17 2.4.7-pre9 Linus Torvalds
@ 2001-07-20  7:22 ` Jens Axboe
  2001-07-20  8:15   ` 2.4.7-pre9 Linus Torvalds
  2001-07-20 10:23 ` 2.4.7-pre9 David Woodhouse
  2001-07-23 12:56 ` 2.4.7-pre9 Pavel Machek
  2 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2001-07-20  7:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kernel Mailing List, Alexander Viro, David S. Miller,
	Andrea Arcangeli, Alan Cox, David Woodhouse, linux-scsi,
	Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 492 bytes --]

On Thu, Jul 19 2001, Linus Torvalds wrote:
> 
> I'm getting ready to do a 2.4.7, but one of the fixes in 2.4.7 is a nasty
> SMP race that was found and made it clear that using an old trick of
> having a semaphore on the stack and doing "down()" on it to wait for some
> event (that would do the "up()") was a really bad idea.

Attached are two patches -- one that should fix DAC960 for this new
completion scheme, and one that corrects the corrected comment in
blkdev.h :-)

-- 
Jens Axboe


[-- Attachment #2: DAC960-completion-1 --]
[-- Type: text/plain, Size: 6169 bytes --]

--- linux/drivers/block/DAC960.c~	Fri Jul 20 09:18:43 2001
+++ linux/drivers/block/DAC960.c	Fri Jul 20 09:18:44 2001
@@ -40,6 +40,7 @@
 #include <linux/spinlock.h>
 #include <linux/timer.h>
 #include <linux/pci.h>
+#include <linux/completion.h>
 #include <asm/io.h>
 #include <asm/segment.h>
 #include <asm/uaccess.h>
@@ -484,14 +485,14 @@
 static void DAC960_ExecuteCommand(DAC960_Command_T *Command)
 {
   DAC960_Controller_T *Controller = Command->Controller;
-  DECLARE_MUTEX_LOCKED(Semaphore);
+  DECLARE_COMPLETION(Wait);
   unsigned long ProcessorFlags;
-  Command->Semaphore = &Semaphore;
+  Command->Waiting = &Wait;
   DAC960_AcquireControllerLock(Controller, &ProcessorFlags);
   DAC960_QueueCommand(Command);
   DAC960_ReleaseControllerLock(Controller, &ProcessorFlags);
   if (in_interrupt()) return;
-  down(&Semaphore);
+  wait_for_completion(&Wait);
 }
 
 
@@ -1316,7 +1317,7 @@
 						 *Controller)
 {
   DAC960_V1_DCDB_T DCDBs[DAC960_V1_MaxChannels], *DCDB;
-  Semaphore_T Semaphores[DAC960_V1_MaxChannels], *Semaphore;
+  Completion_T Wait[DAC960_V1_MaxChannels], *wait;
   unsigned long ProcessorFlags;
   int Channel, TargetID;
   for (TargetID = 0; TargetID < Controller->Targets; TargetID++)
@@ -1327,12 +1328,12 @@
 	  DAC960_SCSI_Inquiry_T *InquiryStandardData =
 	    &Controller->V1.InquiryStandardData[Channel][TargetID];
 	  InquiryStandardData->PeripheralDeviceType = 0x1F;
-	  Semaphore = &Semaphores[Channel];
-	  init_MUTEX_LOCKED(Semaphore);
+	  wait = &Wait[Channel];
+	  init_completion(wait);
 	  DCDB = &DCDBs[Channel];
 	  DAC960_V1_ClearCommand(Command);
 	  Command->CommandType = DAC960_ImmediateCommand;
-	  Command->Semaphore = Semaphore;
+	  Command->Waiting = wait;
 	  Command->V1.CommandMailbox.Type3.CommandOpcode = DAC960_V1_DCDB;
 	  Command->V1.CommandMailbox.Type3.BusAddress = Virtual_to_Bus32(DCDB);
 	  DCDB->Channel = Channel;
@@ -1363,11 +1364,11 @@
 	  DAC960_SCSI_Inquiry_UnitSerialNumber_T *InquiryUnitSerialNumber =
 	    &Controller->V1.InquiryUnitSerialNumber[Channel][TargetID];
 	  InquiryUnitSerialNumber->PeripheralDeviceType = 0x1F;
-	  Semaphore = &Semaphores[Channel];
-	  down(Semaphore);
+	  wait = &Wait[Channel];
+	  wait_for_completion(wait);
 	  if (Command->V1.CommandStatus != DAC960_V1_NormalCompletion)
 	    continue;
-	  Command->Semaphore = Semaphore;
+	  Command->Waiting = wait;
 	  DCDB = &DCDBs[Channel];
 	  DCDB->TransferLength = sizeof(DAC960_SCSI_Inquiry_UnitSerialNumber_T);
 	  DCDB->BusAddress = Virtual_to_Bus32(InquiryUnitSerialNumber);
@@ -1381,7 +1382,7 @@
 	  DAC960_AcquireControllerLock(Controller, &ProcessorFlags);
 	  DAC960_QueueCommand(Command);
 	  DAC960_ReleaseControllerLock(Controller, &ProcessorFlags);
-	  down(Semaphore);
+	  wait_for_completion(wait);
 	}
     }
   return true;
@@ -2768,7 +2769,7 @@
   if (Request->cmd == READ)
     Command->CommandType = DAC960_ReadCommand;
   else Command->CommandType = DAC960_WriteCommand;
-  Command->Semaphore = Request->sem;
+  Command->Waiting = Request->waiting;
   Command->LogicalDriveNumber = DAC960_LogicalDriveNumber(Request->rq_dev);
   Command->BlockNumber =
     Request->sector
@@ -2924,10 +2925,10 @@
 	  /*
 	    Wake up requestor for swap file paging requests.
 	  */
-	  if (Command->Semaphore != NULL)
+	  if (Command->Waiting)
 	    {
-	      up(Command->Semaphore);
-	      Command->Semaphore = NULL;
+	      complete(Command->Waiting);
+	      Command->Waiting = NULL;
 	    }
 	  add_blkdev_randomness(DAC960_MAJOR + Controller->ControllerNumber);
 	}
@@ -2969,13 +2970,10 @@
 	      DAC960_ProcessCompletedBuffer(BufferHeader, false);
 	      BufferHeader = NextBufferHeader;
 	    }
-	  /*
-	    Wake up requestor for swap file paging requests.
-	  */
-	  if (Command->Semaphore != NULL)
+	  if (Command->Waiting)
 	    {
-	      up(Command->Semaphore);
-	      Command->Semaphore = NULL;
+	      complete(Command->Waiting);
+	      Command->Waiting = NULL;
 	    }
 	}
     }
@@ -3589,8 +3587,8 @@
     }
   if (CommandType == DAC960_ImmediateCommand)
     {
-      up(Command->Semaphore);
-      Command->Semaphore = NULL;
+      complete(Command->Waiting);
+      Command->Waiting = NULL;
       return;
     }
   if (CommandType == DAC960_QueuedCommand)
@@ -3931,13 +3929,10 @@
 	      DAC960_ProcessCompletedBuffer(BufferHeader, true);
 	      BufferHeader = NextBufferHeader;
 	    }
-	  /*
-	    Wake up requestor for swap file paging requests.
-	  */
-	  if (Command->Semaphore != NULL)
+	  if (Command->Waiting)
 	    {
-	      up(Command->Semaphore);
-	      Command->Semaphore = NULL;
+	      complete(Command->Waiting);
+	      Command->Waiting = NULL;
 	    }
 	  add_blkdev_randomness(DAC960_MAJOR + Controller->ControllerNumber);
 	}
@@ -3979,13 +3974,10 @@
 	      DAC960_ProcessCompletedBuffer(BufferHeader, false);
 	      BufferHeader = NextBufferHeader;
 	    }
-	  /*
-	    Wake up requestor for swap file paging requests.
-	  */
-	  if (Command->Semaphore != NULL)
+	  if (Command->Waiting)
 	    {
-	      up(Command->Semaphore);
-	      Command->Semaphore = NULL;
+	      complete(Command->Waiting);
+	      Command->Waiting = NULL;
 	    }
 	}
     }
@@ -4539,8 +4531,8 @@
     }
   if (CommandType == DAC960_ImmediateCommand)
     {
-      up(Command->Semaphore);
-      Command->Semaphore = NULL;
+      complete(Command->Waiting);
+      Command->Waiting = NULL;
       return;
     }
   if (CommandType == DAC960_QueuedCommand)
--- linux/drivers/block/DAC960.h~	Fri Jul 20 09:14:39 2001
+++ linux/drivers/block/DAC960.h	Fri Jul 20 09:14:48 2001
@@ -2153,7 +2153,7 @@
 typedef struct pt_regs Registers_T;
 typedef struct request IO_Request_T;
 typedef request_queue_t RequestQueue_T;
-typedef struct semaphore Semaphore_T;
+typedef struct completion Completion_T;
 typedef struct super_block SuperBlock_T;
 typedef struct timer_list Timer_T;
 typedef wait_queue_head_t WaitQueue_T;
@@ -2220,7 +2220,7 @@
   DAC960_CommandType_T CommandType;
   struct DAC960_Controller *Controller;
   struct DAC960_Command *Next;
-  Semaphore_T *Semaphore;
+  Completion_T *Waiting;
   unsigned int LogicalDriveNumber;
   unsigned int BlockNumber;
   unsigned int BlockCount;

[-- Attachment #3: blkdev-1 --]
[-- Type: text/plain, Size: 434 bytes --]

--- linux/include/linux/blkdev.h~	Fri Jul 20 09:20:02 2001
+++ linux/include/linux/blkdev.h	Fri Jul 20 09:20:14 2001
@@ -14,9 +14,7 @@
 
 /*
  * Ok, this is an expanded form so that we can use the same
- * request for paging requests when that is implemented. In
- * paging, 'bh' is NULL, and the completion is used to wait
- * for the IO to be ready.
+ * request for paging requests.
  */
 struct request {
 	struct list_head queue;

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.7-pre9..
  2001-07-20  7:22 ` 2.4.7-pre9 Jens Axboe
@ 2001-07-20  8:15   ` Linus Torvalds
  2001-07-20  8:26     ` 2.4.7-pre9 Jens Axboe
  0 siblings, 1 reply; 14+ messages in thread
From: Linus Torvalds @ 2001-07-20  8:15 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Kernel Mailing List, Alexander Viro, David S. Miller,
	Andrea Arcangeli, Alan Cox, David Woodhouse, linux-scsi,
	Andrew Morton



On Fri, 20 Jul 2001, Jens Axboe wrote:
>
> Attached are two patches -- one that should fix DAC960 for this new
> completion scheme, and one that corrects the corrected comment in
> blkdev.h :-)

No, the correction of the correction is worse.

The paging stuff doesn't use any of this. The paging stuff use the page
cache lock bit, and always has.

The only thing that uses request completion checking are special commands,
like the initial SCSI spin-up etc (scsi_init_one()).

		Linus


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.7-pre9..
  2001-07-20  8:15   ` 2.4.7-pre9 Linus Torvalds
@ 2001-07-20  8:26     ` Jens Axboe
  2001-07-20 16:42       ` 2.4.7-pre9 Linus Torvalds
  0 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2001-07-20  8:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kernel Mailing List, Alexander Viro, David S. Miller,
	Andrea Arcangeli, Alan Cox, David Woodhouse, linux-scsi,
	Andrew Morton

On Fri, Jul 20 2001, Linus Torvalds wrote:
> 
> 
> On Fri, 20 Jul 2001, Jens Axboe wrote:
> >
> > Attached are two patches -- one that should fix DAC960 for this new
> > completion scheme, and one that corrects the corrected comment in
> > blkdev.h :-)
> 
> No, the correction of the correction is worse.

?

> The paging stuff doesn't use any of this. The paging stuff use the page
> cache lock bit, and always has.

Paging still hits a request, I assumed that's what the (really really)
old comment meant to say.

> The only thing that uses request completion checking are special commands,
> like the initial SCSI spin-up etc (scsi_init_one()).

Sure, and IDE ide_wait etc.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.7-pre9..
  2001-07-20  5:17 2.4.7-pre9 Linus Torvalds
  2001-07-20  7:22 ` 2.4.7-pre9 Jens Axboe
@ 2001-07-20 10:23 ` David Woodhouse
  2001-07-23 12:56 ` 2.4.7-pre9 Pavel Machek
  2 siblings, 0 replies; 14+ messages in thread
From: David Woodhouse @ 2001-07-20 10:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List, jgarzik, alan, zaitcev, jerdfelt


torvalds@transmeta.com said:
> I'm getting ready to do a 2.4.7, but one of the fixes in 2.4.7 is a
> nasty SMP race that was found and made it clear that using an old
> trick of having a semaphore on the stack and doing "down()" on it to
> wait for some event (that would do the "up()") was a really bad idea. 

We need s/up_and_exit/complete_and_exit/ then.

Index: drivers/i2o/i2o_block.c
===================================================================
RCS file: /inst/cvs/linux/drivers/i2o/i2o_block.c,v
retrieving revision 1.3.2.25
diff -u -r1.3.2.25 i2o_block.c
--- drivers/i2o/i2o_block.c	2001/07/20 08:52:40	1.3.2.25
+++ drivers/i2o/i2o_block.c	2001/07/20 09:18:16
@@ -61,6 +61,7 @@
 
 #include <asm/uaccess.h>
 #include <asm/semaphore.h>
+#include <linux/completion.h>
 #include <asm/io.h>
 #include <asm/atomic.h>
 #include <linux/smp_lock.h>
@@ -187,7 +188,7 @@
  * evt_msg contains the last event.
  */
 static DECLARE_MUTEX_LOCKED(i2ob_evt_sem);
-static DECLARE_MUTEX_LOCKED(i2ob_thread_dead);
+static DECLARE_COMPLETION(i2ob_thread_dead);
 static spinlock_t i2ob_evt_lock = SPIN_LOCK_UNLOCKED;
 static u32 evt_msg[MSG_FRAME_SIZE>>2];
 
@@ -859,7 +860,7 @@
 		}
 	};
 
-	up_and_exit(&i2ob_thread_dead,0);
+	complete_and_exit(&i2ob_thread_dead,0);
 	return 0;
 }
 
@@ -2002,7 +2003,7 @@
 			printk("waiting...");
 		}
 		/* Be sure it died */
-		down(&i2ob_thread_dead);
+		wait_for_completion(&i2ob_thread_dead);
 		printk("done.\n");
 	}
 
Index: drivers/i2o/i2o_core.c
===================================================================
RCS file: /inst/cvs/linux/drivers/i2o/i2o_core.c,v
retrieving revision 1.3.2.25
diff -u -r1.3.2.25 i2o_core.c
--- drivers/i2o/i2o_core.c	2001/05/14 10:36:06	1.3.2.25
+++ drivers/i2o/i2o_core.c	2001/07/20 09:17:54
@@ -43,6 +43,7 @@
 #include <linux/interrupt.h>
 #include <linux/sched.h>
 #include <asm/semaphore.h>
+#include <linux/completion.h>
 
 #include <asm/io.h>
 #include <linux/reboot.h>
@@ -206,7 +207,7 @@
  */
  
 static DECLARE_MUTEX(evt_sem);
-static DECLARE_MUTEX_LOCKED(evt_dead);
+static DECLARE_COMPLETION(evt_dead);
 DECLARE_WAIT_QUEUE_HEAD(evt_wait);
 
 static struct notifier_block i2o_reboot_notifier =
@@ -905,7 +906,7 @@
 			dprintk(KERN_INFO "I2O event thread dead\n");
 			printk("exiting...");
 			evt_running = 0;
-			up_and_exit(&evt_dead, 0);
+			complete_and_exit(&evt_dead, 0);
 		}
 
 		/* 
@@ -3427,7 +3428,7 @@
 		stat = kill_proc(evt_pid, SIGTERM, 1);
 		if(!stat) {
 			printk("waiting...");
-			down(&evt_dead);
+			wait_for_completion(&evt_dead);
 		}
 		printk("done.\n");
 	}
Index: drivers/net/8139too.c
===================================================================
RCS file: /inst/cvs/linux/drivers/net/8139too.c,v
retrieving revision 1.1.2.28
diff -u -r1.1.2.28 8139too.c
--- drivers/net/8139too.c	2001/07/19 07:54:26	1.1.2.28
+++ drivers/net/8139too.c	2001/07/20 09:20:09
@@ -152,10 +152,10 @@
 #include <linux/delay.h>
 #include <linux/ethtool.h>
 #include <linux/mii.h>
+#include <linux/completion.h>
 #include <asm/io.h>
 #include <asm/uaccess.h>
 
-
 #define RTL8139_DRIVER_NAME   DRV_NAME " Fast Ethernet driver " DRV_VERSION
 #define PFX DRV_NAME ": "
 
@@ -585,7 +585,7 @@
 	chip_t chipset;
 	pid_t thr_pid;
 	wait_queue_head_t thr_wait;
-	struct semaphore thr_exited;
+	struct completion thr_exited;
 	u32 rx_config;
 	struct rtl_extra_stats xstats;
 };
@@ -970,7 +970,7 @@
 	tp->mmio_addr = ioaddr;
 	spin_lock_init (&tp->lock);
 	init_waitqueue_head (&tp->thr_wait);
-	init_MUTEX_LOCKED (&tp->thr_exited);
+	init_completion (&tp->thr_exited);
 
 	/* dev is fully set up and ready to use now */
 	DPRINTK("about to register device named %s (%p)...\n", dev->name, dev);
@@ -1632,7 +1632,7 @@
 		rtnl_unlock ();
 	}
 
-	up_and_exit (&tp->thr_exited, 0);
+	complete_and_exit (&tp->thr_exited, 0);
 }
 
 
@@ -2138,7 +2138,7 @@
 			printk (KERN_ERR "%s: unable to signal thread\n", dev->name);
 			return ret;
 		}
-		down (&tp->thr_exited);
+		wait_for_completion (&tp->thr_exited);
 	}
 
 	DPRINTK ("%s: Shutting down ethercard, status was 0x%4.4x.\n",
Index: drivers/usb/hub.c
===================================================================
RCS file: /inst/cvs/linux/drivers/usb/hub.c,v
retrieving revision 1.2.2.41
diff -u -r1.2.2.41 hub.c
--- drivers/usb/hub.c	2001/05/14 09:53:16	1.2.2.41
+++ drivers/usb/hub.c	2001/07/20 09:20:29
@@ -36,7 +36,7 @@
 
 static DECLARE_WAIT_QUEUE_HEAD(khubd_wait);
 static int khubd_pid = 0;			/* PID of khubd */
-static DECLARE_MUTEX_LOCKED(khubd_exited);
+static DECLARE_COMPLETION(khubd_exited);
 
 static int usb_get_hub_descriptor(struct usb_device *dev, void *data, int size)
 {
@@ -781,7 +781,7 @@
 
 	dbg("usb_hub_thread exiting");
 
-	up_and_exit(&khubd_exited, 0);
+	complete_and_exit(&khubd_exited, 0);
 }
 
 static struct usb_device_id hub_id_table [] = {
@@ -834,7 +834,7 @@
 	/* Kill the thread */
 	ret = kill_proc(khubd_pid, SIGTERM, 1);
 
-	down(&khubd_exited);
+	wait_for_completion(&khubd_exited);
 
 	/*
 	 * Hub resources are freed for us by usb_deregister. It calls
Index: fs/jffs/inode-v23.c
===================================================================
RCS file: /inst/cvs/linux/fs/jffs/Attic/inode-v23.c,v
retrieving revision 1.1.2.12
diff -u -r1.1.2.12 inode-v23.c
--- fs/jffs/inode-v23.c	2001/06/28 11:16:55	1.1.2.12
+++ fs/jffs/inode-v23.c	2001/07/20 09:13:04
@@ -157,7 +157,7 @@
 		D1(printk (KERN_NOTICE "jffs_put_super(): Telling gc thread to die.\n"));
 		send_sig(SIGKILL, c->gc_task, 1);
 	}
-	down (&c->gc_thread_sem);
+	wait_for_completion(&c->gc_thread_comp);
 
 	D1(printk (KERN_NOTICE "jffs_put_super(): Successfully waited on thread.\n"));
 
Index: fs/jffs/intrep.c
===================================================================
RCS file: /inst/cvs/linux/fs/jffs/Attic/intrep.c,v
retrieving revision 1.1.2.5
diff -u -r1.1.2.5 intrep.c
--- fs/jffs/intrep.c	2001/02/24 19:01:45	1.1.2.5
+++ fs/jffs/intrep.c	2001/07/20 09:11:15
@@ -3008,7 +3008,7 @@
 
 	current->session = 1;
 	current->pgrp = 1;
-	init_MUTEX_LOCKED(&c->gc_thread_sem); /* barrier */ 
+	init_completion(&c->gc_thread_comp); /* barrier */ 
 	spin_lock_irq(&current->sigmask_lock);
 	siginitsetinv (&current->blocked, sigmask(SIGHUP) | sigmask(SIGKILL) | sigmask(SIGSTOP) | sigmask(SIGCONT));
 	recalc_sigpending(current);
@@ -3055,7 +3055,7 @@
 				D1(printk("jffs_garbage_collect_thread(): SIGKILL received.\n"));
 				c->gc_task = NULL;
 				unlock_kernel();
-				up_and_exit(&c->gc_thread_sem, 0);
+				complete_and_exit(&c->gc_thread_comp, 0);
 			}
 		}
 
Index: include/linux/jffs.h
===================================================================
RCS file: /inst/cvs/linux/include/linux/jffs.h,v
retrieving revision 1.1.1.1.2.2
diff -u -r1.1.1.1.2.2 jffs.h
--- include/linux/jffs.h	2000/12/05 10:51:21	1.1.1.1.2.2
+++ include/linux/jffs.h	2001/07/20 09:58:49
@@ -22,6 +22,8 @@
 
 #define JFFS_VERSION_STRING "1.0"
 
+#include <linux/completion.h>
+
 /* This is a magic number that is used as an identification number for
    this file system.  It is written to the super_block structure.  */
 #define JFFS_MAGIC_SB_BITMASK 0x07c0  /* 1984 */
@@ -185,7 +187,7 @@
 	struct jffs_delete_list *delete_list; /* Track deleted files.  */
 	pid_t thread_pid;		/* GC thread's PID */
 	struct task_struct *gc_task;	/* GC task struct */
-	struct semaphore gc_thread_sem; /* GC thread exit mutex */
+	struct completion gc_thread_comp; /* GC thread exit mutex */
 	__u32 gc_minfree_threshold;	/* GC trigger thresholds */
 	__u32 gc_maxdirty_threshold;
 };
Index: include/linux/kernel.h
===================================================================
RCS file: /inst/cvs/linux/include/linux/kernel.h,v
retrieving revision 1.1.1.1.2.18
diff -u -r1.1.1.1.2.18 kernel.h
--- include/linux/kernel.h	2001/06/13 06:41:40	1.1.1.1.2.18
+++ include/linux/kernel.h	2001/07/20 09:17:05
@@ -45,14 +45,14 @@
 #define FASTCALL(x)	x
 #endif
 
-struct semaphore;
+struct completion;
 
 extern struct notifier_block *panic_notifier_list;
 NORET_TYPE void panic(const char * fmt, ...)
 	__attribute__ ((NORET_AND format (printf, 1, 2)));
 NORET_TYPE void do_exit(long error_code)
 	ATTRIB_NORET;
-NORET_TYPE void up_and_exit(struct semaphore *, long)
+NORET_TYPE void complete_and_exit(struct completion *, long)
 	ATTRIB_NORET;
 extern int abs(int);
 extern unsigned long simple_strtoul(const char *,char **,unsigned int);
Index: kernel/exit.c
===================================================================
RCS file: /inst/cvs/linux/kernel/exit.c,v
retrieving revision 1.3.2.32
diff -u -r1.3.2.32 exit.c
--- kernel/exit.c	2001/06/02 14:40:18	1.3.2.32
+++ kernel/exit.c	2001/07/20 09:09:16
@@ -473,10 +473,10 @@
 	goto fake_volatile;
 }
 
-NORET_TYPE void up_and_exit(struct semaphore *sem, long code)
+NORET_TYPE void complete_and_exit(struct completion *comp, long code)
 {
-	if (sem)
-		up(sem);
+	if (comp)
+		complete(comp);
 	
 	do_exit(code);
 }
Index: kernel/ksyms.c
===================================================================
RCS file: /inst/cvs/linux/kernel/ksyms.c,v
retrieving revision 1.3.2.78
diff -u -r1.3.2.78 ksyms.c
--- kernel/ksyms.c	2001/07/19 07:54:30	1.3.2.78
+++ kernel/ksyms.c	2001/07/20 09:20:39
@@ -422,7 +422,7 @@
 EXPORT_SYMBOL(iomem_resource);
 
 /* process management */
-EXPORT_SYMBOL(up_and_exit);
+EXPORT_SYMBOL(complete_and_exit);
 EXPORT_SYMBOL(__wake_up);
 EXPORT_SYMBOL(wake_up_process);
 EXPORT_SYMBOL(sleep_on);




--
dwmw2



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.7-pre9..
  2001-07-20  8:26     ` 2.4.7-pre9 Jens Axboe
@ 2001-07-20 16:42       ` Linus Torvalds
  2001-07-20 18:57         ` 2.4.7-pre9 Jens Axboe
  0 siblings, 1 reply; 14+ messages in thread
From: Linus Torvalds @ 2001-07-20 16:42 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Kernel Mailing List, Alexander Viro, David S. Miller,
	Andrea Arcangeli, Alan Cox, David Woodhouse, linux-scsi,
	Andrew Morton



On Fri, 20 Jul 2001, Jens Axboe wrote:
>
> > The paging stuff doesn't use any of this. The paging stuff use the page
> > cache lock bit, and always has.
>
> Paging still hits a request, I assumed that's what the (really really)
> old comment meant to say.

No. Tha paging stuff (and _all_ regular IO) uses a regular request, with a
NULL waiter. That's the normal path. That normal path depends on the
buffer heads associated with the requests and their "bh->b_end_io()"
function marking other state up-to-date, and then waits on the page being
locked.

The req->sem (and now req->completion) thing is a very different thing: it
is a "we are waiting for this particular request", and is used when it's
not really IO and doesn't have a bh, but something that has side effects.
Like an ioctl that causes a special command to the disk to change some
parameters, or query the size of the disk or something.

So the comment has just always been wrong, I think. It may be that the
original swapping code was doing raw requests instead of real IO, so maybe
the comment was actually correct back in 1992 or something. My memory
isn't good enough..

		Linus


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.7-pre9..
  2001-07-20 16:42       ` 2.4.7-pre9 Linus Torvalds
@ 2001-07-20 18:57         ` Jens Axboe
  0 siblings, 0 replies; 14+ messages in thread
From: Jens Axboe @ 2001-07-20 18:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kernel Mailing List, Alexander Viro, David S. Miller,
	Andrea Arcangeli, Alan Cox, David Woodhouse, linux-scsi,
	Andrew Morton

On Fri, Jul 20 2001, Linus Torvalds wrote:
> 
> 
> On Fri, 20 Jul 2001, Jens Axboe wrote:
> >
> > > The paging stuff doesn't use any of this. The paging stuff use the page
> > > cache lock bit, and always has.
> >
> > Paging still hits a request, I assumed that's what the (really really)
> > old comment meant to say.
> 
> No. Tha paging stuff (and _all_ regular IO) uses a regular request, with a
> NULL waiter. That's the normal path. That normal path depends on the
> buffer heads associated with the requests and their "bh->b_end_io()"
> function marking other state up-to-date, and then waits on the page being
> locked.

This is perfectly clear. I'm saying 'paging uses a request just like any
other I/O', and you seem to disagree and restate the same thing?! In
fact the lower layers have no way of knowing what is what, paging or
not.

> The req->sem (and now req->completion) thing is a very different thing: it
> is a "we are waiting for this particular request", and is used when it's
> not really IO and doesn't have a bh, but something that has side effects.
> Like an ioctl that causes a special command to the disk to change some
> parameters, or query the size of the disk or something.

Ditto! Are you reading what I write?

> So the comment has just always been wrong, I think. It may be that the
> original swapping code was doing raw requests instead of real IO, so maybe
> the comment was actually correct back in 1992 or something. My memory
> isn't good enough..

Good, so now you agree that the corrected comment (as per pre9) is crap,
and the patch I sent that changes the wording to:

"Ok, this is an expanded form so that we can use the same
request for paging requests."

is so much better than _you_ mixing ->waiting and ->sem into this paging
or non-paging pool?

But in fact the whole comment block should just be removed. It gives no
useful or additional information.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.7-pre9..
  2001-07-20  5:17 2.4.7-pre9 Linus Torvalds
  2001-07-20  7:22 ` 2.4.7-pre9 Jens Axboe
  2001-07-20 10:23 ` 2.4.7-pre9 David Woodhouse
@ 2001-07-23 12:56 ` Pavel Machek
  2001-07-27 16:18   ` 2.4.7-pre9 Matthew Dharm
  2 siblings, 1 reply; 14+ messages in thread
From: Pavel Machek @ 2001-07-23 12:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kernel Mailing List, Alexander Viro, David S. Miller,
	Andrea Arcangeli, Alan Cox, David Woodhouse, linux-scsi,
	Andrew Morton

Hi!

> I've changed all generic code, so drivers are all expected to compile and
> work. However, some SCSI drivers use the semaphore trick in their own
> code, and I've not mucked with that. It's not worth worrying about too
> much, as the race is basically impossible to hit (famous last words), but

I smell problems in usb/storage/*...

<evil>
What about adding strategic udelay() somewhere so race is easier to hit?
</evil>
								Pavel
-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.7-pre9..
  2001-07-23 12:56 ` 2.4.7-pre9 Pavel Machek
@ 2001-07-27 16:18   ` Matthew Dharm
  2001-07-27 17:46     ` 2.4.7-pre9 Linus Torvalds
  0 siblings, 1 reply; 14+ messages in thread
From: Matthew Dharm @ 2001-07-27 16:18 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Linus Torvalds, Kernel Mailing List, Alexander Viro,
	David S. Miller, Andrea Arcangeli, Alan Cox, David Woodhouse,
	linux-scsi, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 1299 bytes --]

Oh damn....

It looks like I missed an important discussion in the torrent of e-mail
that I receive... could someone give me the 30-second executive summary so
I can look at what may need to change in usb-storage?

Matt

On Mon, Jul 23, 2001 at 12:56:35PM +0000, Pavel Machek wrote:
> Hi!
> 
> > I've changed all generic code, so drivers are all expected to compile and
> > work. However, some SCSI drivers use the semaphore trick in their own
> > code, and I've not mucked with that. It's not worth worrying about too
> > much, as the race is basically impossible to hit (famous last words), but
> 
> I smell problems in usb/storage/*...
> 
> <evil>
> What about adding strategic udelay() somewhere so race is easier to hit?
> </evil>
> 								Pavel
> -- 
> Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
> details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org

-- 
Matthew Dharm                              Home: mdharm-usb@one-eyed-alien.net 
Maintainer, Linux USB Mass Storage Driver

You should try to see the techs say "three piece suit".
					-- The Chief
User Friendly, 11/23/1997

[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.7-pre9..
  2001-07-27 16:18   ` 2.4.7-pre9 Matthew Dharm
@ 2001-07-27 17:46     ` Linus Torvalds
  2001-07-27 19:47       ` 2.4.7-pre9 Matthew Dharm
  0 siblings, 1 reply; 14+ messages in thread
From: Linus Torvalds @ 2001-07-27 17:46 UTC (permalink / raw)
  To: Matthew Dharm
  Cc: Pavel Machek, Kernel Mailing List, Alexander Viro,
	David S. Miller, Andrea Arcangeli, Alan Cox, David Woodhouse,
	linux-scsi, Andrew Morton


On Fri, 27 Jul 2001, Matthew Dharm wrote:
>
> It looks like I missed an important discussion in the torrent of e-mail
> that I receive... could someone give me the 30-second executive summary so
> I can look at what may need to change in usb-storage?

The basic summary is that we had this (fairly common) way of waiting for
certain events by having a locked semaphore on the stack of the waiter,
and then having the waiter do a "down()" which caused it to block until
the thing it was waiting for did an "up()".

This works fairly well, _but_ it has a really small (and quite unlikely)
race on SMP, that is not so much a race of the idea itself, as of the
implementation of the semaphores. We could have fixed the semaphores, but
there were a few reasons not to:

 - the semaphores are optimized (on purpose) for the non-contention case.
   The "wait for completion" usage has the opposite default case
 - the semaphores are quite involved and architecture-specific, exactly
   due to this optimization. Trying to change them is painful as hell.

So instead, I introduced the notion of "wait for completion":

	struct completion event;

	init_completion(&event);
	.. pass of event pointer to waker ..
	wait_for_completion(&event);

where the thing we're waiting for just does "complete(event)" and we're
all done.

This has the advantage of being a bit more obvious just from a syntactic
angle about what is going on. It also ends up being slightly more
efficient than semaphores because we can handle the right expected case,
and it also avoids the implementation issue that made for the race in the
first place.

Switching over to the new format is really trivial:

 struct semaphore	-> struct completion
 init_MUTEX_LOCKED	-> init_completion
 DECLARE_MUTEX_LOCKED	-> DECLARE_COMPLETION
 down()			-> wait_for_completion()
 up()			-> complete()

and you can in fact maintain 2.2.x compatibility by just having a 2.2.x
compatibility file that does the reverse mappings.

In case anybody cares, the race was that Linux semaphores only protect the
accesses _inside_ the semaphore, while the accesses by the semaphores
themselves can "race" in the internal implementation. That helps make an
efficient implementation, but it means that the race was:

	cpu #1				cpu #2

	DECLARE_MUTEX_LOCKED(sem);
	..
	down(&sem);			up(&sem);
	return;
					wake_up(&sem.wait) /*BOOM*/

where the waker still touches the semaphore data structure after the
sleeper has become happy with it no longer being locked - and free'd the
data structure by virtue of freeing the stack.

		Linus


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.7-pre9..
  2001-07-27 17:46     ` 2.4.7-pre9 Linus Torvalds
@ 2001-07-27 19:47       ` Matthew Dharm
  2001-07-27 21:00         ` 2.4.7-pre9 Linus Torvalds
  2001-07-28  3:53         ` 2.4.7-pre9 David Woodhouse
  0 siblings, 2 replies; 14+ messages in thread
From: Matthew Dharm @ 2001-07-27 19:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Pavel Machek, Kernel Mailing List, Alexander Viro,
	David S. Miller, Andrea Arcangeli, Alan Cox, David Woodhouse,
	linux-scsi, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 3425 bytes --]

Hrm... just to be clear, then... this only is a problem with semaphores
that are declared on the local stack?

IIRC, usb-storage only uses semaphores that are allocated via kfree, so I
think we're okay.  Tho, I think the new semantics are probably better, and
will probably switch to them.  Later.

Matt

On Fri, Jul 27, 2001 at 10:46:44AM -0700, Linus Torvalds wrote:
> 
> On Fri, 27 Jul 2001, Matthew Dharm wrote:
> >
> > It looks like I missed an important discussion in the torrent of e-mail
> > that I receive... could someone give me the 30-second executive summary so
> > I can look at what may need to change in usb-storage?
> 
> The basic summary is that we had this (fairly common) way of waiting for
> certain events by having a locked semaphore on the stack of the waiter,
> and then having the waiter do a "down()" which caused it to block until
> the thing it was waiting for did an "up()".
> 
> This works fairly well, _but_ it has a really small (and quite unlikely)
> race on SMP, that is not so much a race of the idea itself, as of the
> implementation of the semaphores. We could have fixed the semaphores, but
> there were a few reasons not to:
> 
>  - the semaphores are optimized (on purpose) for the non-contention case.
>    The "wait for completion" usage has the opposite default case
>  - the semaphores are quite involved and architecture-specific, exactly
>    due to this optimization. Trying to change them is painful as hell.
> 
> So instead, I introduced the notion of "wait for completion":
> 
> 	struct completion event;
> 
> 	init_completion(&event);
> 	.. pass of event pointer to waker ..
> 	wait_for_completion(&event);
> 
> where the thing we're waiting for just does "complete(event)" and we're
> all done.
> 
> This has the advantage of being a bit more obvious just from a syntactic
> angle about what is going on. It also ends up being slightly more
> efficient than semaphores because we can handle the right expected case,
> and it also avoids the implementation issue that made for the race in the
> first place.
> 
> Switching over to the new format is really trivial:
> 
>  struct semaphore	-> struct completion
>  init_MUTEX_LOCKED	-> init_completion
>  DECLARE_MUTEX_LOCKED	-> DECLARE_COMPLETION
>  down()			-> wait_for_completion()
>  up()			-> complete()
> 
> and you can in fact maintain 2.2.x compatibility by just having a 2.2.x
> compatibility file that does the reverse mappings.
> 
> In case anybody cares, the race was that Linux semaphores only protect the
> accesses _inside_ the semaphore, while the accesses by the semaphores
> themselves can "race" in the internal implementation. That helps make an
> efficient implementation, but it means that the race was:
> 
> 	cpu #1				cpu #2
> 
> 	DECLARE_MUTEX_LOCKED(sem);
> 	..
> 	down(&sem);			up(&sem);
> 	return;
> 					wake_up(&sem.wait) /*BOOM*/
> 
> where the waker still touches the semaphore data structure after the
> sleeper has become happy with it no longer being locked - and free'd the
> data structure by virtue of freeing the stack.
> 
> 		Linus

-- 
Matthew Dharm                              Home: mdharm-usb@one-eyed-alien.net 
Maintainer, Linux USB Mass Storage Driver

Hey Chief.  We've figured out how to save the technical department.  We 
need to be committed.
					-- The Techs
User Friendly, 1/22/1998

[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.7-pre9..
  2001-07-28  3:53         ` 2.4.7-pre9 David Woodhouse
@ 2001-07-27 19:55           ` Matthew Dharm
  0 siblings, 0 replies; 14+ messages in thread
From: Matthew Dharm @ 2001-07-27 19:55 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Kernel Mailing List, linux-scsi

[-- Attachment #1: Type: text/plain, Size: 809 bytes --]

Ah... okay.. now I see the issue.  Now let's see if I can fix and test this
over the weekend....

Matt

On Sat, Jul 28, 2001 at 04:53:30AM +0100, David Woodhouse wrote:
> 
> (Cc list trimmed)
> 
> On Fri, 27 Jul 2001, Matthew Dharm wrote:
> 
> > IIRC, usb-storage only uses semaphores that are allocated via kfree, so I
> > think we're okay.  Tho, I think the new semantics are probably better, and
> > will probably switch to them.  Later.
> 
> If the exit (or indeed any) path does down(); kfree(); you suffer the same 
> problem.
> 
> -- 
> dwmw2

-- 
Matthew Dharm                              Home: mdharm-usb@one-eyed-alien.net 
Maintainer, Linux USB Mass Storage Driver

How would you like this tie wrapped around your hairy round head?
					-- Greg
User Friendly, 9/2/1998

[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.7-pre9..
  2001-07-27 19:47       ` 2.4.7-pre9 Matthew Dharm
@ 2001-07-27 21:00         ` Linus Torvalds
  2001-07-28  3:53         ` 2.4.7-pre9 David Woodhouse
  1 sibling, 0 replies; 14+ messages in thread
From: Linus Torvalds @ 2001-07-27 21:00 UTC (permalink / raw)
  To: Matthew Dharm
  Cc: Pavel Machek, Kernel Mailing List, Alexander Viro,
	David S. Miller, Andrea Arcangeli, Alan Cox, David Woodhouse,
	linux-scsi, Andrew Morton



On Fri, 27 Jul 2001, Matthew Dharm wrote:
>
> Hrm... just to be clear, then... this only is a problem with semaphores
> that are declared on the local stack?

Yes.

In theory you can have the same problem on any allocation that is free'd
after the semaphore has been used, but most (hopefully all) other forms of
allocations should be using proper reference counting etc, so that it is
impossible for a semaphore user to have the semaphore disappear from under
it.

> IIRC, usb-storage only uses semaphores that are allocated via kfree, so I
> think we're okay.  Tho, I think the new semantics are probably better, and
> will probably switch to them.  Later.

If you're using kmalloc/kfree, just make sure that the kfree never happens
early (ie the "down()" does not protect the kfree, but some other
reference count or lock does).

If you have

	CPU #1					CPU #2

	down(mem->semaphore);			up(mem->semaphore);
	kfree(mem);

you can still have the same bug.

But if you have

	CPU #1					CPU #2

	down(mem->semaphore)			up(mem->semaphore);
	put(mem);				put(mem);

where "put(mem)" does something like

	if (atomic_dec_and_test(&mem->refcount))
		kfree(mem);

then you're obvoiusly ok, because now the free'ing of the data structure
cannot happen until all users have handed off on it.

		Linus


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 2.4.7-pre9..
  2001-07-27 19:47       ` 2.4.7-pre9 Matthew Dharm
  2001-07-27 21:00         ` 2.4.7-pre9 Linus Torvalds
@ 2001-07-28  3:53         ` David Woodhouse
  2001-07-27 19:55           ` 2.4.7-pre9 Matthew Dharm
  1 sibling, 1 reply; 14+ messages in thread
From: David Woodhouse @ 2001-07-28  3:53 UTC (permalink / raw)
  To: Matthew Dharm; +Cc: Kernel Mailing List, linux-scsi


(Cc list trimmed)

On Fri, 27 Jul 2001, Matthew Dharm wrote:

> IIRC, usb-storage only uses semaphores that are allocated via kfree, so I
> think we're okay.  Tho, I think the new semantics are probably better, and
> will probably switch to them.  Later.

If the exit (or indeed any) path does down(); kfree(); you suffer the same 
problem.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2001-07-28  0:27 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-07-20  5:17 2.4.7-pre9 Linus Torvalds
2001-07-20  7:22 ` 2.4.7-pre9 Jens Axboe
2001-07-20  8:15   ` 2.4.7-pre9 Linus Torvalds
2001-07-20  8:26     ` 2.4.7-pre9 Jens Axboe
2001-07-20 16:42       ` 2.4.7-pre9 Linus Torvalds
2001-07-20 18:57         ` 2.4.7-pre9 Jens Axboe
2001-07-20 10:23 ` 2.4.7-pre9 David Woodhouse
2001-07-23 12:56 ` 2.4.7-pre9 Pavel Machek
2001-07-27 16:18   ` 2.4.7-pre9 Matthew Dharm
2001-07-27 17:46     ` 2.4.7-pre9 Linus Torvalds
2001-07-27 19:47       ` 2.4.7-pre9 Matthew Dharm
2001-07-27 21:00         ` 2.4.7-pre9 Linus Torvalds
2001-07-28  3:53         ` 2.4.7-pre9 David Woodhouse
2001-07-27 19:55           ` 2.4.7-pre9 Matthew Dharm

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).