All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] IBM Power RAID driver (ipr)
@ 2004-01-16 21:59 Brian King
  2004-01-16 23:01 ` Muli Ben-Yehuda
                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Brian King @ 2004-01-16 21:59 UTC (permalink / raw)
  To: linux-scsi

Hi,

I have just written a LLD for the IBM Power RAID family of adapters.
This includes several IBM iSeries and pSeries SCSI adapters. Please
review for 2.6 inclusion.

The driver can be found at:

http://www-124.ibm.com/storageio/ipr/patch-2.6.1-ipr-2.0.0-1.gz


Thanks

-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] IBM Power RAID driver (ipr)
  2004-01-16 21:59 [RFC] IBM Power RAID driver (ipr) Brian King
@ 2004-01-16 23:01 ` Muli Ben-Yehuda
  2004-01-16 23:46   ` Brian King
  2004-01-18 14:10 ` Anton Blanchard
  2004-01-19 18:34 ` Christoph Hellwig
  2 siblings, 1 reply; 28+ messages in thread
From: Muli Ben-Yehuda @ 2004-01-16 23:01 UTC (permalink / raw)
  To: Brian King; +Cc: linux-scsi

[-- Attachment #1: Type: text/plain, Size: 3046 bytes --]

On Fri, Jan 16, 2004 at 03:59:54PM -0600, Brian King wrote:

> http://www-124.ibm.com/storageio/ipr/patch-2.6.1-ipr-2.0.0-1.gz

Hi, I took a look and I have some comments / questions. I will try to
go over all of it in the next few days, but here's what I went over so
far: 

> +static int ipr_unsafe = 1;		/* xxx set to 0 for ship */

Should it be set to 0 or to 1 for inclusion in the kernel? any driver
that is first included is considered experimental, so just how unsafe
is ipr_unsafe?

> +module_param_named(unsafe, ipr_unsafe, int, 0);
> +MODULE_PARM_DESC(unsafe, "Do not use!");

Same question, I guess, and if you decide to keep it, please add a
more descriptive MODULE_PARM_DESC string... 

> +static void ipr_sleep_no_lock(signed long delay)
> +{
> +	DECLARE_WAIT_QUEUE_HEAD(internal_wait);
> +
> +	sleep_on_timeout(&internal_wait, (delay * HZ) / 1000);
> +	return;

just a nitpick, but the Linux coding style says no 'return' from void
functions. 

>+static void ipr_sleep(struct ipr_ioa_cfg *ioa_cfg, signed long delay)
> +{
> +	spin_unlock_irq(ioa_cfg->host->host_lock);
> +	ipr_sleep_no_lock(delay);
> +	spin_lock_irq(ioa_cfg->host->host_lock);

Hmpf... it's called from ipr_diagnostic_ioctl(), which does
spin_lock_irqsave()
ipr_sleep()
spin_unlock_irqrestore()

So I guess you really do intend to enable interrupts before going to
sleep? it's somewhat fragile... also, is the dropping and
reacquisition of the lock safe here (in the sense that none of the
data the lock is protecting that we care about can change while the
lock is not held?)

Same questions wrt ipr_interruptible_sleep_on(), which does the same
thing. By the way, ipr_dump_ioctl does not check the return value from
ipr_interruptible_sleep_on(). If this is intentional, perhaps call it
using (void)ipr_interruptible_sleep_on()?

> +static void ipr_block_requests(struct ipr_ioa_cfg *ioa_cfg)
> +{
> +	ENTER;
> +
> +	if (0 == ioa_cfg->block_host_ops++) {
> +	   ioa_cfg->allow_cmds = 0;

This is racy unless a lock is held. I think a lock is held here in all
callers of this function, but a comment to that effect would be nice
before the function. This also applies to every other function which
either must be called with a lock taken, or modifies the "locking
status" - takes a lock which is released elsewhere, enables interrupts
unconditionally, etc. 

+static void ipr_unblock_requests(struct ipr_ioa_cfg *ioa_cfg)
+{
+	ENTER;
+
+	if (0 == --ioa_cfg->block_host_ops) {
+	   ioa_cfg->allow_cmds = 1;
+	   spin_unlock_irq(ioa_cfg->host->host_lock);
+	   scsi_unblock_requests(ioa_cfg->host);
+	   spin_lock_irq(ioa_cfg->host->host_lock);

Eeek! What happens if on another CPU a call comes in that does
ipr_block_requests() after the lock is released and before it is taken
again?

That's as far as I got, for now. 

Cheers, 
Muli 
-- 
Muli Ben-Yehuda
http://www.mulix.org | http://mulix.livejournal.com/

"the nucleus of linux oscillates my world" - gccbot@#offtopic


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] IBM Power RAID driver (ipr)
  2004-01-16 23:01 ` Muli Ben-Yehuda
@ 2004-01-16 23:46   ` Brian King
  0 siblings, 0 replies; 28+ messages in thread
From: Brian King @ 2004-01-16 23:46 UTC (permalink / raw)
  To: Muli Ben-Yehuda; +Cc: linux-scsi

Muli Ben-Yehuda wrote:
> Hi, I took a look and I have some comments / questions. I will try to
> go over all of it in the next few days, but here's what I went over so
> far: 

Thanks!

>>+static int ipr_unsafe = 1;		/* xxx set to 0 for ship */
> 
> 
> Should it be set to 0 or to 1 for inclusion in the kernel? any driver
> that is first included is considered experimental, so just how unsafe
> is ipr_unsafe?

Perhaps this is poorly named. If you take a look at ipr_invalid_adapter,
you will see how ipr_unsafe is used. If set to 1, then it will allow 
certain adapters to run in systems which have known problems.

>>+static void ipr_sleep_no_lock(signed long delay)
>>+{
>>+	DECLARE_WAIT_QUEUE_HEAD(internal_wait);
>>+
>>+	sleep_on_timeout(&internal_wait, (delay * HZ) / 1000);
>>+	return;
> 
> 
> just a nitpick, but the Linux coding style says no 'return' from void
> functions. 

Done.

>>+static void ipr_sleep(struct ipr_ioa_cfg *ioa_cfg, signed long delay)
>>+{
>>+	spin_unlock_irq(ioa_cfg->host->host_lock);
>>+	ipr_sleep_no_lock(delay);
>>+	spin_lock_irq(ioa_cfg->host->host_lock);
> 
> 
> Hmpf... it's called from ipr_diagnostic_ioctl(), which does
> spin_lock_irqsave()
> ipr_sleep()
> spin_unlock_irqrestore()
> 
> So I guess you really do intend to enable interrupts before going to
> sleep? it's somewhat fragile... also, is the dropping and
> reacquisition of the lock safe here (in the sense that none of the
> data the lock is protecting that we care about can change while the
> lock is not held?)

I certainly don't want to go to sleep with interrupts masked. I need to 
wait a second and then check to see if some shared data has changed, so 
I am actually counting on the data protected by the spinlock being able 
to change. I can certainly remove ipr_sleep since it is only used in the 
one spot now and

> Same questions wrt ipr_interruptible_sleep_on(), which does the same
> thing. By the way, ipr_dump_ioctl does not check the return value from
> ipr_interruptible_sleep_on(). If this is intentional, perhaps call it
> using (void)ipr_interruptible_sleep_on()?

Yes. It checks the sdt_state, which is sufficient. I will add the (void).


>>+static void ipr_block_requests(struct ipr_ioa_cfg *ioa_cfg)
>>+{
>>+	ENTER;
>>+
>>+	if (0 == ioa_cfg->block_host_ops++) {
>>+	   ioa_cfg->allow_cmds = 0;
> 
> 
> This is racy unless a lock is held. I think a lock is held here in all
> callers of this function, but a comment to that effect would be nice
> before the function. This also applies to every other function which
> either must be called with a lock taken, or modifies the "locking
> status" - takes a lock which is released elsewhere, enables interrupts
> unconditionally, etc. 

Yes, the host lock must be held by the caller. I can certainly add the 
comments you suggest as well.


> +static void ipr_unblock_requests(struct ipr_ioa_cfg *ioa_cfg)
> +{
> +	ENTER;
> +
> +	if (0 == --ioa_cfg->block_host_ops) {
> +	   ioa_cfg->allow_cmds = 1;
> +	   spin_unlock_irq(ioa_cfg->host->host_lock);
> +	   scsi_unblock_requests(ioa_cfg->host);
> +	   spin_lock_irq(ioa_cfg->host->host_lock);
> 
> Eeek! What happens if on another CPU a call comes in that does
> ipr_block_requests() after the lock is released and before it is taken
> again?

Looks like I have other issues with the block/unblock interfaces as 
well. I'll clean this up.


-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] IBM Power RAID driver (ipr)
  2004-01-16 21:59 [RFC] IBM Power RAID driver (ipr) Brian King
  2004-01-16 23:01 ` Muli Ben-Yehuda
@ 2004-01-18 14:10 ` Anton Blanchard
  2004-01-19 16:16   ` Brian King
  2004-01-19 18:34 ` Christoph Hellwig
  2 siblings, 1 reply; 28+ messages in thread
From: Anton Blanchard @ 2004-01-18 14:10 UTC (permalink / raw)
  To: Brian King; +Cc: linux-scsi


Hi Brian,

> I have just written a LLD for the IBM Power RAID family of adapters.
> This includes several IBM iSeries and pSeries SCSI adapters. Please
> review for 2.6 inclusion.

Nice work! It does a good job of using all the recent PCI and SCSI
infrastructure. I had a quick look through it and can only offer some
minor suggestions.

- Since ipr_sleep_no_lock/ipr_sleep is used once, we can open code
  a schedule_timeout there instead.

- Add a comment to explain why the wakeup strategy isnt racy. I
  automatically thought it was (just like the kernel version of
  sleep_on) until I could verify all wake ups are done under the scsi
  host lock. I also have a trivial change to make the two sleep
  functions alike.

- Im a bit worried about memory barriers wrt the allow_* things. As an
  example I added some mb()s wherever ->allow_interrupts is set.
  Actually, do we need this flag at all? Cant we just set the bits on the
  card to disable interrupts, then do a synchronize_irq()? Similarly it
  would be nice to get rid of some of the other allow_ flags if we can
  get around them.

- The forcing of the pci cacheline size worries me, could we have problems 
  with enabling MWI etc after doing this?

- Do we need the extra ipr_set_pcix_cmd_reg during init? We are going to
  do it after a reset arent we?

- It looks like the kmalloc in ipr_sdt_copy is called from process
  context without any locks held, so we dont need an atomic allocation.

Anton


diff -urp t.orig/drivers/scsi/ipr.c t/drivers/scsi/ipr.c
--- t.orig/drivers/scsi/ipr.c	2004-01-01 00:01:37.000000000 +1100
+++ t/drivers/scsi/ipr.c	2004-01-19 01:08:38.958959407 +1100
@@ -670,46 +670,12 @@ static const struct ipr_ses_table_entry 
 };
 
 /**
- * ipr_sleep_no_lock - Sleep for the specified amount of time
- * @delay:		time to sleep
- *
- * Sleep for the specified amount of time
- * 
- * Return value:
- * 	none
- **/
-static void ipr_sleep_no_lock(signed long delay)
-{
-	DECLARE_WAIT_QUEUE_HEAD(internal_wait);
-
-	sleep_on_timeout(&internal_wait, (delay * HZ) / 1000);
-	return;
-}
-
-/**
- * ipr_sleep - Sleep for the specified amount of time
- * @ioa_cfg:	ioa config struct
- * @delay:		time to sleep in msecs
- *
- * Release the host lock and sleep for the specified amount of time
- * 
- * Return value:
- * 	none
- **/
-static void ipr_sleep(struct ipr_ioa_cfg *ioa_cfg, signed long delay)
-{
-	spin_unlock_irq(ioa_cfg->host->host_lock);
-	ipr_sleep_no_lock(delay);
-	spin_lock_irq(ioa_cfg->host->host_lock);
-	return;
-}
-
-/**
  * ipr_interruptible_sleep_on - Sleep on the wait queue
  * @ioa_cfg:	ioa config struct
  * @wait_head:	wait queue to sleep on
  *
- * Sleep interruptibly on the wait queue.
+ * Sleep interruptibly on the wait queue. We avoid wakeup races by
+ * having the scsi host lock held on both sleep and wakeup.
  * 
  * Return value:
  * 	0 on success / -EINTR if woken due to a signal
@@ -717,7 +683,9 @@ static void ipr_sleep(struct ipr_ioa_cfg
 static int ipr_interruptible_sleep_on(struct ipr_ioa_cfg *ioa_cfg,
 				      wait_queue_head_t *wait_head)
 {
-	DECLARE_WAITQUEUE(wait_q_entry, current);
+	wait_queue_t wait_q_entry;
+
+	init_waitqueue_entry(&wait_q_entry);
 
 	set_current_state(TASK_INTERRUPTIBLE);
 
@@ -742,7 +710,8 @@ static int ipr_interruptible_sleep_on(st
  * @ioa_cfg:	ioa config struct
  * @wait_head:	wait queue to sleep on
  *
- * Sleep uninterruptibly on the wait queue.
+ * Sleep uninterruptibly on the wait queue. We avoid wakeup races by
+ * having the scsi host lock held on both sleep and wakeup.
  * 
  * Return value:
  * 	none
@@ -1076,6 +1045,7 @@ static void ipr_mask_and_clear_all_inter
 
 	/* Stop new interrupts */
 	ioa_cfg->allow_interrupts = 0;
+	mb();
 
 	/* Set interrupt mask to stop all new interrupts */
 	writel(~0, ioa_cfg->regs.set_interrupt_mask_reg);
@@ -1395,7 +1365,7 @@ static int __devinit ipr_probe_ioa(struc
 				   ioa_cfg->chip_cfg->cache_line_size);
 
 	if (rc != PCIBIOS_SUCCESSFUL) {
-		dev_err(&pdev->dev, "Read of config space failed\n");
+		dev_err(&pdev->dev, "Write of config space failed\n");
 		rc = -EIO;
 		goto cleanup_nomem;
 	}
@@ -1412,9 +1382,6 @@ static int __devinit ipr_probe_ioa(struc
 	if ((rc = ipr_save_pcix_cmd_reg(ioa_cfg)))
 		goto cleanup_nomem;
 
-	if ((rc = ipr_set_pcix_cmd_reg(ioa_cfg)))
-		goto cleanup_nomem;
-
 	if ((rc = ipr_alloc_mem(ioa_cfg)))
 		goto cleanup;
 
@@ -2276,6 +2243,7 @@ static int ipr_reset_enable_ioa(struct i
 
 	/* Enable interrupts */
 	ioa_cfg->allow_interrupts = 1;
+	mb();
 	writel(IPR_PCII_OPER_INTERRUPTS, ioa_cfg->regs.clr_interrupt_mask_reg);
 	temp_reg = readl(ioa_cfg->regs.sense_interrupt_mask_reg);
 
@@ -6249,13 +6217,12 @@ static int ipr_diagnostic_ioctl(struct i
 	ipr_unblock_requests(ioa_cfg);
 
 	/* Wait for a second for any errors to be logged */
-	ipr_sleep(ioa_cfg, 1000);
+	spin_unlock_irqrestore(ioa_cfg->host->host_lock, lock_flags);
+	schedule_timeout(HZ);
 
 	if (ioa_cfg->errors_logged)
 		rc = -EIO;
 
-	spin_unlock_irqrestore(ioa_cfg->host->host_lock, lock_flags);
-
 	return rc;
 }
 
@@ -7490,7 +7457,7 @@ static int ipr_sdt_copy(struct ipr_ioa_c
 	       ((ioa_cfg->ioa_dump->hdr.length + bytes_copied) < IPR_MAX_IOA_DUMP_SIZE)) {
 		if ((ioa_cfg->ioa_dump->page_offset >= PAGE_SIZE) ||
 		    (ioa_cfg->ioa_dump->page_offset == 0)) {
-			page = (u32 *)__get_free_page(GFP_ATOMIC);
+			page = (u32 *)__get_free_page(GFP_KERNEL);
 
 			if (!page) {
 				ipr_trace;

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] IBM Power RAID driver (ipr)
  2004-01-18 14:10 ` Anton Blanchard
@ 2004-01-19 16:16   ` Brian King
  0 siblings, 0 replies; 28+ messages in thread
From: Brian King @ 2004-01-19 16:16 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: linux-scsi

[-- Attachment #1: Type: text/plain, Size: 2705 bytes --]

Anton Blanchard wrote:
> - Since ipr_sleep_no_lock/ipr_sleep is used once, we can open code
>   a schedule_timeout there instead.

I agree. I will remove them.

> - Add a comment to explain why the wakeup strategy isnt racy. I
>   automatically thought it was (just like the kernel version of
>   sleep_on) until I could verify all wake ups are done under the scsi
>   host lock. I also have a trivial change to make the two sleep
>   functions alike.

I was actually thinking of changing the wakeup strategy because it 
doesn't handle spurious wakeups like the new interfaces in the kernel 
do. See diff below.

> - Im a bit worried about memory barriers wrt the allow_* things. As an
>   example I added some mb()s wherever ->allow_interrupts is set.

Can you elaborate on this? These flags should all be accessed/set while 
the host lock is held. If this is not sufficient, then wouldn't we have 
problems with more than just these flags, but rather any shared data?

>   Actually, do we need this flag at all? Cant we just set the bits on the
>   card to disable interrupts, then do a synchronize_irq()? Similarly it
>   would be nice to get rid of some of the other allow_ flags if we can
>   get around them.

The reason I put this flag in was for shared interrupts. If the isr gets 
called due to another adapter's interrupt while this adapter currently 
has memory space disabled because it is being reset, I wanted to avoid 
issuing a readl.

> - The forcing of the pci cacheline size worries me, could we have problems 
>   with enabling MWI etc after doing this?

Shouldn't be a problem, but I will certainly remove it if it doesn't 
belong here.

> - Do we need the extra ipr_set_pcix_cmd_reg during init? We are going to
>   do it after a reset arent we?

Yes, good catch. Removed.

> - It looks like the kmalloc in ipr_sdt_copy is called from process
>   context without any locks held, so we dont need an atomic allocation.

Actually, this does need to be an atomic allocation. If the adapter 
ipr_sdt_copy is being called for is the root/paging device a GFP_KERNEL 
allocation might hang indefinitely since at the point that this function 
is called, the adapter is not capable of doing I/O.


> @@ -1076,6 +1045,7 @@ static void ipr_mask_and_clear_all_inter
>  
>  	/* Stop new interrupts */
>  	ioa_cfg->allow_interrupts = 0;
> +	mb();

On the topic of memory barriers wrt adapter I/O, at one point I remember 
seeing sync instructions in the readl/writel macros for PPC64. Looking 
now, all I see is an eieio instruction before the readl. Looks like I 
will need to sprinkle a few more memory barriers in the driver...




-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center

[-- Attachment #2: ipr-diff --]
[-- Type: application/x-java-vm, Size: 12289 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] IBM Power RAID driver (ipr)
  2004-01-16 21:59 [RFC] IBM Power RAID driver (ipr) Brian King
  2004-01-16 23:01 ` Muli Ben-Yehuda
  2004-01-18 14:10 ` Anton Blanchard
@ 2004-01-19 18:34 ` Christoph Hellwig
  2004-01-19 19:33   ` Mike Anderson
  2004-01-19 20:30   ` Brian King
  2 siblings, 2 replies; 28+ messages in thread
From: Christoph Hellwig @ 2004-01-19 18:34 UTC (permalink / raw)
  To: Brian King; +Cc: linux-scsi

On Fri, Jan 16, 2004 at 03:59:54PM -0600, Brian King wrote:
> Hi,
> 
> I have just written a LLD for the IBM Power RAID family of adapters.
> This includes several IBM iSeries and pSeries SCSI adapters. Please
> review for 2.6 inclusion.

Here's some comments for thing I stumbled over.  Don't consider this
a full review yet.


+#include <linux/version.h>

You don't seem to need this one.

+#include <linux/proc_fs.h>

Dito.

+#include <linux/kdev_t.h>

You shouldn't need this one (haven't checked whether we actually do, but it
would be a bug to need it in a scsi LLDD)

+#include <linux/pci_ids.h>

You get this one from pci.h

+#include <linux/ctype.h>

Do you really need this one.

+#include <linux/devfs_fs.h>
+#include <linux/devfs_fs_kernel.h>

You shouldn't need this in a scsi LLDD.

+#include <linux/smp.h>

What do you need this for?

+#include <linux/smp_lock.h>

no needed.

+#include <linux/errno.h>

You already included this above..

+#include <asm/processor.h>

Why do you need this one?

+#include <asm/page.h>

And this?

+#ifdef CONFIG_COMPAT
+#include <linux/ioctl32.h>
+#endif

Shouldn't hurt to include this unconditionally.

+/*
+ *  Function Prototypes
+ */
+int ipr_biosparam(struct scsi_device *, struct block_device *, sector_t, int *);
+const char * ipr_ioa_info(struct Scsi_Host *);
+int ipr_release(struct Scsi_Host *);
+int ipr_eh_abort(struct scsi_cmnd *);
+int ipr_eh_dev_reset(struct scsi_cmnd *);
+int ipr_eh_host_reset(struct scsi_cmnd *);
+int ipr_slave_alloc(struct scsi_device *);
+int ipr_slave_configure(struct scsi_device *);
+void ipr_slave_destroy(struct scsi_device *);
+int ipr_queue(struct scsi_cmnd *, void (*done) (struct scsi_cmnd *));
+int ipr_task_thread(void *);
+irqreturn_t ipr_isr(int, void *, struct pt_regs *);

A single source-file driver shouldn't ever have non-static entry points.


+static int __devinit ipr_probe_ioa_part2(struct ipr_ioa_cfg *);
+static void ipr_initiate_ioa_reset(struct ipr_ioa_cfg *,
+				   enum ipr_shutdown_type);

...

You have lots of forward-declarations.  This is usually a sign of a bad
source file structure as code reads much easier if functions called by another
function are above them in the source.  Also there's some other strangeness
in the code, e.g. modern drivers usually have the module_init/exit routines
at the very end, before that the pci driver methods and objects, then the
scsi host template and before it it's methods.

+/**
+ * ipr_version_show - Show the driver version
+ * @dd:	device driver struct
+ * @buf:	buffer
+ * 
+ * Return value:
+ * 	number of bytes printed to buffer
+ **/
+static ssize_t ipr_version_show(struct device_driver *dd, char *buf)
+{
+	return snprintf(buf, PAGE_SIZE, "%s\n", IPR_DRIVER_VERSION);
+}
+
+static DRIVER_ATTR(version, S_IRUGO, ipr_version_show, NULL);

Please don't do this in sysfs.  We're still hoping for a MODULE_VERSION
macro, but every driver crafting it's own version telling mechanism
doesn't scale.

+MODULE_AUTHOR("Brian King <brking@us.ibm.com>");
+module_param_named(disable_tcq, ipr_disable_tcq, int, 0);
+MODULE_PARM_DESC(disable_tcq, "Set to 1 to disable tagged command queuing");
+module_param_named(safe, ipr_safe_settings, int, 0);
+MODULE_PARM_DESC(safe, "Use safe settings (slow)");
+module_param_named(verbose, ipr_verbose, int, 0);
+MODULE_PARM_DESC(verbose, "Set to 0 - 4 for increasing verbosity of device driver");
+module_param_named(unsafe, ipr_unsafe, int, 0);
+MODULE_PARM_DESC(unsafe, "Do not use!");
+MODULE_LICENSE("GPL");

What about a MODULE_DESCRIPTION?

+static struct file_operations ipr_fops = {
+	.ioctl = ipr_ioctl,
+	.open = ipr_open,
+	.release = ipr_close,
+};

Please don't add new 'management' character devices to scsi LLDDs anymore.

+unsigned int ipr_ioctls[] =
+{
+	IPR_IOCTL_PASSTHRU,
+	IPR_IOCTL_RUN_DIAGNOSTICS,
+	IPR_IOCTL_DUMP_IOA,
+	IPR_IOCTL_RESET_IOA,
+	IPR_IOCTL_READ_DRIVER_CFG,
+	IPR_IOCTL_WRITE_DRIVER_CFG,
+	IPR_IOCTL_GET_BUS_CAPABILTIES,
+	IPR_IOCTL_SET_BUS_ATTRIBUTES,
+	IPR_IOCTL_GET_TRACE,
+	IPR_IOCTL_RECLAIM_CACHE,
+	IPR_IOCTL_QUERY_CONFIGURATION,
+	IPR_IOCTL_UCODE_DOWNLOAD,
+	IPR_IOCTL_CHANGE_ADAPTER_ASSIGNMENT
+};

Please provide a brief explanation of every ioctl and why you need it as
ioctl.  At least the GET_BUS_CAPABILTIES / SET_BUS_ATTRIBUTES seem to be
better expressed in the transport sysfs API discussed on linux-scsi.

+/**
+ * ipr_sleep_no_lock - Sleep for the specified amount of time
+ * @delay:		time to sleep
+ *
+ * Sleep for the specified amount of time
+ * 
+ * Return value:
+ * 	none
+ **/
+static void ipr_sleep_no_lock(signed long delay)
+{
+	DECLARE_WAIT_QUEUE_HEAD(internal_wait);
+
+	sleep_on_timeout(&internal_wait, (delay * HZ) / 1000);
+	return;
+}

Okay, these sleep_on thingies seem to be fixed up with your additional
patch, I don't need to comment on them thankfully.

+/**
+ * ipr_get_mode_page - Locate specified mode page
+ * @mode_pages:	mode page buffer
+ * @page_code:	page code to find
+ * @len:		minimum required length for mode page
+ *
+ * Return value:
+ * 	pointer to mode page / NULL on failure
+ **/
+static void *ipr_get_mode_page(struct ipr_mode_pages *mode_pages,
+			       u32 page_code, u32 len)

Okay, this isn't good at all.  A LLDD driver shouldn't care about mode
pages and all the stuff you do with it.  Please do it from userspace using
the sg driver.

+	spin_lock_init(&ipr_driver_lock);
+	INIT_LIST_HEAD(&ipr_ioa_head);

These could be initialized at compile-time (but shouldn't be needed at all
without the character devices)

+	ipr_regs = (unsigned long)ioremap(ipr_regs_pci,
+					  pci_resource_len(pdev, 0));

You need to check the return value.

+/**
+ * ipr_scan_vsets - Scans for VSET devices
+ * @ioa_cfg:	ioa config struct
+ *
+ * Description: Since the VSET resources do not follow SAM in that we can have
+ * sparse LUNs with no LUN 0, we have to scan for these ourselves.
+ *
+ * Return value:
+ * 	none
+ **/
+static void ipr_scan_vsets(struct ipr_ioa_cfg *ioa_cfg)
+{
+	int target, lun;
+
+	for (target = 0; target < IPR_MAX_NUM_TARGETS_PER_BUS; target++)
+		for (lun = 0; lun < IPR_MAX_NUM_VSET_LUNS_PER_TARGET; lun++ )
+			scsi_add_device(ioa_cfg->host, IPR_VSET_BUS, target, lun);
+}

Can you explain what these vsets are supposed to do?  And why a scsi_scan quirk can't
be done instead of doing this in a LLDD?

+/**
+ * ipr_inquiry - Send an Inquiry to the adapter.
+ * @ipr_cmd:	ipr command struct
+ *
+ * This utility function sends an inquiry to the adapter.
+ *
+ * Return value:
+ * 	none
+ **/
+static void ipr_inquiry(struct ipr_cmnd *ipr_cmd, u8 flags, u8 page,
+			u32 dma_addr, u8 xfer_len)

Doing inquiries from a LLDD looks very, very wrong.  Explanations please.

+/**
+ * ipr_find_ses_entry - Find matching SES in SES table
+ * @res:	resource entry struct of SES
+ * 
+ * Return value:
+ * 	pointer to SES table entry / NULL on failure
+ **/

This is an extreme layering violation.  Any SES handling belongs into an
upper level scsi driver (or userspace via sg)

+/**
+ * ipr_do_req -  Send driver initiated requests.
+ * @ipr_cmd:		ipr command struct
+ * @done:			done function
+ * @timeout_func:	timeout function
+ * @timeout:		timeout value
+ *
+ * This function sends the specified command to the adapter with the
+ * timeout given. The done function is invoked on command completion.

This looks very wrong.  And fortunately it seems to be only called from
code that is totally misplaced in a LLDD anyway.  All scsi commands
should go through the midlayer queueing.

+/**
+ * ipr_vset_stop_unit - Send a STOP UNIT to a VSET resource
+ * @ioa_cfg:	ioa config struct
+ * @res:		resource entry struct
+ *
+ * This function sends a STOP UNIT to a VSET resource to
+ * flush the write cache..
+ *
+ * Return value:
+ * 	IOASC of the command
+ **/

Again, a LLDD is the very wrong place for something like that.
Is there a spec for that VSET stuff somewhere?

+/**
+ * ipr_send_cmd - Build and send a mid-layer command
+ * @ioa_cfg:	ioa config struct
+ * @scsi_cmd:	scsi command struct
+ * @res:		ipr resource entry
+ *
+ * Return value:
+ * 	0 on success / SCSI_MLQUEUE_HOST_BUSY if DMA mapping fails
+ **/
+static int ipr_send_cmd(struct ipr_ioa_cfg *ioa_cfg,
+			struct scsi_cmnd *scsi_cmd,
+			struct ipr_resource_entry *res)
+{
+	struct ipr_cmnd *ipr_cmd;
+	int rc = 0;
+	void (*timeout_func) (struct scsi_cmnd *);
+	int timeout;
+	struct ipr_ioarcb *ioarcb;
+
+	ipr_cmd = ipr_get_free_ipr_cmnd(ioa_cfg);
+	ioarcb = &ipr_cmd->ioarcb;
+
+	list_add_tail(&ipr_cmd->queue, &ioa_cfg->pending_q);

Can you explain the per-command list mess please?  In theory a properly
written driver shouldn't need lists like that, just start a command in
->queuecommand if possible, else return an error and immediately call
->done when you're done with a scsi command.

+	/*
+	 * Double the timeout value to use as we will use the adapter
+	 * as the primary timing mechanism
+	 */
+	timeout_func = (void (*)(struct scsi_cmnd *)) scsi_cmd->eh_timeout.function;
+	timeout = scsi_cmd->timeout_per_command;
+
+	scsi_add_timer(scsi_cmd, timeout * IPR_TIMEOUT_MULTIPLIER, timeout_func);

Okay, another driver that would like hooks into the EH timer.  We should probably
better fix this correctly by adding a host method for it.

+ **/
+int ipr_queue(struct scsi_cmnd *scsi_cmd, void (*done) (struct scsi_cmnd *))

ipr_queuecommand would be a better name.

+
+	/* xxx can I remove some of these checks? */
+	if (unlikely(ioa_cfg->ioa_is_dead || !res || res->del_from_ml)) {
+		memset(scsi_cmd->sense_buffer, 0, SCSI_SENSE_BUFFERSIZE);
+		scsi_cmd->result = (DID_NO_CONNECT << 16);
+		scsi_cmd->scsi_done(scsi_cmd);
+		return 0;
+	}

this should be handled by the midlayer per-device and per-host state bits.

+	rc = ipr_send_cmd(ioa_cfg, scsi_cmd, res);

merge this into the quecommand routine?

+int ipr_biosparam(struct scsi_device *scsi_device,
+		  struct block_device *block_device,
+		  sector_t capacity, int *parm)
+{
+	int heads, sectors, cylinders;
+
+	heads = 128;
+	sectors = 32;
+
+	cylinders = (capacity / (128 * 32));

You need to use sector_div here to avoid blowing up on 32bit systems.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] IBM Power RAID driver (ipr)
  2004-01-19 18:34 ` Christoph Hellwig
@ 2004-01-19 19:33   ` Mike Anderson
  2004-01-19 19:35     ` Christoph Hellwig
  2004-01-20  5:57     ` Douglas Gilbert
  2004-01-19 20:30   ` Brian King
  1 sibling, 2 replies; 28+ messages in thread
From: Mike Anderson @ 2004-01-19 19:33 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Brian King, linux-scsi

Christoph Hellwig [hch@infradead.org] wrote:
> +/**
> + * ipr_version_show - Show the driver version
> + * @dd:	device driver struct
> + * @buf:	buffer
> + * 
> + * Return value:
> + * 	number of bytes printed to buffer
> + **/
> +static ssize_t ipr_version_show(struct device_driver *dd, char *buf)
> +{
> +	return snprintf(buf, PAGE_SIZE, "%s\n", IPR_DRIVER_VERSION);
> +}
> +
> +static DRIVER_ATTR(version, S_IRUGO, ipr_version_show, NULL);
> 
> Please don't do this in sysfs.  We're still hoping for a MODULE_VERSION
> macro, but every driver crafting it's own version telling mechanism
> doesn't scale.
> 

This is my fault. We recommended not to include procfs info in this
driver and pointed to sysfs. Do we have a timeline on MODULE_VERSION?

-andmike
--
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] IBM Power RAID driver (ipr)
  2004-01-19 19:33   ` Mike Anderson
@ 2004-01-19 19:35     ` Christoph Hellwig
  2004-01-20  5:57     ` Douglas Gilbert
  1 sibling, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2004-01-19 19:35 UTC (permalink / raw)
  To: Brian King, linux-scsi

On Mon, Jan 19, 2004 at 11:33:54AM -0800, Mike Anderson wrote:
> This is my fault. We recommended not to include procfs info in this
> driver and pointed to sysfs. Do we have a timeline on MODULE_VERSION?

Ask rusty, he's been pushing for this for a while.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] IBM Power RAID driver (ipr)
  2004-01-19 18:34 ` Christoph Hellwig
  2004-01-19 19:33   ` Mike Anderson
@ 2004-01-19 20:30   ` Brian King
  2004-01-20 13:38     ` Christoph Hellwig
  2004-01-20 22:37     ` Brian King
  1 sibling, 2 replies; 28+ messages in thread
From: Brian King @ 2004-01-19 20:30 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-scsi

Christoph Hellwig wrote:

Thanks for looking at the driver.

I removed all the includes that you suggested with the exception of:

> +#include <asm/processor.h>
> 
> Why do you need this one?

I need this for __is_processor and for defines like PV_NORTHSTAR. Look 
at the comment in ipr_invalid_adapter for an explanation.

> +#ifdef CONFIG_COMPAT
> +#include <linux/ioctl32.h>
> +#endif
> 
> Shouldn't hurt to include this unconditionally.

The reason I didn't is because the ioctl32 stuff is somewhat broken and 
should be fixed. We really need stubs for register_ioctl32_conversion 
and unregister_ioctl32_conversion when CONFIG_COMPAT is not defined. 
just haven't gotten around to submitting a patch for this yet.

> A single source-file driver shouldn't ever have non-static entry points.

Ok.

> You have lots of forward-declarations.  This is usually a sign of a bad
> source file structure as code reads much easier if functions called by another
> function are above them in the source.  Also there's some other strangeness
> in the code, e.g. modern drivers usually have the module_init/exit routines
> at the very end, before that the pci driver methods and objects, then the
> scsi host template and before it it's methods.

I'll get this cleaned up.

> +/**
> + * ipr_version_show - Show the driver version
> + * @dd:	device driver struct
> + * @buf:	buffer
> + * 
> + * Return value:
> + * 	number of bytes printed to buffer
> + **/
> +static ssize_t ipr_version_show(struct device_driver *dd, char *buf)
> +{
> +	return snprintf(buf, PAGE_SIZE, "%s\n", IPR_DRIVER_VERSION);
> +}
> +
> +static DRIVER_ATTR(version, S_IRUGO, ipr_version_show, NULL);
> 
> Please don't do this in sysfs.  We're still hoping for a MODULE_VERSION
> macro, but every driver crafting it's own version telling mechanism
> doesn't scale.

Would it be ok to rename the sysfs file to ipr_version so I have a way 
to get this information until a better way comes along?

> What about a MODULE_DESCRIPTION?

Added.

> +unsigned int ipr_ioctls[] =
> +{
> +	IPR_IOCTL_PASSTHRU,
> +	IPR_IOCTL_RUN_DIAGNOSTICS,
> +	IPR_IOCTL_DUMP_IOA,
> +	IPR_IOCTL_RESET_IOA,
> +	IPR_IOCTL_READ_DRIVER_CFG,
> +	IPR_IOCTL_WRITE_DRIVER_CFG,
> +	IPR_IOCTL_GET_BUS_CAPABILTIES,
> +	IPR_IOCTL_SET_BUS_ATTRIBUTES,
> +	IPR_IOCTL_GET_TRACE,
> +	IPR_IOCTL_RECLAIM_CACHE,
> +	IPR_IOCTL_QUERY_CONFIGURATION,
> +	IPR_IOCTL_UCODE_DOWNLOAD,
> +	IPR_IOCTL_CHANGE_ADAPTER_ASSIGNMENT
> +};
> 
> Please provide a brief explanation of every ioctl and why you need it as
> ioctl.  At least the GET_BUS_CAPABILTIES / SET_BUS_ATTRIBUTES seem to be
> better expressed in the transport sysfs API discussed on linux-scsi

IPR_IOCTL_READ_DRIVER_CFG, IPR_IOCTL_WRITE_DRIVER_CFG, 
IPR_IOCTL_RUN_DIAGNOSTICS, IPR_IOCTL_RESET_IOA, 
IPR_IOCTL_CHANGE_ADAPTER_ASSIGNMENT: these could be sysfs files.

IPR_IOCTL_PASSTHRU: Allows user space to build adapter/device commands 
to send to the adapter. This includes RAID configuration, cache recovery 
commands, etc. This ioctl allows me to push all the RAID configuration 
complexity into userspace.

IPR_IOCTL_RUN_DIAGNOSTICS: This one could be a sysfs file.

IPR_IOCTL_DUMP_IOA: This ioctl is called by a userspace daemon. When 
called, it goes to sleep and waits (interruptibly) for a severe adapter 
error to occur. When this happens, the driver "dumps" the adapter's 
memory. After the adapter is reset, the daemon wakes up and writes the 
dump file to the filesystem. The dump file is usually around 2-4Meg in 
size.

IPR_IOCTL_GET_TRACE: This returns the driver's internal trace. This one 
needs a buffer > 4k. I could create a binary sysfs file for it since, 
AFAIK, this size limitation does not exist for binary sysfs files.

IPR_IOCTL_RECLAIM_CACHE: This is used to perform cache storage 
reclaimation on the adapter. It takes parameters from userspace, sends a 
command to the adapter, resets the adapter, then returns data to the user.

IPR_IOCTL_UCODE_DOWNLOAD: Used to update the microcode on the adapter 
and on the physical disks in RAID arrays (these devices are not seen by 
the SCSI midlayer). These microcode images are generally between 500k 
and 1M in size.

Overall the issues I struggled with regarding removing all the IOCTLs 
stemmed from bidirectional data flow, and large amounts of data. sysfs 
does not seem to be able to accomplish this. One suggestion made to me 
was to create a filesystem to get this work done, but I'm not sure that 
would be any cleaner.

> +/**
> + * ipr_get_mode_page - Locate specified mode page
> + * @mode_pages:	mode page buffer
> + * @page_code:	page code to find
> + * @len:		minimum required length for mode page
> + *
> + * Return value:
> + * 	pointer to mode page / NULL on failure
> + **/
> +static void *ipr_get_mode_page(struct ipr_mode_pages *mode_pages,
> +			       u32 page_code, u32 len)
> 
> Okay, this isn't good at all.  A LLDD driver shouldn't care about mode
> pages and all the stuff you do with it.  Please do it from userspace using
> the sg driver.

Ok. Here are my reasons:

1. First of all, this driver is setting up mode pages for devices 
according to how the adapter requires them to be setup to run properly. 
Arguably, the adapter should do this, but it doesn't.

2. There are 2 types of devices that I setup mode pages for. The first 
are generic SCSI disk devices. These are seen by the mid-layer and could 
feasibly have their mode pages setup by userspace. But, the adapter 
requires qerr=1 in order to run TCQing. So I am setting that up. If I 
moved that into userspace, then I couldn't run TCQing until userspace 
setup mode page 0x0A. The other type of device is a physical disk in a 
RAID array. These devices are not seen by the mid-layer. The only way 
for userspace to talk to them (at least in the current implementation) 
is through the adapter ioctl interface discussed above. For these 
devices, the adapter requires the mode pages to be setup as I am doing 
in order to run reliably.

> +	spin_lock_init(&ipr_driver_lock);
> +	INIT_LIST_HEAD(&ipr_ioa_head);
> 
> These could be initialized at compile-time (but shouldn't be needed at all
> without the character devices)

Ok.

> +	ipr_regs = (unsigned long)ioremap(ipr_regs_pci,
> +					  pci_resource_len(pdev, 0));
> 
> You need to check the return value.

Ok.

> +/**
> + * ipr_scan_vsets - Scans for VSET devices
> + * @ioa_cfg:	ioa config struct
> + *
> + * Description: Since the VSET resources do not follow SAM in that we can have
> + * sparse LUNs with no LUN 0, we have to scan for these ourselves.
> + *
> + * Return value:
> + * 	none
> + **/
> +static void ipr_scan_vsets(struct ipr_ioa_cfg *ioa_cfg)
> +{
> +	int target, lun;
> +
> +	for (target = 0; target < IPR_MAX_NUM_TARGETS_PER_BUS; target++)
> +		for (lun = 0; lun < IPR_MAX_NUM_VSET_LUNS_PER_TARGET; lun++ )
> +			scsi_add_device(ioa_cfg->host, IPR_VSET_BUS, target, lun);
> +}
> 
> Can you explain what these vsets are supposed to do?  And why a scsi_scan quirk can't
> be done instead of doing this in a LLDD?

I was expecting this one;) VSET stands for Volume Set. A VSET is a 
logical disk created by the adapter firmware representing a disk array. 
The reason a scsi_scan quirk does not work is because they also do not 
have any reasonable product ID in their standard inquiry data to parse on.

> +/**
> + * ipr_inquiry - Send an Inquiry to the adapter.
> + * @ipr_cmd:	ipr command struct
> + *
> + * This utility function sends an inquiry to the adapter.
> + *
> + * Return value:
> + * 	none
> + **/
> +static void ipr_inquiry(struct ipr_cmnd *ipr_cmd, u8 flags, u8 page,
> +			u32 dma_addr, u8 xfer_len)
> 
> Doing inquiries from a LLDD looks very, very wrong.  Explanations please.

Ok. This particular inquiry is an inquiry to the adapter, not a device. 
However, I do inquiries to devices that are in disk arrays, but only 
because I need to (ipr_std_inquiry). This stems from the fact that the 
adapter will not start any new advanced function (RAID, adapter write 
caching, etc) to a device until it is told by the system that the device 
is a "supported device" by using the "set supported devices" command.

Complicated example: You have a RAID 5 disk array with n devices. One of 
them fails. You put in a replacement device of a different type. In 
order to be able to rebuild the disk array using the new device, the 
adapter needs to have a set supported devices command sent down with the 
vendor/product ID of the replacement device. An inquiry has to be done 
to get this data.

> +/**
> + * ipr_find_ses_entry - Find matching SES in SES table
> + * @res:	resource entry struct of SES
> + * 
> + * Return value:
> + * 	pointer to SES table entry / NULL on failure
> + **/
> 
> This is an extreme layering violation.  Any SES handling belongs into an
> upper level scsi driver (or userspace via sg)

I'm not actually talking to the SES, I'm only looking at the adapter's 
device configuration table. What I am trying to accomplish here is I am 
looking at each SCSI bus and trying to determine the max scsi speed that 
is safe to run on them. There are certain older backplanes that do not 
run reliably at faster speeds. Hence, the SES table.

> +/**
> + * ipr_do_req -  Send driver initiated requests.
> + * @ipr_cmd:		ipr command struct
> + * @done:			done function
> + * @timeout_func:	timeout function
> + * @timeout:		timeout value
> + *
> + * This function sends the specified command to the adapter with the
> + * timeout given. The done function is invoked on command completion.
> 
> This looks very wrong.  And fortunately it seems to be only called from
> code that is totally misplaced in a LLDD anyway.  All scsi commands
> should go through the midlayer queueing.

It is primarily used when talking to devices not known to the mid-layer, 
like the adapter itself, and devices that are in a disk array.

> +/**
> + * ipr_vset_stop_unit - Send a STOP UNIT to a VSET resource
> + * @ioa_cfg:	ioa config struct
> + * @res:		resource entry struct
> + *
> + * This function sends a STOP UNIT to a VSET resource to
> + * flush the write cache..
> + *
> + * Return value:
> + * 	IOASC of the command
> + **/
> 
> Again, a LLDD is the very wrong place for something like that.
> Is there a spec for that VSET stuff somewhere?

Unfortunately no. But I can answer any questions you might have. I 
wasn't sure if this was an ok thing to do or not. The midlayer is 
unconfiguring the device, it seemed reasonable that I should have the 
adapter flush the write cache and stop talking to that logical device.

> +/**
> + * ipr_send_cmd - Build and send a mid-layer command
> + * @ioa_cfg:	ioa config struct
> + * @scsi_cmd:	scsi command struct
> + * @res:		ipr resource entry
> + *
> + * Return value:
> + * 	0 on success / SCSI_MLQUEUE_HOST_BUSY if DMA mapping fails
> + **/
> +static int ipr_send_cmd(struct ipr_ioa_cfg *ioa_cfg,
> +			struct scsi_cmnd *scsi_cmd,
> +			struct ipr_resource_entry *res)
> +{
> +	struct ipr_cmnd *ipr_cmd;
> +	int rc = 0;
> +	void (*timeout_func) (struct scsi_cmnd *);
> +	int timeout;
> +	struct ipr_ioarcb *ioarcb;
> +
> +	ipr_cmd = ipr_get_free_ipr_cmnd(ioa_cfg);
> +	ioarcb = &ipr_cmd->ioarcb;
> +
> +	list_add_tail(&ipr_cmd->queue, &ioa_cfg->pending_q);
> 
> Can you explain the per-command list mess please?  In theory a properly
> written driver shouldn't need lists like that, just start a command in
> ->queuecommand if possible, else return an error and immediately call
> ->done when you're done with a scsi command.

The command transport for the adapter is like this:

Build a command in DMA'able host memory (struct ipr_ioarcb), write the 
PCI address to a register on the adapter, the adapter DMAs down the 
control block, executes the command, writes the 
ipr_ioarcb.host_response_handle in the command control block to the host 
request/response queue in host memory, and signals an interrupt.

Does that help at all?

> +	/*
> +	 * Double the timeout value to use as we will use the adapter
> +	 * as the primary timing mechanism
> +	 */
> +	timeout_func = (void (*)(struct scsi_cmnd *)) scsi_cmd->eh_timeout.function;
> +	timeout = scsi_cmd->timeout_per_command;
> +
> +	scsi_add_timer(scsi_cmd, timeout * IPR_TIMEOUT_MULTIPLIER, timeout_func);
> 
> Okay, another driver that would like hooks into the EH timer.  We should probabl
> better fix this correctly by adding a host method for it.

That would be wonderful.

> + **/
> +int ipr_queue(struct scsi_cmnd *scsi_cmd, void (*done) (struct scsi_cmnd *))
> 
> ipr_queuecommand would be a better name.
> 
> +
> +	/* xxx can I remove some of these checks? */
> +	if (unlikely(ioa_cfg->ioa_is_dead || !res || res->del_from_ml)) {
> +		memset(scsi_cmd->sense_buffer, 0, SCSI_SENSE_BUFFERSIZE);
> +		scsi_cmd->result = (DID_NO_CONNECT << 16);
> +		scsi_cmd->scsi_done(scsi_cmd);
> +		return 0;
> +	}
> 
> this should be handled by the midlayer per-device and per-host state bits.

I didn't realize a LLD was allowed to modify these.

> +	rc = ipr_send_cmd(ioa_cfg, scsi_cmd, res);
> 
> merge this into the quecommand routine?

Sure.

> +int ipr_biosparam(struct scsi_device *scsi_device,
> +		  struct block_device *block_device,
> +		  sector_t capacity, int *parm)
> +{
> +	int heads, sectors, cylinders;
> +
> +	heads = 128;
> +	sectors = 32;
> +
> +	cylinders = (capacity / (128 * 32));
> 
> You need to use sector_div here to avoid blowing up on 32bit systems.

Will do.


-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] IBM Power RAID driver (ipr)
  2004-01-19 19:33   ` Mike Anderson
  2004-01-19 19:35     ` Christoph Hellwig
@ 2004-01-20  5:57     ` Douglas Gilbert
  2004-01-20 13:21       ` Christoph Hellwig
  1 sibling, 1 reply; 28+ messages in thread
From: Douglas Gilbert @ 2004-01-20  5:57 UTC (permalink / raw)
  To: Mike Anderson; +Cc: Christoph Hellwig, Brian King, linux-scsi

Mike Anderson wrote:
> Christoph Hellwig [hch@infradead.org] wrote:
> 
>>+/**
>>+ * ipr_version_show - Show the driver version
>>+ * @dd:	device driver struct
>>+ * @buf:	buffer
>>+ * 
>>+ * Return value:
>>+ * 	number of bytes printed to buffer
>>+ **/
>>+static ssize_t ipr_version_show(struct device_driver *dd, char *buf)
>>+{
>>+	return snprintf(buf, PAGE_SIZE, "%s\n", IPR_DRIVER_VERSION);
>>+}
>>+
>>+static DRIVER_ATTR(version, S_IRUGO, ipr_version_show, NULL);
>>
>>Please don't do this in sysfs.  We're still hoping for a MODULE_VERSION
>>macro, but every driver crafting it's own version telling mechanism
>>doesn't scale.
>>
> 
> 
> This is my fault. We recommended not to include procfs info in this
> driver and pointed to sysfs. Do we have a timeline on MODULE_VERSION?

... and will MODULE_VERSION be defined when the driver is built in?

Doug Gilbert



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] IBM Power RAID driver (ipr)
  2004-01-20  5:57     ` Douglas Gilbert
@ 2004-01-20 13:21       ` Christoph Hellwig
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2004-01-20 13:21 UTC (permalink / raw)
  To: Douglas Gilbert; +Cc: Mike Anderson, Christoph Hellwig, Brian King, linux-scsi

On Tue, Jan 20, 2004 at 03:57:00PM +1000, Douglas Gilbert wrote:
> ... and will MODULE_VERSION be defined when the driver is built in?

Given the tradition of the MODULE_ macros it will certinaly be defined.
For details on what it actually does in the built-in case ask Rusty.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] IBM Power RAID driver (ipr)
  2004-01-19 20:30   ` Brian King
@ 2004-01-20 13:38     ` Christoph Hellwig
  2004-01-20 16:41       ` Brian King
  2004-01-20 22:37     ` Brian King
  1 sibling, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2004-01-20 13:38 UTC (permalink / raw)
  To: Brian King; +Cc: linux-scsi

On Mon, Jan 19, 2004 at 02:30:40PM -0600, Brian King wrote:
> Would it be ok to rename the sysfs file to ipr_version so I have a way 
> to get this information until a better way comes along?

I'm not too happy.  Can't you just grep dmesg output like we do for other
drivers?

> IPR_IOCTL_READ_DRIVER_CFG, IPR_IOCTL_WRITE_DRIVER_CFG, 
> IPR_IOCTL_RUN_DIAGNOSTICS, IPR_IOCTL_RESET_IOA, 
> IPR_IOCTL_CHANGE_ADAPTER_ASSIGNMENT: these could be sysfs files.

Please change them over.

> IPR_IOCTL_PASSTHRU: Allows user space to build adapter/device commands 
> to send to the adapter. This includes RAID configuration, cache recovery 
> commands, etc. This ioctl allows me to push all the RAID configuration 
> complexity into userspace.

Ok, fine with me

> IPR_IOCTL_RUN_DIAGNOSTICS: This one could be a sysfs file.

See above.

> IPR_IOCTL_DUMP_IOA: This ioctl is called by a userspace daemon. When 
> called, it goes to sleep and waits (interruptibly) for a severe adapter 
> error to occur. When this happens, the driver "dumps" the adapter's 
> memory. After the adapter is reset, the daemon wakes up and writes the 
> dump file to the filesystem. The dump file is usually around 2-4Meg in 
> size.

Okay..


> IPR_IOCTL_GET_TRACE: This returns the driver's internal trace. This one 
> needs a buffer > 4k. I could create a binary sysfs file for it since, 
> AFAIK, this size limitation does not exist for binary sysfs files.

Please do.  Also what about making this and the above one conditional on
some debugging option?

> IPR_IOCTL_RECLAIM_CACHE: This is used to perform cache storage 
> reclaimation on the adapter. It takes parameters from userspace, sends a 
> command to the adapter, resets the adapter, then returns data to the user.

Okay..

> IPR_IOCTL_UCODE_DOWNLOAD: Used to update the microcode on the adapter 
> and on the physical disks in RAID arrays (these devices are not seen by 
> the SCSI midlayer). These microcode images are generally between 500k 
> and 1M in size.

Can't you use the request_firmware interface for that?

> > Okay, this isn't good at all.  A LLDD driver shouldn't care about mode
> > pages and all the stuff you do with it.  Please do it from userspace using
> > the sg driver.
> 
> Ok. Here are my reasons:
> 
> 1. First of all, this driver is setting up mode pages for devices 
> according to how the adapter requires them to be setup to run properly. 
> Arguably, the adapter should do this, but it doesn't.
> 
> 2. There are 2 types of devices that I setup mode pages for. The first 
> are generic SCSI disk devices. These are seen by the mid-layer and could 
> feasibly have their mode pages setup by userspace. But, the adapter 
> requires qerr=1 in order to run TCQing. So I am setting that up. If I 
> moved that into userspace, then I couldn't run TCQing until userspace 
> setup mode page 0x0A.

Doesn't sound to bad, does it?  Bootup ir hardly performance critical.

> The other type of device is a physical disk in a 
> RAID array. These devices are not seen by the mid-layer. The only way 
> for userspace to talk to them (at least in the current implementation) 
> is through the adapter ioctl interface discussed above. For these 
> devices, the adapter requires the mode pages to be setup as I am doing 
> in order to run reliably.

Given all the mess these invisble devices create I think they shouldn't
be invisble but rather exported as a second pseudo-channel on the host.

> > Can you explain what these vsets are supposed to do?  And why a scsi_scan quirk can't
> > be done instead of doing this in a LLDD?
> 
> I was expecting this one;) VSET stands for Volume Set. A VSET is a 
> logical disk created by the adapter firmware representing a disk array. 
> The reason a scsi_scan quirk does not work is because they also do not 
> have any reasonable product ID in their standard inquiry data to parse on.

Another reason why we should allow the driver setting quirk flags in
struct scsi_device in ->slave_alloc.  Unfortunately the patch from the usb
people to do it braindeadly in the host template got accepted instead.

> because I need to (ipr_std_inquiry). This stems from the fact that the 
> adapter will not start any new advanced function (RAID, adapter write 
> caching, etc) to a device until it is told by the system that the device 
> is a "supported device" by using the "set supported devices" command.
> 
> Complicated example: You have a RAID 5 disk array with n devices. One of 
> them fails. You put in a replacement device of a different type. In 
> order to be able to rebuild the disk array using the new device, the 
> adapter needs to have a set supported devices command sent down with the 
> vendor/product ID of the replacement device. An inquiry has to be done 
> to get this data.

Yikes.  I'd love to see designer of this hardware face by face..

Well, whats done is done and we can fix these fuckups now.  Can't you at
least do the inquiries from userland?

> > This looks very wrong.  And fortunately it seems to be only called from
> > code that is totally misplaced in a LLDD anyway.  All scsi commands
> > should go through the midlayer queueing.
> 
> It is primarily used when talking to devices not known to the mid-layer, 
> like the adapter itself, and devices that are in a disk array.


Well, you shouldn't do that.  We do in fact have interfaces to allocate
proper scsi device without registering them with the midlayer, currently
it's only used for the host itself (scsi_get_host_dev/scsi_free_host_dev),
but I don't see why this can't work for other devices.

> > Again, a LLDD is the very wrong place for something like that.
> > Is there a spec for that VSET stuff somewhere?
> 
> Unfortunately no. But I can answer any questions you might have. I 
> wasn't sure if this was an ok thing to do or not. The midlayer is 
> unconfiguring the device, it seemed reasonable that I should have the 
> adapter flush the write cache and stop talking to that logical device.

We should probably always send a STOP UNIT when unconfiguring devices
from the midlayer or and upper driver.

> Build a command in DMA'able host memory (struct ipr_ioarcb), write the 
> PCI address to a register on the adapter, the adapter DMAs down the 
> control block, executes the command, writes the 
> ipr_ioarcb.host_response_handle in the command control block to the host 
> request/response queue in host memory, and signals an interrupt.
> 
> Does that help at all?

What's the relation of all these command lists to that?

> > 
> > this should be handled by the midlayer per-device and per-host state bits.
> 
> I didn't realize a LLD was allowed to modify these.

Not directly.  But we have function to e.g. set a device offline.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] IBM Power RAID driver (ipr)
  2004-01-20 13:38     ` Christoph Hellwig
@ 2004-01-20 16:41       ` Brian King
  2004-01-20 17:18         ` Mike Anderson
                           ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Brian King @ 2004-01-20 16:41 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-scsi

Christoph Hellwig wrote:
>>Would it be ok to rename the sysfs file to ipr_version so I have a way 
>>to get this information until a better way comes along?
> 
> 
> I'm not too happy.  Can't you just grep dmesg output like we do for other
> drivers?

Probably. Rusty already has a patch for MODULE_VERSION, so it sounds 
like we might have this function in a reasonable amount of time.

>>IPR_IOCTL_READ_DRIVER_CFG, IPR_IOCTL_WRITE_DRIVER_CFG, 
>>IPR_IOCTL_RUN_DIAGNOSTICS, IPR_IOCTL_RESET_IOA, 
>>IPR_IOCTL_CHANGE_ADAPTER_ASSIGNMENT: these could be sysfs files.
> 
> Please change them over.

Done.

>>IPR_IOCTL_GET_TRACE: This returns the driver's internal trace. This one 
>>needs a buffer > 4k. I could create a binary sysfs file for it since, 
>>AFAIK, this size limitation does not exist for binary sysfs files.
> 
> 
> Please do.  Also what about making this and the above one conditional on
> some debugging option?

How about a kernel config option for this? I don't want to make it too 
difficult for people to enable it as it is exceedingly valuable for 
failure analysis.

>>IPR_IOCTL_UCODE_DOWNLOAD: Used to update the microcode on the adapter 
>>and on the physical disks in RAID arrays (these devices are not seen by 
>>the SCSI midlayer). These microcode images are generally between 500k 
>>and 1M in size.
> 
> 
> Can't you use the request_firmware interface for that?

I'll look into this again.

>>>Okay, this isn't good at all.  A LLDD driver shouldn't care about mode
>>>pages and all the stuff you do with it.  Please do it from userspace using
>>>the sg driver.
>>
>>Ok. Here are my reasons:
>>
>>1. First of all, this driver is setting up mode pages for devices 
>>according to how the adapter requires them to be setup to run properly. 
>>Arguably, the adapter should do this, but it doesn't.
>>
>>2. There are 2 types of devices that I setup mode pages for. The first 
>>are generic SCSI disk devices. These are seen by the mid-layer and could 
>>feasibly have their mode pages setup by userspace. But, the adapter 
>>requires qerr=1 in order to run TCQing. So I am setting that up. If I 
>>moved that into userspace, then I couldn't run TCQing until userspace 
>>setup mode page 0x0A.
> 
> 
> Doesn't sound to bad, does it?  Bootup ir hardly performance critical.

I'll get started on this.

>>The other type of device is a physical disk in a 
>>RAID array. These devices are not seen by the mid-layer. The only way 
>>for userspace to talk to them (at least in the current implementation) 
>>is through the adapter ioctl interface discussed above. For these 
>>devices, the adapter requires the mode pages to be setup as I am doing 
>>in order to run reliably.
> 
> 
> Given all the mess these invisble devices create I think they shouldn't
> be invisble but rather exported as a second pseudo-channel on the host.

Ok. That would solve a lot of problems. I will report these devices in 
as scsi disk devices. They will respond like normal SCSI disks, except 
when issued media commands (read/write) it will fail with data protect 
(07) sense data.


>>because I need to (ipr_std_inquiry). This stems from the fact that the 
>>adapter will not start any new advanced function (RAID, adapter write 
>>caching, etc) to a device until it is told by the system that the device 
>>is a "supported device" by using the "set supported devices" command.
>>
>>Complicated example: You have a RAID 5 disk array with n devices. One of 
>>them fails. You put in a replacement device of a different type. In 
>>order to be able to rebuild the disk array using the new device, the 
>>adapter needs to have a set supported devices command sent down with the 
>>vendor/product ID of the replacement device. An inquiry has to be done 
>>to get this data.
> 
> 
> Yikes.  I'd love to see designer of this hardware face by face..
> 
> Well, whats done is done and we can fix these fuckups now.  Can't you at
> least do the inquiries from userland?

Yes. Should be able to. By getting rid of all this code, one problem we 
will need to solve is the following:

1. Driver is up and running with a disk array.
2. Adapter gets reset for some reason (could be due to an adapter error)
3. Disk array device now needs a START_UNIT command.

The current mid-layer/sd will not handle this. Would you accept a patch 
to add this function?

>>>This looks very wrong.  And fortunately it seems to be only called from
>>>code that is totally misplaced in a LLDD anyway.  All scsi commands
>>>should go through the midlayer queueing.
>>
>>It is primarily used when talking to devices not known to the mid-layer, 
>>like the adapter itself, and devices that are in a disk array.
> 
> Well, you shouldn't do that.  We do in fact have interfaces to allocate
> proper scsi device without registering them with the midlayer, currently
> it's only used for the host itself (scsi_get_host_dev/scsi_free_host_dev),
> but I don't see why this can't work for other devices.

Ok, but I will still need that interface to send commands directed to 
the adapter. There is an entire sequence of commands that needs to be 
sent to the adapter before it can accept commands. If the adapter gets 
reset asynchronously at runtime due to an error, I need to be able to 
block mid-layer requests, reset the adapter, send a bunch of commands to 
the adapter, then unblock. I should be able to get rid of all the code 
that sends commands to scsi devices, though.

>>>Again, a LLDD is the very wrong place for something like that.
>>>Is there a spec for that VSET stuff somewhere?
>>
>>Unfortunately no. But I can answer any questions you might have. I 
>>wasn't sure if this was an ok thing to do or not. The midlayer is 
>>unconfiguring the device, it seemed reasonable that I should have the 
>>adapter flush the write cache and stop talking to that logical device.
> 
> We should probably always send a STOP UNIT when unconfiguring devices
> from the midlayer or and upper driver.

Sounds reasonable.

>>Build a command in DMA'able host memory (struct ipr_ioarcb), write the 
>>PCI address to a register on the adapter, the adapter DMAs down the 
>>control block, executes the command, writes the 
>>ipr_ioarcb.host_response_handle in the command control block to the host 
>>request/response queue in host memory, and signals an interrupt.
>>
>>Does that help at all?
> 
> What's the relation of all these command lists to that?

Sorry, I guess I'm missing something here. Are you saying a LLD 
shouldn't need to maintain of queue of its command blocks (like my 
pending_q and free_q)?

>>>this should be handled by the midlayer per-device and per-host state bits.
>>
>>I didn't realize a LLD was allowed to modify these.
> 
> Not directly.  But we have function to e.g. set a device offline.

Not in 2.6.1. I think this was in, then removed at some point. Anyone 
know the story on why it was removed? Should we have a 
scsi_set_host_offline as well?


-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] IBM Power RAID driver (ipr)
  2004-01-20 16:41       ` Brian King
@ 2004-01-20 17:18         ` Mike Anderson
  2004-01-20 18:01         ` Christoph Hellwig
  2004-01-20 20:35         ` Patrick Mansfield
  2 siblings, 0 replies; 28+ messages in thread
From: Mike Anderson @ 2004-01-20 17:18 UTC (permalink / raw)
  To: Brian King; +Cc: Christoph Hellwig, linux-scsi

Brian King [brking@us.ibm.com] wrote:
> >>>this should be handled by the midlayer per-device and per-host state 
> >>>bits.
> >>
> >>I didn't realize a LLD was allowed to modify these.
> >
> >Not directly.  But we have function to e.g. set a device offline.
> 
> Not in 2.6.1. I think this was in, then removed at some point. Anyone 
> know the story on why it was removed? Should we have a 
> scsi_set_host_offline as well?

We removed the interface through the interations of the state model for
scsi_device / scsi_host. We where also overloading the online flag. If
we are going to have LLDDs modify state we may want to close on issues
of using test_bit or not and the locking around these functions.

-andmike
--
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] IBM Power RAID driver (ipr)
  2004-01-20 16:41       ` Brian King
  2004-01-20 17:18         ` Mike Anderson
@ 2004-01-20 18:01         ` Christoph Hellwig
  2004-01-20 19:13           ` Brian King
  2004-01-21 20:49           ` Brian King
  2004-01-20 20:35         ` Patrick Mansfield
  2 siblings, 2 replies; 28+ messages in thread
From: Christoph Hellwig @ 2004-01-20 18:01 UTC (permalink / raw)
  To: Brian King; +Cc: linux-scsi

On Tue, Jan 20, 2004 at 10:41:12AM -0600, Brian King wrote:
> > Please do.  Also what about making this and the above one conditional on
> > some debugging option?
> 
> How about a kernel config option for this? I don't want to make it too 
> difficult for people to enable it as it is exceedingly valuable for 
> failure analysis.

Yes, that's what I meant.

> > Given all the mess these invisble devices create I think they shouldn't
> > be invisble but rather exported as a second pseudo-channel on the host.
> 
> Ok. That would solve a lot of problems. I will report these devices in 
> as scsi disk devices. They will respond like normal SCSI disks, except 
> when issued media commands (read/write) it will fail with data protect 
> (07) sense data.

Now reading this again it might be better to use the "fake" as in not
registered with the device model devices I mentioned below for this.

> Yes. Should be able to. By getting rid of all this code, one problem we 
> will need to solve is the following:
> 
> 1. Driver is up and running with a disk array.
> 2. Adapter gets reset for some reason (could be due to an adapter error)
> 3. Disk array device now needs a START_UNIT command.
> 
> The current mid-layer/sd will not handle this. Would you accept a patch 
> to add this function?

I'm not against this, but I wonder how such a patch would look like

> Ok, but I will still need that interface to send commands directed to 
> the adapter.

commands as in scsi commands or as in internal commands?

> >>Build a command in DMA'able host memory (struct ipr_ioarcb), write the 
> >>PCI address to a register on the adapter, the adapter DMAs down the 
> >>control block, executes the command, writes the 
> >>ipr_ioarcb.host_response_handle in the command control block to the host 
> >>request/response queue in host memory, and signals an interrupt.
> >>
> >>Does that help at all?
> > 
> > What's the relation of all these command lists to that?
> 
> Sorry, I guess I'm missing something here. Are you saying a LLD 
> shouldn't need to maintain of queue of its command blocks (like my 
> pending_q and free_q)?

Unless you allocate the blocks in ->queuecommand (like most modern drivers
do) you'll need some form of freelist of course.  What you shouldn't need iß
the pendig_q as the midlayer already keeps track of pending commands.
This of course only works if you actually do send all commands via the
midlayer (which you should).

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] IBM Power RAID driver (ipr)
  2004-01-20 18:01         ` Christoph Hellwig
@ 2004-01-20 19:13           ` Brian King
  2004-01-20 19:28             ` Christoph Hellwig
  2004-01-21 20:49           ` Brian King
  1 sibling, 1 reply; 28+ messages in thread
From: Brian King @ 2004-01-20 19:13 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-scsi

Christoph Hellwig wrote
>>Ok, but I will still need that interface to send commands directed to 
>>the adapter.
> 
> 
> commands as in scsi commands or as in internal commands?

Internal commands. As you look through the driver, they will look very 
similar, as the interface to the adapter maps to scsi pretty close. I 
could use the mid-layer for these commands, but I would need some way to 
send them when the adapter is self blocked and would probably need to 
call scsi_add_host before the adapter is functional so I could send the 
adapter initialization sequence.


>>>>Build a command in DMA'able host memory (struct ipr_ioarcb), write the 
>>>>PCI address to a register on the adapter, the adapter DMAs down the 
>>>>control block, executes the command, writes the 
>>>>ipr_ioarcb.host_response_handle in the command control block to the host 
>>>>request/response queue in host memory, and signals an interrupt.
>>>>
>>>>Does that help at all?
>>>
>>>What's the relation of all these command lists to that?
>>
>>Sorry, I guess I'm missing something here. Are you saying a LLD 
>>shouldn't need to maintain of queue of its command blocks (like my 
>>pending_q and free_q)?
> 
> 
> Unless you allocate the blocks in ->queuecommand (like most modern drivers
> do) you'll need some form of freelist of course.  What you shouldn't need iß
> the pendig_q as the midlayer already keeps track of pending commands.
> This of course only works if you actually do send all commands via the
> midlayer (which you should).

The reason I have a pending_q is so that when I reset the adapter 
without the mid-layer knowing about it, I need to fail back any 
outstanding ops so they get retried as appropriate. I suppose a way 
around this would be a mid-layer interface that an LLD could call which 
would cause a host reset. Then the midlayer could drive the reset and 
take care of doing the right thing with any outstanding ops.


-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] IBM Power RAID driver (ipr)
  2004-01-20 19:13           ` Brian King
@ 2004-01-20 19:28             ` Christoph Hellwig
  2004-01-20 20:13               ` Brian King
  0 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2004-01-20 19:28 UTC (permalink / raw)
  To: Brian King; +Cc: linux-scsi

On Tue, Jan 20, 2004 at 01:13:38PM -0600, Brian King wrote:
> > commands as in scsi commands or as in internal commands?
> 
> Internal commands. As you look through the driver, they will look very 
> similar, as the interface to the adapter maps to scsi pretty close. I 
> could use the mid-layer for these commands, but I would need some way to 
> send them when the adapter is self blocked and would probably need to 
> call scsi_add_host before the adapter is functional so I could send the 
> adapter initialization sequence.

Okay, so maybe we'll have to keep the internals commands without midlayer
interaction for the adapter intialization.

> The reason I have a pending_q is so that when I reset the adapter 
> without the mid-layer knowing about it, I need to fail back any 
> outstanding ops so they get retried as appropriate. I suppose a way 
> around this would be a mid-layer interface that an LLD could call which 
> would cause a host reset. Then the midlayer could drive the reset and 
> take care of doing the right thing with any outstanding ops.

Does scsi_reset_provider fit your needs?


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] IBM Power RAID driver (ipr)
  2004-01-20 19:28             ` Christoph Hellwig
@ 2004-01-20 20:13               ` Brian King
  0 siblings, 0 replies; 28+ messages in thread
From: Brian King @ 2004-01-20 20:13 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-scsi

Christoph Hellwig wrote:
>>The reason I have a pending_q is so that when I reset the adapter 
>>without the mid-layer knowing about it, I need to fail back any 
>>outstanding ops so they get retried as appropriate. I suppose a way 
>>around this would be a mid-layer interface that an LLD could call which 
>>would cause a host reset. Then the midlayer could drive the reset and 
>>take care of doing the right thing with any outstanding ops.
> 
> 
> Does scsi_reset_provider fit your needs?

scsi_reset_provider needs to be called at task level, I often need to 
initiate a reset from interrupt level. Also, it doesn't seem to touch 
outstanding ops. The interface would really need to walk all the device 
queues, failing all the outstanding ops, putting them on the eh queue, 
then wake the eh thread to do the reset.

Also, regarding the pending_q discussion, one issue with not having a 
pending_q would be additional complexity in my eh_host_reset_handler. I 
would need to be able to find all the ops that timed out so I could free 
up my command block, unmap the data buffer, etc. Currently I can just 
walk my pending_q. Without it, I would need to walk all the device 
queues, which seems like it might be a bad thing for a LLD to do.


-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] IBM Power RAID driver (ipr)
  2004-01-20 16:41       ` Brian King
  2004-01-20 17:18         ` Mike Anderson
  2004-01-20 18:01         ` Christoph Hellwig
@ 2004-01-20 20:35         ` Patrick Mansfield
  2 siblings, 0 replies; 28+ messages in thread
From: Patrick Mansfield @ 2004-01-20 20:35 UTC (permalink / raw)
  To: Brian King; +Cc: Christoph Hellwig, linux-scsi

On Tue, Jan 20, 2004 at 10:41:12AM -0600, Brian King wrote:
> Christoph Hellwig wrote:

> > We should probably always send a STOP UNIT when unconfiguring devices
> > from the midlayer or and upper driver.
> 
> Sounds reasonable.

That would screw up clustered or shared storage, where there are other
hosts attached, and sending a STOP UNIT would prevent them from talking to
the LUNs. And, on addition of the scsi_host (hot add, reboot, or modprobe)
the drives would have to all be spun up again, and we do not handle that
well at all.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] IBM Power RAID driver (ipr)
  2004-01-19 20:30   ` Brian King
  2004-01-20 13:38     ` Christoph Hellwig
@ 2004-01-20 22:37     ` Brian King
  2004-01-20 22:39       ` Christoph Hellwig
  1 sibling, 1 reply; 28+ messages in thread
From: Brian King @ 2004-01-20 22:37 UTC (permalink / raw)
  To: Brian King; +Cc: Christoph Hellwig, linux-scsi

>> +/**
>> + * ipr_find_ses_entry - Find matching SES in SES table
>> + * @res:    resource entry struct of SES
>> + * + * Return value:
>> + *     pointer to SES table entry / NULL on failure
>> + **/
>>
>> This is an extreme layering violation.  Any SES handling belongs into an
>> upper level scsi driver (or userspace via sg)
> 
> 
> I'm not actually talking to the SES, I'm only looking at the adapter's 
> device configuration table. What I am trying to accomplish here is I am 
> looking at each SCSI bus and trying to determine the max scsi speed that 
> is safe to run on them. There are certain older backplanes that do not 
> run reliably at faster speeds. Hence, the SES table.

Did my explanation help, or would you still like this code removed? I don't
have a huge problem moving it into userspace, the only downside is that
I will only be able to run at 80MB/sec until the configuration tool runs.



-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] IBM Power RAID driver (ipr)
  2004-01-20 22:37     ` Brian King
@ 2004-01-20 22:39       ` Christoph Hellwig
  0 siblings, 0 replies; 28+ messages in thread
From: Christoph Hellwig @ 2004-01-20 22:39 UTC (permalink / raw)
  To: Brian King; +Cc: linux-scsi

On Tue, Jan 20, 2004 at 04:37:12PM -0600, Brian King wrote:
> Did my explanation help, or would you still like this code removed? I don't
> have a huge problem moving it into userspace, the only downside is that
> I will only be able to run at 80MB/sec until the configuration tool runs.

Keep it for now.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] IBM Power RAID driver (ipr)
  2004-01-20 18:01         ` Christoph Hellwig
  2004-01-20 19:13           ` Brian King
@ 2004-01-21 20:49           ` Brian King
  2004-01-22 14:02             ` Christoph Hellwig
  1 sibling, 1 reply; 28+ messages in thread
From: Brian King @ 2004-01-21 20:49 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-scsi

Christoph Hellwig wrote:

>>>Given all the mess these invisble devices create I think they shouldn't
>>>be invisble but rather exported as a second pseudo-channel on the host.
>>
>>Ok. That would solve a lot of problems. I will report these devices in 
>>as scsi disk devices. They will respond like normal SCSI disks, except 
>>when issued media commands (read/write) it will fail with data protect 
>>(07) sense data.
> 
> 
> Now reading this again it might be better to use the "fake" as in not
> registered with the device model devices I mentioned below for this.

How about a flag in the scsi_device struct that an LLD could set in
slave_configure, which would prevent upper layer drivers from binding
to it? Then these devices could be found through the normal scsi scan
and would have nice sysfs entries and sg devices for them. Not sure what
we would call the flag... psuedo_device, fake_device? Then upper level
drivers could check this flag, or if we didn't like that we could move
the upper layer binding logic into scsi_bus_match and keep it all
in one place.


-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] IBM Power RAID driver (ipr)
  2004-01-21 20:49           ` Brian King
@ 2004-01-22 14:02             ` Christoph Hellwig
  2004-01-22 16:39               ` Patrick Mansfield
  0 siblings, 1 reply; 28+ messages in thread
From: Christoph Hellwig @ 2004-01-22 14:02 UTC (permalink / raw)
  To: Brian King; +Cc: linux-scsi

On Wed, Jan 21, 2004 at 02:49:42PM -0600, Brian King wrote:
> How about a flag in the scsi_device struct that an LLD could set in
> slave_configure, which would prevent upper layer drivers from binding
> to it?

I'm not that happy about this.  But the other solutions I could think
of are ven worse.

> Then these devices could be found through the normal scsi scan
> and would have nice sysfs entries and sg devices for them. Not sure what
> we would call the flag... psuedo_device, fake_device?

ghost? hidden?

Could you cook up a patch?

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] IBM Power RAID driver (ipr)
  2004-01-22 14:02             ` Christoph Hellwig
@ 2004-01-22 16:39               ` Patrick Mansfield
  2004-01-22 16:56                 ` Patrick Mansfield
  2004-01-22 17:09                 ` Brian King
  0 siblings, 2 replies; 28+ messages in thread
From: Patrick Mansfield @ 2004-01-22 16:39 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Brian King, linux-scsi

On Thu, Jan 22, 2004 at 02:02:48PM +0000, Christoph Hellwig wrote:
> On Wed, Jan 21, 2004 at 02:49:42PM -0600, Brian King wrote:
> > How about a flag in the scsi_device struct that an LLD could set in
> > slave_configure, which would prevent upper layer drivers from binding
> > to it?
> 
> I'm not that happy about this.  But the other solutions I could think
> of are ven worse.
> 
> > Then these devices could be found through the normal scsi scan
> > and would have nice sysfs entries and sg devices for them. Not sure what
> > we would call the flag... psuedo_device, fake_device?
> 
> ghost? hidden?
> 
> Could you cook up a patch?

Why not just have the sd's show up? This is not much different then use of
a volume manager. Finding the /dev/sg of the physical disk is the same
problem as finding the sd.

And then what about making them read only? I would rather any flag force
sd to be read only rather than prevent upper level binding.  

Though having them show up as writable means the ipr can also be used to
with the raw disks (not just for management of the physical disks).

Also if we can tell via sysfs attributes that a bus (host:channel or maybe
just a host) has physical disks attached, udev can be used to create
entries in their own directory, like:

	/dev/physical/sdN

Making the node read only can also be done with udev, but that does not
help with root write access.

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] IBM Power RAID driver (ipr)
  2004-01-22 16:39               ` Patrick Mansfield
@ 2004-01-22 16:56                 ` Patrick Mansfield
  2004-01-22 17:09                 ` Brian King
  1 sibling, 0 replies; 28+ messages in thread
From: Patrick Mansfield @ 2004-01-22 16:56 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Brian King, linux-scsi

On Thu, Jan 22, 2004 at 08:39:31AM -0800, Patrick Mansfield wrote:

> Also if we can tell via sysfs attributes that a bus (host:channel or maybe
> just a host) has physical disks attached, udev can be used to create
> entries in their own directory, like:
> 
> 	/dev/physical/sdN
> 
> Making the node read only can also be done with udev, but that does not
> help with root write access.

And we can prevent even root from opening for write with blockdev --setro:

elm3b79:~# dd if=/dev/zero of=/udev/physical/sd_f  count=1
1+0 records in
1+0 records out
elm3b79:~# blockdev --setro /udev/physical/sd_f
elm3b79:~# dd if=/dev/zero of=/udev/physical/sd_f  count=1
dd: writing to `/udev/physical/sd_f': Operation not permitted
1+0 records in
0+0 records out

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] IBM Power RAID driver (ipr)
  2004-01-22 16:39               ` Patrick Mansfield
  2004-01-22 16:56                 ` Patrick Mansfield
@ 2004-01-22 17:09                 ` Brian King
  2004-01-22 17:27                   ` Patrick Mansfield
  1 sibling, 1 reply; 28+ messages in thread
From: Brian King @ 2004-01-22 17:09 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: Christoph Hellwig, linux-scsi

Patrick Mansfield wrote:
> On Thu, Jan 22, 2004 at 02:02:48PM +0000, Christoph Hellwig wrote:
> 
>>On Wed, Jan 21, 2004 at 02:49:42PM -0600, Brian King wrote:
>>
>>>How about a flag in the scsi_device struct that an LLD could set in
>>>slave_configure, which would prevent upper layer drivers from binding
>>>to it?
>>
>>I'm not that happy about this.  But the other solutions I could think
>>of are ven worse.
>>
>>
>>>Then these devices could be found through the normal scsi scan
>>>and would have nice sysfs entries and sg devices for them. Not sure what
>>>we would call the flag... psuedo_device, fake_device?
>>
>>ghost? hidden?
>>
>>Could you cook up a patch?
> 
> 
> Why not just have the sd's show up? This is not much different then use of
> a volume manager. Finding the /dev/sg of the physical disk is the same
> problem as finding the sd.

We will still get errors logged when the partition table is read,
as the devices are also read protected.


-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] IBM Power RAID driver (ipr)
  2004-01-22 17:09                 ` Brian King
@ 2004-01-22 17:27                   ` Patrick Mansfield
  2004-01-22 17:33                     ` Brian King
  0 siblings, 1 reply; 28+ messages in thread
From: Patrick Mansfield @ 2004-01-22 17:27 UTC (permalink / raw)
  To: Brian King; +Cc: Christoph Hellwig, linux-scsi

On Thu, Jan 22, 2004 at 11:09:07AM -0600, Brian King wrote:

> We will still get errors logged when the partition table is read,
> as the devices are also read protected.

So it is the hardware and not the driver enforcing the read protection?
But other SCSI commands are OK?

-- Patrick Mansfield

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC] IBM Power RAID driver (ipr)
  2004-01-22 17:27                   ` Patrick Mansfield
@ 2004-01-22 17:33                     ` Brian King
  0 siblings, 0 replies; 28+ messages in thread
From: Brian King @ 2004-01-22 17:33 UTC (permalink / raw)
  To: Patrick Mansfield; +Cc: Christoph Hellwig, linux-scsi

Patrick Mansfield wrote:
> On Thu, Jan 22, 2004 at 11:09:07AM -0600, Brian King wrote:
> 
> 
>>We will still get errors logged when the partition table is read,
>>as the devices are also read protected.
> 
> 
> So it is the hardware and not the driver enforcing the read protection?
> But other SCSI commands are OK?

Correct. It is the adapter firmware enforcing this protection. Other
SCSI commands work fine.


-- 
Brian King
eServer Storage I/O
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2004-01-22 17:33 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-01-16 21:59 [RFC] IBM Power RAID driver (ipr) Brian King
2004-01-16 23:01 ` Muli Ben-Yehuda
2004-01-16 23:46   ` Brian King
2004-01-18 14:10 ` Anton Blanchard
2004-01-19 16:16   ` Brian King
2004-01-19 18:34 ` Christoph Hellwig
2004-01-19 19:33   ` Mike Anderson
2004-01-19 19:35     ` Christoph Hellwig
2004-01-20  5:57     ` Douglas Gilbert
2004-01-20 13:21       ` Christoph Hellwig
2004-01-19 20:30   ` Brian King
2004-01-20 13:38     ` Christoph Hellwig
2004-01-20 16:41       ` Brian King
2004-01-20 17:18         ` Mike Anderson
2004-01-20 18:01         ` Christoph Hellwig
2004-01-20 19:13           ` Brian King
2004-01-20 19:28             ` Christoph Hellwig
2004-01-20 20:13               ` Brian King
2004-01-21 20:49           ` Brian King
2004-01-22 14:02             ` Christoph Hellwig
2004-01-22 16:39               ` Patrick Mansfield
2004-01-22 16:56                 ` Patrick Mansfield
2004-01-22 17:09                 ` Brian King
2004-01-22 17:27                   ` Patrick Mansfield
2004-01-22 17:33                     ` Brian King
2004-01-20 20:35         ` Patrick Mansfield
2004-01-20 22:37     ` Brian King
2004-01-20 22:39       ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.