linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Boot failure with block/for-next
@ 2010-12-22 17:27 James Bottomley
  2010-12-22 17:53 ` Tejun Heo
  2010-12-23  4:31 ` James Bottomley
  0 siblings, 2 replies; 11+ messages in thread
From: James Bottomley @ 2010-12-22 17:27 UTC (permalink / raw)
  To: Jens Axboe, Tejun Heo; +Cc: linux-scsi, linux-kernel

Trying to test out the SCSI post merge tree, I've found it won't boot on
my SCSI test system.  The reason is a failure to read the partition
table of the root disc.  It just gives

sda: unable to read partition table

The other four discs in the system seem to read their partition tables
OK.

The obvious candidate is the partition code rework in block/for-next.
Simply reverting 

commit d2bf1b6723ed0eab378363649d15b7893bf14e91
Author: Tejun Heo <tj@kernel.org>
Date:   Wed Dec 8 20:57:36 2010 +0100

    block: move register_disk() and del_gendisk() to block/genhd.c
 
Doesn't fix the boot failure, so it's obviously something deeper.

James



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Boot failure with block/for-next
  2010-12-22 17:27 Boot failure with block/for-next James Bottomley
@ 2010-12-22 17:53 ` Tejun Heo
  2010-12-23  4:31 ` James Bottomley
  1 sibling, 0 replies; 11+ messages in thread
From: Tejun Heo @ 2010-12-22 17:53 UTC (permalink / raw)
  To: James Bottomley; +Cc: Jens Axboe, linux-scsi, linux-kernel

Hello,

On Wed, Dec 22, 2010 at 11:27:03AM -0600, James Bottomley wrote:
> Trying to test out the SCSI post merge tree, I've found it won't boot on
> my SCSI test system.  The reason is a failure to read the partition
> table of the root disc.  It just gives
> 
> sda: unable to read partition table
> 
> The other four discs in the system seem to read their partition tables
> OK.

That's really weird.

> The obvious candidate is the partition code rework in block/for-next.
> Simply reverting 
> 
> commit d2bf1b6723ed0eab378363649d15b7893bf14e91
> Author: Tejun Heo <tj@kernel.org>
> Date:   Wed Dec 8 20:57:36 2010 +0100
> 
>     block: move register_disk() and del_gendisk() to block/genhd.c
>  
> Doesn't fix the boot failure, so it's obviously something deeper.

Can you please apply the following patch and see what the kernel says?

Thanks.

diff --git a/fs/partitions/check.c b/fs/partitions/check.c
index bdf8d3c..dd0b0f3 100644
--- a/fs/partitions/check.c
+++ b/fs/partitions/check.c
@@ -186,6 +186,8 @@ check_partition(struct gendisk *hd, struct block_device *bdev)
 			/* We have hit an I/O error which we don't report now.
 		 	* But record it, and let the others do their job.
 		 	*/
+			printk(KERN_WARNING "XXX %s: check_part %pf failed with %d\n",
+			       state->name, check_part[i-1], res);
 			err = res;
 			res = 0;
 		}
@@ -197,8 +199,11 @@ check_partition(struct gendisk *hd, struct block_device *bdev)
 		free_page((unsigned long)state->pp_buf);
 		return state;
 	}
-	if (state->access_beyond_eod)
+	if (state->access_beyond_eod) {
+		printk(KERN_WARNING "XXX %s: access_beyond_eod, err=-ENOSPC\n",
+		       state->name);
 		err = -ENOSPC;
+	}
 	if (err)
 	/* The partition is unrecognized. So report I/O errors if there were any */
 		res = err;
diff --git a/fs/partitions/check.h b/fs/partitions/check.h
index d68bf4d..f836fb2 100644
--- a/fs/partitions/check.h
+++ b/fs/partitions/check.h
@@ -26,6 +26,8 @@ static inline void *read_part_sector(struct parsed_partitions *state,
 				     sector_t n, Sector *p)
 {
 	if (n >= get_capacity(state->bdev->bd_disk)) {
+		printk(KERN_WARNING "XXX %s: setting access_beyond_eod for sector %llu\n",
+		       state->name, (unsigned long long)n);
 		state->access_beyond_eod = true;
 		return NULL;
 	}

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Boot failure with block/for-next
  2010-12-22 17:27 Boot failure with block/for-next James Bottomley
  2010-12-22 17:53 ` Tejun Heo
@ 2010-12-23  4:31 ` James Bottomley
  2010-12-23 10:09   ` Tejun Heo
  1 sibling, 1 reply; 11+ messages in thread
From: James Bottomley @ 2010-12-23  4:31 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Tejun Heo, linux-scsi, linux-kernel

On Wed, 2010-12-22 at 11:27 -0600, James Bottomley wrote:
> Trying to test out the SCSI post merge tree, I've found it won't boot on
> my SCSI test system.  The reason is a failure to read the partition
> table of the root disc.  It just gives
> 
> sda: unable to read partition table
> 
> The other four discs in the system seem to read their partition tables
> OK.
> 
> The obvious candidate is the partition code rework in block/for-next.
> Simply reverting 

Actually, surprisingly, bisection confirms it's this patch:

Author: Tejun Heo <tj@kernel.org>
Date:   Wed Dec 8 20:57:42 2010 +0100

    sd: implement sd_check_events()

I can't quite see how yet.

James



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Boot failure with block/for-next
  2010-12-23  4:31 ` James Bottomley
@ 2010-12-23 10:09   ` Tejun Heo
  2010-12-23 15:27     ` James Bottomley
  0 siblings, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2010-12-23 10:09 UTC (permalink / raw)
  To: James Bottomley; +Cc: Jens Axboe, linux-scsi, linux-kernel

Hello, James.

On Wed, Dec 22, 2010 at 10:31:13PM -0600, James Bottomley wrote:
 > Actually, surprisingly, bisection confirms it's this patch:
 > 
 > Author: Tejun Heo <tj@kernel.org>
 > Date:   Wed Dec 8 20:57:42 2010 +0100
 > 
 >     sd: implement sd_check_events()
 > 
 > I can't quite see how yet.

Does the following patch fix the problem?  Thanks.

Subject: sd: don't clear media presence unless it's removable

set_media_not_present() can be called for a non-removable device if it
raises UA for whatever reason; however, block layer calls into
sd_check_events() only if the device is removable.  As the function is
responsible for setting media presence, sd continues to think that
media is missing on the non-removable device making it inaccessible.

This patch makes set_media_not_present() clear media presence iff the
device is removable.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 drivers/scsi/sd.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Index: work/drivers/scsi/sd.c
===================================================================
--- work.orig/drivers/scsi/sd.c
+++ work/drivers/scsi/sd.c
@@ -993,8 +993,11 @@ static void set_media_not_present(struct
 {
 	if (sdkp->media_present)
 		sdkp->device->changed = 1;
-	sdkp->media_present = 0;
-	sdkp->capacity = 0;
+
+	if (sdkp->device->removable) {
+		sdkp->media_present = 0;
+		sdkp->capacity = 0;
+	}
 }
 
 static int media_not_present(struct scsi_disk *sdkp,

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Boot failure with block/for-next
  2010-12-23 10:09   ` Tejun Heo
@ 2010-12-23 15:27     ` James Bottomley
  2010-12-23 15:52       ` Tejun Heo
  0 siblings, 1 reply; 11+ messages in thread
From: James Bottomley @ 2010-12-23 15:27 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Jens Axboe, linux-scsi, linux-kernel

On Thu, 2010-12-23 at 11:09 +0100, Tejun Heo wrote:
> Hello, James.
> 
> On Wed, Dec 22, 2010 at 10:31:13PM -0600, James Bottomley wrote:
>  > Actually, surprisingly, bisection confirms it's this patch:
>  > 
>  > Author: Tejun Heo <tj@kernel.org>
>  > Date:   Wed Dec 8 20:57:42 2010 +0100
>  > 
>  >     sd: implement sd_check_events()
>  > 
>  > I can't quite see how yet.
> 
> Does the following patch fix the problem?  Thanks.
> 
> Subject: sd: don't clear media presence unless it's removable

No, still the same failure.  Here's an excised log of an unsuccessful
and successful disk probe:

/bin/sh: can't access tty; job control turned off
(initramfs) dmesg|grep sda
sd 2:0:1:0: [sda] 71132960 512-byte logical blocks: (36.4 GB/33.9 GiB)
sd 2:0:1:0: [sda] Write Protect is off
sd 2:0:1:0: [sda] Mode Sense: d3 00 10 08
sd 2:0:1:0: [sda] Write cache: disabled, read cache: enabled, supports DPO and FUA
device: 'sda': device_add
PM: Adding info for No Bus:sda
 sda: unable to read partition table
sd 2:0:1:0: [sda] Attached SCSI disk
(initramfs) dmesg|grep sdb
sd 2:0:2:0: [sdb] 17942584 512-byte logical blocks: (9.18 GB/8.55 GiB)
sd 2:0:2:0: [sdb] Write Protect is off
sd 2:0:2:0: [sdb] Mode Sense: e3 00 10 08
sd 2:0:2:0: [sdb] Write cache: disabled, read cache: enabled, supports DPO and FUA
device: 'sdb': device_add
PM: Adding info for No Bus:sdb
 sdb: sdb1 sdb2
device: 'sdb1': device_add
PM: Adding info for No Bus:sdb1
device: 'sdb2': device_add
PM: Adding info for No Bus:sdb2
sd 2:0:2:0: [sdb] Attached SCSI disk

James



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Boot failure with block/for-next
  2010-12-23 15:27     ` James Bottomley
@ 2010-12-23 15:52       ` Tejun Heo
  2010-12-23 16:10         ` James Bottomley
  0 siblings, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2010-12-23 15:52 UTC (permalink / raw)
  To: James Bottomley; +Cc: Jens Axboe, linux-scsi, linux-kernel

Hey,

On Thu, Dec 23, 2010 at 09:27:21AM -0600, James Bottomley wrote:
> > Subject: sd: don't clear media presence unless it's removable
> 
> No, still the same failure.  Here's an excised log of an unsuccessful
> and successful disk probe:

Hmmm... That's unfortunate.

> /bin/sh: can't access tty; job control turned off
> (initramfs) dmesg|grep sda
> sd 2:0:1:0: [sda] 71132960 512-byte logical blocks: (36.4 GB/33.9 GiB)
> sd 2:0:1:0: [sda] Write Protect is off
> sd 2:0:1:0: [sda] Mode Sense: d3 00 10 08
> sd 2:0:1:0: [sda] Write cache: disabled, read cache: enabled, supports DPO and FUA
> device: 'sda': device_add
> PM: Adding info for No Bus:sda
>  sda: unable to read partition table
> sd 2:0:1:0: [sda] Attached SCSI disk
> (initramfs) dmesg|grep sdb
> sd 2:0:2:0: [sdb] 17942584 512-byte logical blocks: (9.18 GB/8.55 GiB)
> sd 2:0:2:0: [sdb] Write Protect is off
> sd 2:0:2:0: [sdb] Mode Sense: e3 00 10 08
> sd 2:0:2:0: [sdb] Write cache: disabled, read cache: enabled, supports DPO and FUA
> device: 'sdb': device_add
> PM: Adding info for No Bus:sdb
>  sdb: sdb1 sdb2
> device: 'sdb1': device_add
> PM: Adding info for No Bus:sdb1
> device: 'sdb2': device_add
> PM: Adding info for No Bus:sdb2
> sd 2:0:2:0: [sdb] Attached SCSI disk

Can you please apply the debug patch I posted in the other message and
post the boot log?  Let's see how the partition read is failing.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Boot failure with block/for-next
  2010-12-23 15:52       ` Tejun Heo
@ 2010-12-23 16:10         ` James Bottomley
  2010-12-23 16:13           ` Tejun Heo
  0 siblings, 1 reply; 11+ messages in thread
From: James Bottomley @ 2010-12-23 16:10 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Jens Axboe, linux-scsi, linux-kernel

On Thu, 2010-12-23 at 16:52 +0100, Tejun Heo wrote:
> Hey,
> 
> On Thu, Dec 23, 2010 at 09:27:21AM -0600, James Bottomley wrote:
> > > Subject: sd: don't clear media presence unless it's removable
> > 
> > No, still the same failure.  Here's an excised log of an unsuccessful
> > and successful disk probe:
> 
> Hmmm... That's unfortunate.
> 
> > /bin/sh: can't access tty; job control turned off
> > (initramfs) dmesg|grep sda
> > sd 2:0:1:0: [sda] 71132960 512-byte logical blocks: (36.4 GB/33.9 GiB)
> > sd 2:0:1:0: [sda] Write Protect is off
> > sd 2:0:1:0: [sda] Mode Sense: d3 00 10 08
> > sd 2:0:1:0: [sda] Write cache: disabled, read cache: enabled, supports DPO and FUA
> > device: 'sda': device_add
> > PM: Adding info for No Bus:sda
> >  sda: unable to read partition table
> > sd 2:0:1:0: [sda] Attached SCSI disk
> > (initramfs) dmesg|grep sdb
> > sd 2:0:2:0: [sdb] 17942584 512-byte logical blocks: (9.18 GB/8.55 GiB)
> > sd 2:0:2:0: [sdb] Write Protect is off
> > sd 2:0:2:0: [sdb] Mode Sense: e3 00 10 08
> > sd 2:0:2:0: [sdb] Write cache: disabled, read cache: enabled, supports DPO and FUA
> > device: 'sdb': device_add
> > PM: Adding info for No Bus:sdb
> >  sdb: sdb1 sdb2
> > device: 'sdb1': device_add
> > PM: Adding info for No Bus:sdb1
> > device: 'sdb2': device_add
> > PM: Adding info for No Bus:sdb2
> > sd 2:0:2:0: [sdb] Attached SCSI disk
> 
> Can you please apply the debug patch I posted in the other message and
> post the boot log?  Let's see how the partition read is failing.

That's actually a red herring ... the disk isn't spinning up, so the
partition read is getting a not ready.

James



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Boot failure with block/for-next
  2010-12-23 16:10         ` James Bottomley
@ 2010-12-23 16:13           ` Tejun Heo
  2010-12-23 18:25             ` James Bottomley
  0 siblings, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2010-12-23 16:13 UTC (permalink / raw)
  To: James Bottomley; +Cc: Jens Axboe, linux-scsi, linux-kernel

On Thu, Dec 23, 2010 at 10:10:14AM -0600, James Bottomley wrote:
> > Can you please apply the debug patch I posted in the other message and
> > post the boot log?  Let's see how the partition read is failing.
> 
> That's actually a red herring ... the disk isn't spinning up, so the
> partition read is getting a not ready.

Oh, yay, but at any rate I think the don't-clear-media-presence patch
is probably a good idea just in case UA gets reported somehow.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Boot failure with block/for-next
  2010-12-23 16:13           ` Tejun Heo
@ 2010-12-23 18:25             ` James Bottomley
  2010-12-24 11:03               ` Tejun Heo
  0 siblings, 1 reply; 11+ messages in thread
From: James Bottomley @ 2010-12-23 18:25 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Jens Axboe, linux-scsi, linux-kernel

On Thu, 2010-12-23 at 17:13 +0100, Tejun Heo wrote:
> On Thu, Dec 23, 2010 at 10:10:14AM -0600, James Bottomley wrote:
> > > Can you please apply the debug patch I posted in the other message and
> > > post the boot log?  Let's see how the partition read is failing.
> > 
> > That's actually a red herring ... the disk isn't spinning up, so the
> > partition read is getting a not ready.
> 
> Oh, yay, but at any rate I think the don't-clear-media-presence patch
> is probably a good idea just in case UA gets reported somehow.

Well, it wasn't this either.  It turns out that this disk alone reports
UNIT_ATTENTION RESET_OCCURRED on the first TEST UNIT READY of spin up.
Ordinarily this is harmless, but the new medium change code wrongly
interprets any UNIT_ATTENTION as medium changed (and then refuses to
talk to the disk).  This is actually a change from the previous code, so
the fix is to put it back the way it was.

James

---

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 6a88fc1..7d25746 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1008,8 +1008,6 @@ static int media_not_present(struct scsi_disk *sdkp,
 	/* not invoked for commands that could return deferred errors */
 	switch (sshdr->sense_key) {
 	case UNIT_ATTENTION:
-		sdkp->device->changed = 1;
-		/* fall through */
 	case NOT_READY:
 		/* medium not present */
 		if (sshdr->asc == 0x3A) {



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Boot failure with block/for-next
  2010-12-23 18:25             ` James Bottomley
@ 2010-12-24 11:03               ` Tejun Heo
  2010-12-24 15:47                 ` James Bottomley
  0 siblings, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2010-12-24 11:03 UTC (permalink / raw)
  To: James Bottomley; +Cc: Jens Axboe, linux-scsi, linux-kernel

Hello, James.

On Thu, Dec 23, 2010 at 12:25:17PM -0600, James Bottomley wrote:
> On Thu, 2010-12-23 at 17:13 +0100, Tejun Heo wrote:
> > On Thu, Dec 23, 2010 at 10:10:14AM -0600, James Bottomley wrote:
> > > > Can you please apply the debug patch I posted in the other message and
> > > > post the boot log?  Let's see how the partition read is failing.
> > > 
> > > That's actually a red herring ... the disk isn't spinning up, so the
> > > partition read is getting a not ready.
> > 
> > Oh, yay, but at any rate I think the don't-clear-media-presence patch
> > is probably a good idea just in case UA gets reported somehow.
> 
> Well, it wasn't this either.  It turns out that this disk alone reports
> UNIT_ATTENTION RESET_OCCURRED on the first TEST UNIT READY of spin up.
> Ordinarily this is harmless, but the new medium change code wrongly
> interprets any UNIT_ATTENTION as medium changed (and then refuses to
> talk to the disk).  This is actually a change from the previous code, so
> the fix is to put it back the way it was.

I see.  I wonder why the previous patch didn't work.  It should have
had about the same effect.  I think the UA change went in there while
trying to bring sr and sd behaviors closer to each other, but yes it
seems the original code didn't have that.  That said, now there are
paths where UA would be consumed without setting ->changed and thus sd
may miss media change events.  This has been like this for quite some
time and there aren't many removable sd devices these days, so maybe
this doesn't matter too much.

Anyways, for now, the change looks good to me too.  Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Boot failure with block/for-next
  2010-12-24 11:03               ` Tejun Heo
@ 2010-12-24 15:47                 ` James Bottomley
  0 siblings, 0 replies; 11+ messages in thread
From: James Bottomley @ 2010-12-24 15:47 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Jens Axboe, linux-scsi, linux-kernel

On Fri, 2010-12-24 at 12:03 +0100, Tejun Heo wrote:
> Hello, James.
> 
> On Thu, Dec 23, 2010 at 12:25:17PM -0600, James Bottomley wrote:
> > On Thu, 2010-12-23 at 17:13 +0100, Tejun Heo wrote:
> > > On Thu, Dec 23, 2010 at 10:10:14AM -0600, James Bottomley wrote:
> > > > > Can you please apply the debug patch I posted in the other message and
> > > > > post the boot log?  Let's see how the partition read is failing.
> > > > 
> > > > That's actually a red herring ... the disk isn't spinning up, so the
> > > > partition read is getting a not ready.
> > > 
> > > Oh, yay, but at any rate I think the don't-clear-media-presence patch
> > > is probably a good idea just in case UA gets reported somehow.
> > 
> > Well, it wasn't this either.  It turns out that this disk alone reports
> > UNIT_ATTENTION RESET_OCCURRED on the first TEST UNIT READY of spin up.
> > Ordinarily this is harmless, but the new medium change code wrongly
> > interprets any UNIT_ATTENTION as medium changed (and then refuses to
> > talk to the disk).  This is actually a change from the previous code, so
> > the fix is to put it back the way it was.
> 
> I see.  I wonder why the previous patch didn't work.

Oh, you didn't put a ->removable check in enough paths.  The setting on
UA was still unconditional, as was the check in sd_prep_fn().

>   It should have
> had about the same effect.  I think the UA change went in there while
> trying to bring sr and sd behaviors closer to each other, but yes it
> seems the original code didn't have that.  That said, now there are
> paths where UA would be consumed without setting ->changed and thus sd
> may miss media change events.  This has been like this for quite some
> time and there aren't many removable sd devices these days, so maybe
> this doesn't matter too much.

The code can never really be merged.  for CD/DVD, UA pretty much does
mean medium removal.  Discs and arrays emit a panoply of UA events (it's
the SCSI asynchronous event mechanism) if you assume media change on all
of them, there'll be terrible confusion.  We can narrow to 28/00 that
means medium may have changed.  It could really do with checking by
someone who has a removable disc device, though ... it looks like some
of the 3B/xx might be applicable.

> Anyways, for now, the change looks good to me too.  Thanks.

Thanks,

James

---

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 7d25746..1995533 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -578,7 +578,7 @@ static int sd_prep_fn(struct request_queue *q, struct request *rq)
 		goto out;
 	}
 
-	if (sdp->changed) {
+	if (sdp->removable && sdp->changed) {
 		/*
 		 * quietly refuse to do anything to a changed disc until 
 		 * the changed bit has been reset
@@ -1008,6 +1008,9 @@ static int media_not_present(struct scsi_disk *sdkp,
 	/* not invoked for commands that could return deferred errors */
 	switch (sshdr->sense_key) {
 	case UNIT_ATTENTION:
+		if (sdkp->device->removable && sshdr->asc == 0x28 &&
+		    sshdr->ascq == 0x00)
+			sdkp->device->changed = 1;
 	case NOT_READY:
 		/* medium not present */
 		if (sshdr->asc == 0x3A) {



^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2010-12-24 15:48 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-22 17:27 Boot failure with block/for-next James Bottomley
2010-12-22 17:53 ` Tejun Heo
2010-12-23  4:31 ` James Bottomley
2010-12-23 10:09   ` Tejun Heo
2010-12-23 15:27     ` James Bottomley
2010-12-23 15:52       ` Tejun Heo
2010-12-23 16:10         ` James Bottomley
2010-12-23 16:13           ` Tejun Heo
2010-12-23 18:25             ` James Bottomley
2010-12-24 11:03               ` Tejun Heo
2010-12-24 15:47                 ` James Bottomley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).