linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
       [not found] <fa.eWQqoO1e4Mdhu8SK5jzMFAI/3NU@ifi.uio.no>
@ 2008-07-21 19:37 ` Robert Hancock
  2008-07-22  2:37   ` Alan Stern
  2008-07-22  5:11   ` Tomas Styblo
  0 siblings, 2 replies; 39+ messages in thread
From: Robert Hancock @ 2008-07-21 19:37 UTC (permalink / raw)
  To: Tomas Styblo; +Cc: linux-kernel, linux-usb, usb-storage

(adding CCs)

Tomas Styblo wrote:
> 
> Hello,
> 
> this message includes a patch that provides a workaround for
> a silent data corruption bug caused by incorrect error handling in
> the JMicron JM20337 Hi-Speed USB to SATA & PATA Combo Bridge chipset,
> USB device id 152d:2338.
> 
> 
> - the problem occurs quite rarely, approx. once for 
>   every 20 GB of transfered data during heavy load
> 
> - it seems that only read operations are affected
> 
> - the problem is accompanied by these messages in syslog each
>   time it occurs:
> 
> May 17 15:06:56 kernel: sd 6:0:0:0: [sdb] Sense Key : 0x0 [current] 
> May 17 15:06:56 kernel: sd 6:0:0:0: [sdb] ASC=0x0 ASCQ=0x0
> 
> - the bug is not detected as an error and incorrect data is returned, 
>   causing insidious data corruption
> 
> - tested with 3 external disk enclosures (Akasa Integral AK-ENP2SATA-BL) 
>   with different disks on different computers, with kernel 2.6.24 and 2.6.25
> 
> - the patch provides a crude workaround by detecting the error condition
>   and retrying the faulty transfer
> 
> 
> The fix needs a review as I don't know much about USB and SCSI.  
> It's possible that this approach is wrong and that the problem should
> be fixed somewhere else.
> 
> There are other problems with this chipset that make it necessary 
> to disconnect and power off the enclosure from time to time, but at least
> there's no data corruption anymore.

I'm not sure this is a good approach. More that this code right above in 
usb_stor_invoke_transport, which your code undoes the effect of for this 
device, doesn't seem right:

	/* If things are really okay, then let's show that.  Zero
	 * out the sense buffer so the higher layers won't realize
	 * we did an unsolicited auto-sense. */
	if (result == USB_STOR_TRANSPORT_GOOD &&
		/* Filemark 0, ignore EOM, ILI 0, no sense */
			(srb->sense_buffer[2] & 0xaf) == 0 &&
		/* No ASC or ASCQ */
			srb->sense_buffer[12] == 0 &&
			srb->sense_buffer[13] == 0) {
		srb->result = SAM_STAT_GOOD;
		srb->sense_buffer[0] = 0x0;
	}

So if the transport initially gets a failure, but then request sense 
doesn't show any error, we just go "hmm, guess it was ok after all". 
That seems kind of dangerous, I shouldn't think we should assume a 
successful transfer occurred if we got any kind of error.

If you just delete that code above, does the corruption go away?

Original attached patch was (likely whitespace damaged now):

--- linux-2.6.25.9/drivers/usb/storage/transport.c.orig	2008-06-24 
23:09:06.000000000 +0200
+++ linux-2.6.25.9/drivers/usb/storage/transport.c	2008-07-20 
05:14:32.000000000 +0200
@@ -661,6 +661,21 @@ void usb_stor_invoke_transport(struct sc
  			srb->result = SAM_STAT_GOOD;
  			srb->sense_buffer[0] = 0x0;
  		}
+
+        /* JMicron JM20337 chipset bug workaround - BEGIN */
+		if (us->pusb_dev->descriptor.idVendor == 0x152d &&
+            us->pusb_dev->descriptor.idProduct == 0x2338 &&
+            result == USB_STOR_TRANSPORT_FAILED &&
+			/* Filemark 0, ignore EOM, ILI 0, no sense */
+				(srb->sense_buffer[2] & 0xaf) == 0 &&
+			/* No ASC or ASCQ */
+				srb->sense_buffer[12] == 0 &&
+				srb->sense_buffer[13] == 0) {
+            printk(KERN_WARNING "USB Storage - Working around the 
JMicron JM20337 chipset bug (idVendor=%04x, idProduct=%04x, NO_SENSE, 
ASC=0, ASCQ=0) - retrying the read operation\n", 
us->pusb_dev->descriptor.idVendor, us->pusb_dev->descriptor.idProduct);
+		    srb->result = (DID_ERROR << 16) | (SUGGEST_RETRY << 24);
+            return;
+        }
+        /* JMicron JM20337 chipset bug workaround - END */
  	}

  	/* Did we transfer less than the minimum amount required? */

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
  2008-07-21 19:37 ` [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338 Robert Hancock
@ 2008-07-22  2:37   ` Alan Stern
  2008-07-22  9:03     ` Tomas Styblo
  2008-07-22  5:11   ` Tomas Styblo
  1 sibling, 1 reply; 39+ messages in thread
From: Alan Stern @ 2008-07-22  2:37 UTC (permalink / raw)
  To: Robert Hancock; +Cc: Tomas Styblo, linux-kernel, linux-usb, usb-storage

On Mon, 21 Jul 2008, Robert Hancock wrote:

> (adding CCs)
> 
> Tomas Styblo wrote:
> > 
> > Hello,
> > 
> > this message includes a patch that provides a workaround for
> > a silent data corruption bug caused by incorrect error handling in
> > the JMicron JM20337 Hi-Speed USB to SATA & PATA Combo Bridge chipset,
> > USB device id 152d:2338.

The two of you should read through

	http://bugzilla.kernel.org/show_bug.cgi?id=9638

which concerns this very problem.

> > - the problem occurs quite rarely, approx. once for 
> >   every 20 GB of transfered data during heavy load
> > 
> > - it seems that only read operations are affected
> > 
> > - the problem is accompanied by these messages in syslog each
> >   time it occurs:
> > 
> > May 17 15:06:56 kernel: sd 6:0:0:0: [sdb] Sense Key : 0x0 [current] 
> > May 17 15:06:56 kernel: sd 6:0:0:0: [sdb] ASC=0x0 ASCQ=0x0
> > 
> > - the bug is not detected as an error and incorrect data is returned, 
> >   causing insidious data corruption
> > 
> > - tested with 3 external disk enclosures (Akasa Integral AK-ENP2SATA-BL) 
> >   with different disks on different computers, with kernel 2.6.24 and 2.6.25
> > 
> > - the patch provides a crude workaround by detecting the error condition
> >   and retrying the faulty transfer
> > 
> > 
> > The fix needs a review as I don't know much about USB and SCSI.  
> > It's possible that this approach is wrong and that the problem should
> > be fixed somewhere else.
> > 
> > There are other problems with this chipset that make it necessary 
> > to disconnect and power off the enclosure from time to time, but at least
> > there's no data corruption anymore.
> 
> I'm not sure this is a good approach. More that this code right above in 
> usb_stor_invoke_transport, which your code undoes the effect of for this 
> device, doesn't seem right:
> 
> 	/* If things are really okay, then let's show that.  Zero
> 	 * out the sense buffer so the higher layers won't realize
> 	 * we did an unsolicited auto-sense. */
> 	if (result == USB_STOR_TRANSPORT_GOOD &&
> 		/* Filemark 0, ignore EOM, ILI 0, no sense */
> 			(srb->sense_buffer[2] & 0xaf) == 0 &&
> 		/* No ASC or ASCQ */
> 			srb->sense_buffer[12] == 0 &&
> 			srb->sense_buffer[13] == 0) {
> 		srb->result = SAM_STAT_GOOD;
> 		srb->sense_buffer[0] = 0x0;
> 	}
> 
> So if the transport initially gets a failure, but then request sense 
> doesn't show any error, we just go "hmm, guess it was ok after all". 
> That seems kind of dangerous, I shouldn't think we should assume a 

No, no -- you have misread the code.  If the transport initially got a 
failure then result would be equal to USB_STOR_TRANSPORT_FAILED, not 
USB_STOR_TRANSPORT_GOOD, so this code wouldn't run.

> If you just delete that code above, does the corruption go away?

Alan Stern


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
  2008-07-21 19:37 ` [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338 Robert Hancock
  2008-07-22  2:37   ` Alan Stern
@ 2008-07-22  5:11   ` Tomas Styblo
  2008-07-22  5:31     ` Robert Hancock
  1 sibling, 1 reply; 39+ messages in thread
From: Tomas Styblo @ 2008-07-22  5:11 UTC (permalink / raw)
  To: Robert Hancock; +Cc: linux-kernel, linux-usb, usb-storage

* Robert Hancock <hancockr@shaw.ca> [Mon, 21 Jul 2008]:
>
> I'm not sure this is a good approach. More that this code right above in 
> usb_stor_invoke_transport, which your code undoes the effect of for this 
> device, doesn't seem right:
>
> 	/* If things are really okay, then let's show that.  Zero
> 	 * out the sense buffer so the higher layers won't realize
> 	 * we did an unsolicited auto-sense. */
> 	if (result == USB_STOR_TRANSPORT_GOOD &&
> 		/* Filemark 0, ignore EOM, ILI 0, no sense */
> 			(srb->sense_buffer[2] & 0xaf) == 0 &&
> 		/* No ASC or ASCQ */
> 			srb->sense_buffer[12] == 0 &&
> 			srb->sense_buffer[13] == 0) {
> 		srb->result = SAM_STAT_GOOD;
> 		srb->sense_buffer[0] = 0x0;
> 	}
>

The patch doesn't exactly undo the effect of the code above,
because the value of _result_ is different. When this problem
happens, the condition above is false, _result_ is
USB_STOR_TRANSPORT_FAILED and scsi_get_resid(srb) > 0, but the
chipset doesn't report any error (NO_SENSE,ASC==0,ASCQ==0). That's
why I think there's something wrong with the chipset. 

There are Windows users on various message boards who report the
same problem with this chipset - a kind of silent data corruption
that occurs only when copying large amounts of data.

But as I said I know little about SCSI and USB. I tried to locate
and fix the problem, but I can't tell whether the current error
handling code is written according to the relevant standards.

A more generic approach would certainly be better than hardcoded
device ids. Perhaps this check should be enabled for all devices?
Why not?



-- 
Tomas Styblo <tripie@cpan.org>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
  2008-07-22  5:11   ` Tomas Styblo
@ 2008-07-22  5:31     ` Robert Hancock
  2008-07-22  6:11       ` Tomas Styblo
  0 siblings, 1 reply; 39+ messages in thread
From: Robert Hancock @ 2008-07-22  5:31 UTC (permalink / raw)
  To: Tomas Styblo; +Cc: linux-kernel, linux-usb, usb-storage, Alan Stern

Tomas Styblo wrote:
> * Robert Hancock <hancockr@shaw.ca> [Mon, 21 Jul 2008]:
>> I'm not sure this is a good approach. More that this code right above in 
>> usb_stor_invoke_transport, which your code undoes the effect of for this 
>> device, doesn't seem right:
>>
>> 	/* If things are really okay, then let's show that.  Zero
>> 	 * out the sense buffer so the higher layers won't realize
>> 	 * we did an unsolicited auto-sense. */
>> 	if (result == USB_STOR_TRANSPORT_GOOD &&
>> 		/* Filemark 0, ignore EOM, ILI 0, no sense */
>> 			(srb->sense_buffer[2] & 0xaf) == 0 &&
>> 		/* No ASC or ASCQ */
>> 			srb->sense_buffer[12] == 0 &&
>> 			srb->sense_buffer[13] == 0) {
>> 		srb->result = SAM_STAT_GOOD;
>> 		srb->sense_buffer[0] = 0x0;
>> 	}
>>
> 
> The patch doesn't exactly undo the effect of the code above,
> because the value of _result_ is different. When this problem

Yes, you and Alan are right, that code only triggers on a good result.

> happens, the condition above is false, _result_ is
> USB_STOR_TRANSPORT_FAILED and scsi_get_resid(srb) > 0, but the
> chipset doesn't report any error (NO_SENSE,ASC==0,ASCQ==0). That's
> why I think there's something wrong with the chipset. 

I'm assuming what triggers the transport failure is it getting a 
US_BULK_STAT_FAIL result from the transfer inside 
usb_stor_Bulk_transport. It could be there's some condition in the chip 
that causes that error, yet doesn't provide any sense data.

In any case, given that your code apparently fixes the corruption it 
seems that srb->result is being set to SAM_STAT_CHECK_CONDITION, but the 
DID_ERROR and SUGGEST_RETRY flags are not being set. Presumably then the 
  SCSI layer looks at the sense data and says "hmm, nothing to worry 
about here" and carries on.

I think we do need something like your patch, though it should likely be 
moved inside the if (need_auto_sense) check, and I don't see a reason to 
limit to this device ID only.

> 
> There are Windows users on various message boards who report the
> same problem with this chipset - a kind of silent data corruption
> that occurs only when copying large amounts of data.
> 
> But as I said I know little about SCSI and USB. I tried to locate
> and fix the problem, but I can't tell whether the current error
> handling code is written according to the relevant standards.
> 
> A more generic approach would certainly be better than hardcoded
> device ids. Perhaps this check should be enabled for all devices?
> Why not?



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
  2008-07-22  5:31     ` Robert Hancock
@ 2008-07-22  6:11       ` Tomas Styblo
  2008-07-22  8:45         ` Robert Hancock
  0 siblings, 1 reply; 39+ messages in thread
From: Tomas Styblo @ 2008-07-22  6:11 UTC (permalink / raw)
  To: Robert Hancock; +Cc: linux-kernel, linux-usb, usb-storage, Alan Stern

* Robert Hancock <hancockr@shaw.ca> [Tue, 22 Jul 2008]:
>
> In any case, given that your code apparently fixes the corruption it seems 
> that srb->result is being set to SAM_STAT_CHECK_CONDITION, but the 
> DID_ERROR and SUGGEST_RETRY flags are not being set. Presumably then the  
> SCSI layer looks at the sense data and says "hmm, nothing to worry about 
> here" and carries on.

That's exactly what I thought was happening, after a cursory 
look at the SCSI code.

>
> I think we do need something like your patch, though it should likely be 
> moved inside the if (need_auto_sense) check, and I don't see a reason to 
> limit to this device ID only.

Thank you. This is a very insidious bug as it doesn't manifest
itself very often, months of data corruption may pass before you
notice it.

So is there a bug in the chipset, or does the error handling code 
not follow specifications?

I wonder if the company that makes the chipset should be notified
about this problem?


-- 
Tomas Styblo <tripie@cpan.org>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
  2008-07-22  6:11       ` Tomas Styblo
@ 2008-07-22  8:45         ` Robert Hancock
  2008-07-24  6:15           ` Alex Buell
  2008-07-29 21:09           ` Alex Buell
  0 siblings, 2 replies; 39+ messages in thread
From: Robert Hancock @ 2008-07-22  8:45 UTC (permalink / raw)
  To: Tomas Styblo; +Cc: linux-kernel, linux-usb, usb-storage, Alan Stern

Tomas Styblo wrote:
> * Robert Hancock <hancockr@shaw.ca> [Tue, 22 Jul 2008]:
>> In any case, given that your code apparently fixes the corruption it seems 
>> that srb->result is being set to SAM_STAT_CHECK_CONDITION, but the 
>> DID_ERROR and SUGGEST_RETRY flags are not being set. Presumably then the  
>> SCSI layer looks at the sense data and says "hmm, nothing to worry about 
>> here" and carries on.
> 
> That's exactly what I thought was happening, after a cursory 
> look at the SCSI code.
> 
>> I think we do need something like your patch, though it should likely be 
>> moved inside the if (need_auto_sense) check, and I don't see a reason to 
>> limit to this device ID only.
> 
> Thank you. This is a very insidious bug as it doesn't manifest
> itself very often, months of data corruption may pass before you
> notice it.
> 
> So is there a bug in the chipset, or does the error handling code 
> not follow specifications?

It looks clear to me that it's a bug in the chipset. It's supposed to 
set some valid sense data if an error occurs, not just set the "failed" 
flag in the USB storage status word. (Presumably the fact that these 
errors are occurring in the first place is a bug in itself.. though that 
could be a problem with the enclosure or drive as well.)

However the kernel should be more robust and not ignore the error 
indication that it is giving.

> I wonder if the company that makes the chipset should be notified
> about this problem?

I suppose it wouldn't hurt to let JMicron know about this. I doubt they 
could do anything for existing chipsets, but it might help them avoid 
this bug in future designs.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
  2008-07-22  2:37   ` Alan Stern
@ 2008-07-22  9:03     ` Tomas Styblo
  2008-07-23  2:38       ` Alan Stern
  0 siblings, 1 reply; 39+ messages in thread
From: Tomas Styblo @ 2008-07-22  9:03 UTC (permalink / raw)
  To: Alan Stern; +Cc: Robert Hancock, linux-kernel, linux-usb, usb-storage

* Alan Stern <stern@rowland.harvard.edu> [Tue, 22 Jul 2008]:
> On Mon, 21 Jul 2008, Robert Hancock wrote:
> > > this message includes a patch that provides a workaround for
> > > a silent data corruption bug caused by incorrect error handling in
> > > the JMicron JM20337 Hi-Speed USB to SATA & PATA Combo Bridge chipset,
> > > USB device id 152d:2338.
> 
> The two of you should read through
> 
> 	http://bugzilla.kernel.org/show_bug.cgi?id=9638
> 
> which concerns this very problem.

I had found this bugreport and read through it before I posted my
patch. I don't think this is the same problem. The error messages
and the description of the problem are different from what I've
been trying to fix.

Anyway, I'll send the patch to this person so he can try it. 
I guess it won't fix his problem. This patch is much simpler and doesn't 
need any delays - I really think this is a different situation.

I sometimes experience the problems described by this person, as I
noted in the first message with the patch. When these "reset high
speed USB device" messages appear, it is usually necessary to
disconnect and power off the device. In my experience, data
corruption can be prevented in this situation by setting:

# tune2fs -e remount-ro /dev/sdX

But this is a different problem - in this situation the error
actually IS detected.



-- 
Tomas Styblo <tripie@cpan.org>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
  2008-07-22  9:03     ` Tomas Styblo
@ 2008-07-23  2:38       ` Alan Stern
  2008-07-23 23:20         ` Robert Hancock
  0 siblings, 1 reply; 39+ messages in thread
From: Alan Stern @ 2008-07-23  2:38 UTC (permalink / raw)
  To: Tomas Styblo; +Cc: Robert Hancock, linux-kernel, linux-usb, usb-storage

On Tue, 22 Jul 2008, Tomas Styblo wrote:

> * Alan Stern <stern@rowland.harvard.edu> [Tue, 22 Jul 2008]:
> > On Mon, 21 Jul 2008, Robert Hancock wrote:
> > > > this message includes a patch that provides a workaround for
> > > > a silent data corruption bug caused by incorrect error handling in
> > > > the JMicron JM20337 Hi-Speed USB to SATA & PATA Combo Bridge chipset,
> > > > USB device id 152d:2338.
> > 
> > The two of you should read through
> > 
> > 	http://bugzilla.kernel.org/show_bug.cgi?id=9638
> > 
> > which concerns this very problem.
> 
> I had found this bugreport and read through it before I posted my
> patch. I don't think this is the same problem. The error messages
> and the description of the problem are different from what I've
> been trying to fix.

Maybe the initial descriptions are.  If you read through the entire
report, though, you'll see that the underlying problems are exactly
the same:

	A READ fails to transfer all the data requested.  The device
	sends back Check Condition status, but the sense data is all
	0, indicating that nothing was wrong.  The system does not
	retry the READ, leading to data corruption.

	A WRITE fails and times out after 30 seconds.  The system
	tries to reset the device, but the reset fails.

The system doesn't detect the first problem for two reasons: The
scsi_eh_prep_cmnd routine fails to preserve scmd->underflow, and
usb-storage fails to check for underflow when the device returns Check
Condition status.  The Bugzilla report contains fixes for both of
these, and they solve the problem.

The second problem is harder.  The device is supposed to send a STALL
to cut the WRITE short -- and a USB trace under Windows seems to
indicate that it does -- but under Linux no STALL is received.  I
don't know why not.  But since Linux does not respond in the way the
device expects, the device crashes.

> Anyway, I'll send the patch to this person so he can try it. 
> I guess it won't fix his problem. This patch is much simpler and doesn't 
> need any delays - I really think this is a different situation.

It isn't.  And your patch is an ad-hoc correction that doesn't address
the true underlying reasons for the errors.

You should also try adding the delay mentioned in the bug report.
There's an excellent chance it will also prevent your problems.

> I sometimes experience the problems described by this person, as I
> noted in the first message with the patch. When these "reset high
> speed USB device" messages appear, it is usually necessary to
> disconnect and power off the device.

Because the device's firmware has crashed.  That's why the reset fails.

Alan Stern


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
  2008-07-23  2:38       ` Alan Stern
@ 2008-07-23 23:20         ` Robert Hancock
  2008-07-24  3:42           ` Alan Stern
  2008-07-25  8:44           ` Tomas Styblo
  0 siblings, 2 replies; 39+ messages in thread
From: Robert Hancock @ 2008-07-23 23:20 UTC (permalink / raw)
  To: Alan Stern; +Cc: Tomas Styblo, linux-kernel, linux-usb, usb-storage

Alan Stern wrote:
> The system doesn't detect the first problem for two reasons: The
> scsi_eh_prep_cmnd routine fails to preserve scmd->underflow, and
> usb-storage fails to check for underflow when the device returns Check
> Condition status.  The Bugzilla report contains fixes for both of
> these, and they solve the problem.
> 
> The second problem is harder.  The device is supposed to send a STALL
> to cut the WRITE short -- and a USB trace under Windows seems to
> indicate that it does -- but under Linux no STALL is received.  I
> don't know why not.  But since Linux does not respond in the way the
> device expects, the device crashes.
> 
>> Anyway, I'll send the patch to this person so he can try it. 
>> I guess it won't fix his problem. This patch is much simpler and doesn't 
>> need any delays - I really think this is a different situation.
> 
> It isn't.  And your patch is an ad-hoc correction that doesn't address
> the true underlying reasons for the errors.

Tomas, you should try the patch listed in that bug report (well, except 
maybe the part that's actually adding the JMicron unusual devices entry) 
and see if that fixes the problem.

It remains an issue, though, that if there's no underflow, if the device 
reports an error in the CSW but doesn't provide sense data, we assume 
nothing bad happened and don't retry. That definitely does not seem 
correct. The device is not supposed to do this, but with how crappily 
some of these devices are designed we should be more defensive.

It's pretty gross to add random delays in without some good evidence or 
input from the manufacturer that it will really fix the problem. (We do 
have one GO_SLOW flag with a delay in there already for some Genesys 
chips, but I believe that was based on some actual input from Genesys.)

> 
> You should also try adding the delay mentioned in the bug report.
> There's an excellent chance it will also prevent your problems.
> 
>> I sometimes experience the problems described by this person, as I
>> noted in the first message with the patch. When these "reset high
>> speed USB device" messages appear, it is usually necessary to
>> disconnect and power off the device.
> 
> Because the device's firmware has crashed.  That's why the reset fails.
> 
> Alan Stern
> 
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
  2008-07-23 23:20         ` Robert Hancock
@ 2008-07-24  3:42           ` Alan Stern
  2008-07-30 19:55             ` Robert Hancock
  2008-07-25  8:44           ` Tomas Styblo
  1 sibling, 1 reply; 39+ messages in thread
From: Alan Stern @ 2008-07-24  3:42 UTC (permalink / raw)
  To: Robert Hancock; +Cc: Tomas Styblo, linux-kernel, linux-usb, usb-storage

On Wed, 23 Jul 2008, Robert Hancock wrote:

> It remains an issue, though, that if there's no underflow, if the device 
> reports an error in the CSW but doesn't provide sense data, we assume 
> nothing bad happened and don't retry. That definitely does not seem 
> correct. The device is not supposed to do this, but with how crappily 
> some of these devices are designed we should be more defensive.

The problem is, what can you do?  The device has said that something 
was wrong, but it hasn't told you what.  Without knowing what went 
wrong, you can't know how to recover.

I suppose in such cases we could simply report that the command failed
completely.

> It's pretty gross to add random delays in without some good evidence or 
> input from the manufacturer that it will really fix the problem. (We do 
> have one GO_SLOW flag with a delay in there already for some Genesys 
> chips, but I believe that was based on some actual input from Genesys.)

That's right.

Alan Stern


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
  2008-07-22  8:45         ` Robert Hancock
@ 2008-07-24  6:15           ` Alex Buell
  2008-07-29 21:09           ` Alex Buell
  1 sibling, 0 replies; 39+ messages in thread
From: Alex Buell @ 2008-07-24  6:15 UTC (permalink / raw)
  To: Robert Hancock
  Cc: Tomas Styblo, linux-kernel, linux-usb, usb-storage, Alan Stern

On Tue, 22 Jul 2008 02:45:32 -0600, I waved a wand and this message
magically appears in front of Robert Hancock:

> I suppose it wouldn't hurt to let JMicron know about this. I doubt
> they could do anything for existing chipsets, but it might help them
> avoid this bug in future designs.

I, too, have the same chipet. I have observed the same corruption bug
with SATA disks only. PATA disks works perfectly. 

When I have more time and a spare SATA disk, I will try and reproduce
the same problem and post results on here, in the hope someone will
come up with a fix for it. 

It is very annoying not being able to use cheaper SATA disks for
backing up my boxes instead of slightly more expensive PATA disks. 
-- 
http://www.munted.org.uk

Fearsome grindings.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
  2008-07-23 23:20         ` Robert Hancock
  2008-07-24  3:42           ` Alan Stern
@ 2008-07-25  8:44           ` Tomas Styblo
  2008-07-25  8:54             ` Robert Hancock
  1 sibling, 1 reply; 39+ messages in thread
From: Tomas Styblo @ 2008-07-25  8:44 UTC (permalink / raw)
  To: Robert Hancock; +Cc: Alan Stern, linux-kernel, linux-usb, usb-storage

* Robert Hancock <hancockr@shaw.ca> [Thu, 24 Jul 2008]:
>>> Anyway, I'll send the patch to this person so he can try it. I guess it 
>>> won't fix his problem. This patch is much simpler and doesn't need any 
>>> delays - I really think this is a different situation.
>>
>> It isn't.  And your patch is an ad-hoc correction that doesn't address
>> the true underlying reasons for the errors.
>
> Tomas, you should try the patch listed in that bug report (well, except 
> maybe the part that's actually adding the JMicron unusual devices entry) 
> and see if that fixes the problem.

I'll try that patch next week - but I have no confidence in the
reliability of fixing the problem by inserting random delays. All
I have wanted to achieve is to detect the errors and thus prevent
the silent data corruption. 

I don't really mind those sporadic forced resets since they cause
no data corruption, at least in my setup. My simple patch
accomplishes the goal of preventing data corruption more reliably,
I think.


-- 
Tomas Styblo <tripie@cpan.org>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
  2008-07-25  8:44           ` Tomas Styblo
@ 2008-07-25  8:54             ` Robert Hancock
  0 siblings, 0 replies; 39+ messages in thread
From: Robert Hancock @ 2008-07-25  8:54 UTC (permalink / raw)
  To: Tomas Styblo; +Cc: Alan Stern, linux-kernel, linux-usb, usb-storage

Tomas Styblo wrote:
> * Robert Hancock <hancockr@shaw.ca> [Thu, 24 Jul 2008]:
>>>> Anyway, I'll send the patch to this person so he can try it. I guess it 
>>>> won't fix his problem. This patch is much simpler and doesn't need any 
>>>> delays - I really think this is a different situation.
>>> It isn't.  And your patch is an ad-hoc correction that doesn't address
>>> the true underlying reasons for the errors.
>> Tomas, you should try the patch listed in that bug report (well, except 
>> maybe the part that's actually adding the JMicron unusual devices entry) 
>> and see if that fixes the problem.
> 
> I'll try that patch next week - but I have no confidence in the
> reliability of fixing the problem by inserting random delays. All
> I have wanted to achieve is to detect the errors and thus prevent
> the silent data corruption. 
> 
> I don't really mind those sporadic forced resets since they cause
> no data corruption, at least in my setup. My simple patch
> accomplishes the goal of preventing data corruption more reliably,
> I think.

I agree, that's why I mentioned to leave out the part that adds the 
unusual devices entry which actually enables the random delays. There's 
another part of the patch which fixes underflow detection, which might 
have the same effect as your patch.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
  2008-07-22  8:45         ` Robert Hancock
  2008-07-24  6:15           ` Alex Buell
@ 2008-07-29 21:09           ` Alex Buell
  2008-07-29 22:33             ` [usb-storage] " Matthew Dharm
  1 sibling, 1 reply; 39+ messages in thread
From: Alex Buell @ 2008-07-29 21:09 UTC (permalink / raw)
  To: Robert Hancock
  Cc: Tomas Styblo, linux-kernel, linux-usb, usb-storage, Alan Stern

On Tue, 22 Jul 2008 02:45:32 -0600, I waved a wand and this message
magically appears in front of Robert Hancock:

> I suppose it wouldn't hurt to let JMicron know about this. I doubt
> they could do anything for existing chipsets, but it might help them
> avoid this bug in future designs.

I can reproduce this EVERY time with SATA disks using the same JM20337
chipset. 

1. mke2fs /dev/sda1
2. tunefs -j /dev/sda1
3. e2fsck /dev/sda1

This spits out a whole load of:
Jul 29 22:04:41 lithium sd 4:0:0:0: [sda] Sense Key : 0x0 [current] 
Jul 29 22:04:41 lithium sd 4:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Jul 29 22:04:43 lithium sd 4:0:0:0: [sda] Sense Key : 0x0 [current] 
Jul 29 22:04:43 lithium sd 4:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Jul 29 22:04:44 lithium sd 4:0:0:0: [sda] Sense Key : 0x0 [current] 
Jul 29 22:04:44 lithium sd 4:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Jul 29 22:04:45 lithium sd 4:0:0:0: [sda] Sense Key : 0x0 [current] 
Jul 29 22:04:45 lithium sd 4:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Jul 29 22:04:45 lithium sd 4:0:0:0: [sda] Sense Key : 0x0 [current] 
Jul 29 22:04:45 lithium sd 4:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Jul 29 22:06:31 lithium sd 4:0:0:0: [sda] Sense Key : 0x0 [current] 
Jul 29 22:06:31 lithium sd 4:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Jul 29 22:06:32 lithium sd 4:0:0:0: [sda] Sense Key : 0x0 [current] 
Jul 29 22:06:32 lithium sd 4:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Jul 29 22:06:35 lithium sd 4:0:0:0: [sda] Sense Key : 0x0 [current] 
Jul 29 22:06:35 lithium sd 4:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Jul 29 22:06:51 lithium sd 4:0:0:0: [sda] Sense Key : 0x0 [current] 
Jul 29 22:06:51 lithium sd 4:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Jul 29 22:06:51 lithium sd 4:0:0:0: [sda] Sense Key : 0x0 [current] 
Jul 29 22:06:51 lithium sd 4:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Jul 29 22:06:54 lithium sd 4:0:0:0: [sda] Sense Key : 0x0 [current] 
Jul 29 22:06:54 lithium sd 4:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Jul 29 22:07:37 lithium sd 4:0:0:0: [sda] Sense Key : 0x0 [current] 
Jul 29 22:07:37 lithium sd 4:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Jul 29 22:07:37 lithium sd 4:0:0:0: [sda] Sense Key : 0x0 [current] 
Jul 29 22:07:37 lithium sd 4:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Jul 29 22:07:47 lithium sd 4:0:0:0: [sda] Sense Key : 0x0 [current] 
Jul 29 22:07:47 lithium sd 4:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Jul 29 22:08:04 lithium sd 4:0:0:0: [sda] Sense Key : 0x0 [current] 
Jul 29 22:08:04 lithium sd 4:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Jul 29 22:08:06 lithium sd 4:0:0:0: [sda] Sense Key : 0x0 [current] 
Jul 29 22:08:06 lithium sd 4:0:0:0: [sda] ASC=0x0 ASCQ=0x0

-- 
http://www.munted.org.uk

Fearsome grindings.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [usb-storage] [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
  2008-07-29 21:09           ` Alex Buell
@ 2008-07-29 22:33             ` Matthew Dharm
  0 siblings, 0 replies; 39+ messages in thread
From: Matthew Dharm @ 2008-07-29 22:33 UTC (permalink / raw)
  To: Alex Buell
  Cc: Robert Hancock, usb-storage, Tomas Styblo, linux-usb, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 824 bytes --]

On Tue, Jul 29, 2008 at 10:09:08PM +0100, Alex Buell wrote:
> On Tue, 22 Jul 2008 02:45:32 -0600, I waved a wand and this message
> magically appears in front of Robert Hancock:
> 
> > I suppose it wouldn't hurt to let JMicron know about this. I doubt
> > they could do anything for existing chipsets, but it might help them
> > avoid this bug in future designs.
> 
> I can reproduce this EVERY time with SATA disks using the same JM20337
> chipset. 
> 
> 1. mke2fs /dev/sda1
> 2. tunefs -j /dev/sda1
> 3. e2fsck /dev/sda1

I already sent a note to my contact at JMicron, but got no response.

Matt

-- 
Matthew Dharm                              Home: mdharm-usb@one-eyed-alien.net 
Maintainer, Linux USB Mass Storage Driver

Somebody call an exorcist!
					-- Dust Puppy
User Friendly, 5/16/1998

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
  2008-07-24  3:42           ` Alan Stern
@ 2008-07-30 19:55             ` Robert Hancock
  2008-07-30 20:00               ` [usb-storage] " Matthew Dharm
  2008-07-30 21:07               ` Alan Stern
  0 siblings, 2 replies; 39+ messages in thread
From: Robert Hancock @ 2008-07-30 19:55 UTC (permalink / raw)
  To: Alan Stern; +Cc: Tomas Styblo, linux-kernel, linux-usb, usb-storage

Alan Stern wrote:
> On Wed, 23 Jul 2008, Robert Hancock wrote:
> 
>> It remains an issue, though, that if there's no underflow, if the device 
>> reports an error in the CSW but doesn't provide sense data, we assume 
>> nothing bad happened and don't retry. That definitely does not seem 
>> correct. The device is not supposed to do this, but with how crappily 
>> some of these devices are designed we should be more defensive.
> 
> The problem is, what can you do?  The device has said that something 
> was wrong, but it hasn't told you what.  Without knowing what went 
> wrong, you can't know how to recover.
> 
> I suppose in such cases we could simply report that the command failed
> completely.

I think that is what we need to do. The SCSI/block layers should retry 
the command or report a failure to userspace. Above all else we can't 
just continue on our merry way and assume success, otherwise data will 
get silently corrupted.

It could be that's what Windows is doing, and it's just silently 
retrying commands. Though, some people have reported corruption on these 
chipsets on Windows too, so maybe it has the same issue.

Is that SCSI/USB underflow fix from 
http://bugzilla.kernel.org/show_bug.cgi?id=9638 being pushed forward by 
anybody? If not, I can try and cook up a patch to take care of that as 
well as the non-underflow failed CSW/no sense data issue.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [usb-storage] [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
  2008-07-30 19:55             ` Robert Hancock
@ 2008-07-30 20:00               ` Matthew Dharm
  2008-07-30 20:46                 ` Robert Hancock
  2008-07-30 21:18                 ` Alan Stern
  2008-07-30 21:07               ` Alan Stern
  1 sibling, 2 replies; 39+ messages in thread
From: Matthew Dharm @ 2008-07-30 20:00 UTC (permalink / raw)
  To: Robert Hancock
  Cc: Alan Stern, usb-storage, Tomas Styblo, linux-usb, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1804 bytes --]

On Wed, Jul 30, 2008 at 01:55:25PM -0600, Robert Hancock wrote:
> Alan Stern wrote:
> > On Wed, 23 Jul 2008, Robert Hancock wrote:
> > 
> >> It remains an issue, though, that if there's no underflow, if the device 
> >> reports an error in the CSW but doesn't provide sense data, we assume 
> >> nothing bad happened and don't retry. That definitely does not seem 
> >> correct. The device is not supposed to do this, but with how crappily 
> >> some of these devices are designed we should be more defensive.
> > 
> > The problem is, what can you do?  The device has said that something 
> > was wrong, but it hasn't told you what.  Without knowing what went 
> > wrong, you can't know how to recover.

Yes and no.  If ASC/ASCQ is clear, then it's telling you that nothing is
wrong.  The device is contradicting itself. That doesn't really help us
here, but it's a point I like to be clear on.

> > I suppose in such cases we could simply report that the command failed
> > completely.
> 
> I think that is what we need to do. The SCSI/block layers should retry 
> the command or report a failure to userspace. Above all else we can't 
> just continue on our merry way and assume success, otherwise data will 
> get silently corrupted.

The code path to supress the reporting of an error when auto-sense shows no
ASC/ASCQ was added for a reason.  That reason has likely been lost to time,
but I worry about devices that are out there that rely on the current
behavior to function properly....

Matt

-- 
Matthew Dharm                              Home: mdharm-usb@one-eyed-alien.net 
Maintainer, Linux USB Mass Storage Driver

A:  The most ironic oxymoron wins ...
DP: "Microsoft Works"
A:  Uh, okay, you win.
					-- A.J. & Dust Puppy
User Friendly, 1/18/1998

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [usb-storage] [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
  2008-07-30 20:00               ` [usb-storage] " Matthew Dharm
@ 2008-07-30 20:46                 ` Robert Hancock
  2008-07-30 21:18                 ` Alan Stern
  1 sibling, 0 replies; 39+ messages in thread
From: Robert Hancock @ 2008-07-30 20:46 UTC (permalink / raw)
  To: Robert Hancock, Alan Stern, usb-storage, Tomas Styblo, linux-usb,
	linux-kernel

Matthew Dharm wrote:
> On Wed, Jul 30, 2008 at 01:55:25PM -0600, Robert Hancock wrote:
>> Alan Stern wrote:
>>> On Wed, 23 Jul 2008, Robert Hancock wrote:
>>>
>>>> It remains an issue, though, that if there's no underflow, if the device 
>>>> reports an error in the CSW but doesn't provide sense data, we assume 
>>>> nothing bad happened and don't retry. That definitely does not seem 
>>>> correct. The device is not supposed to do this, but with how crappily 
>>>> some of these devices are designed we should be more defensive.
>>> The problem is, what can you do?  The device has said that something 
>>> was wrong, but it hasn't told you what.  Without knowing what went 
>>> wrong, you can't know how to recover.
> 
> Yes and no.  If ASC/ASCQ is clear, then it's telling you that nothing is
> wrong.  The device is contradicting itself. That doesn't really help us
> here, but it's a point I like to be clear on.
> 
>>> I suppose in such cases we could simply report that the command failed
>>> completely.
>> I think that is what we need to do. The SCSI/block layers should retry 
>> the command or report a failure to userspace. Above all else we can't 
>> just continue on our merry way and assume success, otherwise data will 
>> get silently corrupted.
> 
> The code path to supress the reporting of an error when auto-sense shows no
> ASC/ASCQ was added for a reason.  That reason has likely been lost to time,
> but I worry about devices that are out there that rely on the current
> behavior to function properly....

My original comment was that that code should be removed, but this is 
incorrect. In fact that code path is unrelated to this problem since it 
only executes if no transport error was detected. This code path is 
needed since retrieving sense data is done for multiple reasons other 
than a transport failure. For one, "If we're running the CB transport, 
which is incapable of determining status on its own, we will auto-sense 
unless the operation involved a data-in transfer." In this case, for a 
successful transfer the status must be reset to good after getting the 
sense data.

In the case in question here, the BOT transport reports a failure, and 
we retrieve sense data, but the sense data doesn't indicate an error. 
This results in the failure essentially being ignored. In this case I 
think we should be doing the same thing as we do on detecting an underflow:

srb->result = (DID_ERROR << 16) | (SUGGEST_RETRY << 24);



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
  2008-07-30 19:55             ` Robert Hancock
  2008-07-30 20:00               ` [usb-storage] " Matthew Dharm
@ 2008-07-30 21:07               ` Alan Stern
  2008-08-01 21:15                 ` Alex Buell
  1 sibling, 1 reply; 39+ messages in thread
From: Alan Stern @ 2008-07-30 21:07 UTC (permalink / raw)
  To: Robert Hancock; +Cc: Tomas Styblo, linux-kernel, linux-usb, usb-storage

On Wed, 30 Jul 2008, Robert Hancock wrote:

> > I suppose in such cases we could simply report that the command failed
> > completely.
> 
> I think that is what we need to do. The SCSI/block layers should retry 
> the command or report a failure to userspace. Above all else we can't 
> just continue on our merry way and assume success, otherwise data will 
> get silently corrupted.
> 
> It could be that's what Windows is doing, and it's just silently 
> retrying commands. Though, some people have reported corruption on these 
> chipsets on Windows too, so maybe it has the same issue.

It's probably a combination of problems.

> Is that SCSI/USB underflow fix from 
> http://bugzilla.kernel.org/show_bug.cgi?id=9638 being pushed forward by 
> anybody? If not, I can try and cook up a patch to take care of that as 
> well as the non-underflow failed CSW/no sense data issue.

The patch situation is a little confused.  The change to the SCSI stack
are already present in 2.6.27-rc1.  I am not pushing the added delay
patch; I want to do some testing of my own and hear from other people
about it before submitting it.

The USB change discussed there is written but not yet submitted.  It
also needs testing.  Would you like to try it under 2.6.27-rc1?  Here
it is.

Alan Stern


Index: usb-2.6/drivers/usb/storage/transport.c
===================================================================
--- usb-2.6.orig/drivers/usb/storage/transport.c
+++ usb-2.6/drivers/usb/storage/transport.c
@@ -663,7 +663,7 @@ void usb_stor_invoke_transport(struct sc
 	}
 
 	/* Did we transfer less than the minimum amount required? */
-	if (srb->result == SAM_STAT_GOOD &&
+	if ((srb->result == SAM_STAT_GOOD || srb->sense_buffer[2] == 0) &&
 			scsi_bufflen(srb) - scsi_get_resid(srb) < srb->underflow)
 		srb->result = (DID_ERROR << 16) | (SUGGEST_RETRY << 24);
 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [usb-storage] [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
  2008-07-30 20:00               ` [usb-storage] " Matthew Dharm
  2008-07-30 20:46                 ` Robert Hancock
@ 2008-07-30 21:18                 ` Alan Stern
  1 sibling, 0 replies; 39+ messages in thread
From: Alan Stern @ 2008-07-30 21:18 UTC (permalink / raw)
  To: Matthew Dharm
  Cc: Robert Hancock, usb-storage, Tomas Styblo, linux-usb, linux-kernel

On Wed, 30 Jul 2008, Matthew Dharm wrote:

> The code path to supress the reporting of an error when auto-sense shows no
> ASC/ASCQ was added for a reason.  That reason has likely been lost to time,
> but I worry about devices that are out there that rely on the current
> behavior to function properly....

In this case we shouldn't have to worry about that code path.  It 
applies to situations where there actually is no error.  In fact, if 
you follow the logic flow you'll see that the supression code gets 
invoked only for CB-like transports, i.e., unsolicited auto-sense.

But in this situation there really _is_ an error: We get an underflow.  
The existing logic checks for underflow only when the device reports 
Okay status; my patch changes the logic so that it also checks for 
underflow when the device reports Check Condition status with no sense 
data.

The idea is that the higher SCSI layers don't know what to do with
Check Condition plus no-sense -- so they treat the transfer as a
success.  But that's definitely the wrong thing to do when there was an
underflow; in that situation we want to report the underflow so that
the midlayer will retry the command.

Alan Stern


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
  2008-07-30 21:07               ` Alan Stern
@ 2008-08-01 21:15                 ` Alex Buell
  2008-08-01 22:22                   ` Alex Buell
  0 siblings, 1 reply; 39+ messages in thread
From: Alex Buell @ 2008-08-01 21:15 UTC (permalink / raw)
  To: Alan Stern
  Cc: Robert Hancock, Tomas Styblo, linux-kernel, linux-usb, usb-storage

On Wed, 30 Jul 2008 17:07:30 -0400 (EDT), I waved a wand and this
message magically appears in front of Alan Stern:

> Index: usb-2.6/drivers/usb/storage/transport.c
[ snip ]

I've just applied this patch to 2.6.25 tonight, and so far, despite my
number of tests, I've yet to experience any more of these failures.

I've done a lot of mkfs and fsck, haven't seen any problems so far.
Fingers crossed that this patch works. 
-- 
http://www.munted.org.uk

Fearsome grindings.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
  2008-08-01 21:15                 ` Alex Buell
@ 2008-08-01 22:22                   ` Alex Buell
  2008-08-02  2:32                     ` Robert Hancock
  2008-08-02 23:49                     ` Alan Stern
  0 siblings, 2 replies; 39+ messages in thread
From: Alex Buell @ 2008-08-01 22:22 UTC (permalink / raw)
  To: linux-kernel
  Cc: Alan Stern, Robert Hancock, Tomas Styblo, linux-usb, usb-storage

On Fri, 1 Aug 2008 22:15:04 +0100, I waved a wand and this message
magically appears in front of Alex Buell:

> I've done a lot of mkfs and fsck, haven't seen any problems so far.
> Fingers crossed that this patch works. 

Spoke too soon, that patch that was posted doesn't work for me. Here's
my log:

Aug  1 23:18:08 lithium scsi 5:0:0:0: Direct-Access     SAMSUNG  SP1213C          0-30 PQ: 0 ANSI: 2 CCS
Aug  1 23:18:08 lithium sd 5:0:0:0: [sda] 234493056 512-byte hardware sectors (120060 MB)
Aug  1 23:18:08 lithium sd 5:0:0:0: [sda] Write Protect is off
Aug  1 23:18:08 lithium sd 5:0:0:0: [sda] Mode Sense: 00 38 00 00
Aug  1 23:18:08 lithium sd 5:0:0:0: [sda] Assuming drive cache: write through
Aug  1 23:18:08 lithium sd 5:0:0:0: [sda] 234493056 512-byte hardware sectors (120060 MB)
Aug  1 23:18:08 lithium sd 5:0:0:0: [sda] Write Protect is off
Aug  1 23:18:08 lithium sd 5:0:0:0: [sda] Mode Sense: 00 38 00 00
Aug  1 23:18:08 lithium sd 5:0:0:0: [sda] Assuming drive cache: write through
Aug  1 23:18:08 lithium sda: sda1
Aug  1 23:18:08 lithium sd 5:0:0:0: [sda] Attached SCSI disk
Aug  1 23:18:08 lithium sd 5:0:0:0: Attached scsi generic sg1 type 0
Aug  1 23:18:08 lithium usb-storage: device scan complete
Aug  1 23:19:11 lithium sd 5:0:0:0: [sda] Sense Key : 0x0 [current] 
Aug  1 23:19:11 lithium sd 5:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Aug  1 23:19:11 lithium ERROR: (device sda1): dbAllocNext: Corrupt dmap page
Aug  1 23:19:11 lithium ERROR: (device sda1): dbAllocNext: Corrupt dmap page
Aug  1 23:19:20 lithium sd 5:0:0:0: [sda] Sense Key : 0x0 [current] 
Aug  1 23:19:20 lithium sd 5:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Aug  1 23:19:20 lithium sd 5:0:0:0: [sda] Sense Key : 0x0 [current] 
Aug  1 23:19:20 lithium sd 5:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Aug  1 23:19:20 lithium sd 5:0:0:0: [sda] Sense Key : 0x0 [current] 
Aug  1 23:19:20 lithium sd 5:0:0:0: [sda] ASC=0x0 ASCQ=0x0
Aug  1 23:20:01 lithium cron[13504]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )
Aug  1 23:20:34 lithium usb 1-3.4.3: USB disconnect, address 8

:-(
-- 
http://www.munted.org.uk

Fearsome grindings.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
  2008-08-01 22:22                   ` Alex Buell
@ 2008-08-02  2:32                     ` Robert Hancock
  2008-08-02 23:49                     ` Alan Stern
  1 sibling, 0 replies; 39+ messages in thread
From: Robert Hancock @ 2008-08-02  2:32 UTC (permalink / raw)
  To: Alex Buell; +Cc: linux-kernel, Alan Stern, Tomas Styblo, linux-usb, usb-storage

Alex Buell wrote:
> On Fri, 1 Aug 2008 22:15:04 +0100, I waved a wand and this message
> magically appears in front of Alex Buell:
> 
>> I've done a lot of mkfs and fsck, haven't seen any problems so far.
>> Fingers crossed that this patch works. 
> 
> Spoke too soon, that patch that was posted doesn't work for me. Here's
> my log:
> 
> Aug  1 23:18:08 lithium scsi 5:0:0:0: Direct-Access     SAMSUNG  SP1213C          0-30 PQ: 0 ANSI: 2 CCS
> Aug  1 23:18:08 lithium sd 5:0:0:0: [sda] 234493056 512-byte hardware sectors (120060 MB)
> Aug  1 23:18:08 lithium sd 5:0:0:0: [sda] Write Protect is off
> Aug  1 23:18:08 lithium sd 5:0:0:0: [sda] Mode Sense: 00 38 00 00
> Aug  1 23:18:08 lithium sd 5:0:0:0: [sda] Assuming drive cache: write through
> Aug  1 23:18:08 lithium sd 5:0:0:0: [sda] 234493056 512-byte hardware sectors (120060 MB)
> Aug  1 23:18:08 lithium sd 5:0:0:0: [sda] Write Protect is off
> Aug  1 23:18:08 lithium sd 5:0:0:0: [sda] Mode Sense: 00 38 00 00
> Aug  1 23:18:08 lithium sd 5:0:0:0: [sda] Assuming drive cache: write through
> Aug  1 23:18:08 lithium sda: sda1
> Aug  1 23:18:08 lithium sd 5:0:0:0: [sda] Attached SCSI disk
> Aug  1 23:18:08 lithium sd 5:0:0:0: Attached scsi generic sg1 type 0
> Aug  1 23:18:08 lithium usb-storage: device scan complete
> Aug  1 23:19:11 lithium sd 5:0:0:0: [sda] Sense Key : 0x0 [current] 
> Aug  1 23:19:11 lithium sd 5:0:0:0: [sda] ASC=0x0 ASCQ=0x0

These patches (mine or Alan's) won't stop those messages from showing 
up. The request should get retried, though.

> Aug  1 23:19:11 lithium ERROR: (device sda1): dbAllocNext: Corrupt dmap page
> Aug  1 23:19:11 lithium ERROR: (device sda1): dbAllocNext: Corrupt dmap page

Not sure what is producing these messages?

> Aug  1 23:19:20 lithium sd 5:0:0:0: [sda] Sense Key : 0x0 [current] 
> Aug  1 23:19:20 lithium sd 5:0:0:0: [sda] ASC=0x0 ASCQ=0x0
> Aug  1 23:19:20 lithium sd 5:0:0:0: [sda] Sense Key : 0x0 [current] 
> Aug  1 23:19:20 lithium sd 5:0:0:0: [sda] ASC=0x0 ASCQ=0x0
> Aug  1 23:19:20 lithium sd 5:0:0:0: [sda] Sense Key : 0x0 [current] 
> Aug  1 23:19:20 lithium sd 5:0:0:0: [sda] ASC=0x0 ASCQ=0x0
> Aug  1 23:20:01 lithium cron[13504]: (root) CMD (test -x /usr/sbin/run-crons && /usr/sbin/run-crons )
> Aug  1 23:20:34 lithium usb 1-3.4.3: USB disconnect, address 8
> 
> :-(

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
  2008-08-01 22:22                   ` Alex Buell
  2008-08-02  2:32                     ` Robert Hancock
@ 2008-08-02 23:49                     ` Alan Stern
  2008-08-03  9:07                       ` Alex Buell
  1 sibling, 1 reply; 39+ messages in thread
From: Alan Stern @ 2008-08-02 23:49 UTC (permalink / raw)
  To: Alex Buell
  Cc: linux-kernel, Robert Hancock, Tomas Styblo, linux-usb, usb-storage

On Fri, 1 Aug 2008, Alex Buell wrote:

> On Fri, 1 Aug 2008 22:15:04 +0100, I waved a wand and this message
> magically appears in front of Alex Buell:
> 
> > I've done a lot of mkfs and fsck, haven't seen any problems so far.
> > Fingers crossed that this patch works. 
> 
> Spoke too soon, that patch that was posted doesn't work for me. Here's
> my log:

Which patch?  Several different ones have been posted.

The log doesn't provide much help.  We need to see a log with 
CONFIG_USB_STORAGE_DEBUG enabled.

Alan Stern


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
  2008-08-02 23:49                     ` Alan Stern
@ 2008-08-03  9:07                       ` Alex Buell
  2008-08-04 16:48                         ` Alan Stern
  0 siblings, 1 reply; 39+ messages in thread
From: Alex Buell @ 2008-08-03  9:07 UTC (permalink / raw)
  To: Alan Stern
  Cc: linux-kernel, Robert Hancock, Tomas Styblo, linux-usb, usb-storage

On Sat, 2 Aug 2008 19:49:22 -0400 (EDT), I waved a wand and this
message magically appears in front of Alan Stern:

> > > I've done a lot of mkfs and fsck, haven't seen any problems so
> > > far. Fingers crossed that this patch works. 
> > 
> > Spoke too soon, that patch that was posted doesn't work for me.
> > Here's my log:
> 
> Which patch?  Several different ones have been posted.

This is the one that I applied:
Index: usb-2.6/drivers/usb/storage/transport.c
===================================================================
--- usb-2.6.orig/drivers/usb/storage/transport.c
+++ usb-2.6/drivers/usb/storage/transport.c
@@ -663,7 +663,7 @@ void usb_stor_invoke_transport(struct sc
 	}
 
 	/* Did we transfer less than the minimum amount required? */
-	if (srb->result == SAM_STAT_GOOD &&
+	if ((srb->result == SAM_STAT_GOOD || srb->sense_buffer[2] == 0) && scsi_bufflen(srb) - scsi_get_resid(srb) < srb->underflow)
 		srb->result = (DID_ERROR << 16) | (SUGGEST_RETRY << 24); 

> The log doesn't provide much help.  We need to see a log with 
> CONFIG_USB_STORAGE_DEBUG enabled.

Which patch should I be trying out? 

Thanks,
Alex
-- 
http://www.munted.org.uk

Fearsome grindings.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
  2008-08-03  9:07                       ` Alex Buell
@ 2008-08-04 16:48                         ` Alan Stern
  2008-08-04 20:17                           ` Alex Buell
  0 siblings, 1 reply; 39+ messages in thread
From: Alan Stern @ 2008-08-04 16:48 UTC (permalink / raw)
  To: Alex Buell
  Cc: linux-kernel, Robert Hancock, Tomas Styblo, linux-usb, usb-storage

On Sun, 3 Aug 2008, Alex Buell wrote:

> > Which patch?  Several different ones have been posted.
> 
> This is the one that I applied:
> Index: usb-2.6/drivers/usb/storage/transport.c
> ===================================================================
> --- usb-2.6.orig/drivers/usb/storage/transport.c
> +++ usb-2.6/drivers/usb/storage/transport.c
> @@ -663,7 +663,7 @@ void usb_stor_invoke_transport(struct sc
>  	}
>  
>  	/* Did we transfer less than the minimum amount required? */
> -	if (srb->result == SAM_STAT_GOOD &&
> +	if ((srb->result == SAM_STAT_GOOD || srb->sense_buffer[2] == 0) && scsi_bufflen(srb) - scsi_get_resid(srb) < srb->underflow)
>  		srb->result = (DID_ERROR << 16) | (SUGGEST_RETRY << 24); 
> 
> > The log doesn't provide much help.  We need to see a log with 
> > CONFIG_USB_STORAGE_DEBUG enabled.
> 
> Which patch should I be trying out? 

That's the right patch.  You should use it with 2.6.27-rc1.

I just got one of those JMicron thingies; I'll try it out later today.  
No SATA devices available for testing, though, only PATA.

Alan Stern


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
  2008-08-04 16:48                         ` Alan Stern
@ 2008-08-04 20:17                           ` Alex Buell
  2008-08-04 20:45                             ` Alan Stern
  0 siblings, 1 reply; 39+ messages in thread
From: Alex Buell @ 2008-08-04 20:17 UTC (permalink / raw)
  To: Alan Stern
  Cc: linux-kernel, Robert Hancock, Tomas Styblo, linux-usb, usb-storage

On Mon, 4 Aug 2008 12:48:09 -0400 (EDT), I waved a wand and this
message magically appears in front of Alan Stern:

> > Which patch should I be trying out? 
> 
> That's the right patch.  You should use it with 2.6.27-rc1.

I used that with 2.6.25, works perfectly with PATA devices. I think
maybe the SATA device I have is bad, I have another one I'll try out
later tonight. 
-- 
http://www.munted.org.uk

Fearsome grindings.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
  2008-08-04 20:17                           ` Alex Buell
@ 2008-08-04 20:45                             ` Alan Stern
  2008-09-02 12:10                               ` Thiago Galesi
  0 siblings, 1 reply; 39+ messages in thread
From: Alan Stern @ 2008-08-04 20:45 UTC (permalink / raw)
  To: Alex Buell
  Cc: linux-kernel, Robert Hancock, Tomas Styblo, linux-usb, usb-storage

On Mon, 4 Aug 2008, Alex Buell wrote:

> On Mon, 4 Aug 2008 12:48:09 -0400 (EDT), I waved a wand and this
> message magically appears in front of Alan Stern:
> 
> > > Which patch should I be trying out? 
> > 
> > That's the right patch.  You should use it with 2.6.27-rc1.
> 
> I used that with 2.6.25, works perfectly with PATA devices. I think
> maybe the SATA device I have is bad, I have another one I'll try out
> later tonight. 

I meant what I said about 2.6.27-rc1.  There have been other changes 
which will affect the patch's operation.

Alan Stern


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
  2008-08-04 20:45                             ` Alan Stern
@ 2008-09-02 12:10                               ` Thiago Galesi
  2008-09-02 15:02                                 ` Alan Stern
  0 siblings, 1 reply; 39+ messages in thread
From: Thiago Galesi @ 2008-09-02 12:10 UTC (permalink / raw)
  To: Alan Stern
  Cc: Alex Buell, linux-kernel, Robert Hancock, Tomas Styblo,
	linux-usb, usb-storage

Hello

Does anyone knows of problems concerning 152d:2338 attached to DVD recorders??

I have one of those here connected to a DVD-RW, it works fine for
reading, but I can't make it work for recording. K3B produce errors
while probing, newer cdrecords go as far as burning a CD halfway.

Here's a bug I opened a couple of months ago
https://bugzilla.redhat.com/show_bug.cgi?id=440289

(It really doesn't seem to matter which kernel I'm using - fedora,
ubuntu or plain vanilla)

I'll take a closer look at these patches here (and in kernel bug
9638), see if they make any difference

-- 
-
Thiago Galesi

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
  2008-09-02 12:10                               ` Thiago Galesi
@ 2008-09-02 15:02                                 ` Alan Stern
  2008-09-02 16:07                                   ` Thiago Galesi
  0 siblings, 1 reply; 39+ messages in thread
From: Alan Stern @ 2008-09-02 15:02 UTC (permalink / raw)
  To: Thiago Galesi
  Cc: Alex Buell, linux-kernel, Robert Hancock, Tomas Styblo,
	linux-usb, usb-storage

On Tue, 2 Sep 2008, Thiago Galesi wrote:

> Hello
> 
> Does anyone knows of problems concerning 152d:2338 attached to DVD recorders??
> 
> I have one of those here connected to a DVD-RW, it works fine for
> reading, but I can't make it work for recording. K3B produce errors
> while probing, newer cdrecords go as far as burning a CD halfway.
> 
> Here's a bug I opened a couple of months ago
> https://bugzilla.redhat.com/show_bug.cgi?id=440289
> 
> (It really doesn't seem to matter which kernel I'm using - fedora,
> ubuntu or plain vanilla)
> 
> I'll take a closer look at these patches here (and in kernel bug
> 9638), see if they make any difference

If nothing works, you should try usbmon (see 
Documentation/usb/usbmon.txt).

Alan Stern


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
  2008-09-02 15:02                                 ` Alan Stern
@ 2008-09-02 16:07                                   ` Thiago Galesi
  2008-09-02 19:19                                     ` Alan Stern
  0 siblings, 1 reply; 39+ messages in thread
From: Thiago Galesi @ 2008-09-02 16:07 UTC (permalink / raw)
  To: Alan Stern
  Cc: Alex Buell, linux-kernel, Robert Hancock, Tomas Styblo,
	linux-usb, usb-storage

>
> If nothing works, you should try usbmon (see
> Documentation/usb/usbmon.txt).
>
> Alan Stern

Ok, I've created a USBMON log, here it is here:
http://duskblue.org/jmicron_log_k3b.log

(I guess it is a little too big - 70k -  to be placed in the message directly)

Using k3b, since it seems it is the program which causes more errors.
DVD is completely unusable after I did this, dmesg has sr 4:0:0:0:
rejecting I/O to offline device

If I reload the drivers manually I have the reader back without
rebooting the machine.

-- 
-
Thiago Galesi

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
  2008-09-02 16:07                                   ` Thiago Galesi
@ 2008-09-02 19:19                                     ` Alan Stern
  2008-09-02 20:16                                       ` Thiago Galesi
  0 siblings, 1 reply; 39+ messages in thread
From: Alan Stern @ 2008-09-02 19:19 UTC (permalink / raw)
  To: Thiago Galesi
  Cc: Alex Buell, linux-kernel, Robert Hancock, Tomas Styblo,
	linux-usb, usb-storage

On Tue, 2 Sep 2008, Thiago Galesi wrote:

> Ok, I've created a USBMON log, here it is here:
> http://duskblue.org/jmicron_log_k3b.log
> 
> (I guess it is a little too big - 70k -  to be placed in the message directly)
> 
> Using k3b, since it seems it is the program which causes more errors.
> DVD is completely unusable after I did this, dmesg has sr 4:0:0:0:
> rejecting I/O to offline device
> 
> If I reload the drivers manually I have the reader back without
> rebooting the machine.

I can't tell exactly what's happening, but your device is sending back 
a lot of error messages (Not Ready - Medium not present).  Maybe it 
doesn't recognize the type of disc you have loaded.

Also, there appears to be some program querying the state of your
device 7 or 8 times every two seconds.  I can't tell what that is; it 
might be hal or it might be your k3b program.

However what finally caused the failure was a command (0xAC) which I 
don't recognize and apparently your drive doesn't recognize either.

Alan Stern


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
  2008-09-02 19:19                                     ` Alan Stern
@ 2008-09-02 20:16                                       ` Thiago Galesi
  2008-09-02 21:06                                         ` Alan Stern
  0 siblings, 1 reply; 39+ messages in thread
From: Thiago Galesi @ 2008-09-02 20:16 UTC (permalink / raw)
  To: Alan Stern
  Cc: Alex Buell, linux-kernel, Robert Hancock, Tomas Styblo,
	linux-usb, usb-storage

Hello

>
> I can't tell exactly what's happening, but your device is sending back
> a lot of error messages (Not Ready - Medium not present).  Maybe it
> doesn't recognize the type of disc you have loaded. Also, there appears to be some program querying the state of your
> device 7 or 8 times every two seconds.

Ok, I suspect what this may be, and I'll disable it, hopefully
producing a cleaner log.

> However what finally caused the failure was a command (0xAC) which I
> don't recognize and apparently your drive doesn't recognize either.

Is this it? dfbb5d80 1845367929 C Bi:6:002:1 0 13 = 55534253 acc90100
00000000 00

Any idea who may be sending this? I mean,  can this be send via a
simple IOCTL to /dev/sr0 or it is something else?
Maybe stracing the program would be a good idea.

If I were to place a trace inside a driver, were would it be better?
sr_mod, sd_mod, usb-storage, cdrom? or somewhere else?

Thank you

-- 
-
Thiago Galesi

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
  2008-09-02 20:16                                       ` Thiago Galesi
@ 2008-09-02 21:06                                         ` Alan Stern
  2008-09-04 12:09                                           ` Thiago Galesi
  0 siblings, 1 reply; 39+ messages in thread
From: Alan Stern @ 2008-09-02 21:06 UTC (permalink / raw)
  To: Thiago Galesi
  Cc: Alex Buell, linux-kernel, Robert Hancock, Tomas Styblo,
	linux-usb, usb-storage

On Tue, 2 Sep 2008, Thiago Galesi wrote:

> > However what finally caused the failure was a command (0xAC) which I
> > don't recognize and apparently your drive doesn't recognize either.
> 
> Is this it? dfbb5d80 1845367929 C Bi:6:002:1 0 13 = 55534253 acc90100
> 00000000 00

No, it was this:

dfbb5d80 1845397954 S Bo:6:002:2 -115 31 = 55534243 dac90100 08000000 80000cac 00000000 00000000 01030000 000000

The initial sign of a problem is the -104 error code about five lines 
farther down in the log.

The odd thing is that the device did give an 8-byte response to that 
command (as it was supposed to) but then apparently crashed and didn't 
provide any status.  After that everything failed.

I don't know; maybe it was just a coincidence that the failure occurred 
the first time this command was used.

> Any idea who may be sending this? I mean,  can this be send via a
> simple IOCTL to /dev/sr0 or it is something else?
> Maybe stracing the program would be a good idea.

Maybe.  If you were trying to write a DVD during the test then almost 
certainly the command was sent by an IOCTL like you said.  You could 
probably figure out what the command was supposed to do by looking 
through the source code for the program.

> If I were to place a trace inside a driver, were would it be better?
> sr_mod, sd_mod, usb-storage, cdrom? or somewhere else?

It depends on what you want to trace.  Although in this case it's safe 
to ignore sd_mod since your device uses sr_mod.

Alan Stern


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
  2008-09-02 21:06                                         ` Alan Stern
@ 2008-09-04 12:09                                           ` Thiago Galesi
  2008-09-04 14:03                                             ` Alan Stern
  0 siblings, 1 reply; 39+ messages in thread
From: Thiago Galesi @ 2008-09-04 12:09 UTC (permalink / raw)
  To: Alan Stern
  Cc: Alex Buell, linux-kernel, Robert Hancock, Tomas Styblo,
	linux-usb, usb-storage

Ok, I looked into this a little bit further...

K3B provides an interesting log:

first, this is repeated 3 times:

(K3bDevice::ScsiCommand) failed:
                           command:    MODE SELECT (55)
                           errorcode:  70
                           sense key:  ILLEGAL REQUEST (5)
                           asc:        26
                           ascq:       0

then it shows:

(K3bDevice::ScsiCommand) failed:
                           command:    GET PERFORMANCE (ac)
                           errorcode:  0
                           sense key:  NO SENSE (2)
                           asc:        0
                           ascq:       0
(K3bDevice::Device) /dev/scd0: GET PERFORMANCE length det failed.

And then all commands from then on fail

strace indicate this is sent via SG_IO IOCTL (unfortunatelly, strace
doen't show the contents in the ioctl)

I don't know if the device is supposed to stop working when receiving
an unkown command (like it is apparently doing), also because
apparently cdrecord doesn't send this command but fails anyway

-- 
-
Thiago Galesi

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
  2008-09-04 12:09                                           ` Thiago Galesi
@ 2008-09-04 14:03                                             ` Alan Stern
  2008-09-04 15:17                                               ` Thiago Galesi
  0 siblings, 1 reply; 39+ messages in thread
From: Alan Stern @ 2008-09-04 14:03 UTC (permalink / raw)
  To: Thiago Galesi
  Cc: Alex Buell, linux-kernel, Robert Hancock, Tomas Styblo,
	linux-usb, usb-storage

On Thu, 4 Sep 2008, Thiago Galesi wrote:

> Ok, I looked into this a little bit further...
> 
> K3B provides an interesting log:
> 
> first, this is repeated 3 times:
> 
> (K3bDevice::ScsiCommand) failed:
>                            command:    MODE SELECT (55)
>                            errorcode:  70
>                            sense key:  ILLEGAL REQUEST (5)
>                            asc:        26
>                            ascq:       0
> 
> then it shows:
> 
> (K3bDevice::ScsiCommand) failed:
>                            command:    GET PERFORMANCE (ac)
>                            errorcode:  0
>                            sense key:  NO SENSE (2)
>                            asc:        0
>                            ascq:       0
> (K3bDevice::Device) /dev/scd0: GET PERFORMANCE length det failed.
> 
> And then all commands from then on fail
> 
> strace indicate this is sent via SG_IO IOCTL (unfortunatelly, strace
> doen't show the contents in the ioctl)
> 
> I don't know if the device is supposed to stop working when receiving
> an unkown command (like it is apparently doing), also because
> apparently cdrecord doesn't send this command but fails anyway

Who would ever design a device that was supposed to stop working when 
it received an unknown command?  Obviously there's a bug either in the 
device or in the JMicron adapter.

What commands does cdrecord send and where does it fail?

Oh yes, one other thing...  Have you tried hooking this drive directly 
to an IDE cable (avoiding the USB-ATA adapter) to see if it works okay 
in that environment?  If it does then you know where the bug is.

Alan Stern


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
  2008-09-04 14:03                                             ` Alan Stern
@ 2008-09-04 15:17                                               ` Thiago Galesi
  2008-09-04 15:26                                                 ` Alan Stern
  0 siblings, 1 reply; 39+ messages in thread
From: Thiago Galesi @ 2008-09-04 15:17 UTC (permalink / raw)
  To: Alan Stern
  Cc: Alex Buell, linux-kernel, Robert Hancock, Tomas Styblo,
	linux-usb, usb-storage

>
> Who would ever design a device that was supposed to stop working when
> it received an unknown command?  Obviously there's a bug either in the
> device or in the JMicron adapter.

I agree :)

> What commands does cdrecord send and where does it fail?
>

If I remember correctly it was a problem during the actual recording.
I'll redo that, but it will probably mean another coaster in my desk ;)

> Oh yes, one other thing...  Have you tried hooking this drive directly
> to an IDE cable (avoiding the USB-ATA adapter) to see if it works okay
> in that environment?  If it does then you know where the bug is.

I'll try that. My bet is on the USB converter, since reloading the
entire usb modules chain makes it work again
Problem is I'm not using this drive as an 'external drive', rather,
the jmicron device is in the motherboard, providing a 'native IDE
port' in the MB (I know, I know)

Needless to say this works flawlessly in that other proprietary OS...

-- 
-
Thiago Galesi

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
  2008-09-04 15:17                                               ` Thiago Galesi
@ 2008-09-04 15:26                                                 ` Alan Stern
  0 siblings, 0 replies; 39+ messages in thread
From: Alan Stern @ 2008-09-04 15:26 UTC (permalink / raw)
  To: Thiago Galesi
  Cc: Alex Buell, linux-kernel, Robert Hancock, Tomas Styblo,
	linux-usb, usb-storage

On Thu, 4 Sep 2008, Thiago Galesi wrote:

> > Oh yes, one other thing...  Have you tried hooking this drive directly
> > to an IDE cable (avoiding the USB-ATA adapter) to see if it works okay
> > in that environment?  If it does then you know where the bug is.
> 
> I'll try that. My bet is on the USB converter, since reloading the
> entire usb modules chain makes it work again
> Problem is I'm not using this drive as an 'external drive', rather,
> the jmicron device is in the motherboard, providing a 'native IDE
> port' in the MB (I know, I know)
> 
> Needless to say this works flawlessly in that other proprietary OS...

You could try using SnoopyPro to monitor the commands Windows sends to
the drive.

Alan Stern


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338
@ 2008-07-20  5:13 Tomas Styblo
  0 siblings, 0 replies; 39+ messages in thread
From: Tomas Styblo @ 2008-07-20  5:13 UTC (permalink / raw)
  To: linux-kernel


[-- Attachment #1.1: Type: text/plain, Size: 1457 bytes --]



Hello,

this message includes a patch that provides a workaround for
a silent data corruption bug caused by incorrect error handling in
the JMicron JM20337 Hi-Speed USB to SATA & PATA Combo Bridge chipset,
USB device id 152d:2338.


- the problem occurs quite rarely, approx. once for 
  every 20 GB of transfered data during heavy load

- it seems that only read operations are affected

- the problem is accompanied by these messages in syslog each
  time it occurs:

May 17 15:06:56 kernel: sd 6:0:0:0: [sdb] Sense Key : 0x0 [current] 
May 17 15:06:56 kernel: sd 6:0:0:0: [sdb] ASC=0x0 ASCQ=0x0

- the bug is not detected as an error and incorrect data is returned, 
  causing insidious data corruption

- tested with 3 external disk enclosures (Akasa Integral AK-ENP2SATA-BL) 
  with different disks on different computers, with kernel 2.6.24 and 2.6.25

- the patch provides a crude workaround by detecting the error condition
  and retrying the faulty transfer


The fix needs a review as I don't know much about USB and SCSI.  
It's possible that this approach is wrong and that the problem should
be fixed somewhere else.

There are other problems with this chipset that make it necessary 
to disconnect and power off the enclosure from time to time, but at least
there's no data corruption anymore.


-- 
Tomas Styblo <tripie@cpan.org>
PGP: C97EA4B6 / 817E B8A8 1AFD 3256 3181  A730 85CF 7BEB C97E A4B6

[-- Attachment #1.2: linux-kernel-2.6.25.9-usb-storage-jmicron-JM20337-no-sense-bugfix.patch --]
[-- Type: text/plain, Size: 1239 bytes --]

--- linux-2.6.25.9/drivers/usb/storage/transport.c.orig	2008-06-24 23:09:06.000000000 +0200
+++ linux-2.6.25.9/drivers/usb/storage/transport.c	2008-07-20 05:14:32.000000000 +0200
@@ -661,6 +661,21 @@ void usb_stor_invoke_transport(struct sc
 			srb->result = SAM_STAT_GOOD;
 			srb->sense_buffer[0] = 0x0;
 		}
+
+        /* JMicron JM20337 chipset bug workaround - BEGIN */
+		if (us->pusb_dev->descriptor.idVendor == 0x152d && 
+            us->pusb_dev->descriptor.idProduct == 0x2338 && 
+            result == USB_STOR_TRANSPORT_FAILED &&
+			/* Filemark 0, ignore EOM, ILI 0, no sense */
+				(srb->sense_buffer[2] & 0xaf) == 0 &&
+			/* No ASC or ASCQ */
+				srb->sense_buffer[12] == 0 &&
+				srb->sense_buffer[13] == 0) {
+            printk(KERN_WARNING "USB Storage - Working around the JMicron JM20337 chipset bug (idVendor=%04x, idProduct=%04x, NO_SENSE, ASC=0, ASCQ=0) - retrying the read operation\n", us->pusb_dev->descriptor.idVendor, us->pusb_dev->descriptor.idProduct);
+		    srb->result = (DID_ERROR << 16) | (SUGGEST_RETRY << 24);
+            return;
+        }
+        /* JMicron JM20337 chipset bug workaround - END */
 	}
 
 	/* Did we transfer less than the minimum amount required? */

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2008-09-04 15:26 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <fa.eWQqoO1e4Mdhu8SK5jzMFAI/3NU@ifi.uio.no>
2008-07-21 19:37 ` [PATCH] JMicron JM20337 USB-SATA data corruption bugfix - device 152d:2338 Robert Hancock
2008-07-22  2:37   ` Alan Stern
2008-07-22  9:03     ` Tomas Styblo
2008-07-23  2:38       ` Alan Stern
2008-07-23 23:20         ` Robert Hancock
2008-07-24  3:42           ` Alan Stern
2008-07-30 19:55             ` Robert Hancock
2008-07-30 20:00               ` [usb-storage] " Matthew Dharm
2008-07-30 20:46                 ` Robert Hancock
2008-07-30 21:18                 ` Alan Stern
2008-07-30 21:07               ` Alan Stern
2008-08-01 21:15                 ` Alex Buell
2008-08-01 22:22                   ` Alex Buell
2008-08-02  2:32                     ` Robert Hancock
2008-08-02 23:49                     ` Alan Stern
2008-08-03  9:07                       ` Alex Buell
2008-08-04 16:48                         ` Alan Stern
2008-08-04 20:17                           ` Alex Buell
2008-08-04 20:45                             ` Alan Stern
2008-09-02 12:10                               ` Thiago Galesi
2008-09-02 15:02                                 ` Alan Stern
2008-09-02 16:07                                   ` Thiago Galesi
2008-09-02 19:19                                     ` Alan Stern
2008-09-02 20:16                                       ` Thiago Galesi
2008-09-02 21:06                                         ` Alan Stern
2008-09-04 12:09                                           ` Thiago Galesi
2008-09-04 14:03                                             ` Alan Stern
2008-09-04 15:17                                               ` Thiago Galesi
2008-09-04 15:26                                                 ` Alan Stern
2008-07-25  8:44           ` Tomas Styblo
2008-07-25  8:54             ` Robert Hancock
2008-07-22  5:11   ` Tomas Styblo
2008-07-22  5:31     ` Robert Hancock
2008-07-22  6:11       ` Tomas Styblo
2008-07-22  8:45         ` Robert Hancock
2008-07-24  6:15           ` Alex Buell
2008-07-29 21:09           ` Alex Buell
2008-07-29 22:33             ` [usb-storage] " Matthew Dharm
2008-07-20  5:13 Tomas Styblo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).