linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* IDE Floppy support for IOMEGA Zip Drive broken in 3.16 -> 3.17 transition
@ 2015-06-25 20:58 Sergio Callegari
  2015-06-25 21:31 ` Borislav Petkov
  2015-07-02 16:11 ` IDE Floppy support for IOMEGA Zip Drive broken in 3.16 -> 3.17 transition Ondrej Zary
  0 siblings, 2 replies; 13+ messages in thread
From: Sergio Callegari @ 2015-06-25 20:58 UTC (permalink / raw)
  To: linux-kernel

Hi,

I have an Iomega IDE Zip drive as in:

/dev/sda:

 Model=IOMEGA  ZIP 100       ATAPI       Floppy, FwRev=12.A, SerialNo=
 Config={ SpinMotCtl Removeable nonMagnetic }
 RawCHS=0/0/0, TrkSize=0, SectSize=0, ECCbytes=0
 BuffType=unknown, BuffSize=unknown, MaxMultSect=0
 (maybe): CurCHS=0/0/0, CurSects=0, LBA=yes, LBAsects=0
 IORDY=on/off, tPIO={min:500,w/IORDY:180}
 PIO modes:  pio0 pio1 pio2 pio3 
 AdvancedPM=no

 * signifies the current active mode


ATAPI Direct-access device, with removable media
        Model Number:       IOMEGA  ZIP 100       ATAPI       Floppy
        Serial Number:      
        Firmware Revision:  12.A    
Standards:
        Likely used: 4
Configuration:
        DRQ response: <=10ms with INTRQ
        Packet size: 12 bytes
        cache/buffer size  = unknown
Capabilities:
        LBA, IORDY(can be disabled)
        DMA: not supported
        PIO: pio0 pio1 pio2 pio3 
             Cycle time: no flow control=500ns  IORDY flow control=180ns
                Removable Media Status Notification feature set supported
Security: 
                supported
                enabled
        not     locked
        not     frozen
        not     expired: security count                                    
                                    
                supported: enhanced erase                                  
                                    
        Security level high   

The drive used to work just fine up to 3.16.  After a kernel upgrade, the
drive started giving issues.  Some checks let me be sure that the issue does
not exist with 3.16 and exists with 3.17.

The problem manifests with IOWAIT all of a sudden jumping high (approx 50%)
and staying high, even with the Iomega Zip drive unmounted. When this
happens any activity having to do with the drive hangs (even hdparm -i
/dev/sda). No related message in syslog apart from the indication of hung
processes. Detaching the drive makes the machine behave properly.

What has changed wrt IDE in 3.16->3.17 transition? Why is the issue present
even when the drive is not used (unmounted)?  How can I help diagnosing?



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: IDE Floppy support for IOMEGA Zip Drive broken in 3.16 -> 3.17 transition
  2015-06-25 20:58 IDE Floppy support for IOMEGA Zip Drive broken in 3.16 -> 3.17 transition Sergio Callegari
@ 2015-06-25 21:31 ` Borislav Petkov
  2015-08-18  7:44   ` Sergio Callegari
  2015-07-02 16:11 ` IDE Floppy support for IOMEGA Zip Drive broken in 3.16 -> 3.17 transition Ondrej Zary
  1 sibling, 1 reply; 13+ messages in thread
From: Borislav Petkov @ 2015-06-25 21:31 UTC (permalink / raw)
  To: Sergio Callegari; +Cc: linux-kernel

On Thu, Jun 25, 2015 at 08:58:58PM +0000, Sergio Callegari wrote:
> What has changed wrt IDE in 3.16->3.17 transition? Why is the issue present
> even when the drive is not used (unmounted)?  How can I help diagnosing?

Here's what went into ide in 3.17:

$ git log -p v3.16..v3.17 drivers/ide/ 
commit a53dae49b2fea43d8f4ec5aeca0e288bbc8d6895
Author: Christoph Jaeger <christophjaeger@linux.com>
Date:   Wed Apr 9 09:28:01 2014 +0200

    ide: use module_platform_driver()

    Eliminate boilerplate code by using module_platform_driver().

    Signed-off-by: Christoph Jaeger <christophjaeger@linux.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/drivers/ide/au1xxx-ide.c b/drivers/ide/au1xxx-ide.c
index 259786ca8b75..07ea58084068 100644
--- a/drivers/ide/au1xxx-ide.c
+++ b/drivers/ide/au1xxx-ide.c
...

and that driver doesn't have anything to do with zip drives.

So problem is either somewhere else in the kernel or maybe something's
missing from the 3.17 config or somewhere completely different.

You could bisect the kernels between 3.16 and 3.17 - unless someone has
a better idea...

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: IDE Floppy support for IOMEGA Zip Drive broken in 3.16 -> 3.17 transition
  2015-06-25 20:58 IDE Floppy support for IOMEGA Zip Drive broken in 3.16 -> 3.17 transition Sergio Callegari
  2015-06-25 21:31 ` Borislav Petkov
@ 2015-07-02 16:11 ` Ondrej Zary
  2015-07-03  6:50   ` Sergio Callegari
  1 sibling, 1 reply; 13+ messages in thread
From: Ondrej Zary @ 2015-07-02 16:11 UTC (permalink / raw)
  To: Sergio Callegari; +Cc: linux-kernel

On Thursday 25 June 2015 22:58:58 Sergio Callegari wrote:
> Hi,
>
> I have an Iomega IDE Zip drive as in:
>
> /dev/sda:
>
>  Model=IOMEGA  ZIP 100       ATAPI       Floppy, FwRev=12.A, SerialNo=
>  Config={ SpinMotCtl Removeable nonMagnetic }
>  RawCHS=0/0/0, TrkSize=0, SectSize=0, ECCbytes=0
>  BuffType=unknown, BuffSize=unknown, MaxMultSect=0
>  (maybe): CurCHS=0/0/0, CurSects=0, LBA=yes, LBAsects=0
>  IORDY=on/off, tPIO={min:500,w/IORDY:180}
>  PIO modes:  pio0 pio1 pio2 pio3
>  AdvancedPM=no
>
>  * signifies the current active mode
>
>
> ATAPI Direct-access device, with removable media
>         Model Number:       IOMEGA  ZIP 100       ATAPI       Floppy
>         Serial Number:
>         Firmware Revision:  12.A
> Standards:
>         Likely used: 4
> Configuration:
>         DRQ response: <=10ms with INTRQ
>         Packet size: 12 bytes
>         cache/buffer size  = unknown
> Capabilities:
>         LBA, IORDY(can be disabled)
>         DMA: not supported
>         PIO: pio0 pio1 pio2 pio3
>              Cycle time: no flow control=500ns  IORDY flow control=180ns
>                 Removable Media Status Notification feature set supported
> Security:
>                 supported
>                 enabled
>         not     locked
>         not     frozen
>         not     expired: security count
>
>                 supported: enhanced erase
>
>         Security level high
>
> The drive used to work just fine up to 3.16.  After a kernel upgrade, the
> drive started giving issues.  Some checks let me be sure that the issue
> does not exist with 3.16 and exists with 3.17.
>
> The problem manifests with IOWAIT all of a sudden jumping high (approx 50%)
> and staying high, even with the Iomega Zip drive unmounted. When this
> happens any activity having to do with the drive hangs (even hdparm -i
> /dev/sda). No related message in syslog apart from the indication of hung
> processes. Detaching the drive makes the machine behave properly.
>
> What has changed wrt IDE in 3.16->3.17 transition? Why is the issue present
> even when the drive is not used (unmounted)?  How can I help diagnosing?

You're probably using libata and not the old IDE layer.

Just tested this one:

 Model=IOMEGA  ZIP 100       ATAPI, FwRev=03.H, SerialNo=
 Config={ SpinMotCtl Removeable nonMagnetic }
 RawCHS=0/0/0, TrkSize=0, SectSize=0, ECCbytes=0
 BuffType=unknown, BuffSize=unknown, MaxMultSect=0
 (maybe): CurCHS=0/0/0, CurSects=0, LBA=yes, LBAsects=0
 IORDY=on/off, tPIO={min:500,w/IORDY:180}, tDMA={min:150,rec:150}
 PIO modes:  pio0 pio1 pio2 pio3
 DMA modes:  mdma0 *mdma1
 AdvancedPM=no

 * signifies the current active mode

in 4.0.0-next-20150415 and it seems to work fine. Mounted, unmounted, read 
complete disk using dd without any problems.

-- 
Ondrej Zary

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: IDE Floppy support for IOMEGA Zip Drive broken in 3.16 -> 3.17 transition
  2015-07-02 16:11 ` IDE Floppy support for IOMEGA Zip Drive broken in 3.16 -> 3.17 transition Ondrej Zary
@ 2015-07-03  6:50   ` Sergio Callegari
  0 siblings, 0 replies; 13+ messages in thread
From: Sergio Callegari @ 2015-07-03  6:50 UTC (permalink / raw)
  To: Ondrej Zary; +Cc: linux-kernel

On 02/07/2015 18:11, Ondrej Zary wrote:
> You're probably using libata and not the old IDE layer.
>
> Just tested this one:
>
>   Model=IOMEGA  ZIP 100       ATAPI, FwRev=03.H, SerialNo=
>   Config={ SpinMotCtl Removeable nonMagnetic }
>   RawCHS=0/0/0, TrkSize=0, SectSize=0, ECCbytes=0
>   BuffType=unknown, BuffSize=unknown, MaxMultSect=0
>   (maybe): CurCHS=0/0/0, CurSects=0, LBA=yes, LBAsects=0
>   IORDY=on/off, tPIO={min:500,w/IORDY:180}, tDMA={min:150,rec:150}
>   PIO modes:  pio0 pio1 pio2 pio3
>   DMA modes:  mdma0 *mdma1
>   AdvancedPM=no
>
>   * signifies the current active mode
>
> in 4.0.0-next-20150415 and it seems to work fine. Mounted, unmounted, read
> complete disk using dd without any problems.
>
Also mine 'seems' to work fine.  I can boot the machine, mount, unmount, read 
the disk without problems *for some time*.

The issue is that, if I boot *with no disk in the drive* after about 2-3 hours I 
use the machine the iowait jumps high, the kernel starts to complain about hung 
processes, any command related to the zip drive starts hanging, the machine 
cannot be shut down cleanly, any kernel install process starts to hang at the 
ramdisk generation or grub install.
If I detach the drive or downgrade to 3.16, all is fine.

Clearly, the items above make 'uninformed' bisecting quite hard, particularly 
because it is unclear to me at this point if the machine load has any influence 
on triggering the issue.

Now, it is obvious that one can live without a zip drive today, but if this is 
the case, it would be better to blacklist it from the kernel altogether, because 
something has surely got wrong in the 3.16->3.17 transition and there is a 
serious regression with regard to this now rarely used piece of hardware. Also I 
wonder if there is around some other "ide-floppy" device that may be affected.

Regards,

Sergio


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: IDE Floppy support for IOMEGA Zip Drive broken in 3.16 -> 3.17 transition
  2015-06-25 21:31 ` Borislav Petkov
@ 2015-08-18  7:44   ` Sergio Callegari
  2015-08-20  8:08     ` Christoph Hellwig
  0 siblings, 1 reply; 13+ messages in thread
From: Sergio Callegari @ 2015-08-18  7:44 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: hch, linux-kernel

Hi,

I have bisected the issue down to

[045065d8a300a37218c548e9aa7becd581c6a0e8] [SCSI] fix qemu boot hang problem

Bisecting has been a painful job due to the fact that the bug may show only many 
hours after the system boot.

The commit above in fact is not the culprit, but a fix to an issue that was 
hiding the real bug on my system.  See

http://marc.info/?l=linux-kernel&m=143973820612978&w=2

The real issue is with sata host lock and seems to be biting a few other people 
as well

https://bbs.archlinux.org/viewtopic.php?id=189324

A patch fixing the issue was sent to the LKML back in Nov 2014 by Christoph 
Hellwig (who is reading in CC)

https://lkml.org/lkml/2014/11/20/581

I have tested the patch and it works for me.

What is expected to happen now?

The original patch appeared to have been dropped and did not make it to the 
mainline kernel (is there any reason why, some contraindication?), but seems to 
be important because without it the linux kernel is broken on some systems.

Is there anything that can be done for the patch be accepted in mainline and 
stabilization kernels?

Best regards,

Sergio



On 25/06/2015 23:31, Borislav Petkov wrote:
> On Thu, Jun 25, 2015 at 08:58:58PM +0000, Sergio Callegari wrote:
>> What has changed wrt IDE in 3.16->3.17 transition? Why is the issue present
>> even when the drive is not used (unmounted)?  How can I help diagnosing?
>
> Here's what went into ide in 3.17:
>
> $ git log -p v3.16..v3.17 drivers/ide/
> commit a53dae49b2fea43d8f4ec5aeca0e288bbc8d6895
> Author: Christoph Jaeger <christophjaeger@linux.com>
> Date:   Wed Apr 9 09:28:01 2014 +0200
>
>      ide: use module_platform_driver()
>
>      Eliminate boilerplate code by using module_platform_driver().
>
>      Signed-off-by: Christoph Jaeger <christophjaeger@linux.com>
>      Signed-off-by: David S. Miller <davem@davemloft.net>
>
> diff --git a/drivers/ide/au1xxx-ide.c b/drivers/ide/au1xxx-ide.c
> index 259786ca8b75..07ea58084068 100644
> --- a/drivers/ide/au1xxx-ide.c
> +++ b/drivers/ide/au1xxx-ide.c
> ...
>
> and that driver doesn't have anything to do with zip drives.
>
> So problem is either somewhere else in the kernel or maybe something's
> missing from the 3.17 config or somewhere completely different.
>
> You could bisect the kernels between 3.16 and 3.17 - unless someone has
> a better idea...
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: IDE Floppy support for IOMEGA Zip Drive broken in 3.16 -> 3.17 transition
  2015-08-18  7:44   ` Sergio Callegari
@ 2015-08-20  8:08     ` Christoph Hellwig
  2015-08-20 15:08       ` Sergio Callegari
  2015-08-24 17:07       ` "scsi: convert host_busy to atomic_t" series causes regressions for some hardware configurations Sergio Callegari
  0 siblings, 2 replies; 13+ messages in thread
From: Christoph Hellwig @ 2015-08-20  8:08 UTC (permalink / raw)
  To: Sergio Callegari; +Cc: Borislav Petkov, linux-kernel

Hi Sergio,

On Tue, Aug 18, 2015 at 09:44:28AM +0200, Sergio Callegari wrote:
> Hi,
> 
> I have bisected the issue down to
> 
> [045065d8a300a37218c548e9aa7becd581c6a0e8] [SCSI] fix qemu boot hang problem
> 
> Bisecting has been a painful job due to the fact that the bug may show only
> many hours after the system boot.
> 
> The commit above in fact is not the culprit, but a fix to an issue that was
> hiding the real bug on my system.  See
> 
> http://marc.info/?l=linux-kernel&m=143973820612978&w=2
> 
> The real issue is with sata host lock and seems to be biting a few other
> people as well
> 
> https://bbs.archlinux.org/viewtopic.php?id=189324
> 
> A patch fixing the issue was sent to the LKML back in Nov 2014 by Christoph
> Hellwig (who is reading in CC)
> 
> https://lkml.org/lkml/2014/11/20/581
> 
> I have tested the patch and it works for me.
> 
> What is expected to happen now?

As mentioned in that thread we need more input from the libata people
on what kind of race this is papering over.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: IDE Floppy support for IOMEGA Zip Drive broken in 3.16 -> 3.17 transition
  2015-08-20  8:08     ` Christoph Hellwig
@ 2015-08-20 15:08       ` Sergio Callegari
  2015-08-24 17:07       ` "scsi: convert host_busy to atomic_t" series causes regressions for some hardware configurations Sergio Callegari
  1 sibling, 0 replies; 13+ messages in thread
From: Sergio Callegari @ 2015-08-20 15:08 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Borislav Petkov, linux-kernel

Hi Christoph,

thanks for your answer!

Any clue on how to help/speed up this? Was there any exchange with them when you 
first provided your patch back in 2014? Can it be restarted?

My feeling is that the issue fixed with 045065d8a300a37218c548e9aa7becd581c6a0e8 
was in fact the real paper over the bug. As distros start picking kernels with 
045065d8a300a37218c548e9aa7becd581c6a0e8 the issue is like to bite more and more.

And it is a subtle bug. In some cases takes ages before revealing, making one 
feel confident on a system that is in fact broken. And takes ages to bisect 
(took me almost 7 days).

Best,

Sergio



On 20/08/2015 10:08, Christoph Hellwig wrote:
> Hi Sergio,
>
> On Tue, Aug 18, 2015 at 09:44:28AM +0200, Sergio Callegari wrote:
>> Hi,
>>
>> I have bisected the issue down to
>>
>> [045065d8a300a37218c548e9aa7becd581c6a0e8] [SCSI] fix qemu boot hang problem
>>
>> Bisecting has been a painful job due to the fact that the bug may show only
>> many hours after the system boot.
>>
>> The commit above in fact is not the culprit, but a fix to an issue that was
>> hiding the real bug on my system.  See
>>
>> http://marc.info/?l=linux-kernel&m=143973820612978&w=2
>>
>> The real issue is with sata host lock and seems to be biting a few other
>> people as well
>>
>> https://bbs.archlinux.org/viewtopic.php?id=189324
>>
>> A patch fixing the issue was sent to the LKML back in Nov 2014 by Christoph
>> Hellwig (who is reading in CC)
>>
>> https://lkml.org/lkml/2014/11/20/581
>>
>> I have tested the patch and it works for me.
>>
>> What is expected to happen now?
> As mentioned in that thread we need more input from the libata people
> on what kind of race this is papering over.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* "scsi: convert host_busy to atomic_t" series causes regressions for some hardware configurations
  2015-08-20  8:08     ` Christoph Hellwig
  2015-08-20 15:08       ` Sergio Callegari
@ 2015-08-24 17:07       ` Sergio Callegari
  2015-08-25 11:00         ` Christoph Hellwig
  1 sibling, 1 reply; 13+ messages in thread
From: Sergio Callegari @ 2015-08-24 17:07 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Borislav Petkov, linux-kernel

Thanks Christoph for the answer!

Apparently I missed a piece of the thread where the test patch was originally 
proposed . Now, I have gone through it and I see how the patch was not meant to 
be a final correction.

My (possibly naive) understanding is that:

- Even if this might be due to hardware that not fully conforms to the standard 
(but we do not know right now), commit 74665016086615bbaa3fa6f83af410a0a4e029ee 
( scsi: convert host_busy to atomic_t ) certainly breaks the kernel for some 
hardware configurations causing a regression.

- If the regression was immediately spotted, the patch would probably have been 
revised right after proposal. Unfortunately, another bug - that got fixed only 
much later with 045065d8a300a37218c - hid the original issue for a long time.

- Now that a lot of time has passed with the "scsi: convert host_busy to 
atomic_t" series in the kernel, going back to look into it is much more 
difficult. Libata people might not be very interested as they moved to other 
topics and might need a lot of time to go through it (it has been known since 
November 2014 - 9 months ago), possibly due to the race like nature of the issue 
and the fact that the bug might not be reproducible on their hardware...

Is this correct?

Aren't commits that cause regressions confirmed by multiple users expected (at 
least in principle) to be reverted?

If reverting is too costy, wouldn't your "papering over" or making the scsi 
delay configurable be an acceptable solution?

Even better: can in some way the libata-people be helped find the real culprit, 
given that there are at least two hardware setups that are known to trigger the 
regression (mine and Barto's)?

I have tried the linux-ide mailing list, but got silence.

Best,

Sergio



On 20/08/2015 10:08, Christoph Hellwig wrote:
> Hi Sergio,
>
> On Tue, Aug 18, 2015 at 09:44:28AM +0200, Sergio Callegari wrote:
>> Hi,
>>
>> I have bisected the issue down to
>>
>> [045065d8a300a37218c548e9aa7becd581c6a0e8] [SCSI] fix qemu boot hang problem
>>
>> Bisecting has been a painful job due to the fact that the bug may show only
>> many hours after the system boot.
>>
>> The commit above in fact is not the culprit, but a fix to an issue that was
>> hiding the real bug on my system.  See
>>
>> http://marc.info/?l=linux-kernel&m=143973820612978&w=2
>>
>> The real issue is with sata host lock and seems to be biting a few other
>> people as well
>>
>> https://bbs.archlinux.org/viewtopic.php?id=189324
>>
>> A patch fixing the issue was sent to the LKML back in Nov 2014 by Christoph
>> Hellwig (who is reading in CC)
>>
>> https://lkml.org/lkml/2014/11/20/581
>>
>> I have tested the patch and it works for me.
>>
>> What is expected to happen now?
> As mentioned in that thread we need more input from the libata people
> on what kind of race this is papering over.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: "scsi: convert host_busy to atomic_t" series causes regressions for some hardware configurations
  2015-08-24 17:07       ` "scsi: convert host_busy to atomic_t" series causes regressions for some hardware configurations Sergio Callegari
@ 2015-08-25 11:00         ` Christoph Hellwig
  2015-08-26  7:14           ` Sergio Callegari
  2015-08-30 10:54           ` Sergio Callegari
  0 siblings, 2 replies; 13+ messages in thread
From: Christoph Hellwig @ 2015-08-25 11:00 UTC (permalink / raw)
  To: Sergio Callegari; +Cc: Borislav Petkov, linux-kernel

Hi Sergio,

can you give the patch below a try?

libata currently completes the SCSI command before freeing the internal
command structure, which could lead to various races that mess with
the ATA command state, which might cause issues like the one you see.

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 641a61a..92cc156 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1772,6 +1772,15 @@ nothing_to_do:
 	return 1;
 }
 
+static void ata_qc_done(struct ata_queued_cmd *qc)
+{
+	struct scsi_cmnd *cmd = qc->scsicmd;
+	void (*done)(struct scsi_cmnd *) = qc->scsidone;
+
+	ata_qc_free(qc);
+	done(cmd);
+}
+
 static void ata_scsi_qc_complete(struct ata_queued_cmd *qc)
 {
 	struct ata_port *ap = qc->ap;
@@ -1810,9 +1819,7 @@ static void ata_scsi_qc_complete(struct ata_queued_cmd *qc)
 	if (need_sense && !ap->ops->error_handler)
 		ata_dump_status(ap->print_id, &qc->result_tf);
 
-	qc->scsidone(cmd);
-
-	ata_qc_free(qc);
+	ata_qc_done(qc);
 }
 
 /**
@@ -2611,8 +2618,7 @@ static void atapi_sense_complete(struct ata_queued_cmd *qc)
 		ata_gen_passthru_sense(qc);
 	}
 
-	qc->scsidone(qc->scsicmd);
-	ata_qc_free(qc);
+	ata_qc_done(qc);
 }
 
 /* is it pointless to prefer PIO for "safety reasons"? */
@@ -2707,8 +2713,7 @@ static void atapi_qc_complete(struct ata_queued_cmd *qc)
 			qc->dev->sdev->locked = 0;
 
 		qc->scsicmd->result = SAM_STAT_CHECK_CONDITION;
-		qc->scsidone(cmd);
-		ata_qc_free(qc);
+		ata_qc_done(qc);
 		return;
 	}
 
@@ -2752,8 +2757,7 @@ static void atapi_qc_complete(struct ata_queued_cmd *qc)
 		cmd->result = SAM_STAT_GOOD;
 	}
 
-	qc->scsidone(cmd);
-	ata_qc_free(qc);
+	ata_qc_done(qc);
 }
 /**
  *	atapi_xlat - Initialize PACKET taskfile

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: "scsi: convert host_busy to atomic_t" series causes regressions for some hardware configurations
  2015-08-25 11:00         ` Christoph Hellwig
@ 2015-08-26  7:14           ` Sergio Callegari
  2015-08-30 10:54           ` Sergio Callegari
  1 sibling, 0 replies; 13+ messages in thread
From: Sergio Callegari @ 2015-08-26  7:14 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Borislav Petkov, linux-kernel

Sure, thanks!

I'll test this weekend.

Best,

Sergio

On 25/08/2015 13:00, Christoph Hellwig wrote:
> Hi Sergio,
>
> can you give the patch below a try?
>
> libata currently completes the SCSI command before freeing the internal
> command structure, which could lead to various races that mess with
> the ATA command state, which might cause issues like the one you see.
>
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index 641a61a..92cc156 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -1772,6 +1772,15 @@ nothing_to_do:
>   	return 1;
>   }
>   
> +static void ata_qc_done(struct ata_queued_cmd *qc)
> +{
> +	struct scsi_cmnd *cmd = qc->scsicmd;
> +	void (*done)(struct scsi_cmnd *) = qc->scsidone;
> +
> +	ata_qc_free(qc);
> +	done(cmd);
> +}
> +
>   static void ata_scsi_qc_complete(struct ata_queued_cmd *qc)
>   {
>   	struct ata_port *ap = qc->ap;
> @@ -1810,9 +1819,7 @@ static void ata_scsi_qc_complete(struct ata_queued_cmd *qc)
>   	if (need_sense && !ap->ops->error_handler)
>   		ata_dump_status(ap->print_id, &qc->result_tf);
>   
> -	qc->scsidone(cmd);
> -
> -	ata_qc_free(qc);
> +	ata_qc_done(qc);
>   }
>   
>   /**
> @@ -2611,8 +2618,7 @@ static void atapi_sense_complete(struct ata_queued_cmd *qc)
>   		ata_gen_passthru_sense(qc);
>   	}
>   
> -	qc->scsidone(qc->scsicmd);
> -	ata_qc_free(qc);
> +	ata_qc_done(qc);
>   }
>   
>   /* is it pointless to prefer PIO for "safety reasons"? */
> @@ -2707,8 +2713,7 @@ static void atapi_qc_complete(struct ata_queued_cmd *qc)
>   			qc->dev->sdev->locked = 0;
>   
>   		qc->scsicmd->result = SAM_STAT_CHECK_CONDITION;
> -		qc->scsidone(cmd);
> -		ata_qc_free(qc);
> +		ata_qc_done(qc);
>   		return;
>   	}
>   
> @@ -2752,8 +2757,7 @@ static void atapi_qc_complete(struct ata_queued_cmd *qc)
>   		cmd->result = SAM_STAT_GOOD;
>   	}
>   
> -	qc->scsidone(cmd);
> -	ata_qc_free(qc);
> +	ata_qc_done(qc);
>   }
>   /**
>    *	atapi_xlat - Initialize PACKET taskfile


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: "scsi: convert host_busy to atomic_t" series causes regressions for some hardware configurations
  2015-08-25 11:00         ` Christoph Hellwig
  2015-08-26  7:14           ` Sergio Callegari
@ 2015-08-30 10:54           ` Sergio Callegari
  2015-09-07  7:31             ` Sergio Callegari
  1 sibling, 1 reply; 13+ messages in thread
From: Sergio Callegari @ 2015-08-30 10:54 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Borislav Petkov, linux-kernel

Hi Christoph,

just checked.

Unfortunately, the patch below, applied on top of Linus' v3.17 (which I 
am using as a test kernel) *does not fix the issue*.

Best regards,

Sergio

On 25/08/2015 13:00, Christoph Hellwig wrote:
> Hi Sergio,
>
> can you give the patch below a try?
>
> libata currently completes the SCSI command before freeing the internal
> command structure, which could lead to various races that mess with
> the ATA command state, which might cause issues like the one you see.
>
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index 641a61a..92cc156 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -1772,6 +1772,15 @@ nothing_to_do:
>   	return 1;
>   }
>   
> +static void ata_qc_done(struct ata_queued_cmd *qc)
> +{
> +	struct scsi_cmnd *cmd = qc->scsicmd;
> +	void (*done)(struct scsi_cmnd *) = qc->scsidone;
> +
> +	ata_qc_free(qc);
> +	done(cmd);
> +}
> +
>   static void ata_scsi_qc_complete(struct ata_queued_cmd *qc)
>   {
>   	struct ata_port *ap = qc->ap;
> @@ -1810,9 +1819,7 @@ static void ata_scsi_qc_complete(struct ata_queued_cmd *qc)
>   	if (need_sense && !ap->ops->error_handler)
>   		ata_dump_status(ap->print_id, &qc->result_tf);
>   
> -	qc->scsidone(cmd);
> -
> -	ata_qc_free(qc);
> +	ata_qc_done(qc);
>   }
>   
>   /**
> @@ -2611,8 +2618,7 @@ static void atapi_sense_complete(struct ata_queued_cmd *qc)
>   		ata_gen_passthru_sense(qc);
>   	}
>   
> -	qc->scsidone(qc->scsicmd);
> -	ata_qc_free(qc);
> +	ata_qc_done(qc);
>   }
>   
>   /* is it pointless to prefer PIO for "safety reasons"? */
> @@ -2707,8 +2713,7 @@ static void atapi_qc_complete(struct ata_queued_cmd *qc)
>   			qc->dev->sdev->locked = 0;
>   
>   		qc->scsicmd->result = SAM_STAT_CHECK_CONDITION;
> -		qc->scsidone(cmd);
> -		ata_qc_free(qc);
> +		ata_qc_done(qc);
>   		return;
>   	}
>   
> @@ -2752,8 +2757,7 @@ static void atapi_qc_complete(struct ata_queued_cmd *qc)
>   		cmd->result = SAM_STAT_GOOD;
>   	}
>   
> -	qc->scsidone(cmd);
> -	ata_qc_free(qc);
> +	ata_qc_done(qc);
>   }
>   /**
>    *	atapi_xlat - Initialize PACKET taskfile


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: "scsi: convert host_busy to atomic_t" series causes regressions for some hardware configurations
  2015-08-30 10:54           ` Sergio Callegari
@ 2015-09-07  7:31             ` Sergio Callegari
  2015-09-07 17:51               ` Christoph Hellwig
  0 siblings, 1 reply; 13+ messages in thread
From: Sergio Callegari @ 2015-09-07  7:31 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Borislav Petkov, linux-kernel

Hi Christoph (and those reading!),

I wonder if there might be any update or, most important, anything else for me 
to test in order to provide info to address this issue...

I really would like to find out whether it is a bug in my hardware (which would 
be OK, as I know already how to work around it) or something that may happen on 
any hardware as soon as an unfortunate combination of storage equipment is adopted.

Thanks for the help so far,

Best regards,

Sergio

On 30/08/2015 12:54, Sergio Callegari wrote:
> Hi Christoph,
>
> just checked.
>
> Unfortunately, the patch below, applied on top of Linus' v3.17 (which I am 
> using as a test kernel) *does not fix the issue*.
>
> Best regards,
>
> Sergio
>
> On 25/08/2015 13:00, Christoph Hellwig wrote:
>> Hi Sergio,
>>
>> can you give the patch below a try?
>>
>> libata currently completes the SCSI command before freeing the internal
>> command structure, which could lead to various races that mess with
>> the ATA command state, which might cause issues like the one you see.
>>
>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
>> index 641a61a..92cc156 100644
>> --- a/drivers/ata/libata-scsi.c
>> +++ b/drivers/ata/libata-scsi.c
>> @@ -1772,6 +1772,15 @@ nothing_to_do:
>>       return 1;
>>   }
>>   +static void ata_qc_done(struct ata_queued_cmd *qc)
>> +{
>> +    struct scsi_cmnd *cmd = qc->scsicmd;
>> +    void (*done)(struct scsi_cmnd *) = qc->scsidone;
>> +
>> +    ata_qc_free(qc);
>> +    done(cmd);
>> +}
>> +
>>   static void ata_scsi_qc_complete(struct ata_queued_cmd *qc)
>>   {
>>       struct ata_port *ap = qc->ap;
>> @@ -1810,9 +1819,7 @@ static void ata_scsi_qc_complete(struct ata_queued_cmd 
>> *qc)
>>       if (need_sense && !ap->ops->error_handler)
>>           ata_dump_status(ap->print_id, &qc->result_tf);
>>   -    qc->scsidone(cmd);
>> -
>> -    ata_qc_free(qc);
>> +    ata_qc_done(qc);
>>   }
>>     /**
>> @@ -2611,8 +2618,7 @@ static void atapi_sense_complete(struct ata_queued_cmd 
>> *qc)
>>           ata_gen_passthru_sense(qc);
>>       }
>>   -    qc->scsidone(qc->scsicmd);
>> -    ata_qc_free(qc);
>> +    ata_qc_done(qc);
>>   }
>>     /* is it pointless to prefer PIO for "safety reasons"? */
>> @@ -2707,8 +2713,7 @@ static void atapi_qc_complete(struct ata_queued_cmd *qc)
>>               qc->dev->sdev->locked = 0;
>>             qc->scsicmd->result = SAM_STAT_CHECK_CONDITION;
>> -        qc->scsidone(cmd);
>> -        ata_qc_free(qc);
>> +        ata_qc_done(qc);
>>           return;
>>       }
>>   @@ -2752,8 +2757,7 @@ static void atapi_qc_complete(struct ata_queued_cmd *qc)
>>           cmd->result = SAM_STAT_GOOD;
>>       }
>>   -    qc->scsidone(cmd);
>> -    ata_qc_free(qc);
>> +    ata_qc_done(qc);
>>   }
>>   /**
>>    *    atapi_xlat - Initialize PACKET taskfile
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: "scsi: convert host_busy to atomic_t" series causes regressions for some hardware configurations
  2015-09-07  7:31             ` Sergio Callegari
@ 2015-09-07 17:51               ` Christoph Hellwig
  0 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2015-09-07 17:51 UTC (permalink / raw)
  To: Sergio Callegari; +Cc: Borislav Petkov, linux-kernel

Hi Sergio,

sorry for the delay.  I'm fighting deadlines at the moment, so even if
this is my highest spare time priority right now I can't find enough
quiet time to dig into the problem.  I'd suggest you resend a report
to linux-ide (with a Cc to linux-scsi) so that the people who know the
libata code can take look.  I strongly suspect the root cause for it is
in libata anyway, which is a code base I need to get up to speed with
first.  

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-09-07 17:51 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-25 20:58 IDE Floppy support for IOMEGA Zip Drive broken in 3.16 -> 3.17 transition Sergio Callegari
2015-06-25 21:31 ` Borislav Petkov
2015-08-18  7:44   ` Sergio Callegari
2015-08-20  8:08     ` Christoph Hellwig
2015-08-20 15:08       ` Sergio Callegari
2015-08-24 17:07       ` "scsi: convert host_busy to atomic_t" series causes regressions for some hardware configurations Sergio Callegari
2015-08-25 11:00         ` Christoph Hellwig
2015-08-26  7:14           ` Sergio Callegari
2015-08-30 10:54           ` Sergio Callegari
2015-09-07  7:31             ` Sergio Callegari
2015-09-07 17:51               ` Christoph Hellwig
2015-07-02 16:11 ` IDE Floppy support for IOMEGA Zip Drive broken in 3.16 -> 3.17 transition Ondrej Zary
2015-07-03  6:50   ` Sergio Callegari

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).