All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ahci: Improve error handling
@ 2018-01-19 13:13 Stefan Fritsch
  2018-01-23 12:00 ` Daniel Kiper
  0 siblings, 1 reply; 4+ messages in thread
From: Stefan Fritsch @ 2018-01-19 13:13 UTC (permalink / raw)
  To: The development of GNU GRUB; +Cc: Daniel Kiper

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3039 bytes --]

From: Stefan Fritsch <fritsch@genua.de>

Check the error bits in the interrupt status register. According to the
AHCI 1.2 spec, "Interrupt sources that are disabled (‘0’) are still
reflected in the status registers.", so this should work even though
grub uses polling

This fixes the following problem on a Fujitsu E744 laptop:

Sometimes there is a very long delay (up to several minutes) when
booting from hard disk. It seems accessing the DVD drive (which has no
disk inserted) sometimes fails with some errors, which leads to each
access being stalled until the 20s timeout triggers. This seems to
happen when grub is trying to read filesystem/partition data.

The problem is that the command_issue bit that is checked in the loop is
only reset if the "HBA receives a FIS which clears the BSY, DRQ, and ERR
bits for the command", but the ERR bit is never cleared. Therefore
command_issue is never reset and grub waits for the timeout.

The relevant bit in our case is the Task File Error Status (TFES), which
is equivalent to the ERR bit 0 in tfd. But this patch also checks
the other error bits except for the "Interface non-fatal error status"
bit.

Signed-off-by: Stefan Fritsch <fritsch@genua.de>
---
 grub-core/disk/ahci.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/grub-core/disk/ahci.c b/grub-core/disk/ahci.c
index 494a1b7..8321fd9 100644
--- a/grub-core/disk/ahci.c
+++ b/grub-core/disk/ahci.c
@@ -82,6 +82,19 @@ enum grub_ahci_hba_port_command
     GRUB_AHCI_HBA_PORT_CMD_FR = 0x4000,
   };
 
+enum grub_ahci_hba_port_int_status
+  {
+	  GRUB_AHCI_HBA_PORT_IS_IFS  = (1UL << 27),
+	  GRUB_AHCI_HBA_PORT_IS_HBDS = (1UL << 28),
+	  GRUB_AHCI_HBA_PORT_IS_HBFS = (1UL << 29),
+	  GRUB_AHCI_HBA_PORT_IS_TFES = (1UL << 30),
+  };
+#define GRUB_AHCI_HBA_PORT_IS_FATAL_MASK (\
+	GRUB_AHCI_HBA_PORT_IS_IFS| \
+	GRUB_AHCI_HBA_PORT_IS_HBDS|\
+	GRUB_AHCI_HBA_PORT_IS_HBFS|\
+	GRUB_AHCI_HBA_PORT_IS_TFES)
+
 struct grub_ahci_hba
 {
   grub_uint32_t cap;
@@ -1026,7 +1039,8 @@ grub_ahci_readwrite_real (struct grub_ahci_device *dev,
 
   endtime = grub_get_time_ms () + (spinup ? 20000 : 20000);
   while ((dev->hba->ports[dev->port].command_issue & 1))
-    if (grub_get_time_ms () > endtime)
+    if (grub_get_time_ms () > endtime ||
+	(dev->hba->ports[dev->port].intstatus & GRUB_AHCI_HBA_PORT_IS_FATAL_MASK))
       {
 	grub_dprintf ("ahci", "AHCI status <%x %x %x %x>\n",
 		      dev->hba->ports[dev->port].command_issue,
@@ -1034,7 +1048,10 @@ grub_ahci_readwrite_real (struct grub_ahci_device *dev,
 		      dev->hba->ports[dev->port].intstatus,
 		      dev->hba->ports[dev->port].task_file_data);
 	dev->hba->ports[dev->port].command_issue = 0;
-	err = grub_error (GRUB_ERR_IO, "AHCI transfer timed out");
+	if (dev->hba->ports[dev->port].intstatus & GRUB_AHCI_HBA_PORT_IS_FATAL_MASK)
+	  err = grub_error (GRUB_ERR_IO, "AHCI transfer error");
+	else
+	  err = grub_error (GRUB_ERR_IO, "AHCI transfer timed out");
 	if (!reset)
 	  grub_ahci_reset_port (dev, 1);
 	break;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] ahci: Improve error handling
  2018-01-19 13:13 [PATCH] ahci: Improve error handling Stefan Fritsch
@ 2018-01-23 12:00 ` Daniel Kiper
  2018-01-29 12:16   ` Daniel Kiper
  0 siblings, 1 reply; 4+ messages in thread
From: Daniel Kiper @ 2018-01-23 12:00 UTC (permalink / raw)
  To: Stefan Fritsch; +Cc: The development of GNU GRUB, Daniel Kiper, daniel.kiper

On Fri, Jan 19, 2018 at 02:13:29PM +0100, Stefan Fritsch wrote:
> From: Stefan Fritsch <fritsch@genua.de>
>
> Check the error bits in the interrupt status register. According to the
> AHCI 1.2 spec, "Interrupt sources that are disabled (?0?) are still
> reflected in the status registers.", so this should work even though
> grub uses polling
>
> This fixes the following problem on a Fujitsu E744 laptop:
>
> Sometimes there is a very long delay (up to several minutes) when
> booting from hard disk. It seems accessing the DVD drive (which has no
> disk inserted) sometimes fails with some errors, which leads to each
> access being stalled until the 20s timeout triggers. This seems to
> happen when grub is trying to read filesystem/partition data.
>
> The problem is that the command_issue bit that is checked in the loop is
> only reset if the "HBA receives a FIS which clears the BSY, DRQ, and ERR
> bits for the command", but the ERR bit is never cleared. Therefore
> command_issue is never reset and grub waits for the timeout.
>
> The relevant bit in our case is the Task File Error Status (TFES), which
> is equivalent to the ERR bit 0 in tfd. But this patch also checks
> the other error bits except for the "Interface non-fatal error status"
> bit.
>
> Signed-off-by: Stefan Fritsch <fritsch@genua.de>

Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>

If there are no objections I will apply this by the end of this week.

Daniel


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] ahci: Improve error handling
  2018-01-23 12:00 ` Daniel Kiper
@ 2018-01-29 12:16   ` Daniel Kiper
  2018-03-08 11:07     ` Paul Menzel
  0 siblings, 1 reply; 4+ messages in thread
From: Daniel Kiper @ 2018-01-29 12:16 UTC (permalink / raw)
  To: Daniel Kiper; +Cc: Stefan Fritsch, The development of GNU GRUB, daniel.kiper

On Tue, Jan 23, 2018 at 01:00:56PM +0100, Daniel Kiper wrote:
> On Fri, Jan 19, 2018 at 02:13:29PM +0100, Stefan Fritsch wrote:
> > From: Stefan Fritsch <fritsch@genua.de>
> >
> > Check the error bits in the interrupt status register. According to the
> > AHCI 1.2 spec, "Interrupt sources that are disabled (?0?) are still
> > reflected in the status registers.", so this should work even though
> > grub uses polling
> >
> > This fixes the following problem on a Fujitsu E744 laptop:
> >
> > Sometimes there is a very long delay (up to several minutes) when
> > booting from hard disk. It seems accessing the DVD drive (which has no
> > disk inserted) sometimes fails with some errors, which leads to each
> > access being stalled until the 20s timeout triggers. This seems to
> > happen when grub is trying to read filesystem/partition data.
> >
> > The problem is that the command_issue bit that is checked in the loop is
> > only reset if the "HBA receives a FIS which clears the BSY, DRQ, and ERR
> > bits for the command", but the ERR bit is never cleared. Therefore
> > command_issue is never reset and grub waits for the timeout.
> >
> > The relevant bit in our case is the Task File Error Status (TFES), which
> > is equivalent to the ERR bit 0 in tfd. But this patch also checks
> > the other error bits except for the "Interface non-fatal error status"
> > bit.
> >
> > Signed-off-by: Stefan Fritsch <fritsch@genua.de>
>
> Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
>
> If there are no objections I will apply this by the end of this week.

Applied!

Thanks,

Daniel


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] ahci: Improve error handling
  2018-01-29 12:16   ` Daniel Kiper
@ 2018-03-08 11:07     ` Paul Menzel
  0 siblings, 0 replies; 4+ messages in thread
From: Paul Menzel @ 2018-03-08 11:07 UTC (permalink / raw)
  To: Daniel Kiper, Stefan Fritsch; +Cc: Daniel Kiper, grub-devel

[-- Attachment #1: Type: text/plain, Size: 2018 bytes --]

Dear Stefan,


Am Montag, den 29.01.2018, 13:16 +0100 schrieb Daniel Kiper:
> On Tue, Jan 23, 2018 at 01:00:56PM +0100, Daniel Kiper wrote:
> > On Fri, Jan 19, 2018 at 02:13:29PM +0100, Stefan Fritsch wrote:
> > > From: Stefan Fritsch <fritsch@genua.de>
> > > 
> > > Check the error bits in the interrupt status register. According to the
> > > AHCI 1.2 spec, "Interrupt sources that are disabled (?0?) are still
> > > reflected in the status registers.", so this should work even though
> > > grub uses polling
> > > 
> > > This fixes the following problem on a Fujitsu E744 laptop:
> > > 
> > > Sometimes there is a very long delay (up to several minutes) when
> > > booting from hard disk. It seems accessing the DVD drive (which has no
> > > disk inserted) sometimes fails with some errors, which leads to each
> > > access being stalled until the 20s timeout triggers. This seems to
> > > happen when grub is trying to read filesystem/partition data.
> > > 
> > > The problem is that the command_issue bit that is checked in the loop is
> > > only reset if the "HBA receives a FIS which clears the BSY, DRQ, and ERR
> > > bits for the command", but the ERR bit is never cleared. Therefore
> > > command_issue is never reset and grub waits for the timeout.
> > > 
> > > The relevant bit in our case is the Task File Error Status (TFES), which
> > > is equivalent to the ERR bit 0 in tfd. But this patch also checks
> > > the other error bits except for the "Interface non-fatal error status"
> > > bit.
> > > 
> > > Signed-off-by: Stefan Fritsch <fritsch@genua.de>
> > 
> > Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
> > 
> > If there are no objections I will apply this by the end of this week.
> 
> Applied!

This is just a note, that in #coreboot@freenode.net, a Lenovo T420 with
coreboot user reported this error with GRUB 2.02 (due to the DVD
drive), and building the master branch with your commit solved the
issue. Thank you very much.


Thanks,

Paul

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-03-08 11:07 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-19 13:13 [PATCH] ahci: Improve error handling Stefan Fritsch
2018-01-23 12:00 ` Daniel Kiper
2018-01-29 12:16   ` Daniel Kiper
2018-03-08 11:07     ` Paul Menzel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.