All of lore.kernel.org
 help / color / mirror / Atom feed
* next-20130709 DMAR issues
@ 2013-07-10 17:01 Valdis Kletnieks
  2013-07-12 12:14 ` Ingo Molnar
  0 siblings, 1 reply; 6+ messages in thread
From: Valdis Kletnieks @ 2013-07-10 17:01 UTC (permalink / raw)
  To: David Woodhouse, Ingo Molnar; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3180 bytes --]

Dell Latitude E6530.

Seeing a new error message crop up in next-0709 that wasn't there with 0703.
Particularly odd, last person to touch dmar.c was on 05/20/2013, so no smoking
guns there...

egrep -i 'dmar|linux ver' /var/log/messages  gives me:

Jul  9 21:47:15 turing-police kernel: [    0.000000] Linux version 3.10.0-next-20130703 (valdis@turing-police.cc.vt.edu) (gcc version 4.8.1 20130612 (Red Hat 4.8.1-2) (GCC) ) #99 SMP PREEMPT Wed Jul 3 17:40:09 EDT 2013
Jul  9 21:47:15 turing-police kernel: [    0.000000] ACPI: DMAR 00000000cb7fd298 00080 (v01 INTEL      SNB  00000001 INTL 00000001)
Jul  9 21:47:15 turing-police kernel: [    0.021530] dmar: Host address width 36
Jul  9 21:47:15 turing-police kernel: [    0.021536] dmar: DRHD base: 0x000000fed90000 flags: 0x1
Jul  9 21:47:15 turing-police kernel: [    0.021569] dmar: IOMMU 0: reg_base_addr fed90000 ver 1:0 cap c9008020660262 ecap f0105a
Jul  9 21:47:15 turing-police kernel: [    0.021576] dmar: RMRR base: 0x000000ce761000 end: 0x000000ce780fff
Jul  9 21:47:15 turing-police kernel: [    1.023235] DMAR: No ATSR found

That's what it usually says.  But now I have:

Jul 10 12:20:19 turing-police kernel: [    0.000000] Linux version 3.10.0-next-20130709 (valdis@turing-police.cc.vt.edu) (gcc version 4.8.1 20130612 (Red Hat 4.8.1-2) (GCC) ) #100 SMP PREEMPT Tue Jul 9 16:01:59 EDT 2013
Jul 10 12:20:19 turing-police kernel: [    0.000000] ACPI: DMAR 00000000cb7fd298 00080 (v01 INTEL      SNB  00000001 INTL 00000001)
Jul 10 12:20:19 turing-police kernel: [    0.021453] dmar: Host address width 36
Jul 10 12:20:19 turing-police kernel: [    0.021456] dmar: DRHD base: 0x000000fed90000 flags: 0x1
Jul 10 12:20:19 turing-police kernel: [    0.021485] dmar: IOMMU 0: reg_base_addr fed90000 ver 1:0 cap c9008020660262 ecap f0105a
Jul 10 12:20:19 turing-police kernel: [    0.021487] dmar: RMRR base: 0x000000ce761000 end: 0x000000ce780fff
Jul 10 12:20:19 turing-police kernel: [    0.021575] dmar: DRHD: handling fault status reg 2
Jul 10 12:20:19 turing-police kernel: [    0.021583] dmar: DMAR:[DMA Read] Request device [00:1f.2] fault addr ce71d000
Jul 10 12:20:19 turing-police kernel: [    0.021583] DMAR:[fault reason 06] PTE Read access is not set
Jul 10 12:20:19 turing-police kernel: [    1.034643] DMAR: No ATSR found

Now I have 3 extra messages talking about handling a fault status.  lspci says 00:1f.2 is:

00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04)

I found similar in a thread here: http://lkml.indiana.edu/hypermail/linux/kernel/1212.1/02319.html
which ended with Don Dutile saying:

"DMAR table does not have an entry for this device to this region.
Once the driver reconfigs/resets the device to stop polling bios-boot
cmd rings and use (new) OS (dma-mapped) rings, there's a period of time
during this transition that the hw is babbling away to an area that is no
longer mapped."

But I'm not convinced this is the same issue - why did it change between 0703 and 0709,
when I haven't updated the firmware. No relevant .config changes between the two, either.

If this doesn't ring any bells, I'l go do the bisect thing...

[-- Attachment #2: Type: application/pgp-signature, Size: 865 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: next-20130709 DMAR issues
  2013-07-10 17:01 next-20130709 DMAR issues Valdis Kletnieks
@ 2013-07-12 12:14 ` Ingo Molnar
  2013-07-12 14:31   ` Joerg Roedel
  2013-07-16 22:07   ` Valdis.Kletnieks
  0 siblings, 2 replies; 6+ messages in thread
From: Ingo Molnar @ 2013-07-12 12:14 UTC (permalink / raw)
  To: Valdis Kletnieks, Li, Zhen-Hua, Joerg Roedel
  Cc: David Woodhouse, linux-kernel


(Cc:-ed a few DMAR people.)

* Valdis Kletnieks <Valdis.Kletnieks@vt.edu> wrote:

> Dell Latitude E6530.
> 
> Seeing a new error message crop up in next-0709 that wasn't there with 0703.
> Particularly odd, last person to touch dmar.c was on 05/20/2013, so no smoking
> guns there...
> 
> egrep -i 'dmar|linux ver' /var/log/messages  gives me:
> 
> Jul  9 21:47:15 turing-police kernel: [    0.000000] Linux version 3.10.0-next-20130703 (valdis@turing-police.cc.vt.edu) (gcc version 4.8.1 20130612 (Red Hat 4.8.1-2) (GCC) ) #99 SMP PREEMPT Wed Jul 3 17:40:09 EDT 2013
> Jul  9 21:47:15 turing-police kernel: [    0.000000] ACPI: DMAR 00000000cb7fd298 00080 (v01 INTEL      SNB  00000001 INTL 00000001)
> Jul  9 21:47:15 turing-police kernel: [    0.021530] dmar: Host address width 36
> Jul  9 21:47:15 turing-police kernel: [    0.021536] dmar: DRHD base: 0x000000fed90000 flags: 0x1
> Jul  9 21:47:15 turing-police kernel: [    0.021569] dmar: IOMMU 0: reg_base_addr fed90000 ver 1:0 cap c9008020660262 ecap f0105a
> Jul  9 21:47:15 turing-police kernel: [    0.021576] dmar: RMRR base: 0x000000ce761000 end: 0x000000ce780fff
> Jul  9 21:47:15 turing-police kernel: [    1.023235] DMAR: No ATSR found
> 
> That's what it usually says.  But now I have:
> 
> Jul 10 12:20:19 turing-police kernel: [    0.000000] Linux version 3.10.0-next-20130709 (valdis@turing-police.cc.vt.edu) (gcc version 4.8.1 20130612 (Red Hat 4.8.1-2) (GCC) ) #100 SMP PREEMPT Tue Jul 9 16:01:59 EDT 2013
> Jul 10 12:20:19 turing-police kernel: [    0.000000] ACPI: DMAR 00000000cb7fd298 00080 (v01 INTEL      SNB  00000001 INTL 00000001)
> Jul 10 12:20:19 turing-police kernel: [    0.021453] dmar: Host address width 36
> Jul 10 12:20:19 turing-police kernel: [    0.021456] dmar: DRHD base: 0x000000fed90000 flags: 0x1
> Jul 10 12:20:19 turing-police kernel: [    0.021485] dmar: IOMMU 0: reg_base_addr fed90000 ver 1:0 cap c9008020660262 ecap f0105a
> Jul 10 12:20:19 turing-police kernel: [    0.021487] dmar: RMRR base: 0x000000ce761000 end: 0x000000ce780fff
> Jul 10 12:20:19 turing-police kernel: [    0.021575] dmar: DRHD: handling fault status reg 2
> Jul 10 12:20:19 turing-police kernel: [    0.021583] dmar: DMAR:[DMA Read] Request device [00:1f.2] fault addr ce71d000
> Jul 10 12:20:19 turing-police kernel: [    0.021583] DMAR:[fault reason 06] PTE Read access is not set
> Jul 10 12:20:19 turing-police kernel: [    1.034643] DMAR: No ATSR found
> 
> Now I have 3 extra messages talking about handling a fault status.  lspci says 00:1f.2 is:
> 
> 00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04)
> 
> I found similar in a thread here: http://lkml.indiana.edu/hypermail/linux/kernel/1212.1/02319.html
> which ended with Don Dutile saying:
> 
> "DMAR table does not have an entry for this device to this region.
> Once the driver reconfigs/resets the device to stop polling bios-boot
> cmd rings and use (new) OS (dma-mapped) rings, there's a period of time
> during this transition that the hw is babbling away to an area that is no
> longer mapped."
> 
> But I'm not convinced this is the same issue - why did it change between 0703 and 0709,
> when I haven't updated the firmware. No relevant .config changes between the two, either.
> 
> If this doesn't ring any bells, I'l go do the bisect thing...



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: next-20130709 DMAR issues
  2013-07-12 12:14 ` Ingo Molnar
@ 2013-07-12 14:31   ` Joerg Roedel
  2013-07-15  2:32     ` Li, Zhen-Hua
  2013-07-16 22:07   ` Valdis.Kletnieks
  1 sibling, 1 reply; 6+ messages in thread
From: Joerg Roedel @ 2013-07-12 14:31 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Valdis Kletnieks, Li, Zhen-Hua, David Woodhouse, linux-kernel

Thanks for the heads-up, Ingo.

On Fri, Jul 12, 2013 at 02:14:20PM +0200, Ingo Molnar wrote:
> > Jul 10 12:20:19 turing-police kernel: [    0.021583] DMAR:[fault reason 06] PTE Read access is not set
> > 
> > Now I have 3 extra messages talking about handling a fault status.  lspci says 00:1f.2 is:
> > 
> > 00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04)

This could be caused by some change in the SATA driver stack. Maybe a
DMA buffer is used after unmap or something.


> > If this doesn't ring any bells, I'l go do the bisect thing...

Yes, a bisection would help here, thanks for doing this.


	Joerg



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: next-20130709 DMAR issues
  2013-07-12 14:31   ` Joerg Roedel
@ 2013-07-15  2:32     ` Li, Zhen-Hua
  2013-07-16 22:16       ` Valdis.Kletnieks
  0 siblings, 1 reply; 6+ messages in thread
From: Li, Zhen-Hua @ 2013-07-15  2:32 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Ingo Molnar, Valdis Kletnieks, David Woodhouse, linux-kernel

I have met a bug with the same error message, its cause was that the bios
did not allocate RMRR/DRHD(can't remember which one) for the device.

Thanks
ZhenHua

On 07/12/2013 10:31 PM, Joerg Roedel wrote:
> Thanks for the heads-up, Ingo.
>
> On Fri, Jul 12, 2013 at 02:14:20PM +0200, Ingo Molnar wrote:
>>> Jul 10 12:20:19 turing-police kernel: [    0.021583] DMAR:[fault reason 06] PTE Read access is not set
>>>
>>> Now I have 3 extra messages talking about handling a fault status.  lspci says 00:1f.2 is:
>>>
>>> 00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04)
> This could be caused by some change in the SATA driver stack. Maybe a
> DMA buffer is used after unmap or something.
>
>
>>> If this doesn't ring any bells, I'l go do the bisect thing...
> Yes, a bisection would help here, thanks for doing this.
>
>
> 	Joerg
>
>
> .
>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: next-20130709 DMAR issues
  2013-07-12 12:14 ` Ingo Molnar
  2013-07-12 14:31   ` Joerg Roedel
@ 2013-07-16 22:07   ` Valdis.Kletnieks
  1 sibling, 0 replies; 6+ messages in thread
From: Valdis.Kletnieks @ 2013-07-16 22:07 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Li, Zhen-Hua, Joerg Roedel, David Woodhouse, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2669 bytes --]

On Fri, 12 Jul 2013 14:14:20 +0200, Ingo Molnar said:
>
> (Cc:-ed a few DMAR people.)

Sorry for the slow reply, missed this in the lkml firehose.

For whatever reason, the damned problem seems to have evaporated:

% egrep -i 'dmar|Linux vers' /var/log/messages-20130714
....
Jul 11 18:54:15 turing-police kernel: [    0.000000] Linux version 3.10.0-next-20130709 (valdis@turing-police.cc.vt.edu) (gcc version 4.8.1 20130612 (Red Hat 4.8.1-2) (GCC) ) #100 SMP PREEMPT Tue Jul 9 16:01:59 EDT 2013
Jul 11 18:54:15 turing-police kernel: [    0.000000] ACPI: DMAR 00000000cb7fd298 00080 (v01 INTEL      SNB  00000001 INTL 00000001)
Jul 11 18:54:15 turing-police kernel: [    0.021632] dmar: Host address width 36
Jul 11 18:54:15 turing-police kernel: [    0.021638] dmar: DRHD base: 0x000000fed90000 flags: 0x1
Jul 11 18:54:15 turing-police kernel: [    0.021669] dmar: IOMMU 0: reg_base_addr fed90000 ver 1:0 cap c9008020660262 ecap f0105a
Jul 11 18:54:15 turing-police kernel: [    0.021677] dmar: RMRR base: 0x000000ce761000 end: 0x000000ce780fff
Jul 11 18:54:15 turing-police kernel: [    0.021775] dmar: DRHD: handling fault status reg 2
Jul 11 18:54:15 turing-police kernel: [    0.021782] dmar: DMAR:[DMA Read] Request device [00:1f.2] fault addr ce71d000
Jul 11 18:54:15 turing-police kernel: [    0.021782] DMAR:[fault reason 06] PTE Read access is not set
Jul 11 18:54:15 turing-police kernel: [    1.002171] DMAR: No ATSR found
Jul 11 22:12:39 turing-police kernel: [    0.000000] Linux version 3.10.0-next-20130709 (valdis@turing-police.cc.vt.edu) (gcc version 4.8.1 20130612 (Red Hat 4.8.1-2) (GCC) ) #100 SMP PREEMPT Tue Jul 9 16:01:59 EDT 2013
Jul 11 22:12:39 turing-police kernel: [    0.000000] ACPI: DMAR 00000000cb7fd298 00080 (v01 INTEL      SNB  00000001 INTL 00000001)
Jul 11 22:12:39 turing-police kernel: [    0.021376] dmar: Host address width 36
Jul 11 22:12:39 turing-police kernel: [    0.021382] dmar: DRHD base: 0x000000fed90000 flags: 0x1
Jul 11 22:12:39 turing-police kernel: [    0.021414] dmar: IOMMU 0: reg_base_addr fed90000 ver 1:0 cap c9008020660262 ecap f0105a
Jul 11 22:12:39 turing-police kernel: [    0.021422] dmar: RMRR base: 0x000000ce761000 end: 0x000000ce780fff
Jul 11 22:12:39 turing-police kernel: [    1.034107] DMAR: No ATSR found

Damned if I know what changed - same kernel booted at 18:54 hit the issue, but
at 22:12 it had gone into hiding, and I haven't seen it since.  (And the
laptop gets booted twice a day most days - once at arrival at office, and
once when I get home, and it had been doing it consistently at both locations
for several days.)

Definitely well into "things that go bump in the night" category.

[-- Attachment #2: Type: application/pgp-signature, Size: 865 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: next-20130709 DMAR issues
  2013-07-15  2:32     ` Li, Zhen-Hua
@ 2013-07-16 22:16       ` Valdis.Kletnieks
  0 siblings, 0 replies; 6+ messages in thread
From: Valdis.Kletnieks @ 2013-07-16 22:16 UTC (permalink / raw)
  To: Li, Zhen-Hua; +Cc: Joerg Roedel, Ingo Molnar, David Woodhouse, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 588 bytes --]

On Mon, 15 Jul 2013 10:32:17 +0800, "Li, Zhen-Hua" said:
> I have met a bug with the same error message, its cause was that the bios
> did not allocate RMRR/DRHD(can't remember which one) for the device.

I think I posted a link to that same bug report.  The problem is that
if the BIOS wasn't allocating it before, it isn't allocating it now,
because it's still the same A11 bios that Dell shipped it with.  And now
I'm not sure which is more confusing - that it was OK in -0703 and borked
in -0709, or that between two successive boots of the same -0709 kernel
it cleared itself up....

[-- Attachment #2: Type: application/pgp-signature, Size: 865 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-07-16 22:18 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-10 17:01 next-20130709 DMAR issues Valdis Kletnieks
2013-07-12 12:14 ` Ingo Molnar
2013-07-12 14:31   ` Joerg Roedel
2013-07-15  2:32     ` Li, Zhen-Hua
2013-07-16 22:16       ` Valdis.Kletnieks
2013-07-16 22:07   ` Valdis.Kletnieks

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.