All of lore.kernel.org
 help / color / mirror / Atom feed
* grub.pxe, ARP-after-boot, DMA, and trouble
@ 2012-04-07  5:36 Daniel Kahn Gillmor
  2012-04-11 18:16 ` Daniel Kahn Gillmor
  0 siblings, 1 reply; 2+ messages in thread
From: Daniel Kahn Gillmor @ 2012-04-07  5:36 UTC (permalink / raw)
  To: Grub 2 Development List

[-- Attachment #1: Type: text/plain, Size: 3339 bytes --]

I've been recently using grub.pxe (from debian's version 1.99-17)
according to the instructions at [0] to boot memtest86+ [1] (from
debian's version 4.20-1.1) over the network on x86 machines.  Due to the
problems described below, i'm using a serial console.

The grub configuration is very simple:

--------------------------
serial --speed=115200
terminal_input console serial
terminal_output console serial
menuentry 'memtest86+ serial console' {
  set root='(pxe)'
  echo 'loading memory tester...'
  linux16 /memtest86+.bin console=ttyS0,115200n8
}
--------------------------

On some machines i've done this with, memtest86+ reports transient
memory failures very early in the run, and the failures seem to happen
even on brand new sticks of RAM, placed in any combination and order in
the hardware.  The errors were transient -- sometimes i'd get as many as
~300 32-bit words of RAM failing, other times memtest could complete a
full pass with no errors.

The failures came during an early test where memtest86+ writes each
address's value to its own memory location, and then re-reads the memory
to verify.

Using the serial line, i was able to record the memory failures from a
run that had 24 words fail.  I was able to transcribe them and convert
them to a hexdump format.  These are the 24 words that failed (the
memory address indices are in the left-hand column):

*
00095d30  9c c8 e3 71 dc ff 00 65  32 4b a0 29 08 06 00 01
00095d40  08 00 06 04 00 01 00 65  32 4b a0 29 c0 a8 17 54
00095d50  00 00 00 00 00 00 c0 a8  17 86 55 55 55 55 55 55
00095d60  55 55 55 55 55 55 55 55  55 55 55 55 9a a2 8c 53
*
00097590  00 00 00 00 30 5d 09 00  40 00 00 00 04 00 00 00
000975a0  4d 4d 00 00 00 00 00 00  00 00 00 00 10 38 6a 94
*

The first block (of 16 words) appears to be an ARP request packet
From the local network's DHCP server to the failing machine (the MAC
addresses have been obfuscated here, and i didn't bother updating the
checksum to match)

The second block (of 8 words) appears to contain a pointer to the
first block, a size indicator, and some other stuff i don't recognize.

So i think what's happening is something like Matthew Garrett describes
in his recent work with UEFI [2], although i'm using BIOS and not UEFI.

In particular, i suspect that *after* the bootloader has turned over
control to the kernel (memtest in this case), the PXE-driven NIC is
continuing to DMA received packets into active RAM.

This seems pretty dangerous!

Would using pxe_unload before the close of the stanza prevent this
situation from happening (i regret i haven't been able to test it myself
because i haven't had access to the failing hardware since i completed
this diagnosis)?  If so, it seems like that should be clearly documented
and strongly recommended in grub.texi.

Or, should grub be marking certain sections of memory as unavailable
somehow before handoff to the kernel?

Or is there some other way to avoid this sort of corruption?

I've seen similar failures now on pretty different hardware (a fairly
old Dell Optiplex GX260 SFF and a new Lenovo ThinkCentre M77).

Any ideas?

        --dkg

[0] https://www.gnu.org/software/grub/manual/grub.html#Network
[1] http://www.memtest.org/
[2] http://mjg59.dreamwidth.org/11235.html

[-- Attachment #2: Type: application/pgp-signature, Size: 965 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: grub.pxe, ARP-after-boot, DMA, and trouble
  2012-04-07  5:36 grub.pxe, ARP-after-boot, DMA, and trouble Daniel Kahn Gillmor
@ 2012-04-11 18:16 ` Daniel Kahn Gillmor
  0 siblings, 0 replies; 2+ messages in thread
From: Daniel Kahn Gillmor @ 2012-04-11 18:16 UTC (permalink / raw)
  To: Grub 2 Development List


[-- Attachment #1.1: Type: text/plain, Size: 1070 bytes --]

On Sat, 07 Apr 2012 01:36:44 -0400, Daniel Kahn Gillmor <dkg@fifthhorseman.net> wrote:
> In particular, i suspect that *after* the bootloader has turned over
> control to the kernel (memtest in this case), the PXE-driven NIC is
> continuing to DMA received packets into active RAM.

phcoder sent me the patch below to address the PXE DMA issue.

I tested it against the current bzr head.

My testing grub.cfg is just:

----------------------
serial --speed=115200
terminal_input console serial
terminal_output console serial

linux16 (pxe)/memtest86+.bin console=ttyS0,115200n8
boot
----------------------

Using grub bzr head without this patch, i get these RAM "failures" on
the net-booted ThinkCentre M70 within ~15 seconds on roughly every other
boot (i haven't found the right packets to inject to force the DMA to
happen consistently, alas).

Using grub bzr head with the patch, i have been unable to reproduce
a single RAM failures in more than 20 trials.

I think this patch resolves the issue, and should go into the mainline.

Thanks, phcoder!

        --dkg


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.2: phcoder's fix --]
[-- Type: text/x-diff, Size: 723 bytes --]

=== modified file 'grub-core/net/drivers/i386/pc/pxe.c'
--- grub-core/net/drivers/i386/pc/pxe.c	2012-02-08 18:26:01 +0000
+++ grub-core/net/drivers/i386/pc/pxe.c	2012-04-11 12:53:17 +0000
@@ -278,9 +278,14 @@
 grub_pxe_close (const struct grub_net_card *dev __attribute__ ((unused)))
 {
   if (pxe_rm_entry)
-    grub_pxe_call (GRUB_PXENV_UNDI_CLOSE,
-		   (void *) GRUB_MEMORY_MACHINE_SCRATCH_ADDR,
-		   pxe_rm_entry);
+    {
+      grub_pxe_call (GRUB_PXENV_UNDI_CLOSE,
+		     (void *) GRUB_MEMORY_MACHINE_SCRATCH_ADDR,
+		     pxe_rm_entry);
+      grub_pxe_call (GRUB_PXENV_UNDI_SHUTDOWN,
+		     (void *) GRUB_MEMORY_MACHINE_SCRATCH_ADDR,
+		     pxe_rm_entry);
+    }
 }
 
 static grub_err_t


[-- Attachment #2: Type: application/pgp-signature, Size: 965 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2012-04-11 18:16 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-07  5:36 grub.pxe, ARP-after-boot, DMA, and trouble Daniel Kahn Gillmor
2012-04-11 18:16 ` Daniel Kahn Gillmor

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.