* [PATCH] BadRAM for 2.5.73-mm2
@ 2003-07-02 3:33 steven.newbury1
2003-07-02 6:23 ` Oleg Drokin
0 siblings, 1 reply; 4+ messages in thread
From: steven.newbury1 @ 2003-07-02 3:33 UTC (permalink / raw)
To: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 372 bytes --]
Arghhhh! Third attempt. Yahoo!Mail has currupted the patch twice previously so I have given up with it. :-(
I have ported the BadRAM patch to 2.5.73-mm2.
It will probably apply cleanly to 2.5.73 as well but I haven't checked it
yet...
I am running it now, it seems to work for me.
It is based it on
http://rick.vanrein.org/linux/badram/software/BadRAM-2.4.20.1.patch
[-- Attachment #2: BadRAM-2.5.73-mm2.diff --]
[-- Type: application/octet-stream, Size: 24048 bytes --]
diff -urN linux-2.5.73-mm2/CREDITS linux-2.5/CREDITS
--- linux-2.5.73-mm2/CREDITS 2003-06-17 05:20:07.000000000 +0100
+++ linux-2.5/CREDITS 2003-07-02 01:58:52.146197970 +0100
@@ -2365,6 +2365,12 @@
S: 55127 Mainz
S: Germany
+N: Steven Newbury
+E: s_j_newbury@yahoo.co.uk
+D: Forward ported BadRAM patch to 2.5.73 from 2.4.20
+S: Derby
+S: United Kingdom
+
N: David C. Niemi
E: niemi@tux.org
W: http://www.tux.org/~niemi/
@@ -2601,6 +2607,14 @@
S: Malvern, Pennsylvania 19355
S: USA
+N: Rick van Rein
+E: vanrein@cs.utwente.nl
+W: http://www.cs.utwente.nl/~vanrein
+D: Memory, the BadRAM subsystem dealing with statically challanged RAM modules.
+S: Binnenes 67
+S: 9407 CX Assen
+S: The Netherlands
+
N: Stefan Reinauer
E: stepan@linux.de
W: http://www.freiburg.linux.de/~stepan/
@@ -2819,6 +2833,13 @@
E:
D: Macintosh IDE Driver
+N: Nico Schmoigl
+E: nico@writemail.com
+W: http://webrum.uni-mannheim.de/math/schmoigl/linux/
+D: Migration of BadRAM patch to 2.4.4 series (with Rick van Rein)
+S: Mannheim, BW, Germany
+P: 2047/38FC9E03 5D DB 09 E4 3F F3 CD 09 75 59 - 11 17 9C 03 46 E3 38 FC 9E 03
+
N: Peter De Schrijver
E: stud11@cc4.kuleuven.ac.be
D: Mitsumi CD-ROM driver patches March version
diff -urN linux-2.5.73-mm2/Documentation/badram.txt linux-2.5/Documentation/badram.txt
--- linux-2.5.73-mm2/Documentation/badram.txt 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.5/Documentation/badram.txt 2003-07-01 20:02:27.000000000 +0100
@@ -0,0 +1,266 @@
+INFORMATION ON USING BAD RAM MODULES
+====================================
+
+Introduction
+ RAM is getting smaller and smaller, and as a result, also more and more
+ vulnerable. This makes the manufacturing of hardware more expensive,
+ since an excessive amount of RAM chips must be discarded on account of
+ a single cell that is wrong. Similarly, static discharge may damage a
+ RAM module forever, which is usually remedied by replacing it
+ entirely.
+
+ This is not necessary, as the BadRAM code shows: By informing the Linux
+ kernel which addresses in a RAM are damaged, the kernel simply avoids
+ ever allocating such addresses but makes all the rest available.
+
+Reasons for this feature
+ There are many reasons why this kernel feature is useful:
+ - Chip manufacture is resource intensive; waste less and sleep better
+ - It's another chance to promote Linux as "the flexible OS"
+ - Some laptops have their RAM soldered in... and then it fails!
+ - It's plain cool ;-)
+
+Running example
+ To run this project, I was given two DIMMs, 32 MB each. One, that we
+ shall use as a running example in this text, contained 512 faulty bits,
+ spread over 1/4 of the address range in a regular pattern. Some tricks
+ with a RAM tester and a few binary calculations were sufficient to
+ write these faults down in 2 longword numbers.
+
+ The kernel recognised the correct number of pages with faults and did
+ not give them out for allocation. The allocation routines could
+ therefore progress as normally, without any adaption.
+ So, I gained 30 MB of DIMM which would otherwise have been thrown
+ away. After booting the kernel, the kernel behaved exactly as it
+ always had.
+
+Initial checks
+ If you experience RAM trouble, first read /usr/src/linux/memory.txt
+ and try out the mem=4M trick to see if at least some initial parts
+ of your RAM work well. The BadRAM routines halt the kernel in panic
+ if the reserved area of memory (containing kernel stuff) contains
+ a faulty address.
+
+Running a RAM checker
+ The memory checker is not built into the kernel, to avoid delays at
+ runtime. If you experience problems that may be caused by RAM, run
+ a good RAM checker, such as
+ http://reality.sgi.com/cbrady_denver/memtest86
+ The output of a RAM checker provides addresses that went wrong. In
+ the 32 MB chip with 512 faulty bits mentioned above, the errors were
+ found in the 8MB-16MB range (the DIMM was in slot #0) at addresses
+ xxx42f4
+ xxx62f4
+ xxxc2f4
+ xxxe2f4
+ and the error was a "sticky 1 bit", a memory bit that stayed "1" no
+ matter what was written to it. The regularity of this pattern
+ suggests the death of a buffer at the output stages of a row on one of
+ the chips. I expect such regularity to be commonplace. Finding this
+ regularity currently is human effort, but it should not be hard to
+ alter a RAM checker to capture it in some sort of pattern, possibly
+ the BadRAM patterns described below.
+
+ By the way, if you manage to get hold of memtest86 version 2.3 or
+ beyond, you can configure the printing mode to produce BadRAM patterns,
+ which find out exactly what you must enter on the LILO: commandline,
+ except that you shouldn't mention the added spacing. That means that
+ you can skip the following step, which saves you a *lot* of work.
+
+ Also by the way, if your machine has the ISA memory gap in the 15M-16M
+ range unstoppable, Linux can get in trouble. One way of handling that
+ situation is by specifying the total memory size to Linux with a boot
+ parameter mem=... and then to tell it to treat the 15M-16M range as
+ faulty with an additional boot parameter, for instance:
+ mem=24M badram=0x00f00000,0xfff00000
+ if you installed 24MB of RAM in total.
+
+Capturing errors in a pattern
+ Instead of manually providing all 512 errors to the kernel, it's nicer
+ to generate a pattern. Since the regularity is based on address decoding
+ software, which generally takes certain bits into account and ignores
+ others, we shall provide a faulty address F, together with a bit mask M
+ that specifies which bits must be equal to F. In C code, an address A
+ is faulty if and only if
+ (F & M) == (A & M)
+ or alternately (closer to a hardware implementation):
+ ~((F ^ A) & M)
+ In the example 32 MB chip, we had the faulty addresses in 8MB-16MB:
+ xxx42f4 ....0100....
+ xxx62f4 ....0110....
+ xxxc2f4 ....1100....
+ xxxe2f4 ....1110....
+ The second column represents the alternating hex digit in binary form.
+ Apperantly, the first and one-but last binary digit can be anything,
+ so the binary mask for that part is 0101. The mask for the part after
+ this is 0xfff, and the part before should select anything in the range
+ 8MB-16MB, or 0x00800000-0x01000000; this is done with a bitmask
+ 0xff80xxxx. Combining these partial masks, we get:
+ F=0x008042f4 M=0xff805fff
+ That covers everything for this DIMM; for more complicated failing
+ DIMMs, or for a combination of multiple failing DIMMs, it can be
+ necessary to set up a number of such F/M pairs.
+
+Rebooting Linux
+ Now that these patterns are known (and double-checked, the calculations
+ are highly error-prone... it would be neat to test them in the RAM
+ checker...) we simply restart Linux with these F/M pairs as a parameter.
+ If you normally boot as follows:
+ LILO: linux
+ you should now boot with
+ LILO: linux badram=0x008042f4,0xff805fff
+ or perhaps by mentioning more F/M pairs in an order F0,M0,F1,M1,...
+ When you provide an odd number of arguments to badram, the default mask
+ 0xffffffff (only one address matched) is applied to the pattern.
+
+ Beware of the commandline length. At least up to LILO version 0.21,
+ the commandline is cut off after the 78th character; later versions
+ may go as far as the kernel goes, namely 255 characters. In no way is
+ it possible to enter more than 10 numbers to the badram boot option.
+
+ When the kernel now boots, it should not give any trouble with RAM.
+ Mind you, this is under the assumption that the kernel and its data
+ storage do not overlap an erroneous part. If this happens, and the
+ kernel does not choke on it right away, it will stop with a panic.
+ You will need to provide a RAM where the initial, say 2MB, is faultless.
+
+ Now look up your memory status with
+ dmesg | grep ^Memory:
+ which prints a single line with information like
+ Memory: 158524k/163840k available
+ (940k kernel code,
+ 412k reserved,
+ 1856k data,
+ 60k init,
+ 0k highmem,
+ 2048k BadRAM)
+ The latter entry, the badram, is 2048k to represent the loss of 2MB
+ of general purpose RAM due to the errors. Or, positively rephrased,
+ instead of throwing out 32MB as useless, you only throw out 2MB.
+
+ If the system is stable (try compiling a few kernels, and do a few
+ finds in / or so) you may add the boot parameter to /etc/lilo.conf
+ as a line to _all_ the kernels that handle this trouble with a line
+ append="badram=0x008042f4,0xff805fff"
+ after which you run "lilo".
+ Warning: Don't experiment with these settings on your only boot image.
+ If the BadRAM overlays kernel code, data, init, or other reserved
+ memory, the kernel will halt in panic. Try settings on a test boot
+ image first, and if you get a panic you should change the order of
+ your DIMMs [which may involve buying a new one just to be able to
+ change the order].
+
+ You are allowed to enter any number of BadRAM patterns in all the
+ places documented in this file. They will all apply. It is even
+ possible to mention several BadRAM patterns in a single place. The
+ completion of an odd number of arguments with the default mask is
+ done separately for each badram=... option.
+
+Kernel Customisation
+ Some people prefer to enter their badram patterns in the kernel, and
+ this is also possible. In mm/page_alloc.c there is an array of unsigned
+ long integers into which the parameters can be entered, prefixed with
+ the number of integers (twice the number of patterns). The array is
+ named badram_custom and it will be added to the BadRAM list whenever an
+ option 'badram' is provided on the commandline when booting, either
+ with or without additional patterns.
+
+ For the previous example, the code would become
+
+ static unsigned long __init badram_custom[] = {
+ 2, // Number of longwords that follow, as F/M pairs
+ 0x008042f4L, 0xff805fffL,
+ };
+
+ Even on this place you may assume the default mask to be filled in
+ when you enter an odd number of longwords. Specify the number of
+ longwords to be 0 to avoid influence of this custom BadRAM list.
+
+BadRAM classification
+ This technique may start a lively market for "dead" RAM. It is important
+ to realise that some RAMs are more dead than others. So, instead of
+ just providing a RAM size, it is also important to know the BadRAM
+ class, which is defined as follows:
+
+ A BadRAM class N means that at most 2^N bytes have a problem,
+ and that all problems with the RAMs are persistent: They
+ are predictable and always show up.
+
+ The DIMM that serves as an example here was of class 9, since 512=2^9
+ errors were found. Higher classes are worse, "correct" RAM is of class
+ -1 (or even less, at your choice).
+ Class N also means that the bitmask for your chip (if there's just one,
+ that is) counts N bits "0" and it means that (if no faults fall in the
+ same page) an amount of 2^N*PAGESIZE memory is lost, in the example on
+ an i386 architecture that would be 2^9*4k=2MB, which accounts for the
+ initial claim of 30MB RAM gained with this DIMM.
+
+ Note that this scheme has deliberately been defined to be independent
+ of memory technology and of computer architecture.
+
+Known Bugs
+ LILO is known to cut off commandlines which are too long. For the
+ lilo-0.21 distribution, a commandline may not exceed 78 characters,
+ while actually, 255 would be possible [on i386, kernel 2.2.16].
+ LILO does _not_ report too-long commandlines, but the error will
+ show up as either a panic at boot time, stating
+ panic: BadRAM page in initial area
+ or the dmesg line starting with Memory: will mention an unpredicted
+ number of kilobytes. (Note that the latter number only includes
+ errors in accessed memory.)
+
+Future Possibilities
+ It would be possible to use even more of the faulty RAMs by employing
+ them for slabs. The smaller allocation granularity of slabs makes it
+ possible to throw out just, say, 32 bytes surrounding an error. This
+ would mean that the example DIMM only looses 16kB instead of 2MB.
+ It might even be possible to allocate the slabs in such a way that,
+ where possible, the remaining bytes in a slab structure are allocated
+ around the error, reducing the RAM loss to 0 in the optimal situation!
+
+ However, this yield is somewhat faked: It is possible to provide 512
+ pages of 32-byte slabs, but it is not certain that anyone would use
+ that many 32-byte slabs at any time.
+
+ A better solution might be to alter the page allocation for a slab to
+ have a preference for BadRAM pages, and given those a special treatment.
+ This way, the BadRAM would be spread over all the slabs, which seems
+ more likely to be a `true' pay-off. This would yield more overhead at
+ slab allocation time, but on the other hand, by the nature of slabs,
+ such allocations are made as rare as possible, so it might not matter
+ that much. I am uncertain where to go.
+
+ Many suggestions have been made to insert a RAM checker at boot time;
+ since this would leave the time to do only very meager checking, it
+ is not a reasonable option; we already have a BIOS doing that in most
+ systems!
+
+ It would be interesting to integrate this functionality with the
+ self-verifying nature of ECC RAM. These memories can even distinguish
+ between recorable and unrecoverable errors! Such memory has been
+ handled in older operating systems by `testing' once-failed memory
+ blocks for a while, by placing only (reloadable) program code in it.
+ Unfortunately, I possess no faulty ECC modules to work this out.
+
+Names and Places
+ The home page of this project is on
+ http://rick.vanrein.org/linux/badram
+ This page also links to Nico Schmoigl's experimental extensions to
+ this patch (with debugging and a few other fancy things).
+
+ In case you have experiences with the BadRAM software which differ from
+ the test reportings on that site, I hope you will mail me with that
+ new information.
+
+ The BadRAM project is an idea and implementation by
+ Rick van Rein
+ Binnenes 67
+ 9407 CX Assen
+ The Netherlands
+ vanrein@cs.utwente.nl
+ If you like it, a postcard would be much appreciated ;-)
+
+
+ Enjoy,
+ -Rick.
+
diff -urN linux-2.5.73-mm2/Documentation/kernel-parameters.txt linux-2.5/Documentation/kernel-parameters.txt
--- linux-2.5.73-mm2/Documentation/kernel-parameters.txt 2003-07-01 19:59:40.000000000 +0100
+++ linux-2.5/Documentation/kernel-parameters.txt 2003-07-02 01:52:10.921698985 +0100
@@ -15,6 +15,7 @@
APIC APIC support is enabled.
APM Advanced Power Management support is enabled.
AX25 Appropriate AX.25 support is enabled.
+ BADRAM Support for faulty RAM chips is enabled.
CD Appropriate CD support is enabled.
DEVFS devfs support is enabled.
DRM Direct Rendering Management support is enabled.
@@ -162,6 +163,9 @@
aztcd= [HW,CD] Aztech CD268 CDROM driver
Format: <io>,0x79 (?)
+ badram= [BADRAM] Avoid allocating faulty RAM addresses.
+
+
baycom_epp= [HW,AX25]
Format: <io>,<mode>
diff -urN linux-2.5.73-mm2/Documentation/memory.txt linux-2.5/Documentation/memory.txt
--- linux-2.5.73-mm2/Documentation/memory.txt 2003-06-17 05:20:28.000000000 +0100
+++ linux-2.5/Documentation/memory.txt 2003-07-01 20:02:27.000000000 +0100
@@ -18,6 +18,14 @@
as you add more memory. Consider exchanging your
motherboard.
+ 4) A static discharge or production fault causes a RAM module
+ to have (predictable) errors, usually meaning that certain
+ bits cannot be set or reset. Instead of throwing away your
+ RAM module, you may read /usr/src/linux/Documentation/badram.txt
+ to learn how to detect, locate and circuimvent such errors
+ in your RAM module.
+
+
All of these problems can be addressed with the "mem=XXXM" boot option
(where XXX is the size of RAM to use in megabytes).
It can also tell Linux to use less memory than is actually installed.
@@ -45,6 +53,8 @@
* Try passing the "mem=4M" option to the kernel to limit
Linux to using a very small amount of memory.
+ If this helps, read /usr/src/linux/Documentation/badram.txt
+ to learn how to find and circuimvent memory errors.
Other tricks:
diff -urN linux-2.5.73-mm2/arch/i386/defconfig linux-2.5/arch/i386/defconfig
--- linux-2.5.73-mm2/arch/i386/defconfig 2003-06-17 05:20:07.000000000 +0100
+++ linux-2.5/arch/i386/defconfig 2003-07-01 20:02:28.000000000 +0100
@@ -132,6 +132,7 @@
# CONFIG_EISA is not set
# CONFIG_MCA is not set
CONFIG_HOTPLUG=y
+CONFIG_BADRAM=y
#
# PCMCIA/CardBus support
diff -urN linux-2.5.73-mm2/arch/i386/mm/init.c linux-2.5/arch/i386/mm/init.c
--- linux-2.5.73-mm2/arch/i386/mm/init.c 2003-07-01 19:59:28.000000000 +0100
+++ linux-2.5/arch/i386/mm/init.c 2003-07-02 00:40:31.000000000 +0100
@@ -223,13 +223,24 @@
pkmap_page_table = pte;
}
-void __init one_highpage_init(struct page *page, int pfn, int bad_ppro)
+/**
+ * @param bad set on return to whether the page is bad RAM
+ */
+void __init one_highpage_init(struct page *page, int pfn, int bad_ppro,
+ _Bool *bad)
{
+ *bad = 0;
if (page_is_ram(pfn) && !(bad_ppro && page_kills_ppro(pfn))) {
ClearPageReserved(page);
set_bit(PG_highmem, &page->flags);
set_page_count(page, 1);
- __free_page(page);
+#ifdef CONFIG_BADRAM
+ if (PageBad(page))
+ *bad = 1;
+ else
+#else
+ __free_page(page);
+#endif
totalhigh_pages++;
} else
SetPageReserved(page);
@@ -239,8 +250,10 @@
void __init set_highmem_pages_init(int bad_ppro)
{
int pfn;
- for (pfn = highstart_pfn; pfn < highend_pfn; pfn++)
- one_highpage_init(pfn_to_page(pfn), pfn, bad_ppro);
+ for (pfn = highstart_pfn; pfn < highend_pfn; pfn++) {
+ one_highpage_init(pfn_to_page(pfn), pfn, bad_ppro, &bad);
+ if (bad) badpages++;
+ }
totalram_pages += totalhigh_pages;
}
#else
@@ -431,7 +444,7 @@
void __init mem_init(void)
{
extern int ppro_with_ram_bug(void);
- int codesize, reservedpages, datasize, initsize;
+ int codesize, reservedpages, badpages, datasize, initsize;
int tmp;
int bad_ppro;
@@ -467,13 +480,17 @@
totalram_pages += __free_all_bootmem();
reservedpages = 0;
+ badpages = 0;
for (tmp = 0; tmp < max_low_pfn; tmp++)
/*
- * Only count reserved RAM pages
+ * Only count reserved and bad RAM pages
*/
if (page_is_ram(tmp) && PageReserved(pfn_to_page(tmp)))
reservedpages++;
-
+#ifdef CONFIG_BADRAM
+ if (page_is_ram(tmp) && PageBad(pfn_to_page(tmp)))
+ badpages++;
+#endif
set_highmem_pages_init(bad_ppro);
codesize = (unsigned long) &_etext - (unsigned long) &_text;
@@ -484,6 +501,18 @@
kclist_add(&kcore_vmalloc, (void *)VMALLOC_STA);
return 0;
}
+
+
+#ifdef CONFIG_BADRAM
+
+/* Given a pointed-at address and a mask, increment the page so that the
+ * mask hides the increment. Return 0 if no increment is possible.
+ */
+static int __init next_masked_address (unsigned long *addrp, unsigned long mask)
+{
+ unsigned long inc=1;
+ unsigned long newval = *addrp;
+ while (inc & mask)
+ inc += inc;
+ while (inc != 0) {
+ newval += inc;
+ newval &= ~mask;
+ newval |= ((*addrp) & mask);
+ if (newval > *addrp) {
+ *addrp = newval;
+ return 1;
+ }
+ do {
+ inc += inc;
+ } while (inc & ~mask);
+ while (inc & mask)
+ inc += inc;
+ }
+ return 0;
+}
+
+
+void __init badram_markpages (int argc, unsigned long *argv) {
+ unsigned long addr, mask;
+ while (argc-- > 0) {
+ addr = *argv++;
+ mask = (argc-- > 0) ? *argv++ : ~0L;
+ mask |= ~PAGE_MASK; // Optimalisation
+ addr &= mask; // Normalisation
+ do {
+ struct page *pg = phys_to_page(addr);
+printk ("%05lx ", __pa(__va(addr)) >> PAGE_SHIFT);
+printk ("=%05lx/%05lx ", pg-mem_map, max_mapnr);
+ // if (VALID_PAGE(pg)) {
+ if (PageTestandSetBad (pg)) {
+ reserve_bootmem (addr, PAGE_SIZE);
+printk ("BAD ");
+ }
+else printk ("BFR ");
+ // }
+// else printk ("INV ");
+ } while (next_masked_address (&addr,mask));
+ }
+}
+
+
+
+/****rN linux-2.5.73-mm2/mm/bootmem.c l);
return 0;
}
+
+
+#ifdef CONFIG_BADRAM
+
+/* Given a pointed-at address and a mask, increment the page so that the
+ * mask hides the increment. Return 0 if no increment is possible.
+ */
+static int __init next_masked_address (unsigned long *addrp, unsigned long mask)
+{
+ unsigned long inc=1;
+ unsigned long newval = *addrp;
+ while (inc & mask)
+ inc += inc;
+ while (inc != 0) {
+ newval += inc;
+ newval &= ~mask;
+ newval |= ((*addrp) & mask);
+ if (newval > *addrp) {
+ *addrp = newval;
+ return 1;
+ }
+ do {
+ inc += inc;
+ } while (inc & ~mask);
+ while (inc & mask)
+ inc += inc;
+ }
+ return 0;
+}
+
+
+void __init badram_markpages (int argc, unsigned long *argv) {
+ unsigned long addr, mask;
+ while (argc-- > 0) {
+ addr = *argv++;
+ mask = (argc-- > 0) ? *argv++ : ~0L;
+ mask |= ~PAGE_MASK; // Optimalisation
+ addr &= mask; // Normalisation
+ do {
+ struct page *pg = phys_to_page(addr);
+printk ("%05lx ", __pa(__va(addr)) >> PAGE_SHIFT);
+printk ("=%05lx/%05lx ", pg-mem_map, max_mapnr);
+ // if (VALID_PAGE(pg)) {
+ if (PageTestandSetBad (pg)) {
+ reserve_bootmem (addr, PAGE_SIZE);
+printk ("BAD ");
+ }
+else printk ("BFR ");
+ // }
+// else printk ("INV ");
+ } while (next_masked_address (&addr,mask));
+ }
+}
+
+
+
+/****,96 @@
setup_per_zone_pages_min();
return 0;
}
+
+
+#ifdef CONFIG_BADRAM
+
+/* Given a pointed-at address and a mask, increment the page so that the
+ * mask hides the increment. Return 0 if no increment is possible.
+ */
+static int __init next_masked_address (unsigned long *addrp, unsigned long mask)
+{
+ unsigned long inc=1;
+ unsigned long newval = *addrp;
+ while (inc & mask)
+ inc += inc;
+ while (inc != 0) {
+ newval += inc;
+ newval &= ~mask;
+ newval |= ((*addrp) & mask);
+ if (newval > *addrp) {
+ *addrp = newval;
+ return 1;
+ }
+ do {
+ inc += inc;
+ } while (inc & ~mask);
+ while (inc & mask)
+ inc += inc;
+ }
+ return 0;
+}
+
+
+void __init badram_markpages (int argc, unsigned long *argv) {
+ unsigned long addr, mask;
+ while (argc-- > 0) {
+ addr = *argv++;
+ mask = (argc-- > 0) ? *argv++ : ~0L;
+ mask |= ~PAGE_MASK; // Optimalisation
+ addr &= mask; // Normalisation
+ do {
+ struct page *pg = phys_to_page(addr);
+printk ("%05lx ", __pa(__va(addr)) >> PAGE_SHIFT);
+printk ("=%05lx/%05lx ", pg-mem_map, max_mapnr);
+ // if (VALID_PAGE(pg)) {
+ if (PageTestandSetBad (pg)) {
+ reserve_bootmem (addr, PAGE_SIZE);
+printk ("BAD ");
+ }
+else printk ("BFR ");
+ // }
+// else printk ("INV ");
+ } while (next_masked_address (&addr,mask));
+ }
+}
+
+
+
+/*********** CONFIG_BADRAM: CUSTOMISABLE SECTION STARTS HERE ******************/
+
+
+// Enter your custom BadRAM patterns here as pairs of unsigned long integers.
+// For more information on these F/M pairs, refer to Documentation/badram.txt
+
+
+static unsigned long __devinitdata badram_custom[] = {
+ 0, // Number of longwords that follow, as F/M pairs
+};
+
+
+/*********** CONFIG_BADRAM: CUSTOMISABLE SECTION ENDS HERE ********************/
+
+
+
+static int __init badram_setup (char *str)
+{
+ unsigned long opts[3];
+ if (!mem_map) BUG();
+printk ("PAGE_OFFSET=0x%08lx\n", PAGE_OFFSET);
+printk ("BadRAM option is %s\n", str);
+ if (*str++ == '=')
+ while (str=get_options (str, 3, (int *) opts), *opts) {
+printk (" --> marking 0x%08lx, 0x%08lx [%ld]\n", opts[1], opts[2], opts[0]);
+ badram_markpages (*opts, opts+1);
+ if (*opts==1)
+ break;
+ };
+ badram_markpages (*badram_custom, badram_custom+1);
+ return 0;
+}
+
+__setup("badram", badram_setup);
+
+#endif /* CONFIG_BADRAM */
+
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] BadRAM for 2.5.73-mm2
2003-07-02 3:33 [PATCH] BadRAM for 2.5.73-mm2 steven.newbury1
@ 2003-07-02 6:23 ` Oleg Drokin
0 siblings, 0 replies; 4+ messages in thread
From: Oleg Drokin @ 2003-07-02 6:23 UTC (permalink / raw)
To: steven.newbury1; +Cc: linux-kernel
Hello!
On Wed, Jul 02, 2003 at 03:33:26AM +0000, steven.newbury1@ntlworld.com wrote:
> It will probably apply cleanly to 2.5.73 as well but I haven't checked it
> yet...
This
> It is based it on
> http://rick.vanrein.org/linux/badram/software/BadRAM-2.4.20.1.patch
and this patches both have a bug that effectively disables all highmem (if present)
when badram patch is activated. I contacted Rick, but never got any answer back.
(in fact I got an auto reply from some Debian bugtracking system, but that's about it).
Corrected 2.4.21 version of the patch is available at
http://linuxhacker.ru/patches/2.4.21-badram.diff
The arch/i386/mm/init.c part:
> +void __init one_highpage_init(struct page *page, int pfn, int bad_ppro,
> + _Bool *bad)
> {
> + *bad = 0;
> if (page_is_ram(pfn) && !(bad_ppro && page_kills_ppro(pfn))) {
> ClearPageReserved(page);
> set_bit(PG_highmem, &page->flags);
> set_page_count(page, 1);
> - __free_page(page);
> +#ifdef CONFIG_BADRAM
> + if (PageBad(page))
> + *bad = 1;
> + else
> +#else
> + __free_page(page);
> +#endif
> totalhigh_pages++;
should obviously get "#else" replaced with "#endif" and original "#endif" should be removed
because otherwise all of the highmem pages are either bad or used right from the initialisation.
(Yes, I have highmem (2G RAM) system with 2 bits bad and I am too lazy to to go to the store to
replace the RAM modules, so I given a patch a try and discovered this problem (which is there
for a long time it seems). I suspect that I am the only one who runs this patch on highmem system,
though ;) )
Bye,
Oleg
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] BadRAM for 2.5.73-mm2
@ 2003-07-02 8:56 Etienne Lorrain
0 siblings, 0 replies; 4+ messages in thread
From: Etienne Lorrain @ 2003-07-02 8:56 UTC (permalink / raw)
To: linux-kernel
Nowdays the biggest problem seems to be slightly incompatible timings
in between motherboard and physical RAM that hurts the most, not so
much non-working bits in RAM devices.
Is there someone who has a patch so that, in case of a kernel OOPS
(or maybe SIGSEGV / SIGILL outside kernel) is checking at least
the code page where IP register points to check if a bit has flipped?
Could be done by checking the page on the Hard Disk or by a CRC method.
Having a bad BIOS parameter so that _one_ bit changes randomly every
hours is not an easy thing to detect - a message on the screen would
be nice...
Etienne.
___________________________________________________________
Do You Yahoo!? -- Une adresse @yahoo.fr gratuite et en français !
Yahoo! Mail : http://fr.mail.yahoo.com
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH] BadRAM for 2.5.73-mm2
@ 2003-07-02 3:18 Steven Newbury
0 siblings, 0 replies; 4+ messages in thread
From: Steven Newbury @ 2003-07-02 3:18 UTC (permalink / raw)
To: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 502 bytes --]
Second attempt. Yahoo!Mail seems to have currupted the patch previously... :-/
I have ported the BadRAM patch to 2.5.73-mm2.
It will probably apply cleanly to 2.5.73 as well but I haven't checked it
yet...
I am running it now, it seems to work for me.
It is based it on
http://rick.vanrein.org/linux/badram/software/BadRAM-2.4.20.1.patch
=====
Steve
__________________________________________________
Yahoo! Plus - For a better Internet experience
http://uk.promotions.yahoo.com/yplus/yoffer.html
[-- Attachment #2: BadRAM-2.5.73-mm2.diff --]
[-- Type: application/octet-stream, Size: 24048 bytes --]
diff -urN linux-2.5.73-mm2/CREDITS linux-2.5/CREDITS
--- linux-2.5.73-mm2/CREDITS 2003-06-17 05:20:07.000000000 +0100
+++ linux-2.5/CREDITS 2003-07-02 01:58:52.146197970 +0100
@@ -2365,6 +2365,12 @@
S: 55127 Mainz
S: Germany
+N: Steven Newbury
+E: s_j_newbury@yahoo.co.uk
+D: Forward ported BadRAM patch to 2.5.73 from 2.4.20
+S: Derby
+S: United Kingdom
+
N: David C. Niemi
E: niemi@tux.org
W: http://www.tux.org/~niemi/
@@ -2601,6 +2607,14 @@
S: Malvern, Pennsylvania 19355
S: USA
+N: Rick van Rein
+E: vanrein@cs.utwente.nl
+W: http://www.cs.utwente.nl/~vanrein
+D: Memory, the BadRAM subsystem dealing with statically challanged RAM modules.
+S: Binnenes 67
+S: 9407 CX Assen
+S: The Netherlands
+
N: Stefan Reinauer
E: stepan@linux.de
W: http://www.freiburg.linux.de/~stepan/
@@ -2819,6 +2833,13 @@
E:
D: Macintosh IDE Driver
+N: Nico Schmoigl
+E: nico@writemail.com
+W: http://webrum.uni-mannheim.de/math/schmoigl/linux/
+D: Migration of BadRAM patch to 2.4.4 series (with Rick van Rein)
+S: Mannheim, BW, Germany
+P: 2047/38FC9E03 5D DB 09 E4 3F F3 CD 09 75 59 - 11 17 9C 03 46 E3 38 FC 9E 03
+
N: Peter De Schrijver
E: stud11@cc4.kuleuven.ac.be
D: Mitsumi CD-ROM driver patches March version
diff -urN linux-2.5.73-mm2/Documentation/badram.txt linux-2.5/Documentation/badram.txt
--- linux-2.5.73-mm2/Documentation/badram.txt 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.5/Documentation/badram.txt 2003-07-01 20:02:27.000000000 +0100
@@ -0,0 +1,266 @@
+INFORMATION ON USING BAD RAM MODULES
+====================================
+
+Introduction
+ RAM is getting smaller and smaller, and as a result, also more and more
+ vulnerable. This makes the manufacturing of hardware more expensive,
+ since an excessive amount of RAM chips must be discarded on account of
+ a single cell that is wrong. Similarly, static discharge may damage a
+ RAM module forever, which is usually remedied by replacing it
+ entirely.
+
+ This is not necessary, as the BadRAM code shows: By informing the Linux
+ kernel which addresses in a RAM are damaged, the kernel simply avoids
+ ever allocating such addresses but makes all the rest available.
+
+Reasons for this feature
+ There are many reasons why this kernel feature is useful:
+ - Chip manufacture is resource intensive; waste less and sleep better
+ - It's another chance to promote Linux as "the flexible OS"
+ - Some laptops have their RAM soldered in... and then it fails!
+ - It's plain cool ;-)
+
+Running example
+ To run this project, I was given two DIMMs, 32 MB each. One, that we
+ shall use as a running example in this text, contained 512 faulty bits,
+ spread over 1/4 of the address range in a regular pattern. Some tricks
+ with a RAM tester and a few binary calculations were sufficient to
+ write these faults down in 2 longword numbers.
+
+ The kernel recognised the correct number of pages with faults and did
+ not give them out for allocation. The allocation routines could
+ therefore progress as normally, without any adaption.
+ So, I gained 30 MB of DIMM which would otherwise have been thrown
+ away. After booting the kernel, the kernel behaved exactly as it
+ always had.
+
+Initial checks
+ If you experience RAM trouble, first read /usr/src/linux/memory.txt
+ and try out the mem=4M trick to see if at least some initial parts
+ of your RAM work well. The BadRAM routines halt the kernel in panic
+ if the reserved area of memory (containing kernel stuff) contains
+ a faulty address.
+
+Running a RAM checker
+ The memory checker is not built into the kernel, to avoid delays at
+ runtime. If you experience problems that may be caused by RAM, run
+ a good RAM checker, such as
+ http://reality.sgi.com/cbrady_denver/memtest86
+ The output of a RAM checker provides addresses that went wrong. In
+ the 32 MB chip with 512 faulty bits mentioned above, the errors were
+ found in the 8MB-16MB range (the DIMM was in slot #0) at addresses
+ xxx42f4
+ xxx62f4
+ xxxc2f4
+ xxxe2f4
+ and the error was a "sticky 1 bit", a memory bit that stayed "1" no
+ matter what was written to it. The regularity of this pattern
+ suggests the death of a buffer at the output stages of a row on one of
+ the chips. I expect such regularity to be commonplace. Finding this
+ regularity currently is human effort, but it should not be hard to
+ alter a RAM checker to capture it in some sort of pattern, possibly
+ the BadRAM patterns described below.
+
+ By the way, if you manage to get hold of memtest86 version 2.3 or
+ beyond, you can configure the printing mode to produce BadRAM patterns,
+ which find out exactly what you must enter on the LILO: commandline,
+ except that you shouldn't mention the added spacing. That means that
+ you can skip the following step, which saves you a *lot* of work.
+
+ Also by the way, if your machine has the ISA memory gap in the 15M-16M
+ range unstoppable, Linux can get in trouble. One way of handling that
+ situation is by specifying the total memory size to Linux with a boot
+ parameter mem=... and then to tell it to treat the 15M-16M range as
+ faulty with an additional boot parameter, for instance:
+ mem=24M badram=0x00f00000,0xfff00000
+ if you installed 24MB of RAM in total.
+
+Capturing errors in a pattern
+ Instead of manually providing all 512 errors to the kernel, it's nicer
+ to generate a pattern. Since the regularity is based on address decoding
+ software, which generally takes certain bits into account and ignores
+ others, we shall provide a faulty address F, together with a bit mask M
+ that specifies which bits must be equal to F. In C code, an address A
+ is faulty if and only if
+ (F & M) == (A & M)
+ or alternately (closer to a hardware implementation):
+ ~((F ^ A) & M)
+ In the example 32 MB chip, we had the faulty addresses in 8MB-16MB:
+ xxx42f4 ....0100....
+ xxx62f4 ....0110....
+ xxxc2f4 ....1100....
+ xxxe2f4 ....1110....
+ The second column represents the alternating hex digit in binary form.
+ Apperantly, the first and one-but last binary digit can be anything,
+ so the binary mask for that part is 0101. The mask for the part after
+ this is 0xfff, and the part before should select anything in the range
+ 8MB-16MB, or 0x00800000-0x01000000; this is done with a bitmask
+ 0xff80xxxx. Combining these partial masks, we get:
+ F=0x008042f4 M=0xff805fff
+ That covers everything for this DIMM; for more complicated failing
+ DIMMs, or for a combination of multiple failing DIMMs, it can be
+ necessary to set up a number of such F/M pairs.
+
+Rebooting Linux
+ Now that these patterns are known (and double-checked, the calculations
+ are highly error-prone... it would be neat to test them in the RAM
+ checker...) we simply restart Linux with these F/M pairs as a parameter.
+ If you normally boot as follows:
+ LILO: linux
+ you should now boot with
+ LILO: linux badram=0x008042f4,0xff805fff
+ or perhaps by mentioning more F/M pairs in an order F0,M0,F1,M1,...
+ When you provide an odd number of arguments to badram, the default mask
+ 0xffffffff (only one address matched) is applied to the pattern.
+
+ Beware of the commandline length. At least up to LILO version 0.21,
+ the commandline is cut off after the 78th character; later versions
+ may go as far as the kernel goes, namely 255 characters. In no way is
+ it possible to enter more than 10 numbers to the badram boot option.
+
+ When the kernel now boots, it should not give any trouble with RAM.
+ Mind you, this is under the assumption that the kernel and its data
+ storage do not overlap an erroneous part. If this happens, and the
+ kernel does not choke on it right away, it will stop with a panic.
+ You will need to provide a RAM where the initial, say 2MB, is faultless.
+
+ Now look up your memory status with
+ dmesg | grep ^Memory:
+ which prints a single line with information like
+ Memory: 158524k/163840k available
+ (940k kernel code,
+ 412k reserved,
+ 1856k data,
+ 60k init,
+ 0k highmem,
+ 2048k BadRAM)
+ The latter entry, the badram, is 2048k to represent the loss of 2MB
+ of general purpose RAM due to the errors. Or, positively rephrased,
+ instead of throwing out 32MB as useless, you only throw out 2MB.
+
+ If the system is stable (try compiling a few kernels, and do a few
+ finds in / or so) you may add the boot parameter to /etc/lilo.conf
+ as a line to _all_ the kernels that handle this trouble with a line
+ append="badram=0x008042f4,0xff805fff"
+ after which you run "lilo".
+ Warning: Don't experiment with these settings on your only boot image.
+ If the BadRAM overlays kernel code, data, init, or other reserved
+ memory, the kernel will halt in panic. Try settings on a test boot
+ image first, and if you get a panic you should change the order of
+ your DIMMs [which may involve buying a new one just to be able to
+ change the order].
+
+ You are allowed to enter any number of BadRAM patterns in all the
+ places documented in this file. They will all apply. It is even
+ possible to mention several BadRAM patterns in a single place. The
+ completion of an odd number of arguments with the default mask is
+ done separately for each badram=... option.
+
+Kernel Customisation
+ Some people prefer to enter their badram patterns in the kernel, and
+ this is also possible. In mm/page_alloc.c there is an array of unsigned
+ long integers into which the parameters can be entered, prefixed with
+ the number of integers (twice the number of patterns). The array is
+ named badram_custom and it will be added to the BadRAM list whenever an
+ option 'badram' is provided on the commandline when booting, either
+ with or without additional patterns.
+
+ For the previous example, the code would become
+
+ static unsigned long __init badram_custom[] = {
+ 2, // Number of longwords that follow, as F/M pairs
+ 0x008042f4L, 0xff805fffL,
+ };
+
+ Even on this place you may assume the default mask to be filled in
+ when you enter an odd number of longwords. Specify the number of
+ longwords to be 0 to avoid influence of this custom BadRAM list.
+
+BadRAM classification
+ This technique may start a lively market for "dead" RAM. It is important
+ to realise that some RAMs are more dead than others. So, instead of
+ just providing a RAM size, it is also important to know the BadRAM
+ class, which is defined as follows:
+
+ A BadRAM class N means that at most 2^N bytes have a problem,
+ and that all problems with the RAMs are persistent: They
+ are predictable and always show up.
+
+ The DIMM that serves as an example here was of class 9, since 512=2^9
+ errors were found. Higher classes are worse, "correct" RAM is of class
+ -1 (or even less, at your choice).
+ Class N also means that the bitmask for your chip (if there's just one,
+ that is) counts N bits "0" and it means that (if no faults fall in the
+ same page) an amount of 2^N*PAGESIZE memory is lost, in the example on
+ an i386 architecture that would be 2^9*4k=2MB, which accounts for the
+ initial claim of 30MB RAM gained with this DIMM.
+
+ Note that this scheme has deliberately been defined to be independent
+ of memory technology and of computer architecture.
+
+Known Bugs
+ LILO is known to cut off commandlines which are too long. For the
+ lilo-0.21 distribution, a commandline may not exceed 78 characters,
+ while actually, 255 would be possible [on i386, kernel 2.2.16].
+ LILO does _not_ report too-long commandlines, but the error will
+ show up as either a panic at boot time, stating
+ panic: BadRAM page in initial area
+ or the dmesg line starting with Memory: will mention an unpredicted
+ number of kilobytes. (Note that the latter number only includes
+ errors in accessed memory.)
+
+Future Possibilities
+ It would be possible to use even more of the faulty RAMs by employing
+ them for slabs. The smaller allocation granularity of slabs makes it
+ possible to throw out just, say, 32 bytes surrounding an error. This
+ would mean that the example DIMM only looses 16kB instead of 2MB.
+ It might even be possible to allocate the slabs in such a way that,
+ where possible, the remaining bytes in a slab structure are allocated
+ around the error, reducing the RAM loss to 0 in the optimal situation!
+
+ However, this yield is somewhat faked: It is possible to provide 512
+ pages of 32-byte slabs, but it is not certain that anyone would use
+ that many 32-byte slabs at any time.
+
+ A better solution might be to alter the page allocation for a slab to
+ have a preference for BadRAM pages, and given those a special treatment.
+ This way, the BadRAM would be spread over all the slabs, which seems
+ more likely to be a `true' pay-off. This would yield more overhead at
+ slab allocation time, but on the other hand, by the nature of slabs,
+ such allocations are made as rare as possible, so it might not matter
+ that much. I am uncertain where to go.
+
+ Many suggestions have been made to insert a RAM checker at boot time;
+ since this would leave the time to do only very meager checking, it
+ is not a reasonable option; we already have a BIOS doing that in most
+ systems!
+
+ It would be interesting to integrate this functionality with the
+ self-verifying nature of ECC RAM. These memories can even distinguish
+ between recorable and unrecoverable errors! Such memory has been
+ handled in older operating systems by `testing' once-failed memory
+ blocks for a while, by placing only (reloadable) program code in it.
+ Unfortunately, I possess no faulty ECC modules to work this out.
+
+Names and Places
+ The home page of this project is on
+ http://rick.vanrein.org/linux/badram
+ This page also links to Nico Schmoigl's experimental extensions to
+ this patch (with debugging and a few other fancy things).
+
+ In case you have experiences with the BadRAM software which differ from
+ the test reportings on that site, I hope you will mail me with that
+ new information.
+
+ The BadRAM project is an idea and implementation by
+ Rick van Rein
+ Binnenes 67
+ 9407 CX Assen
+ The Netherlands
+ vanrein@cs.utwente.nl
+ If you like it, a postcard would be much appreciated ;-)
+
+
+ Enjoy,
+ -Rick.
+
diff -urN linux-2.5.73-mm2/Documentation/kernel-parameters.txt linux-2.5/Documentation/kernel-parameters.txt
--- linux-2.5.73-mm2/Documentation/kernel-parameters.txt 2003-07-01 19:59:40.000000000 +0100
+++ linux-2.5/Documentation/kernel-parameters.txt 2003-07-02 01:52:10.921698985 +0100
@@ -15,6 +15,7 @@
APIC APIC support is enabled.
APM Advanced Power Management support is enabled.
AX25 Appropriate AX.25 support is enabled.
+ BADRAM Support for faulty RAM chips is enabled.
CD Appropriate CD support is enabled.
DEVFS devfs support is enabled.
DRM Direct Rendering Management support is enabled.
@@ -162,6 +163,9 @@
aztcd= [HW,CD] Aztech CD268 CDROM driver
Format: <io>,0x79 (?)
+ badram= [BADRAM] Avoid allocating faulty RAM addresses.
+
+
baycom_epp= [HW,AX25]
Format: <io>,<mode>
diff -urN linux-2.5.73-mm2/Documentation/memory.txt linux-2.5/Documentation/memory.txt
--- linux-2.5.73-mm2/Documentation/memory.txt 2003-06-17 05:20:28.000000000 +0100
+++ linux-2.5/Documentation/memory.txt 2003-07-01 20:02:27.000000000 +0100
@@ -18,6 +18,14 @@
as you add more memory. Consider exchanging your
motherboard.
+ 4) A static discharge or production fault causes a RAM module
+ to have (predictable) errors, usually meaning that certain
+ bits cannot be set or reset. Instead of throwing away your
+ RAM module, you may read /usr/src/linux/Documentation/badram.txt
+ to learn how to detect, locate and circuimvent such errors
+ in your RAM module.
+
+
All of these problems can be addressed with the "mem=XXXM" boot option
(where XXX is the size of RAM to use in megabytes).
It can also tell Linux to use less memory than is actually installed.
@@ -45,6 +53,8 @@
* Try passing the "mem=4M" option to the kernel to limit
Linux to using a very small amount of memory.
+ If this helps, read /usr/src/linux/Documentation/badram.txt
+ to learn how to find and circuimvent memory errors.
Other tricks:
diff -urN linux-2.5.73-mm2/arch/i386/defconfig linux-2.5/arch/i386/defconfig
--- linux-2.5.73-mm2/arch/i386/defconfig 2003-06-17 05:20:07.000000000 +0100
+++ linux-2.5/arch/i386/defconfig 2003-07-01 20:02:28.000000000 +0100
@@ -132,6 +132,7 @@
# CONFIG_EISA is not set
# CONFIG_MCA is not set
CONFIG_HOTPLUG=y
+CONFIG_BADRAM=y
#
# PCMCIA/CardBus support
diff -urN linux-2.5.73-mm2/arch/i386/mm/init.c linux-2.5/arch/i386/mm/init.c
--- linux-2.5.73-mm2/arch/i386/mm/init.c 2003-07-01 19:59:28.000000000 +0100
+++ linux-2.5/arch/i386/mm/init.c 2003-07-02 00:40:31.000000000 +0100
@@ -223,13 +223,24 @@
pkmap_page_table = pte;
}
-void __init one_highpage_init(struct page *page, int pfn, int bad_ppro)
+/**
+ * @param bad set on return to whether the page is bad RAM
+ */
+void __init one_highpage_init(struct page *page, int pfn, int bad_ppro,
+ _Bool *bad)
{
+ *bad = 0;
if (page_is_ram(pfn) && !(bad_ppro && page_kills_ppro(pfn))) {
ClearPageReserved(page);
set_bit(PG_highmem, &page->flags);
set_page_count(page, 1);
- __free_page(page);
+#ifdef CONFIG_BADRAM
+ if (PageBad(page))
+ *bad = 1;
+ else
+#else
+ __free_page(page);
+#endif
totalhigh_pages++;
} else
SetPageReserved(page);
@@ -239,8 +250,10 @@
void __init set_highmem_pages_init(int bad_ppro)
{
int pfn;
- for (pfn = highstart_pfn; pfn < highend_pfn; pfn++)
- one_highpage_init(pfn_to_page(pfn), pfn, bad_ppro);
+ for (pfn = highstart_pfn; pfn < highend_pfn; pfn++) {
+ one_highpage_init(pfn_to_page(pfn), pfn, bad_ppro, &bad);
+ if (bad) badpages++;
+ }
totalram_pages += totalhigh_pages;
}
#else
@@ -431,7 +444,7 @@
void __init mem_init(void)
{
extern int ppro_with_ram_bug(void);
- int codesize, reservedpages, datasize, initsize;
+ int codesize, reservedpages, badpages, datasize, initsize;
int tmp;
int bad_ppro;
@@ -467,13 +480,17 @@
totalram_pages += __free_all_bootmem();
reservedpages = 0;
+ badpages = 0;
for (tmp = 0; tmp < max_low_pfn; tmp++)
/*
- * Only count reserved RAM pages
+ * Only count reserved and b02 01:12:09.000000000 +0100
@@ -12,6 +12,7 @@
* Zone balancing, Kanoj Sarcar, SGI, Jan 2000
* Per cpu hot/cold page lists, bulk allocation, Martin J. Bligh, Sept 2002
* (lots of bits borrowed from Ingo Molnar & Andrew Morton)
+ * BadRAM handling, Rick van Rein, Feb 2001
*/
#include <linux/config.h>
@@ -1615,3 +1616,96 @@
setup_per_zone_pages_min();
return 0;
}
+
+
+#ifdef CONFIG_BADRAM
+
+/* Given a pointed-at address and a mask, increment the page so that the
+ * mask hides the increment. Return 0 if no increment is possible.
+ */
+static int __init next_masked_address (unsigned long *addrp, unsigned long mask)
+{
+ unsigned long inc=1;
+ unsigned long newval = *addrp;
+ while (inc & mask)
+ inc += inc;
+ while (inc != 0) {
+ newval += inc;
+ newval &= ~mask;
+ newval |= ((*addrp) & mask);
+ if (newval > *addrp) {
+ *addrp = newval;
+ return 1;
+ }
+ do {
+ inc += inc;
+ } while (inc & ~mask);
+ while (inc & mask)
+ inc += inc;
+ }
+ return 0;
+}
+
+
+void __init badram_markpages (int argc, unsigned long *argv) {
+ unsigned long addr, mask;
+ while (argc-- > 0) {
+ addr = *argv++;
+ mask = (argc-- > 0) ? *argv++ : ~0L;
+ mask |= ~PAGE_MASK; // Optimalisation
+ addr &= mask; // Normalisation
+ do {
+ struct page *pg = phys_to_page(addr);
+priasm-i386/page.h 2003-07-01 20:49:202 01:12:09.000000000 +0100
@@ -12,6 +12,7 @@
* Zone balancing, Kanoj Sarcar, SGI, Jan 2000
* Per cpu hot/cold page lists, bulk allocation, Martin J. Bligh, Sept 2002
* (lots of bits borrowed from Ingo Molnar & Andrew Morton)
+ * BadRAM handling, Rick van Rein, Feb 2001
*/
#include <linux/config.h>
@@ -1615,3 +1616,96 @@
setup_per_zone_pages_min();
return 0;
}
+
+
+#ifdef CONFIG_BADRAM
+
+/* Given a pointed-at address and a mask, increment the page so that the
+ * mask hides the increment. Return 0 if no increment is possible.
+ */
+static int __init next_masked_address (unsigned long *addrp, unsigned long mask)
+{
+ unsigned long inc=1;
+ unsigned long newval = *addrp;
+ while (inc & mask)
+ inc += inc;
+ while (inc != 0) {
+ newval += inc;
+ newval &= ~mask;
+ newval |= ((*addrp) & mask);
+ if (newval > *addrp) {
+ *addrp = newval;
+ return 1;
+ }
+ do {
+ inc += inc;
+ } while (inc & ~mask);
+ while (inc & mask)
+ inc += inc;
+ }
+ return 0;
+}
+
+
+void __init badram_markpages (int argc, unsigned long *argv) {
+ unsigned long addr, mask;
+ while (argc-- > 0) {
+ addr = *argv++;
+ mask = (argc-- > 0) ? *argv++ : ~0L;
+ mask |= ~PAGE_MASK; // Optimalisation
+ addr &= mask; // Normalisation
+ do {
+ struct page *pg = phys_to_page(addr);
+prilinux-2.5/mm/page_alloc.c 2003-07-02 01:12:09.000000000 +0100
@@ -12,6 +12,7 @@
* Zone balancing, Kanoj Sarcar, SGI, Jan 2000
* Per cpu hot/cold page lists, bulk allocation, Martin J. Bligh, Sept 2002
* (lots of bits borrowed from Ingo Molnar & Andrew Morton)
+ * BadRAM handling, Rick van Rein, Feb 2001
*/
#include <linux/config.h>
@@ -1615,3 +1616,96 @@
setup_per_zone_pages_min();
return 0;
}
+
+
+#ifdef CONFIG_BADRAM
+
+/* Given a pointed-at address and a mask, increment the page so that the
+ * mask hides the increment. Return 0 if no increment is possible.
+ */
+static int __init next_masked_address (unsigned long *addrp, unsigned long mask)
+{
+ unsigned long inc=1;
+ unsigned long newval = *addrp;
+ while (inc & mask)
+ inc += inc;
+ while (inc != 0) {
+ newval += inc;
+ newval &= ~mask;
+ newval |= ((*addrp) & mask);
+ if (newval > *addrp) {
+ *addrp = newval;
+ return 1;
+ }
+ do {
+ inc += inc;
+ } while (inc & ~mask);
+ while (inc & mask)
+ inc += inc;
+ }
+ return 0;
+}
+
+
+void __init badram_markpages (int argc, unsigned long *argv) {
+ unsigned long addr, mask;
+ while (argc-- > 0) {
+ addr = *argv++;
+ mask = (argc-- > 0) ? *argv++ : ~0L;
+ mask |= ~PAGE_MASK; // Optimalisation
+ addr &= mask; // Normalisation
+ do {
+ struct page *pg = phys_to_page(addr);
+printk ("%05lx ", __pa(__va(addr)) >> PAGE_SHIFT);
+printk ("=%05lx/%05lx ", pg-mem_map, max_mapnr);
+ // if (VALID_PAGE(pg)) {
+ if (PageTestandSetBad (pg)) {
+ reserve_bootmem (addr, PAGE_SIZE);
+printk ("BAD ");
+ }
+else printk ("BFR ");
+ // }
+// else printk ("INV ");
+ } while (next_masked_address (&addr,mask));
+ }
+}
+
+
+
+/*********** CONFIG_BADRAM: CUSTOMISABLE SECTION STARTS HERE ******************/
+
+
+// Enter your custom BadRAM patterns here as pairs of unsigned long integers.
+// For more information on these F/M pairs, refer to Documentation/badram.txt
+
+
+static unsigned long __devinitdata badram_custom[] = {
+ 0, // Number of longwords that follow, as F/M pairs
+};
+
+
+/*********** CONFIG_BADRAM: CUSTOMISABLE SECTION ENDS HERE ********************/
+
+
+
+static int __init badram_setup (char *str)
+{
+ unsigned long opts[3];
+ if (!mem_map) BUG();
+printk ("PAGE_OFFSET=0x%08lx\n", PAGE_OFFSET);
+printk ("BadRAM option is %s\n", str);
+ if (*str++ == '=')
+ while (str=get_options (str, 3, (int *) opts), *opts) {
+printk (" --> marking 0x%08lx, 0x%08lx [%ld]\n", opts[1], opts[2], opts[0]);
+ badram_markpages (*opts, opts+1);
+ if (*opts==1)
+ break;
+ };
+ badram_markpages (*badram_custom, badram_custom+1);
+ return 0;
+}
+
+__setup("badram", badram_setup);
+
+#endif /* CONFIG_BADRAM */
+
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2003-07-02 8:42 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-07-02 3:33 [PATCH] BadRAM for 2.5.73-mm2 steven.newbury1
2003-07-02 6:23 ` Oleg Drokin
-- strict thread matches above, loose matches on Subject: below --
2003-07-02 8:56 Etienne Lorrain
2003-07-02 3:18 Steven Newbury
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).