linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] [v3] powerpc/4xx: work around CHIP11 errata in a more PAGE_SIZE-friendly way
@ 2008-11-12  0:06 Hollis Blanchard
  2008-11-12  0:09 ` David Gibson
  2008-11-12  4:37 ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 19+ messages in thread
From: Hollis Blanchard @ 2008-11-12  0:06 UTC (permalink / raw)
  To: dwg, jwboyer; +Cc: linuxppc-dev, kvm-ppc, yanok

The current CHIP11 errata truncates the device tree memory node, and subtracts
(hardcoded) 4096 bytes. This breaks kernels with larger PAGE_SIZE, since the
bootmem allocator assumes that total memory is a multiple of PAGE_SIZE.

Instead, use a device tree memory reservation to reserve only the 256 bytes
actually affected by the errata, leaving the total memory size unaltered.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>

---

Changes from v2:
- David pointed out I'd duplicated the fdt_add_mem_rsv() prototype, and that
  4xx.c should directly include libfdt/libfdt.h instead.

Using large pages results in a huge performance improvement for KVM, and this
patch is required to make Ilya's large page patch work. David and/or Josh,
please apply.

diff --git a/arch/powerpc/boot/4xx.c b/arch/powerpc/boot/4xx.c
--- a/arch/powerpc/boot/4xx.c
+++ b/arch/powerpc/boot/4xx.c
@@ -20,8 +20,9 @@
 #include "ops.h"
 #include "reg.h"
 #include "dcr.h"
+#include "libfdt/libfdt.h"
 
-static unsigned long chip_11_errata(unsigned long memsize)
+static void chip_11_errata(unsigned long memsize)
 {
 	unsigned long pvr;
 
@@ -31,13 +32,11 @@ static unsigned long chip_11_errata(unsi
 		case 0x40000850:
 		case 0x400008d0:
 		case 0x200008d0:
-			memsize -= 4096;
+			fdt_add_mem_rsv(fdt, memsize - 256, 256);
 			break;
 		default:
 			break;
 	}
-
-	return memsize;
 }
 
 /* Read the 4xx SDRAM controller to get size of system memory. */
@@ -53,7 +52,7 @@ void ibm4xx_sdram_fixup_memsize(void)
 			memsize += SDRAM_CONFIG_BANK_SIZE(bank_config);
 	}
 
-	memsize = chip_11_errata(memsize);
+	chip_11_errata(memsize);
 	dt_fixup_memory(0, memsize);
 }
 
@@ -219,7 +218,7 @@ void ibm4xx_denali_fixup_memsize(void)
 		bank = 4; /* 4 banks */
 
 	memsize = cs * (1 << (col+row)) * bank * dpath;
-	memsize = chip_11_errata(memsize);
+	chip_11_errata(memsize);
 	dt_fixup_memory(0, memsize);
 }
 
diff --git a/arch/powerpc/boot/libfdt-wrapper.c b/arch/powerpc/boot/libfdt-wrapper.c
--- a/arch/powerpc/boot/libfdt-wrapper.c
+++ b/arch/powerpc/boot/libfdt-wrapper.c
@@ -51,7 +51,7 @@
 #define devp_offset_find(devp)	(((int)(devp))-1)
 #define devp_offset(devp)	(devp ? ((int)(devp))-1 : 0)
 
-static void *fdt;
+void *fdt;
 static void *buf; /* = NULL */
 
 #define EXPAND_GRANULARITY	1024
diff --git a/arch/powerpc/boot/ops.h b/arch/powerpc/boot/ops.h
--- a/arch/powerpc/boot/ops.h
+++ b/arch/powerpc/boot/ops.h
@@ -14,6 +14,7 @@
 #include <stddef.h>
 #include "types.h"
 #include "string.h"
+#include "libfdt_env.h"
 
 #define	COMMAND_LINE_SIZE	512
 #define	MAX_PATH_LEN		256
@@ -32,6 +33,9 @@ struct platform_ops {
 	void *	(*vmlinux_alloc)(unsigned long size);
 };
 extern struct platform_ops platform_ops;
+
+/* The device tree itself. Should almost always be accessed via dt_ops. */
+extern void *fdt;
 
 /* Device Tree operations */
 struct dt_ops {

^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH] [v3] powerpc/4xx: work around CHIP11 errata in a more PAGE_SIZE-friendly way
@ 2008-11-14 17:25 Milton Miller
  2008-11-14 17:29 ` Milton Miller
  0 siblings, 1 reply; 19+ messages in thread
From: Milton Miller @ 2008-11-14 17:25 UTC (permalink / raw)
  To: Ben Herrenschmidt, Hollis Blanchard, Josh Boyer; +Cc: linux-ppc

In-Reply-To: <1226606055.5339.31.camel@localhost.localdomain>


On Fri Nov 14 at 06:54:15 EST in 2008, Hollis Blanchard wrote:
> On Thu, 2008-11-13 at 07:44 +1100, Benjamin Herrenschmidt wrote:
>> Again, why can't we just stick something in the kernel code that
>> reserves the last page ? It could be in prom.c or it could be called 
>> by
>> affected 4xx platforms by the platform code, whatever, but the reserve
>> map isn't really meant for that and will not be passed over from 
>> kernel
>> to kernel by kexec.
>
> Reserving a page is overkill; only the last 256 bytes are affected. We
> need to intercept at the LMB level, because allocations are already 
> done
> there, so by the time we hit bootmem it's way too late.

I agree with Ben we need to have something in the tree to tell kexec 
and or the kernel of this errata, unless we adapt the kernel to not 
require the memory node be page size aligned.

I instigated a discussion with Josh and Hollis on irc.

> I simply don't see a good place to do this in the kernel. It would have
> to be before the first lmb_alloc() call, which for safety would put it
> inside early_init_devtree() -- along with the other lmb_reserve()
> calls.[1]
>
> [1] This is exactly where flat device tree reservations are done, and
> that's why the patch I submitted works.


> However, ppc_md.probe() hasn't even been called yet, so there's no way
> of knowing if we're on an affected system, unless you want to add a
> special of_scan_flat_dt() call here.

I think we decided a property is the right way to go, but am not sure 
we decided if it should be a specific property in the /cpus/cpu@* nodes 
or a general property that describes a base and length ... in which 
case it is either a property in /memory (cpus nodes are not part of the 
system address space, with an independent size 0 address space).   It 
was also noted if we go the property route. that kexec tools would need 
to know about it since it allocates destination pages based on reading 
/memory reg ranges, although it also has a hardcoded 768M limit which 
might hide this.

> I'm open to suggestions, but I don't see a better way than what I
> already sent. I think the important part is to call lmb_add() for all
> memory, but lmb_reserve() the last 256 bytes before lmb_alloc() 
> happens.
>
> It sounds like kexec must have some knowledge of the platform and 
> device
> tree already, so is this really a big deal? At any rate, this
> conversation is somewhat academic, since there is no kexec on 44x... so
> maybe this can be re-addressed when that becomes a real issue.

As discussed, kexec userspace has some ideas of platforms, but its very 
general and should not have lists of which cpus have an errata but 
should base all its decisions off the device tree.

Alternatives to adding a property include just trimming the memory node 
(and fixing the kernel to handle memory size not being page aligned), 
and adding an additional node that says this memory is in use.  We 
should handle the memory size not some big power of 2 anyways, and if 
we just create a new node it should not overlap the memory node 
anyways.  Although we did note that due to current kexec implementation 
we can name a node starting with /rtas and use linux,rtas-base and 
rtas-size to reserve any 32 bit chunk of memory even to kexec, although 
that is considered beyond acceptable for this errata fix (some else 
might want to join me in using that to reserve memory for log buffers 
across boot).

It has been described to me that the bug affects any access to the 256 
bytes, so it would be accurate to describe the memory as not existing 
or as this cpu has an errata tnd the dram is really here.  I just say 
it needs to be described in the device tree.  Trimming the memory node 
has the advantage that kexec userspace will not need a patch, adding 
the cpu has errata property would only require a patch for platofrms 
with <768MB (or manual override of the usable memory size via the 
command line).


milton

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2008-11-25 23:43 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-11-12  0:06 [PATCH] [v3] powerpc/4xx: work around CHIP11 errata in a more PAGE_SIZE-friendly way Hollis Blanchard
2008-11-12  0:09 ` David Gibson
2008-11-12  4:37 ` Benjamin Herrenschmidt
2008-11-12 11:31   ` Josh Boyer
2008-11-12 11:52     ` Benjamin Herrenschmidt
2008-11-12 15:11       ` Hollis Blanchard
2008-11-12 20:44         ` Benjamin Herrenschmidt
2008-11-12 20:53           ` Josh Boyer
2008-11-13 19:54           ` Hollis Blanchard
2008-11-14 17:25 Milton Miller
2008-11-14 17:29 ` Milton Miller
2008-11-14 22:09   ` Hollis Blanchard
2008-11-18 20:33     ` Hollis Blanchard
2008-11-24 20:07     ` Hollis Blanchard
2008-11-25  0:10       ` Michael Ellerman
2008-11-25 17:10         ` Milton Miller
2008-11-25 21:17           ` Hollis Blanchard
2008-11-25 21:53         ` Hollis Blanchard
2008-11-25 23:43           ` Michael Ellerman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).