All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch] fix per-CPU MCA mess and make UP kernels work again
@ 2005-01-26  2:47 David Mosberger
  2005-01-26 16:25 ` Jesse Barnes
                   ` (27 more replies)
  0 siblings, 28 replies; 29+ messages in thread
From: David Mosberger @ 2005-01-26  2:47 UTC (permalink / raw)
  To: linux-ia64

The per-CPU MCA support was in a mess: there was confusion as to
whether ar.k3 points to cpuinfo or the per-CPU area.  Apart from not
allocating the per-CPU MCA area for UP, I _think_ that's the main
reason UP didn't work.  The reason I say _think_ is that once I
started to look at the code more closely, there were so many things
wrong with it that it wasn't even funny.  Attached patch tries to
clean things up.

The patch has been compile- and boot-tested for zx1 UP and SMP.  I
think it should be OK for discontig configs, too, but I haven't tested
that (and if anybody wanted to build discontig for UP, then
discontig.c:per_cpu_init() would have to be updated like the contig.c
version.

Also, I verified that the kernel can still do an INIT dump.  Other
than that, I can't really test the MCA-path.

	--david

This patch cleans up the per-CPU MCA mess with the following changes
(and yields a UP kernel that actually boots again):

 - In percpu.h, make per_cpu_init() a function-call even for the
   UP case.
 - In contig.c, enable per_cpu_init() even for UP since we need to
   allocate the per-CPU MCA data in that case as well.
 - Move the MCA-related stuff out of the cpuinfo structure into
   per-CPU variables defined by mca.c.
 - Rename IA64_KR_PA_CPU_INFO to IA64_KR_PER_CPU_DATA, since it really
   is a per-CPU pointer now.
 - In mca.h, move IA64_MCA_STACK_SIZE early enough so it gets defined
   for assembly-code, too.  Tidy up struct ia64_mca_struct.  Add declaration
   of ia64_mca_cpu_init().
 - In mca_asm.[hS], replace various GET_*() macros with a single
   GET_PERCPU_ADDR() which loads the physical address of an
   arbitrary per-CPU variable.  Remove all dependencies on the
   layout of the cpuinfo structure.  Replace hardcoded stack-size
   with IA64_MCA_STACK_SIZE constant.  Replace hardcoded references
   to ar.k3 with IA64_KR(PER_CPU_DATA).
 - In setup.c:cpu_init(), initialize ar.k3 to be the physical equivalent
   of the per-CPU data pointer.
 - Nuke silly ia64_mca_cpu_t typedef and just use struct ia64_mca_cpu instead.
 - Move __per_cpu_mca[] from setup.c to mca.c.
 - Rename set_mca_pointer() to ia64_mca_cpu_init() and sanitize it.
 - Rename efi.c:pal_code_memdesc() to efi_get_pal_addr() and make it
   return the PAL address, rather than a memory-descriptor.
 - Make efi_map_pal_code() use efi_get_pal_addr().

Signed-off-by: David Mosberger-Tang <davidm@hpl.hp.com>

=== arch/ia64/kernel/asm-offsets.c 1.9 vs edited ==--- 1.9/arch/ia64/kernel/asm-offsets.c	2005-01-06 16:19:39 -08:00
+++ edited/arch/ia64/kernel/asm-offsets.c	2005-01-25 17:43:43 -08:00
@@ -193,9 +193,17 @@
 	DEFINE(IA64_CLONE_VM, CLONE_VM);
 
 	BLANK();
-	DEFINE(IA64_CPUINFO_NSEC_PER_CYC_OFFSET, offsetof (struct cpuinfo_ia64, nsec_per_cyc));
-	DEFINE(IA64_TIMESPEC_TV_NSEC_OFFSET, offsetof (struct timespec, tv_nsec));
-
+	DEFINE(IA64_CPUINFO_NSEC_PER_CYC_OFFSET,
+	       offsetof (struct cpuinfo_ia64, nsec_per_cyc));
+	DEFINE(IA64_CPUINFO_PTCE_BASE_OFFSET,
+	       offsetof (struct cpuinfo_ia64, ptce_base));
+	DEFINE(IA64_CPUINFO_PTCE_COUNT_OFFSET,
+	       offsetof (struct cpuinfo_ia64, ptce_count));
+	DEFINE(IA64_CPUINFO_PTCE_STRIDE_OFFSET,
+	       offsetof (struct cpuinfo_ia64, ptce_stride));
+	BLANK();
+	DEFINE(IA64_TIMESPEC_TV_NSEC_OFFSET,
+	       offsetof (struct timespec, tv_nsec));
 
 	DEFINE(CLONE_SETTLS_BIT, 19);
 #if CLONE_SETTLS != (1<<19)
@@ -203,19 +211,16 @@
 #endif
 
 	BLANK();
-	/* used by arch/ia64/kernel/mca_asm.S */
-	DEFINE(IA64_CPUINFO_PERCPU_PADDR, offsetof (struct cpuinfo_ia64, percpu_paddr));
-	DEFINE(IA64_CPUINFO_PAL_PADDR, offsetof (struct cpuinfo_ia64, pal_paddr));
-	DEFINE(IA64_CPUINFO_PA_MCA_INFO, offsetof (struct cpuinfo_ia64, ia64_pa_mca_data));
-	DEFINE(IA64_MCA_PROC_STATE_DUMP, offsetof (struct ia64_mca_cpu_s, ia64_mca_proc_state_dump));
-	DEFINE(IA64_MCA_STACK, offsetof (struct ia64_mca_cpu_s, ia64_mca_stack));
-	DEFINE(IA64_MCA_STACKFRAME, offsetof (struct ia64_mca_cpu_s, ia64_mca_stackframe));
-	DEFINE(IA64_MCA_BSPSTORE, offsetof (struct ia64_mca_cpu_s, ia64_mca_bspstore));
-	DEFINE(IA64_INIT_STACK, offsetof (struct ia64_mca_cpu_s, ia64_init_stack));
-
-	/* used by head.S */
-	DEFINE(IA64_CPUINFO_NSEC_PER_CYC_OFFSET, offsetof (struct cpuinfo_ia64, nsec_per_cyc));
-
+	DEFINE(IA64_MCA_CPU_PROC_STATE_DUMP_OFFSET,
+	       offsetof (struct ia64_mca_cpu, proc_state_dump));
+	DEFINE(IA64_MCA_CPU_STACK_OFFSET,
+	       offsetof (struct ia64_mca_cpu, stack));
+	DEFINE(IA64_MCA_CPU_STACKFRAME_OFFSET,
+	       offsetof (struct ia64_mca_cpu, stackframe));
+	DEFINE(IA64_MCA_CPU_RBSTORE_OFFSET,
+	       offsetof (struct ia64_mca_cpu, rbstore));
+	DEFINE(IA64_MCA_CPU_INIT_STACK_OFFSET,
+	       offsetof (struct ia64_mca_cpu, init_stack));
 	BLANK();
 	/* used by fsys_gettimeofday in arch/ia64/kernel/fsys.S */
 	DEFINE(IA64_TIME_INTERPOLATOR_ADDRESS_OFFSET, offsetof (struct time_interpolator, addr));
=== arch/ia64/kernel/efi.c 1.42 vs edited ==--- 1.42/arch/ia64/kernel/efi.c	2005-01-22 15:59:24 -08:00
+++ edited/arch/ia64/kernel/efi.c	2005-01-25 17:51:59 -08:00
@@ -415,8 +415,8 @@
  * Abstraction Layer chapter 11 in ADAG
  */
 
-static efi_memory_desc_t *
-pal_code_memdesc (void)
+void *
+efi_get_pal_addr (void)
 {
 	void *efi_map_start, *efi_map_end, *p;
 	efi_memory_desc_t *md;
@@ -474,51 +474,31 @@
 			md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT),
 			vaddr & mask, (vaddr & mask) + IA64_GRANULE_SIZE);
 #endif
-		return md;
+		return __va(md->phys_addr);
 	}
-
+	printk(KERN_WARNING "%s: no PAL-code memory-descriptor found",
+	       __FUNCTION__);
 	return NULL;
 }
 
 void
-efi_get_pal_addr (void)
-{
-	efi_memory_desc_t *md = pal_code_memdesc();
-	u64 vaddr, mask;
-	struct cpuinfo_ia64 *cpuinfo;
-
-	if (md != NULL) {
-
-		vaddr = PAGE_OFFSET + md->phys_addr;
-		mask  = ~((1 << IA64_GRANULE_SHIFT) - 1);
-
-		cpuinfo = (struct cpuinfo_ia64 *)__va(ia64_get_kr(IA64_KR_PA_CPU_INFO));
-		cpuinfo->pal_base = vaddr & mask;
-		cpuinfo->pal_paddr = pte_val(mk_pte_phys(md->phys_addr, PAGE_KERNEL));
-	}
-}
-
-void
 efi_map_pal_code (void)
 {
-	efi_memory_desc_t *md = pal_code_memdesc();
-	u64 vaddr, mask, psr;
-
-	if (md != NULL) {
+	void *pal_vaddr = efi_get_pal_addr ();
+	u64 psr;
 
-		vaddr = PAGE_OFFSET + md->phys_addr;
-		mask  = ~((1 << IA64_GRANULE_SHIFT) - 1);
+	if (!pal_vaddr)
+		return;
 
-		/*
-		 * Cannot write to CRx with PSR.ic=1
-		 */
-		psr = ia64_clear_ic();
-		ia64_itr(0x1, IA64_TR_PALCODE, vaddr & mask,
-			pte_val(pfn_pte(md->phys_addr >> PAGE_SHIFT, PAGE_KERNEL)),
-			IA64_GRANULE_SHIFT);
-		ia64_set_psr(psr);		/* restore psr */
-		ia64_srlz_i();
-	}
+	/*
+	 * Cannot write to CRx with PSR.ic=1
+	 */
+	psr = ia64_clear_ic();
+	ia64_itr(0x1, IA64_TR_PALCODE, GRANULEROUNDDOWN((unsigned long) pal_vaddr),
+		 pte_val(pfn_pte(__pa(pal_vaddr) >> PAGE_SHIFT, PAGE_KERNEL)),
+		 IA64_GRANULE_SHIFT);
+	ia64_set_psr(psr);		/* restore psr */
+	ia64_srlz_i();
 }
 
 void __init
=== arch/ia64/kernel/mca.c 1.76 vs edited ==--- 1.76/arch/ia64/kernel/mca.c	2005-01-22 13:48:51 -08:00
+++ edited/arch/ia64/kernel/mca.c	2005-01-25 17:40:29 -08:00
@@ -67,6 +67,7 @@
 
 #include <asm/delay.h>
 #include <asm/machvec.h>
+#include <asm/meminit.h>
 #include <asm/page.h>
 #include <asm/ptrace.h>
 #include <asm/system.h>
@@ -86,6 +87,12 @@
 ia64_mca_sal_to_os_state_t	ia64_sal_to_os_handoff_state;
 ia64_mca_os_to_sal_state_t	ia64_os_to_sal_handoff_state;
 u64				ia64_mca_serialize;
+DEFINE_PER_CPU(u64, ia64_mca_data); /* = __per_cpu_mca[smp_processor_id()] */
+DEFINE_PER_CPU(u64, ia64_mca_per_cpu_pte); /* PTE to map per-CPU area */
+DEFINE_PER_CPU(u64, ia64_mca_pal_pte);	    /* PTE to map PAL code */
+DEFINE_PER_CPU(u64, ia64_mca_pal_base);    /* vaddr PAL code granule */
+
+unsigned long __per_cpu_mca[NR_CPUS];
 
 /* In mca_asm.S */
 extern void			ia64_monarch_init_handler (void);
@@ -1194,6 +1201,41 @@
 	.name =		"cpe_poll"
 };
 #endif /* CONFIG_ACPI */
+
+/* Do per-CPU MCA-related initialization.  */
+
+void __init
+ia64_mca_cpu_init(void *cpu_data)
+{
+	void *pal_vaddr;
+
+        /*
+         * The MCA info structure was allocated earlier and its
+         * physical address saved in __per_cpu_mca[cpu].  Copy that
+         * address * to ia64_mca_data so we can access it as a per-CPU
+         * variable.
+         */
+	__get_cpu_var(ia64_mca_data) = __per_cpu_mca[smp_processor_id()];
+
+	/*
+	 * Stash away a copy of the PTE needed to map the per-CPU page.
+	 * We may need it during MCA recovery.
+	 */
+	__get_cpu_var(ia64_mca_per_cpu_pte) +		pte_val(mk_pte_phys(__pa(cpu_data), PAGE_KERNEL));
+
+        /*
+         * Also, stash away a copy of the PAL address and the PTE
+         * needed to map it.
+         */
+        pal_vaddr = efi_get_pal_addr();
+	if (!pal_vaddr)
+		return;
+	__get_cpu_var(ia64_mca_pal_base) +		GRANULEROUNDDOWN((unsigned long) pal_vaddr);
+	__get_cpu_var(ia64_mca_pal_pte) = pte_val(mk_pte_phys(__pa(pal_vaddr),
+							      PAGE_KERNEL));
+}
 
 /*
  * ia64_mca_init
=== arch/ia64/kernel/mca_asm.S 1.17 vs edited ==--- 1.17/arch/ia64/kernel/mca_asm.S	2005-01-06 16:20:34 -08:00
+++ edited/arch/ia64/kernel/mca_asm.S	2005-01-25 17:29:10 -08:00
@@ -144,24 +144,26 @@
 	// The following code purges TC and TR entries. Then reload all TC entries.
 	// Purge percpu data TC entries.
 begin_tlb_purge_and_reload:
-	GET_PERCPU_PADDR(r2)	// paddr of percpu_paddr in cpuinfo struct
-	;;
-	mov	r17=r2
-	;;
-	adds r17=8,r17
+
+#define O(member)	IA64_CPUINFO_##member##_OFFSET
+
+	GET_THIS_PADDR(r2, cpu_info)	// load phys addr of cpu_info into r2
 	;;
-	ld8 r18=[r17],8		// r18=ptce_base
-  	;;
-	ld4 r19=[r17],4		// r19=ptce_count[0]
+	addl r17=O(PTCE_STRIDE),r2
+	addl r2=O(PTCE_BASE),r2
 	;;
-	ld4 r20=[r17],4		// r20=ptce_count[1]
+	ld8 r18=[r2],(O(PTCE_COUNT)-O(PTCE_BASE));;	// r18=ptce_base
+	ld4 r19=[r2],4					// r19=ptce_count[0]
+	ld4 r21=[r17],4					// r21=ptce_stride[0]
 	;;
-	ld4 r21=[r17],4		// r21=ptce_stride[0]
+	ld4 r20=[r2]					// r20=ptce_count[1]
+	ld4 r22=[r17]					// r22=ptce_stride[1]
 	mov r24=0
 	;;
-	ld4 r22=[r17],4		// r22=ptce_stride[1]
 	adds r20=-1,r20
 	;;
+#undef O
+
 2:
 	cmp.ltu p6,p7=r24,r19
 (p7)	br.cond.dpnt.few 4f
@@ -246,16 +248,15 @@
 	srlz.d
 	;;
 	// 2. Reload DTR register for PERCPU data.
-	GET_PERCPU_PADDR(r2)		// paddr of percpu_paddr in cpuinfo struct
+	GET_THIS_PADDR(r2, ia64_mca_per_cpu_pte)
 	;;
-	mov r17=r2
 	movl r16=PERCPU_ADDR		// vaddr
 	movl r18=PERCPU_PAGE_SHIFT<<2
 	;;
 	mov cr.itir=r18
 	mov cr.ifa=r16
 	;;
-	ld8 r18=[r17]			// pte
+	ld8 r18=[r2]			// load per-CPU PTE
 	mov r16=IA64_TR_PERCPU_DATA;
 	;;
 	itr.d dtr[r16]=r18
@@ -263,13 +264,13 @@
 	srlz.d
 	;;
 	// 3. Reload ITR for PAL code.
-	GET_CPUINFO_PAL_PADDR(r2)	// paddr of pal_paddr in cpuinfo struct
+	GET_THIS_PADDR(r2, ia64_mca_pal_pte)
 	;;
-	mov r17=r2
+	ld8 r18=[r2]			// load PAL PTE
 	;;
-	ld8 r18=[r17],8			// pte
+	GET_THIS_PADDR(r2, ia64_mca_pal_base)
 	;;
-	ld8 r16=[r17]			// vaddr
+	ld8 r16=[r2]			// load PAL vaddr
 	mov r19=IA64_GRANULE_SHIFT<<2
 	;;
 	mov cr.itir=r19
@@ -308,14 +309,18 @@
 done_tlb_purge_and_reload:
 
 	// Setup new stack frame for OS_MCA handling
-	GET_MCA_BSPSTORE(r2)		// paddr of bspstore save area
-	GET_MCA_STACKFRAME(r3);;	// paddr of stack frame save area
+	GET_THIS_PADDR(r2, ia64_mca_data)
+	;;
+	add r3 = IA64_MCA_CPU_STACKFRAME_OFFSET, r2
+	add r2 = IA64_MCA_CPU_RBSTORE_OFFSET, r2
+	;;
 	rse_switch_context(r6,r3,r2);;	// RSC management in this new context
-	GET_MCA_STACK(r2);;		// paddr of stack save area
-					// stack size must be same as C array
-	addl	r2=8*1024-16,r2;;	// stack base @ bottom of array
-	mov	r12=r2			// allow 16 bytes of scratch
-					// (C calling convention)
+
+	GET_THIS_PADDR(r2, ia64_mca_data)
+	;;
+	add r2 = IA64_MCA_CPU_STACK_OFFSET+IA64_MCA_STACK_SIZE-16, r2
+	;;
+	mov r12=r2		// establish new stack-pointer
 
         // Enter virtual mode from physical mode
 	VIRTUAL_MODE_ENTER(r2, r3, ia64_os_mca_virtual_begin, r4)
@@ -331,7 +336,10 @@
 ia64_os_mca_virtual_end:
 
 	// restore the original stack frame here
-	GET_MCA_STACKFRAME(r2);;	// phys addr of MCA save area
+	GET_THIS_PADDR(r2, ia64_mca_data)
+	;;
+	add r2 = IA64_MCA_CPU_STACKFRAME_OFFSET, r2
+	;;
 	movl    r4=IA64_PSR_MC
 	;;
 	rse_return_context(r4,r3,r2)	// switch from interrupt context for RSE
@@ -372,8 +380,10 @@
 ia64_os_mca_proc_state_dump:
 // Save bank 1 GRs 16-31 which will be used by c-language code when we switch
 //  to virtual addressing mode.
-	GET_MCA_DUMP_PADDR(r2);;  // phys addr of MCA save area
-
+	GET_THIS_PADDR(r2, ia64_mca_data)
+	;;
+	add r2 = IA64_MCA_CPU_PROC_STATE_DUMP_OFFSET, r2
+	;;
 // save ar.NaT
 	mov		r5=ar.unat                  // ar.unat
 
@@ -603,7 +613,9 @@
 ia64_os_mca_proc_state_restore:
 
 // Restore bank1 GR16-31
-	GET_MCA_DUMP_PADDR(r2);;		// phys addr of proc state dump area
+	GET_THIS_PADDR(r2, ia64_mca_data)
+	;;
+	add r2 = IA64_MCA_CPU_PROC_STATE_DUMP_OFFSET, r2
 
 restore_GRs:                                    // restore bank-1 GRs 16-31
 	bsw.1;;
=== arch/ia64/kernel/minstate.h 1.18 vs edited ==--- 1.18/arch/ia64/kernel/minstate.h	2004-12-10 13:31:49 -08:00
+++ edited/arch/ia64/kernel/minstate.h	2005-01-25 17:51:22 -08:00
@@ -37,10 +37,10 @@
  * go virtual and don't want to destroy the iip or ipsr.
  */
 #define MINSTATE_START_SAVE_MIN_PHYS								\
-(pKStk) mov r3=ar.k3;;										\
-(pKStk) addl r3=IA64_CPUINFO_PA_MCA_INFO,r3;;							\
+(pKStk) mov r3=IA64_KR(PER_CPU_DATA);;								\
+(pKStk) addl r3=THIS_CPU(ia64_mca_data),r3;;							\
 (pKStk) ld8 r3 = [r3];;										\
-(pKStk) addl r3=IA64_INIT_STACK,r3;;								\
+(pKStk) addl r3=IA64_MCA_CPU_INIT_STACK_OFFSET,r3;;						\
 (pKStk) addl sp=IA64_STK_OFFSET-IA64_PT_REGS_SIZE,r3;						\
 (pUStk)	mov ar.rsc=0;		/* set enforced lazy mode, pl 0, little-endian, loadrs=0 */	\
 (pUStk)	addl r22=IA64_RBS_OFFSET,r1;		/* compute base of register backing store */	\
=== arch/ia64/kernel/setup.c 1.86 vs edited ==--- 1.86/arch/ia64/kernel/setup.c	2005-01-24 13:41:13 -08:00
+++ edited/arch/ia64/kernel/setup.c	2005-01-25 18:12:20 -08:00
@@ -60,7 +60,6 @@
 unsigned long __per_cpu_offset[NR_CPUS];
 EXPORT_SYMBOL(__per_cpu_offset);
 #endif
-unsigned long __per_cpu_mca[NR_CPUS];
 
 DEFINE_PER_CPU(struct cpuinfo_ia64, cpu_info);
 DEFINE_PER_CPU(unsigned long, local_per_cpu_offset);
@@ -388,7 +387,7 @@
 	/* enable IA-64 Machine Check Abort Handling unless disabled */
 	if (!strstr(saved_command_line, "nomca"))
 		ia64_mca_init();
-	
+
 	platform_setup(cmdline_p);
 	paging_init();
 }
@@ -602,7 +601,6 @@
 cpu_init (void)
 {
 	extern void __devinit ia64_mmu_init (void *);
-	extern void set_mca_pointer (struct cpuinfo_ia64 *, void *);
 	unsigned long num_phys_stacked;
 	pal_vm_info_2_u_t vmi;
 	unsigned int max_ctx;
@@ -611,6 +609,8 @@
 
 	cpu_data = per_cpu_init();
 
+	ia64_set_kr(IA64_KR_PER_CPU_DATA, __pa(cpu_data - (void *) __per_cpu_start));
+
 	get_max_cacheline_size();
 
 	/*
@@ -657,7 +657,7 @@
 		BUG();
 
 	ia64_mmu_init(ia64_imva(cpu_data));
-	set_mca_pointer(cpu_info, cpu_data);
+	ia64_mca_cpu_init(ia64_imva(cpu_data));
 
 #ifdef CONFIG_IA32_SUPPORT
 	ia32_cpu_init();
=== arch/ia64/mm/contig.c 1.12 vs edited ==--- 1.12/arch/ia64/mm/contig.c	2005-01-18 15:21:57 -08:00
+++ edited/arch/ia64/mm/contig.c	2005-01-25 18:06:34 -08:00
@@ -169,7 +169,6 @@
 	find_initrd();
 }
 
-#ifdef CONFIG_SMP
 /**
  * per_cpu_init - setup per-cpu variables
  *
@@ -178,30 +177,41 @@
 void *
 per_cpu_init (void)
 {
-	void *cpu_data, *mca_data;
+	void *mca_data, *my_data;
 	int cpu;
 
+#ifdef CONFIG_SMP
 	/*
 	 * get_free_pages() cannot be used before cpu_init() done.  BSP
 	 * allocates "NR_CPUS" pages for all CPUs to avoid that AP calls
 	 * get_zeroed_page().
 	 */
 	if (smp_processor_id() = 0) {
+		void *cpu_data;
+
 		cpu_data = __alloc_bootmem(PERCPU_PAGE_SIZE * NR_CPUS,
 					   PERCPU_PAGE_SIZE, __pa(MAX_DMA_ADDRESS));
-		mca_data = alloc_bootmem(PERCPU_MCA_SIZE * NR_CPUS);
 		for (cpu = 0; cpu < NR_CPUS; cpu++) {
 			memcpy(cpu_data, __phys_per_cpu_start, __per_cpu_end - __per_cpu_start);
 			__per_cpu_offset[cpu] = (char *) cpu_data - __per_cpu_start;
 			cpu_data += PERCPU_PAGE_SIZE;
 			per_cpu(local_per_cpu_offset, cpu) = __per_cpu_offset[cpu];
-			__per_cpu_mca[cpu] = (unsigned long)__pa(mca_data);
-			mca_data += PERCPU_MCA_SIZE;
 		}
 	}
-	return __per_cpu_start + __per_cpu_offset[smp_processor_id()];
+	my_data = __per_cpu_start + __per_cpu_offset[smp_processor_id()];
+#else
+	my_data = (void *) __phys_per_cpu_start;
+#endif
+
+	if (smp_processor_id() = 0) {
+		mca_data = alloc_bootmem(sizeof (struct ia64_mca_cpu) * NR_CPUS);
+		for (cpu = 0; cpu < NR_CPUS; cpu++) {
+			__per_cpu_mca[cpu] = __pa(mca_data);
+			mca_data += sizeof (struct ia64_mca_cpu);
+		}
+	}
+	return my_data;
 }
-#endif /* CONFIG_SMP */
 
 static int
 count_pages (u64 start, u64 end, void *arg)
=== arch/ia64/mm/discontig.c 1.28 vs edited ==--- 1.28/arch/ia64/mm/discontig.c	2005-01-18 12:06:18 -08:00
+++ edited/arch/ia64/mm/discontig.c	2005-01-25 16:06:58 -08:00
@@ -339,7 +339,7 @@
 	pernodesize += node * L1_CACHE_BYTES;
 	pernodesize += L1_CACHE_ALIGN(sizeof(pg_data_t));
 	pernodesize += L1_CACHE_ALIGN(sizeof(struct ia64_node_data));
-	pernodesize += L1_CACHE_ALIGN(sizeof(ia64_mca_cpu_t)) * phys_cpus;
+	pernodesize += L1_CACHE_ALIGN(sizeof(struct ia64_mca_cpu)) * phys_cpus;
 	pernodesize = PAGE_ALIGN(pernodesize);
 	pernode = NODEDATA_ALIGN(start, node);
 
@@ -363,7 +363,7 @@
 		pernode += L1_CACHE_ALIGN(sizeof(pg_data_t));
 
 		mca_data_phys = (void *)pernode;
-		pernode += L1_CACHE_ALIGN(sizeof(ia64_mca_cpu_t)) * phys_cpus;
+		pernode += L1_CACHE_ALIGN(sizeof(struct ia64_mca_cpu)) * phys_cpus;
 
 		/*
 		 * Copy the static per-cpu data into the region we
@@ -384,7 +384,7 @@
 					 * will be put in the cpuinfo structure.
 					 */
 					__per_cpu_mca[cpu] = __pa(mca_data_phys);
-					mca_data_phys += L1_CACHE_ALIGN(sizeof(ia64_mca_cpu_t));
+					mca_data_phys += L1_CACHE_ALIGN(sizeof(struct ia64_mca_cpu));
 				}
 				__per_cpu_offset[cpu] = (char*)__va(cpu_data) -
 					__per_cpu_start;
=== arch/ia64/mm/init.c 1.76 vs edited ==--- 1.76/arch/ia64/mm/init.c	2005-01-18 12:06:18 -08:00
+++ edited/arch/ia64/mm/init.c	2005-01-25 15:52:22 -08:00
@@ -40,7 +40,6 @@
 DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
 
 extern void ia64_tlb_init (void);
-extern void efi_get_pal_addr (void);
 
 unsigned long MAX_DMA_ADDRESS = PAGE_OFFSET + 0x100000000UL;
 
@@ -290,27 +289,6 @@
 	put_kernel_page(page, GATE_ADDR + PERCPU_PAGE_SIZE, PAGE_GATE);
 #endif
 	ia64_patch_gate();
-}
-
-void
-set_mca_pointer(struct cpuinfo_ia64 *cpuinfo, void *cpu_data)
-{
-	void *my_cpu_data = ia64_imva(cpu_data);
-
-        /*
-         * The MCA info structure was allocated earlier and a physical address pointer
-         * saved in __per_cpu_mca[cpu].  Move that pointer into the cpuinfo structure.
-         */
-
-        cpuinfo->ia64_pa_mca_data = (__u64 *)__per_cpu_mca[smp_processor_id()];
-
-        cpuinfo->percpu_paddr = pte_val(mk_pte_phys(__pa(my_cpu_data), PAGE_KERNEL));
-        ia64_set_kr(IA64_KR_PA_CPU_INFO, __pa(cpuinfo));
-
-        /*
-         * Set pal_base and pal_paddr in cpuinfo structure.
-         */
-        efi_get_pal_addr();
 }
 
 void __devinit
=== include/asm-ia64/kregs.h 1.7 vs edited ==--- 1.7/include/asm-ia64/kregs.h	2004-12-10 13:25:43 -08:00
+++ edited/include/asm-ia64/kregs.h	2005-01-25 16:52:05 -08:00
@@ -14,7 +14,7 @@
  */
 #define IA64_KR_IO_BASE		0	/* ar.k0: legacy I/O base address */
 #define IA64_KR_TSSD		1	/* ar.k1: IVE uses this as the TSSD */
-#define IA64_KR_PA_CPU_INFO	3	/* ar.k3: phys addr of this cpu's cpu_info struct */
+#define IA64_KR_PER_CPU_DATA	3	/* ar.k3: physical per-CPU base */
 #define IA64_KR_CURRENT_STACK	4	/* ar.k4: what's mapped in IA64_TR_CURRENT_STACK */
 #define IA64_KR_FPU_OWNER	5	/* ar.k5: fpu-owner (UP only, at the moment) */
 #define IA64_KR_CURRENT		6	/* ar.k6: "current" task pointer */
=== include/asm-ia64/mca.h 1.22 vs edited ==--- 1.22/include/asm-ia64/mca.h	2005-01-07 13:54:29 -08:00
+++ edited/include/asm-ia64/mca.h	2005-01-25 17:23:08 -08:00
@@ -11,6 +11,8 @@
 #ifndef _ASM_IA64_MCA_H
 #define _ASM_IA64_MCA_H
 
+#define IA64_MCA_STACK_SIZE	8192
+
 #if !defined(__ASSEMBLY__)
 
 #include <linux/interrupt.h>
@@ -102,21 +104,21 @@
 						 */
 } ia64_mca_os_to_sal_state_t;
 
-#define IA64_MCA_STACK_SIZE 	1024
-#define IA64_MCA_STACK_SIZE_BYTES 	(1024 * 8)
-#define IA64_MCA_BSPSTORE_SIZE 	1024
-
-typedef struct ia64_mca_cpu_s {
-	u64	ia64_mca_stack[IA64_MCA_STACK_SIZE] 		__attribute__((aligned(16)));
-	u64	ia64_mca_proc_state_dump[512]			__attribute__((aligned(16)));
-	u64	ia64_mca_stackframe[32]				__attribute__((aligned(16)));
-	u64	ia64_mca_bspstore[IA64_MCA_BSPSTORE_SIZE]	__attribute__((aligned(16)));
-	u64	ia64_init_stack[KERNEL_STACK_SIZE/8] 		__attribute__((aligned(16)));
-} ia64_mca_cpu_t;
+/* Per-CPU MCA state that is too big for normal per-CPU variables.  */
+
+struct ia64_mca_cpu {
+	u64 stack[IA64_MCA_STACK_SIZE/8];	/* MCA memory-stack */
+	u64 proc_state_dump[512];
+	u64 stackframe[32];
+	u64 rbstore[IA64_MCA_STACK_SIZE/8];	/* MCA reg.-backing store */
+	u64 init_stack[KERNEL_STACK_SIZE/8];
+} __attribute__ ((aligned(16)));
 
-#define PERCPU_MCA_SIZE sizeof(ia64_mca_cpu_t)
+/* Array of physical addresses of each CPU's MCA area.  */
+extern unsigned long __per_cpu_mca[NR_CPUS];
 
 extern void ia64_mca_init(void);
+extern void ia64_mca_cpu_init(void *);
 extern void ia64_os_mca_dispatch(void);
 extern void ia64_os_mca_dispatch_end(void);
 extern void ia64_mca_ucmc_handler(void);
=== include/asm-ia64/mca_asm.h 1.11 vs edited ==--- 1.11/include/asm-ia64/mca_asm.h	2005-01-06 16:22:36 -08:00
+++ edited/include/asm-ia64/mca_asm.h	2005-01-25 17:42:30 -08:00
@@ -46,40 +46,9 @@
 	mov	temp	= 0x7	;;							\
 	dep	addr	= temp, addr, 61, 3
 
-/*
- * This macro gets the physical address of this cpu's cpuinfo structure.
- */
-#define GET_PERCPU_PADDR(reg)							\
-	mov	reg	= ar.k3;;						\
-	addl	reg	= IA64_CPUINFO_PERCPU_PADDR,reg
-
-#define GET_CPUINFO_PAL_PADDR(reg)						\
-	mov	reg	= ar.k3;;						\
-	addl	reg	= IA64_CPUINFO_PAL_PADDR,reg
-
-/*
- * This macro gets the physical address of this cpu's MCA save structure.
- */
-#define GET_CPUINFO_MCA_PADDR(reg)						\
-	mov	reg	= ar.k3;;						\
-	addl	reg	= IA64_CPUINFO_PA_MCA_INFO,reg;;			\
-	ld8	reg	= [reg]
-
-#define	GET_MCA_BSPSTORE(reg)							\
-	GET_CPUINFO_MCA_PADDR(reg);;						\
-	addl	reg	= IA64_MCA_BSPSTORE,reg
-
-#define	GET_MCA_STACKFRAME(reg)							\
-	GET_CPUINFO_MCA_PADDR(reg);;						\
-	addl	reg	= IA64_MCA_STACKFRAME,reg
-
-#define	GET_MCA_STACK(reg)							\
-	GET_CPUINFO_MCA_PADDR(reg);;						\
-	addl	reg	= IA64_MCA_STACK,reg
-
-#define	GET_MCA_DUMP_PADDR(reg)							\
-	GET_CPUINFO_MCA_PADDR(reg);;						\
-	addl	reg	= IA64_MCA_PROC_STATE_DUMP,reg
+#define GET_THIS_PADDR(reg, var)		\
+	mov	reg = IA64_KR(PER_CPU_DATA);;	\
+        addl	reg = THIS_CPU(var), reg
 
 /*
  * This macro jumps to the instruction at the given virtual address
=== include/asm-ia64/percpu.h 1.15 vs edited ==--- 1.15/include/asm-ia64/percpu.h	2005-01-06 16:23:02 -08:00
+++ edited/include/asm-ia64/percpu.h	2005-01-25 16:34:57 -08:00
@@ -46,18 +46,14 @@
 
 extern void percpu_modcopy(void *pcpudst, const void *src, unsigned long size);
 extern void setup_per_cpu_areas (void);
-extern void *per_cpu_init(void);
 
 #else /* ! SMP */
 
 #define per_cpu(var, cpu)			(*((void)cpu, &per_cpu__##var))
 #define __get_cpu_var(var)			per_cpu__##var
-#define per_cpu_init()				(__phys_per_cpu_start)
 
 #endif	/* SMP */
 
-extern unsigned long __per_cpu_mca[NR_CPUS];
-
 #define EXPORT_PER_CPU_SYMBOL(var)		EXPORT_SYMBOL(per_cpu__##var)
 #define EXPORT_PER_CPU_SYMBOL_GPL(var)		EXPORT_SYMBOL_GPL(per_cpu__##var)
 
@@ -68,6 +64,8 @@
  * more efficient.
  */
 #define __ia64_per_cpu_var(var)	(per_cpu__##var)
+
+extern void *per_cpu_init(void);
 
 #endif /* !__ASSEMBLY__ */
 
=== include/asm-ia64/processor.h 1.71 vs edited ==--- 1.71/include/asm-ia64/processor.h	2004-12-10 13:24:20 -08:00
+++ edited/include/asm-ia64/processor.h	2005-01-25 17:10:56 -08:00
@@ -151,12 +151,9 @@
 	__u64 itc_freq;		/* frequency of ITC counter */
 	__u64 proc_freq;	/* frequency of processor */
 	__u64 cyc_per_usec;	/* itc_freq/1000000 */
-	__u64 percpu_paddr;
 	__u64 ptce_base;
 	__u32 ptce_count[2];
 	__u32 ptce_stride[2];
-	__u64 pal_paddr;
-	__u64 pal_base;
 	struct task_struct *ksoftirqd;	/* kernel softirq daemon for this CPU */
 
 #ifdef CONFIG_SMP
@@ -177,7 +174,6 @@
 #ifdef CONFIG_NUMA
 	struct ia64_node_data *node_data;
 #endif
-	__u64 *ia64_pa_mca_data;	/* prt to MCA/INIT processor state */
 };
 
 DECLARE_PER_CPU(struct cpuinfo_ia64, cpu_info);
=== include/linux/efi.h 1.13 vs edited ==--- 1.13/include/linux/efi.h	2005-01-22 14:43:57 -08:00
+++ edited/include/linux/efi.h	2005-01-25 15:51:11 -08:00
@@ -289,6 +289,7 @@
 }
 
 extern void efi_init (void);
+extern void *efi_get_pal_addr (void);
 extern void efi_map_pal_code (void);
 extern void efi_map_memmap(void);
 extern void efi_memmap_walk (efi_freemem_callback_t callback, void *arg);

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch] fix per-CPU MCA mess and make UP kernels work again
  2005-01-26  2:47 [patch] fix per-CPU MCA mess and make UP kernels work again David Mosberger
@ 2005-01-26 16:25 ` Jesse Barnes
  2005-01-26 17:13 ` Russ Anderson
                   ` (26 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Jesse Barnes @ 2005-01-26 16:25 UTC (permalink / raw)
  To: linux-ia64

On Tuesday, January 25, 2005 6:47 pm, David Mosberger wrote:
> The patch has been compile- and boot-tested for zx1 UP and SMP.  I
> think it should be OK for discontig configs, too, but I haven't tested
> that (and if anybody wanted to build discontig for UP, then
> discontig.c:per_cpu_init() would have to be updated like the contig.c
> version.

Did you see the last patch I posted for UP+generic support?  I *think* it 
fixes things in discontig.c somewhat correctly, but I've asked Russ to take a 
look to make sure.

Jesse

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch] fix per-CPU MCA mess and make UP kernels work again
  2005-01-26  2:47 [patch] fix per-CPU MCA mess and make UP kernels work again David Mosberger
  2005-01-26 16:25 ` Jesse Barnes
@ 2005-01-26 17:13 ` Russ Anderson
  2005-01-26 17:48 ` David Mosberger
                   ` (25 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Russ Anderson @ 2005-01-26 17:13 UTC (permalink / raw)
  To: linux-ia64

Jesse Barnes wrote:
> 
> On Tuesday, January 25, 2005 6:47 pm, David Mosberger wrote:
> > The patch has been compile- and boot-tested for zx1 UP and SMP.  I
> > think it should be OK for discontig configs, too, but I haven't tested
> > that (and if anybody wanted to build discontig for UP, then
> > discontig.c:per_cpu_init() would have to be updated like the contig.c
> > version.
> 
> Did you see the last patch I posted for UP+generic support?  I *think* it 
> fixes things in discontig.c somewhat correctly, but I've asked Russ to take a 
> look to make sure.

I'm trying David's changes on Altix right now with
the error injection (memory uncorrectable) test.

-- 
Russ Anderson, OS RAS/Partitioning Project Lead  
SGI - Silicon Graphics Inc          rja@sgi.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch] fix per-CPU MCA mess and make UP kernels work again
  2005-01-26  2:47 [patch] fix per-CPU MCA mess and make UP kernels work again David Mosberger
  2005-01-26 16:25 ` Jesse Barnes
  2005-01-26 17:13 ` Russ Anderson
@ 2005-01-26 17:48 ` David Mosberger
  2005-01-26 17:53 ` Jesse Barnes
                   ` (24 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: David Mosberger @ 2005-01-26 17:48 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Wed, 26 Jan 2005 08:25:38 -0800, Jesse Barnes <jbarnes@engr.sgi.com> said:

  Jesse> On Tuesday, January 25, 2005 6:47 pm, David Mosberger wrote:

  >> The patch has been compile- and boot-tested for zx1 UP and SMP.
  >> I think it should be OK for discontig configs, too, but I haven't
  >> tested that (and if anybody wanted to build discontig for UP,
  >> then discontig.c:per_cpu_init() would have to be updated like the
  >> contig.c version.

  Jesse> Did you see the last patch I posted for UP+generic support?
  Jesse> I *think* it fixes things in discontig.c somewhat correctly,
  Jesse> but I've asked Russ to take a look to make sure.

The patch just removes per_cpu_init() in the non-SMP-case.  That would
have to be changed to be in sync with the contig.c per_cpu_init().

Perhaps a better solution would be to disassociate the MCA allocations
from per_cpu_init().  For example, we could have a separate
alloc_per_cpu_mca_data() in {dis,}contig.c.

BTW: can you remind me why you want node-local MCA data?  Performance
is probably not an issue.  Are you concerned about error-containment,
hot-swap, or something else?

	--david


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch] fix per-CPU MCA mess and make UP kernels work again
  2005-01-26  2:47 [patch] fix per-CPU MCA mess and make UP kernels work again David Mosberger
                   ` (2 preceding siblings ...)
  2005-01-26 17:48 ` David Mosberger
@ 2005-01-26 17:53 ` Jesse Barnes
  2005-01-26 18:05 ` David Mosberger
                   ` (23 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Jesse Barnes @ 2005-01-26 17:53 UTC (permalink / raw)
  To: linux-ia64

On Wednesday, January 26, 2005 9:48 am, David Mosberger wrote:
> The patch just removes per_cpu_init() in the non-SMP-case.  That would
> have to be changed to be in sync with the contig.c per_cpu_init().

It doesn't totally remove it, I tried to keep the MCA initialization.  
per_cpu_init for discontig is different than contig, since the memory has 
already been allocated.  All we need to do is assign the pointers.

> Perhaps a better solution would be to disassociate the MCA allocations
> from per_cpu_init().  For example, we could have a separate
> alloc_per_cpu_mca_data() in {dis,}contig.c.

That might be clearer.

Jesse

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch] fix per-CPU MCA mess and make UP kernels work again
  2005-01-26  2:47 [patch] fix per-CPU MCA mess and make UP kernels work again David Mosberger
                   ` (3 preceding siblings ...)
  2005-01-26 17:53 ` Jesse Barnes
@ 2005-01-26 18:05 ` David Mosberger
  2005-01-26 18:11 ` Jesse Barnes
                   ` (22 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: David Mosberger @ 2005-01-26 18:05 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Wed, 26 Jan 2005 09:53:55 -0800, Jesse Barnes <jbarnes@engr.sgi.com> said:

  Jesse> On Wednesday, January 26, 2005 9:48 am, David Mosberger
  Jesse> wrote:
  >> The patch just removes per_cpu_init() in the non-SMP-case.  That
  >> would have to be changed to be in sync with the contig.c
  >> per_cpu_init().

  Jesse> It doesn't totally remove it, I tried to keep the MCA
  Jesse> initialization.  per_cpu_init for discontig is different than
  Jesse> contig, since the memory has already been allocated.  All we
  Jesse> need to do is assign the pointers.

Hmmh, I probably misread the patch.  I was looking at this hunk:

+#ifdef CONFIG_SMP
 /**
  * per_cpu_init - setup per-cpu variables
  *
@@ -558,6 +586,7 @@

 	return __per_cpu_start + __per_cpu_offset[smp_processor_id()];
 }
+#endif /* CONFIG_SMP */

  >> Perhaps a better solution would be to disassociate the MCA
  >> allocations from per_cpu_init().  For example, we could have a
  >> separate alloc_per_cpu_mca_data() in {dis,}contig.c.

  Jesse> That might be clearer.

Yes, I think we should do that.  I'm still interested in an answer to
"why node-local MCA data". ;-)

	--david

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch] fix per-CPU MCA mess and make UP kernels work again
  2005-01-26  2:47 [patch] fix per-CPU MCA mess and make UP kernels work again David Mosberger
                   ` (4 preceding siblings ...)
  2005-01-26 18:05 ` David Mosberger
@ 2005-01-26 18:11 ` Jesse Barnes
  2005-01-26 19:01 ` Russ Anderson
                   ` (21 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Jesse Barnes @ 2005-01-26 18:11 UTC (permalink / raw)
  To: linux-ia64

On Wednesday, January 26, 2005 10:05 am, David Mosberger wrote:
> Hmmh, I probably misread the patch.  I was looking at this hunk:
>
> +#ifdef CONFIG_SMP
>  /**
>   * per_cpu_init - setup per-cpu variables
>   *
> @@ -558,6 +586,7 @@
>
>   return __per_cpu_start + __per_cpu_offset[smp_processor_id()];
>  }
> +#endif /* CONFIG_SMP */

Oh, you're right.  My patch just defaults to using __phys_per_cpu_start.  
That's probably why the MCA stuff didn't work.

Jesse

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch] fix per-CPU MCA mess and make UP kernels work again
  2005-01-26  2:47 [patch] fix per-CPU MCA mess and make UP kernels work again David Mosberger
                   ` (5 preceding siblings ...)
  2005-01-26 18:11 ` Jesse Barnes
@ 2005-01-26 19:01 ` Russ Anderson
  2005-01-26 19:23 ` Luck, Tony
                   ` (20 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Russ Anderson @ 2005-01-26 19:01 UTC (permalink / raw)
  To: linux-ia64

David Mosberger wrote:
> 
> BTW: can you remind me why you want node-local MCA data?  Performance
> is probably not an issue.  Are you concerned about error-containment,
> hot-swap, or something else?

I'm not aware of a compelling functional reason that the MCA data 
has to be node-local.  The key feature is that each CPU have a unique
MCA data area.  If all the MCA data areas are on one node, that
should be OK.  If a CPU cannot access MCA data memory on a remote 
node, odds are we will die anyway.

It seems logical to me that MCA data be close to the node/cpu using
it, but workable if it is not.

-- 
Russ Anderson, OS RAS/Partitioning Project Lead  
SGI - Silicon Graphics Inc          rja@sgi.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [patch] fix per-CPU MCA mess and make UP kernels work again
  2005-01-26  2:47 [patch] fix per-CPU MCA mess and make UP kernels work again David Mosberger
                   ` (6 preceding siblings ...)
  2005-01-26 19:01 ` Russ Anderson
@ 2005-01-26 19:23 ` Luck, Tony
  2005-01-26 20:07 ` David Mosberger
                   ` (19 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Luck, Tony @ 2005-01-26 19:23 UTC (permalink / raw)
  To: linux-ia64


>It seems logical to me that MCA data be close to the node/cpu using
>it, but workable if it is not.

Since performance is not an issue here, and there isn't a functionality
reason for allocating per-node, I'd vote for whatever is the cleanest
code implementation.  Part of the pit I've dug myself into is caused by
the complexity of the contig/discontig code ... if we can make it simpler,
perhaps I won't break it all again.

-Tony

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [patch] fix per-CPU MCA mess and make UP kernels work again
  2005-01-26  2:47 [patch] fix per-CPU MCA mess and make UP kernels work again David Mosberger
                   ` (7 preceding siblings ...)
  2005-01-26 19:23 ` Luck, Tony
@ 2005-01-26 20:07 ` David Mosberger
  2005-01-26 21:40 ` Russ Anderson
                   ` (18 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: David Mosberger @ 2005-01-26 20:07 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Wed, 26 Jan 2005 11:23:06 -0800, "Luck, Tony" <tony.luck@intel.com> said:

  >> It seems logical to me that MCA data be close to the node/cpu
  >> using it, but workable if it is not.

  Tony> Since performance is not an issue here, and there isn't a
  Tony> functionality reason for allocating per-node, I'd vote for
  Tony> whatever is the cleanest code implementation.  Part of the pit
  Tony> I've dug myself into is caused by the complexity of the
  Tony> contig/discontig code ... if we can make it simpler, perhaps I
  Tony> won't break it all again.

I'm all for simplification, but with a better API, we could share code
and still have node-local data (which does feel intuitively right).

Can't we have a simple

	alloc_bootmem_for_cpu(cpu, size, align, goal)

which would take care of the difference?

(Yes, I know about alloc_bootmem_node(), but it wants a pgdat pointer,
which I couldn't care less about in the non-NUMA case.)

	--david

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch] fix per-CPU MCA mess and make UP kernels work again
  2005-01-26  2:47 [patch] fix per-CPU MCA mess and make UP kernels work again David Mosberger
                   ` (8 preceding siblings ...)
  2005-01-26 20:07 ` David Mosberger
@ 2005-01-26 21:40 ` Russ Anderson
  2005-01-26 21:50 ` David Mosberger
                   ` (17 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Russ Anderson @ 2005-01-26 21:40 UTC (permalink / raw)
  To: linux-ia64

David Mosberger-Tang wrote:
>
> The patch has been compile- and boot-tested for zx1 UP and SMP.  I
> think it should be OK for discontig configs, too, but I haven't tested
> that (and if anybody wanted to build discontig for UP, then
> discontig.c:per_cpu_init() would have to be updated like the contig.c
> version.
>
> Also, I verified that the kernel can still do an INIT dump.  Other
> than that, I can't really test the MCA-path.

There is one small problem.  In mca_asm.S, r23 was used without being set 
and the hardcoded value 40 is no longer valid (patch below).

With linux-ia64-test-2.6.11 plus David's patch plus the patch
below, 1024 memory uncorectable errors were injected and sucessfully
recovered on an SGI Altix test machine.  1024 is the number of entries 
in the page_isolate[] array in arch/ia64/kernel/mca_drv.c.  When the 
array is full, the recovery code says the error is not recoverable 
and the system reboots.

Test output:
------------------------------------
./test.script: line 10: 17343 Killed                  ./errit -d 3
ERR_INJ: type = 2, addr = 6000000000002480, bits = 3, paddr = 0x00000b300abda480
OS_MCA: process [pid: 17343](errit) encounters MCA.
Page isolation: ( b300abda480 ) success.
pass 1024
------------------------------------


Signed-off-by: Russ Anderson <rja@sgi.com>

----------------------------------------------------------------
Index: linux/arch/ia64/kernel/mca_asm.S
=================================--- linux.orig/arch/ia64/kernel/mca_asm.S	2005-01-26 10:20:55.140112553 -0600
+++ linux/arch/ia64/kernel/mca_asm.S	2005-01-26 14:47:19.878566832 -0600
@@ -203,9 +203,9 @@
 	srlz.d
 	;;
 	// 3. Purge ITR for PAL code.
-	adds r17@,r23
+	GET_THIS_PADDR(r2, ia64_mca_pal_base)
 	;;
-	ld8 r16=[r17]
+	ld8 r16=[r2]
 	mov r18=IA64_GRANULE_SHIFT<<2
 	;;
 	ptr.i r16,r18
-- 
Russ Anderson, OS RAS/Partitioning Project Lead  
SGI - Silicon Graphics Inc          rja@sgi.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch] fix per-CPU MCA mess and make UP kernels work again
  2005-01-26  2:47 [patch] fix per-CPU MCA mess and make UP kernels work again David Mosberger
                   ` (9 preceding siblings ...)
  2005-01-26 21:40 ` Russ Anderson
@ 2005-01-26 21:50 ` David Mosberger
  2005-01-26 22:13 ` Luck, Tony
                   ` (16 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: David Mosberger @ 2005-01-26 21:50 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Wed, 26 Jan 2005 15:40:47 -0600 (CST), Russ Anderson <rja@sgi.com> said:

  Russ> There is one small problem.  In mca_asm.S, r23 was used
  Russ> without being set and the hardcoded value 40 is no longer
  Russ> valid (patch below).

Ah, I missed that.

  Russ> With linux-ia64-test-2.6.11 plus David's patch plus the patch
  Russ> below, 1024 memory uncorectable errors were injected and
  Russ> sucessfully recovered on an SGI Altix test machine.  1024 is
  Russ> the number of entries in the page_isolate[] array in
  Russ> arch/ia64/kernel/mca_drv.c.  When the array is full, the
  Russ> recovery code says the error is not recoverable and the system
  Russ> reboots.

Cool!

Tony, will you take the patches?  (Yes, there is still a question of
how to cleanup/unify the MCA state allocation, but I think that's best
done separately.)

	--david

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [patch] fix per-CPU MCA mess and make UP kernels work again
  2005-01-26  2:47 [patch] fix per-CPU MCA mess and make UP kernels work again David Mosberger
                   ` (10 preceding siblings ...)
  2005-01-26 21:50 ` David Mosberger
@ 2005-01-26 22:13 ` Luck, Tony
  2005-01-26 22:16 ` David Mosberger
                   ` (15 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Luck, Tony @ 2005-01-26 22:13 UTC (permalink / raw)
  To: linux-ia64


>Tony, will you take the patches?  (Yes, there is still a question of
>how to cleanup/unify the MCA state allocation, but I think that's best
>done separately.)

I've been playing with the patch on a few configurations (tiger with
DIG and generic kernel still builds and boots SMP, my zx2000 is now
running UP again) and haven't seen any regressions.  So yes, I'll
apply your big patch and Russ's addendum.

Jesse's generic up-build patch still needs some work, right?

-Tony 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [patch] fix per-CPU MCA mess and make UP kernels work again
  2005-01-26  2:47 [patch] fix per-CPU MCA mess and make UP kernels work again David Mosberger
                   ` (11 preceding siblings ...)
  2005-01-26 22:13 ` Luck, Tony
@ 2005-01-26 22:16 ` David Mosberger
  2005-01-26 22:19 ` Jesse Barnes
                   ` (14 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: David Mosberger @ 2005-01-26 22:16 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Wed, 26 Jan 2005 14:13:50 -0800, "Luck, Tony" <tony.luck@intel.com> said:

  >> Tony, will you take the patches?  (Yes, there is still a question
  >> of how to cleanup/unify the MCA state allocation, but I think
  >> that's best done separately.)

  Tony> I've been playing with the patch on a few configurations
  Tony> (tiger with DIG and generic kernel still builds and boots SMP,
  Tony> my zx2000 is now running UP again) and haven't seen any
  Tony> regressions.  So yes, I'll apply your big patch and Russ's
  Tony> addendum.

Great!

  Tony> Jesse's generic up-build patch still needs some work, right?

Not if we change per_cpu_init() to _not_ allocate the MCA data.  I can
cook up a patch for doing this once the above two patches show up in
your release-2.6.11 bk tree.

	--david

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch] fix per-CPU MCA mess and make UP kernels work again
  2005-01-26  2:47 [patch] fix per-CPU MCA mess and make UP kernels work again David Mosberger
                   ` (12 preceding siblings ...)
  2005-01-26 22:16 ` David Mosberger
@ 2005-01-26 22:19 ` Jesse Barnes
  2005-01-26 22:33 ` Luck, Tony
                   ` (13 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Jesse Barnes @ 2005-01-26 22:19 UTC (permalink / raw)
  To: linux-ia64

On Wednesday, January 26, 2005 2:13 pm, Luck, Tony wrote:
> Jesse's generic up-build patch still needs some work, right?

It just needs to be fixed in light of this patch I think.  It booted for me UP 
and SMP with both CONFIG_IA64_GENERIC and the sn2 specific config.

Jesse

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [patch] fix per-CPU MCA mess and make UP kernels work again
  2005-01-26  2:47 [patch] fix per-CPU MCA mess and make UP kernels work again David Mosberger
                   ` (13 preceding siblings ...)
  2005-01-26 22:19 ` Jesse Barnes
@ 2005-01-26 22:33 ` Luck, Tony
  2005-01-27  0:40 ` David Mosberger
                   ` (12 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Luck, Tony @ 2005-01-26 22:33 UTC (permalink / raw)
  To: linux-ia64

>  Tony> Jesse's generic up-build patch still needs some work, right?
>
>Not if we change per_cpu_init() to _not_ allocate the MCA data.  I can
>cook up a patch for doing this once the above two patches show up in
>your release-2.6.11 bk tree.

Ok.  Your big fix-it patch, and Russ's addendum ar in the release tree.

-Tony

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [patch] fix per-CPU MCA mess and make UP kernels work again
  2005-01-26  2:47 [patch] fix per-CPU MCA mess and make UP kernels work again David Mosberger
                   ` (14 preceding siblings ...)
  2005-01-26 22:33 ` Luck, Tony
@ 2005-01-27  0:40 ` David Mosberger
  2005-01-27  0:55 ` Luck, Tony
                   ` (11 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: David Mosberger @ 2005-01-27  0:40 UTC (permalink / raw)
  To: linux-ia64

>>>>> On Wed, 26 Jan 2005 14:33:00 -0800, "Luck, Tony" <tony.luck@intel.com> said:

  Tony> Jesse's generic up-build patch still needs some work, right?
  >>  Not if we change per_cpu_init() to _not_ allocate the MCA data.
  >> I can cook up a patch for doing this once the above two patches
  >> show up in your release-2.6.11 bk tree.

  Tony> Ok.  Your big fix-it patch, and Russ's addendum ar in the
  Tony> release tree.

OK, how about this patch?  Compile-tested and boot-tested for zx1 UP
and SMP and GENERIC.  I also verified that INIT dumps works for all
three kernels.

The discontig stuff is untested but I hope I got it right.

Note that with this patch applied, the per-CPU MCA memory will be
allocated from one node only.  As per our earlier discussion, that
should be OK.  I wouldn't object to doing per-node allocations, if
only there was a sane interface to allocate boot-memory for a
particular CPU (along the lines of the alloc_boot_mem_for_cpu() I
suggested), but I'm not familiar enough with the NUMA code to do this
myself.

	--david

--------------------------------------------------------------------
ia64: Move allocation of per-CPU MCA data out of per_cpu_init()

This patch moves the per-CPU MCA data allocation out of per_cpu_init()
so the code can be shared for contig and discontig memory
architectures.  Also, it means we can revert back to the old way
of doing per_cpu_init() on UP.

Signed-off-by: David Mosberger-Tang <davidm@hpl.hp.com>

=== arch/ia64/kernel/mca.c 1.77 vs edited ==--- 1.77/arch/ia64/kernel/mca.c	2005-01-26 10:01:28 -08:00
+++ edited/arch/ia64/kernel/mca.c	2005-01-26 15:43:02 -08:00
@@ -1209,6 +1209,18 @@
 {
 	void *pal_vaddr;
 
+	if (smp_processor_id() = 0) {
+		void *mca_data;
+		int cpu;
+
+		mca_data = alloc_bootmem(sizeof(struct ia64_mca_cpu)
+					 * NR_CPUS);
+		for (cpu = 0; cpu < NR_CPUS; cpu++) {
+			__per_cpu_mca[cpu] = __pa(mca_data);
+			mca_data += sizeof(struct ia64_mca_cpu);
+		}
+	}
+
         /*
          * The MCA info structure was allocated earlier and its
          * physical address saved in __per_cpu_mca[cpu].  Copy that
=== arch/ia64/mm/contig.c 1.13 vs edited ==--- 1.13/arch/ia64/mm/contig.c	2005-01-26 10:01:33 -08:00
+++ edited/arch/ia64/mm/contig.c	2005-01-26 15:39:07 -08:00
@@ -169,6 +169,7 @@
 	find_initrd();
 }
 
+#ifdef CONFIG_SMP
 /**
  * per_cpu_init - setup per-cpu variables
  *
@@ -177,18 +178,15 @@
 void *
 per_cpu_init (void)
 {
-	void *mca_data, *my_data;
+	void *cpu_data;
 	int cpu;
 
-#ifdef CONFIG_SMP
 	/*
 	 * get_free_pages() cannot be used before cpu_init() done.  BSP
 	 * allocates "NR_CPUS" pages for all CPUs to avoid that AP calls
 	 * get_zeroed_page().
 	 */
 	if (smp_processor_id() = 0) {
-		void *cpu_data;
-
 		cpu_data = __alloc_bootmem(PERCPU_PAGE_SIZE * NR_CPUS,
 					   PERCPU_PAGE_SIZE, __pa(MAX_DMA_ADDRESS));
 		for (cpu = 0; cpu < NR_CPUS; cpu++) {
@@ -198,20 +196,9 @@
 			per_cpu(local_per_cpu_offset, cpu) = __per_cpu_offset[cpu];
 		}
 	}
-	my_data = __per_cpu_start + __per_cpu_offset[smp_processor_id()];
-#else
-	my_data = (void *) __phys_per_cpu_start;
-#endif
-
-	if (smp_processor_id() = 0) {
-		mca_data = alloc_bootmem(sizeof (struct ia64_mca_cpu) * NR_CPUS);
-		for (cpu = 0; cpu < NR_CPUS; cpu++) {
-			__per_cpu_mca[cpu] = __pa(mca_data);
-			mca_data += sizeof (struct ia64_mca_cpu);
-		}
-	}
-	return my_data;
+	return __per_cpu_start + __per_cpu_offset[smp_processor_id()];
 }
+#endif /* CONFIG_SMP */
 
 static int
 count_pages (u64 start, u64 end, void *arg)
=== arch/ia64/mm/discontig.c 1.29 vs edited ==--- 1.29/arch/ia64/mm/discontig.c	2005-01-26 10:01:34 -08:00
+++ edited/arch/ia64/mm/discontig.c	2005-01-26 15:39:48 -08:00
@@ -26,7 +26,6 @@
 #include <asm/meminit.h>
 #include <asm/numa.h>
 #include <asm/sections.h>
-#include <asm/mca.h>
 
 /*
  * Track per-node information needed to setup the boot memory allocator, the
@@ -294,9 +293,6 @@
  *   |------------------------|
  *   |  local ia64_node_data  |
  *   |------------------------|
- *   |    MCA/INIT data *     |
- *   |    cpus_on_this_node   |
- *   |------------------------|
  *   |          ???           |
  *   |________________________|
  *
@@ -310,7 +306,7 @@
 {
 	unsigned long epfn, cpu, cpus, phys_cpus;
 	unsigned long pernodesize = 0, pernode, pages, mapsize;
-	void *cpu_data, *mca_data_phys;
+	void *cpu_data;
 	struct bootmem_data *bdp = &mem_data[node].bootmem_data;
 
 	epfn = (start + len) >> PAGE_SHIFT;
@@ -339,7 +335,6 @@
 	pernodesize += node * L1_CACHE_BYTES;
 	pernodesize += L1_CACHE_ALIGN(sizeof(pg_data_t));
 	pernodesize += L1_CACHE_ALIGN(sizeof(struct ia64_node_data));
-	pernodesize += L1_CACHE_ALIGN(sizeof(struct ia64_mca_cpu)) * phys_cpus;
 	pernodesize = PAGE_ALIGN(pernodesize);
 	pernode = NODEDATA_ALIGN(start, node);
 
@@ -362,9 +357,6 @@
 		mem_data[node].pgdat->bdata = bdp;
 		pernode += L1_CACHE_ALIGN(sizeof(pg_data_t));
 
-		mca_data_phys = (void *)pernode;
-		pernode += L1_CACHE_ALIGN(sizeof(struct ia64_mca_cpu)) * phys_cpus;
-
 		/*
 		 * Copy the static per-cpu data into the region we
 		 * just set aside and then setup __per_cpu_offset
@@ -374,18 +366,6 @@
 			if (node = node_cpuid[cpu].nid) {
 				memcpy(__va(cpu_data), __phys_per_cpu_start,
 				       __per_cpu_end - __per_cpu_start);
-				if ((cpu = 0) || (node_cpuid[cpu].phys_id > 0)) {
-					/* 
-					 * The memory for the cpuinfo structure is allocated
-					 * here, but the data in the structure is initialized
-					 * later.  Save the physical address of the MCA save
-					 * area in __per_cpu_mca[cpu].  When the cpuinfo struct 
-					 * is initialized, the value in __per_cpu_mca[cpu]
-					 * will be put in the cpuinfo structure.
-					 */
-					__per_cpu_mca[cpu] = __pa(mca_data_phys);
-					mca_data_phys += L1_CACHE_ALIGN(sizeof(struct ia64_mca_cpu));
-				}
 				__per_cpu_offset[cpu] = (char*)__va(cpu_data) -
 					__per_cpu_start;
 				cpu_data += PERCPU_PAGE_SIZE;
=== include/asm-ia64/percpu.h 1.16 vs edited ==--- 1.16/include/asm-ia64/percpu.h	2005-01-26 10:01:39 -08:00
+++ edited/include/asm-ia64/percpu.h	2005-01-26 15:35:12 -08:00
@@ -46,11 +46,13 @@
 
 extern void percpu_modcopy(void *pcpudst, const void *src, unsigned long size);
 extern void setup_per_cpu_areas (void);
+extern void *per_cpu_init(void);
 
 #else /* ! SMP */
 
 #define per_cpu(var, cpu)			(*((void)cpu, &per_cpu__##var))
 #define __get_cpu_var(var)			per_cpu__##var
+#define per_cpu_init()				(__phys_per_cpu_start)
 
 #endif	/* SMP */
 
@@ -64,8 +66,6 @@
  * more efficient.
  */
 #define __ia64_per_cpu_var(var)	(per_cpu__##var)
-
-extern void *per_cpu_init(void);
 
 #endif /* !__ASSEMBLY__ */
 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [patch] fix per-CPU MCA mess and make UP kernels work again
  2005-01-26  2:47 [patch] fix per-CPU MCA mess and make UP kernels work again David Mosberger
                   ` (15 preceding siblings ...)
  2005-01-27  0:40 ` David Mosberger
@ 2005-01-27  0:55 ` Luck, Tony
  2005-01-28 22:54 ` Russ Anderson
                   ` (10 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Luck, Tony @ 2005-01-27  0:55 UTC (permalink / raw)
  To: linux-ia64

>OK, how about this patch?  Compile-tested and boot-tested for zx1 UP
>and SMP and GENERIC.  I also verified that INIT dumps works for all
>three kernels.
>
>The discontig stuff is untested but I hope I got it right.

Russ ... Jesse ... can you ack when you test this.

Thanks

-Tony

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch] fix per-CPU MCA mess and make UP kernels work again
  2005-01-26  2:47 [patch] fix per-CPU MCA mess and make UP kernels work again David Mosberger
                   ` (16 preceding siblings ...)
  2005-01-27  0:55 ` Luck, Tony
@ 2005-01-28 22:54 ` Russ Anderson
  2005-02-02  1:04 ` Luck, Tony
                   ` (9 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Russ Anderson @ 2005-01-28 22:54 UTC (permalink / raw)
  To: linux-ia64

Tony Luck wrote:
> 
> >OK, how about this patch?  Compile-tested and boot-tested for zx1 UP
> >and SMP and GENERIC.  I also verified that INIT dumps works for all
> >three kernels.
> >
> >The discontig stuff is untested but I hope I got it right.
> 
> Russ ... Jesse ... can you ack when you test this.

The system does not seem to recover as often with this patch, but
I'm not sure why.  At first I thought it may be just reasonable
varience, but after backing out the patch and re-running more times,
then re-applying and re-re-running, it certainly looks like the
test fails quicker with the patch.

When it fails, it Oops's with an ip of 0, as if something was
not restored properly.  But I have to dig deeper to find out 
what is going on.

-- 
Russ Anderson, OS RAS/Partitioning Project Lead  
SGI - Silicon Graphics Inc          rja@sgi.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [patch] fix per-CPU MCA mess and make UP kernels work again
  2005-01-26  2:47 [patch] fix per-CPU MCA mess and make UP kernels work again David Mosberger
                   ` (17 preceding siblings ...)
  2005-01-28 22:54 ` Russ Anderson
@ 2005-02-02  1:04 ` Luck, Tony
  2005-02-02 20:25 ` Russ Anderson
                   ` (8 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Luck, Tony @ 2005-02-02  1:04 UTC (permalink / raw)
  To: linux-ia64

>> Russ ... Jesse ... can you ack when you test this.
>
>The system does not seem to recover as often with this patch, but
>I'm not sure why.  At first I thought it may be just reasonable
>varience, but after backing out the patch and re-running more times,
>then re-applying and re-re-running, it certainly looks like the
>test fails quicker with the patch.
>
>When it fails, it Oops's with an ip of 0, as if something was
>not restored properly.  But I have to dig deeper to find out 
>what is going on.

Any progress?

The only visible effect of this patch should be that the
ia64_mca_cpu structures for all cpus will be in one big block,
instead of scattered across nodes.  Perhaps there is an overrun
problem so one cpu is corrupting data that belongs to an adjacent
cpu.  When the allocations are scattered you might notice this
less often?

-Tony

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch] fix per-CPU MCA mess and make UP kernels work again
  2005-01-26  2:47 [patch] fix per-CPU MCA mess and make UP kernels work again David Mosberger
                   ` (18 preceding siblings ...)
  2005-02-02  1:04 ` Luck, Tony
@ 2005-02-02 20:25 ` Russ Anderson
  2005-02-03 22:48 ` Luck, Tony
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Russ Anderson @ 2005-02-02 20:25 UTC (permalink / raw)
  To: linux-ia64

Tony Luck wrote:
> 
> Any progress?
> 
> The only visible effect of this patch should be that the
> ia64_mca_cpu structures for all cpus will be in one big block,
> instead of scattered across nodes.  Perhaps there is an overrun
> problem so one cpu is corrupting data that belongs to an adjacent
> cpu.  When the allocations are scattered you might notice this
> less often?

Yes, I suspect something along those lines.
I'm actively digging into it.

-- 
Russ Anderson, OS RAS/Partitioning Project Lead  
SGI - Silicon Graphics Inc          rja@sgi.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [patch] fix per-CPU MCA mess and make UP kernels work again
  2005-01-26  2:47 [patch] fix per-CPU MCA mess and make UP kernels work again David Mosberger
                   ` (19 preceding siblings ...)
  2005-02-02 20:25 ` Russ Anderson
@ 2005-02-03 22:48 ` Luck, Tony
  2005-02-03 23:48 ` Russ Anderson
                   ` (6 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Luck, Tony @ 2005-02-03 22:48 UTC (permalink / raw)
  To: linux-ia64

[-- Attachment #1: Type: text/plain, Size: 282 bytes --]

This patch helps some (all uses of ia64_mca_data in
mca_asm.S need to deref the pointer to get at the
actual space, instead of clobbering thigs in the
percpu area).

Also I noticed that ia64_sal_to_os_handoff_state
hasn't been replicated ... won't we need that too?

-Tony

[-- Attachment #2: forruss.patch --]
[-- Type: application/octet-stream, Size: 1152 bytes --]

===== arch/ia64/kernel/mca_asm.S 1.19 vs edited =====
--- 1.19/arch/ia64/kernel/mca_asm.S	2005-01-26 14:26:40 -08:00
+++ edited/arch/ia64/kernel/mca_asm.S	2005-02-03 14:02:33 -08:00
@@ -311,6 +311,8 @@
 	// Setup new stack frame for OS_MCA handling
 	GET_THIS_PADDR(r2, ia64_mca_data)
 	;;
+	ld8 r2 = [r2]
+	;;
 	add r3 = IA64_MCA_CPU_STACKFRAME_OFFSET, r2
 	add r2 = IA64_MCA_CPU_RBSTORE_OFFSET, r2
 	;;
@@ -318,6 +320,8 @@
 
 	GET_THIS_PADDR(r2, ia64_mca_data)
 	;;
+	ld8 r2 = [r2]
+	;;
 	add r2 = IA64_MCA_CPU_STACK_OFFSET+IA64_MCA_STACK_SIZE-16, r2
 	;;
 	mov r12=r2		// establish new stack-pointer
@@ -340,6 +344,8 @@
 	;;
 	add r2 = IA64_MCA_CPU_STACKFRAME_OFFSET, r2
 	;;
+	ld8 r2 = [r2]
+	;;
 	movl    r4=IA64_PSR_MC
 	;;
 	rse_return_context(r4,r3,r2)	// switch from interrupt context for RSE
@@ -382,6 +388,8 @@
 //  to virtual addressing mode.
 	GET_THIS_PADDR(r2, ia64_mca_data)
 	;;
+	ld8 r2 = [r2]
+	;;
 	add r2 = IA64_MCA_CPU_PROC_STATE_DUMP_OFFSET, r2
 	;;
 // save ar.NaT
@@ -614,6 +622,8 @@
 
 // Restore bank1 GR16-31
 	GET_THIS_PADDR(r2, ia64_mca_data)
+	;;
+	ld8 r2 = [r2]
 	;;
 	add r2 = IA64_MCA_CPU_PROC_STATE_DUMP_OFFSET, r2
 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch] fix per-CPU MCA mess and make UP kernels work again
  2005-01-26  2:47 [patch] fix per-CPU MCA mess and make UP kernels work again David Mosberger
                   ` (20 preceding siblings ...)
  2005-02-03 22:48 ` Luck, Tony
@ 2005-02-03 23:48 ` Russ Anderson
  2005-02-04  2:09 ` Jack Steiner
                   ` (5 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Russ Anderson @ 2005-02-03 23:48 UTC (permalink / raw)
  To: linux-ia64

Tony Luck wrote:
> 
> Also I noticed that ia64_sal_to_os_handoff_state
> hasn't been replicated ... won't we need that too?

According to the SAL Spec, MCAs are supposed to be handled
one at a time.  

I would like that to change, so that multiple independent
MCAs could be handled in parallel, but that would require
a number of Spec changes (ie the way rendezvous works).

We should go down that path...

-- 
Russ Anderson, OS RAS/Partitioning Project Lead  
SGI - Silicon Graphics Inc          rja@sgi.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch] fix per-CPU MCA mess and make UP kernels work again
  2005-01-26  2:47 [patch] fix per-CPU MCA mess and make UP kernels work again David Mosberger
                   ` (21 preceding siblings ...)
  2005-02-03 23:48 ` Russ Anderson
@ 2005-02-04  2:09 ` Jack Steiner
  2005-02-04  3:00 ` Keith Owens
                   ` (4 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Jack Steiner @ 2005-02-04  2:09 UTC (permalink / raw)
  To: linux-ia64

On Thu, Feb 03, 2005 at 05:48:26PM -0600, Russ Anderson wrote:
> Tony Luck wrote:
> > 
> > Also I noticed that ia64_sal_to_os_handoff_state
> > hasn't been replicated ... won't we need that too?
> 
> According to the SAL Spec, MCAs are supposed to be handled
> one at a time.  

It has been a long time since I looked, but I thought the
spec allowed either implemention, ie. serialize OR all-at-once.

Maybe I'm remembering the error handling guide but I know
I have seen this somewhere.....


> 
> I would like that to change, so that multiple independent
> MCAs could be handled in parallel, but that would require
> a number of Spec changes (ie the way rendezvous works).
> 
> We should go down that path...
> 
> -- 
> Russ Anderson, OS RAS/Partitioning Project Lead  
> SGI - Silicon Graphics Inc          rja@sgi.com
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Thanks

Jack Steiner (steiner@sgi.com)          651-683-5302
Principal Engineer                      SGI - Silicon Graphics, Inc.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch] fix per-CPU MCA mess and make UP kernels work again
  2005-01-26  2:47 [patch] fix per-CPU MCA mess and make UP kernels work again David Mosberger
                   ` (22 preceding siblings ...)
  2005-02-04  2:09 ` Jack Steiner
@ 2005-02-04  3:00 ` Keith Owens
  2005-02-04 16:24 ` Jack Steiner
                   ` (3 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Keith Owens @ 2005-02-04  3:00 UTC (permalink / raw)
  To: linux-ia64

On Thu, 3 Feb 2005 20:09:57 -0600, 
Jack Steiner <steiner@sgi.com> wrote:
>On Thu, Feb 03, 2005 at 05:48:26PM -0600, Russ Anderson wrote:
>> According to the SAL Spec, MCAs are supposed to be handled
>> one at a time.  
>
>It has been a long time since I looked, but I thought the
>spec allowed either implemention, ie. serialize OR all-at-once.
>
>Maybe I'm remembering the error handling guide but I know
>I have seen this somewhere.....

It is ambiguous.  Extracts from SAL spec.

4.1.1 says only one processor gets OS_MCA.

  When multiple processors experience machine checks simultaneously,
  SAL selects a "monarch" machine check processor to accumulate all the
  error records at the platform level and continue with the machine
  check processing. "Monarch" status is relevant only for the current
  MCA error event.

4.7.2 (5) also says only one processor.

  5. SAL selects a monarch for handling the error. All slaves
     processors in SAL_MC_RENDEZ check in their status with the SAL on
     the monarch.

But the last sentence of 4.7.2 (8) refers to multiple processors in OS
MCA.

  8. SAL finishes the MCA handling on all the processors that are in
     MCA and waits for all the processors in MCA to synchronize before
     branching to OS MCA for further processing.  Note that the
     hand-off to OS MCA from SAL MCA occurs simultaneously on all
     processors executing in SAL MCA handler.

4.7.2 (9) lets the OS choose the monarch, which implies that more than
one cpu can be in OS MCA handler.

  9. OS_MCA may choose a monarch processor to continue with error
     handling. After OS_MCA completes the error handling, the monarch
     processor wakes up all the slaves through a wake-up message as
     shown by (9) in Figure 4-4

The end of 4.7.3 also implies that OS MCA handler can be running on
multiple cpus. Note 'on all the processors'.

  When multiple processors experience machine checks simultaneously,
  SAL selects a monarch machine check processor to accumulate all the
  error records at the platform level. Once this is done, the OS_MCA
  procedure will take control of further error handling on all the
  processors that experienced the machine checks. The OS_MCA layer may
  need to implement a similar monarch processor selection for the error
  recovery phase. The operating system will be aware of which
  processors invoked the SAL_MC_RENDEZ procedure in response to the
  MC_rendezvous interrupt or the INIT signal and shall wake up those
  processors.


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch] fix per-CPU MCA mess and make UP kernels work again
  2005-01-26  2:47 [patch] fix per-CPU MCA mess and make UP kernels work again David Mosberger
                   ` (23 preceding siblings ...)
  2005-02-04  3:00 ` Keith Owens
@ 2005-02-04 16:24 ` Jack Steiner
  2005-02-04 16:34 ` Russ Anderson
                   ` (2 subsequent siblings)
  27 siblings, 0 replies; 29+ messages in thread
From: Jack Steiner @ 2005-02-04 16:24 UTC (permalink / raw)
  To: linux-ia64

On Fri, Feb 04, 2005 at 02:00:15PM +1100, Keith Owens wrote:
> On Thu, 3 Feb 2005 20:09:57 -0600, 
> Jack Steiner <steiner@sgi.com> wrote:
> >On Thu, Feb 03, 2005 at 05:48:26PM -0600, Russ Anderson wrote:
> >> According to the SAL Spec, MCAs are supposed to be handled
> >> one at a time.  
> >
> >It has been a long time since I looked, but I thought the
> >spec allowed either implemention, ie. serialize OR all-at-once.
> >
> >Maybe I'm remembering the error handling guide but I know
> >I have seen this somewhere.....
> 
> It is ambiguous.  Extracts from SAL spec.
> 
> 4.1.1 says only one processor gets OS_MCA.
> 
>   When multiple processors experience machine checks simultaneously,
>   SAL selects a "monarch" machine check processor to accumulate all the
>   error records at the platform level and continue with the machine
>   check processing. "Monarch" status is relevant only for the current
>   MCA error event.
> 
> 4.7.2 (5) also says only one processor.
> 
>   5. SAL selects a monarch for handling the error. All slaves
>      processors in SAL_MC_RENDEZ check in their status with the SAL on
>      the monarch.
> 
> But the last sentence of 4.7.2 (8) refers to multiple processors in OS
> MCA.
> 
>   8. SAL finishes the MCA handling on all the processors that are in
>      MCA and waits for all the processors in MCA to synchronize before
>      branching to OS MCA for further processing.  Note that the
>      hand-off to OS MCA from SAL MCA occurs simultaneously on all
>      processors executing in SAL MCA handler.
> 
> 4.7.2 (9) lets the OS choose the monarch, which implies that more than
> one cpu can be in OS MCA handler.
> 
>   9. OS_MCA may choose a monarch processor to continue with error
>      handling. After OS_MCA completes the error handling, the monarch
>      processor wakes up all the slaves through a wake-up message as
>      shown by (9) in Figure 4-4
> 
> The end of 4.7.3 also implies that OS MCA handler can be running on
> multiple cpus. Note 'on all the processors'.
> 
>   When multiple processors experience machine checks simultaneously,
>   SAL selects a monarch machine check processor to accumulate all the
>   error records at the platform level. Once this is done, the OS_MCA
>   procedure will take control of further error handling on all the
>   processors that experienced the machine checks. The OS_MCA layer may
>   need to implement a similar monarch processor selection for the error
>   recovery phase. The operating system will be aware of which
>   processors invoked the SAL_MC_RENDEZ procedure in response to the
>   MC_rendezvous interrupt or the INIT signal and shall wake up those
>   processors.

To further muddy the waters, it looks like the latest Error Handling Guide
has addressed the issue:

>> Intel® Itanium® Processor Family Error Handling Guide April 2004
>>
>> Document Number: 249278-003
>>
>> 2.7.1
>>
>> ...
>> The MCA error information is provided to the OS_MCA layer. The MCA
>> error record is logged to the NVM.  To simplify SAL implementation, it
>> is strongly recommended that SAL process all MCAs by handing off to the
>> OS as soon as possible to prevent some OSes from experiencing time-outs
>> and potentially crashing the system. >>>> The SAL may maintain a variable in
>> the SAL data area that indicates whether SAL, on one of the processors,
>> is already handling an MCA. If so, MCA processing on other processors will
>> wait within the SAL MCA handler until the current MCA is processed. This
>> situation may arise when local MCAs are experienced on multiple
>> processors. <<<<<<<


However, it says "may maintain a variable...".  Should I interpret this as 
allowing but not requiring serialization?


-- 
Thanks

Jack Steiner (steiner@sgi.com)          651-683-5302
Principal Engineer                      SGI - Silicon Graphics, Inc.



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch] fix per-CPU MCA mess and make UP kernels work again
  2005-01-26  2:47 [patch] fix per-CPU MCA mess and make UP kernels work again David Mosberger
                   ` (24 preceding siblings ...)
  2005-02-04 16:24 ` Jack Steiner
@ 2005-02-04 16:34 ` Russ Anderson
  2005-02-06 15:58 ` Russ Anderson
  2005-02-07 22:58 ` Luck, Tony
  27 siblings, 0 replies; 29+ messages in thread
From: Russ Anderson @ 2005-02-04 16:34 UTC (permalink / raw)
  To: linux-ia64

Jack Steiner wrote:
> On Fri, Feb 04, 2005 at 02:00:15PM +1100, Keith Owens wrote:
> 
> To further muddy the waters, it looks like the latest Error Handling Guide
> has addressed the issue:
> 
> >> Intel® Itanium® Processor Family Error Handling Guide April 2004
> >>
> >> Document Number: 249278-003
> >>
> >> 2.7.1
> >>
> >> ...
> >> The MCA error information is provided to the OS_MCA layer. The MCA
> >> error record is logged to the NVM.  To simplify SAL implementation, it
> >> is strongly recommended that SAL process all MCAs by handing off to the
> >> OS as soon as possible to prevent some OSes from experiencing time-outs
> >> and potentially crashing the system. >>>> The SAL may maintain a variable in
> >> the SAL data area that indicates whether SAL, on one of the processors,
> >> is already handling an MCA. If so, MCA processing on other processors will
> >> wait within the SAL MCA handler until the current MCA is processed. This
> >> situation may arise when local MCAs are experienced on multiple
> >> processors. <<<<<<<
> 
> 
> However, it says "may maintain a variable...".  Should I interpret this as 
> allowing but not requiring serialization?

I vote for that interpritation.  IMHO, Linux needs to continue to support
SALs that single thread MCAs, but should also allow concurrent MCA
handling for SALs that support it.


-- 
Russ Anderson, OS RAS/Partitioning Project Lead  
SGI - Silicon Graphics Inc          rja@sgi.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [patch] fix per-CPU MCA mess and make UP kernels work again
  2005-01-26  2:47 [patch] fix per-CPU MCA mess and make UP kernels work again David Mosberger
                   ` (25 preceding siblings ...)
  2005-02-04 16:34 ` Russ Anderson
@ 2005-02-06 15:58 ` Russ Anderson
  2005-02-07 22:58 ` Luck, Tony
  27 siblings, 0 replies; 29+ messages in thread
From: Russ Anderson @ 2005-02-06 15:58 UTC (permalink / raw)
  To: linux-ia64

Tony Luck wrote:
> 
> This patch helps some (all uses of ia64_mca_data in
> mca_asm.S need to deref the pointer to get at the
> actual space, instead of clobbering thigs in the
> percpu area).

That patch does not work on Altix (discontig & SMP).
The system wedges in the MCA code.

The patch (below) helps in that they system gets through the
MCA code and back to the error injection app, but then
the system dies.

I'm confused about how the offsets are now being computed.
Keith's explaination made sense.  How does the current code
get the correct link time offsets?

Keith Owens wrote:
> Russ Anderson <rja@sgi.com> wrote:
> >OK, just to make sure I understand, it is not practical to have ar.k3
> >store a pointer to __per_cpu_start, due to the difficulty computing
> >the offset to per_cpu__cpu_info.  So ar.k3 should point at the
> >the start of per_cpu__cpu_info (the cpuinfo_ia64 structure).
> 
> Right.  We can compute the offset from __per_cpu_start to
> per_cpu__cpu_info, but that offset can only be computed at link time.
> That rules out the use of asm-offsets, asm-offsets.h is built at
> compile time, not link time.  The extra code required to compute the
> offset at link time and apply that offset to ar.k3 to get from
> __per_cpu_start to per_cpu__cpu_info is ugly and is not worth the
> effort.
> 
> Put all the MCA related fields in struct cpuinfo_ia64, point ar.k3 at
> struct cpuinfo_ia64 and we can use compile time offsets for the fields
> that the MCA handler cares about.  We just need to document that all
> MCA related fields must be in struct cpuinfo_ia64 or accessed via a
> pointer that is in struct cpuinfo_ia64.



-----------------------------------------------------
Index: linux/arch/ia64/kernel/mca_asm.S
=================================--- linux.orig/arch/ia64/kernel/mca_asm.S	2005-02-04 09:58:01.241029555 -0600
+++ linux/arch/ia64/kernel/mca_asm.S	2005-02-05 10:11:46.126571337 -0600
@@ -342,10 +342,10 @@
 	// restore the original stack frame here
 	GET_THIS_PADDR(r2, ia64_mca_data)
 	;;
-	add r2 = IA64_MCA_CPU_STACKFRAME_OFFSET, r2
-	;;
 	ld8 r2 = [r2]
 	;;
+	add r2 = IA64_MCA_CPU_STACKFRAME_OFFSET, r2
+	;;
 	movl    r4=IA64_PSR_MC
 	;;
 	rse_return_context(r4,r3,r2)	// switch from interrupt context for RSE
-----------------------------------------------------



-- 
Russ Anderson, OS RAS/Partitioning Project Lead  
SGI - Silicon Graphics Inc          rja@sgi.com

^ permalink raw reply	[flat|nested] 29+ messages in thread

* RE: [patch] fix per-CPU MCA mess and make UP kernels work again
  2005-01-26  2:47 [patch] fix per-CPU MCA mess and make UP kernels work again David Mosberger
                   ` (26 preceding siblings ...)
  2005-02-06 15:58 ` Russ Anderson
@ 2005-02-07 22:58 ` Luck, Tony
  27 siblings, 0 replies; 29+ messages in thread
From: Luck, Tony @ 2005-02-07 22:58 UTC (permalink / raw)
  To: linux-ia64

>I'm confused about how the offsets are now being computed.
>Keith's explaination made sense.  How does the current code
>get the correct link time offsets?

Initially I'd imagined that ar.k3 would point at the physical
address of the base of the per-cpu area ... and then we'd be
able to access any per-cpu variable with:

	phys(per_cpu_X) = ar.k3 + (virt_per_cpu_X - PERCPU_ADDR);

As Keith has pointed out, it's hard to do this because of the
allocation of addresses to percpu variables at link time.  But
David observed that we could get around this by assigning ar.k3
a value so that:

	phys(per_cpu_X) = ar.k3 + virt_per_cpu_X;

re-arranging (algebra 101):

	ar.k3 = phys(per_cpu_X) - virt_per_cpu_X;

This clearly gives the same value for ar.k3 for any per-cpu
variable (the offsets within the virtual page are the same
as the offsets within the physical page).  So we can use the
base of the page itself in setup.c:

        ia64_set_kr(IA64_KR_PER_CPU_DATA,
                    ia64_tpa(cpu_data) - (long) __per_cpu_start);

Interesting numerical aside ... When I first looked at the values in
ar.k3 on each cpu, I thought there was an off-by-one error in
the initialzation ... I saw that ar.k3 on cpu(N) pointed to the
physical address of the percpu area of cpu(N+1).  But if you think for
a little while about the addresses that we use for per-cpu variables,
you realize that they all look like small negative numbers (since we
anchor the top of the per-cpu area at 0xffffffffffffffff).  So in fact
the value we end up with in ar.k3 is the physical address of the END
of that cpu's per area.  E.g. "__per_cpu_start" in the statement above
has the value "-64K" ... so if ar.k3 points at the end of the structure,
then ar.k3 - 64K points at the start.

-Tony

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2005-02-07 22:58 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-01-26  2:47 [patch] fix per-CPU MCA mess and make UP kernels work again David Mosberger
2005-01-26 16:25 ` Jesse Barnes
2005-01-26 17:13 ` Russ Anderson
2005-01-26 17:48 ` David Mosberger
2005-01-26 17:53 ` Jesse Barnes
2005-01-26 18:05 ` David Mosberger
2005-01-26 18:11 ` Jesse Barnes
2005-01-26 19:01 ` Russ Anderson
2005-01-26 19:23 ` Luck, Tony
2005-01-26 20:07 ` David Mosberger
2005-01-26 21:40 ` Russ Anderson
2005-01-26 21:50 ` David Mosberger
2005-01-26 22:13 ` Luck, Tony
2005-01-26 22:16 ` David Mosberger
2005-01-26 22:19 ` Jesse Barnes
2005-01-26 22:33 ` Luck, Tony
2005-01-27  0:40 ` David Mosberger
2005-01-27  0:55 ` Luck, Tony
2005-01-28 22:54 ` Russ Anderson
2005-02-02  1:04 ` Luck, Tony
2005-02-02 20:25 ` Russ Anderson
2005-02-03 22:48 ` Luck, Tony
2005-02-03 23:48 ` Russ Anderson
2005-02-04  2:09 ` Jack Steiner
2005-02-04  3:00 ` Keith Owens
2005-02-04 16:24 ` Jack Steiner
2005-02-04 16:34 ` Russ Anderson
2005-02-06 15:58 ` Russ Anderson
2005-02-07 22:58 ` Luck, Tony

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.