All of lore.kernel.org
 help / color / mirror / Atom feed
* [Linux-ia64] [patch] 2.4.20 ia64_sal_mc_rendez must not lock
@ 2003-03-22  4:51 Keith Owens
  2003-03-22 13:03 ` Matthew Wilcox
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Keith Owens @ 2003-03-22  4:51 UTC (permalink / raw)
  To: linux-ia64

No cpu will return from ia64_sal_mc_rendez() until all cpus have
entered rendezvous and the monarch cpu sends the wake up ipi.  All cpus
try to call ia64_sal_mc_rendez() but only the first one makes it, the
others all block on the spinlock and eventually SAL hits them with an
INIT.

Why do I get the feeling that I am the first person to really use this
code?

With this patch and my previous patch to set SAL_MC_PARAM_RZ_ALWAYS,
kdb v4.0 gets backtrace on _ALL_ cpus when an MCA occurs.  Well, almost
all, if any of the cpus are spinning disabled then the MCA rendezvous
interrupt does not get through, SAL sends INIT and that cpu drops into
INIT processing.  kdb processing for INIT handlers is not complete yet,
work in progress.

Index: 20.5/include/asm-ia64/sal.h
--- 20.5/include/asm-ia64/sal.h Wed, 11 Dec 2002 20:58:53 +1100 kaos (linux-2.4/s/47_sal.h 1.1.3.2.3.1.1.1.1.3 644)
+++ 20.5(w)/include/asm-ia64/sal.h Sat, 22 Mar 2003 15:35:13 +1100 kaos (linux-2.4/s/47_sal.h 1.1.3.2.3.1.1.1.1.3 644)
@@ -46,6 +46,16 @@ extern spinlock_t sal_lock;
 	ia64_load_scratch_fpregs(fr);                   \
 } while (0)
 
+# define SAL_CALL_NOLOCK(result,args...) do {		\
+	unsigned long flags;				\
+	struct ia64_fpreg fr[6];			\
+	ia64_save_scratch_fpregs(fr);                   \
+	local_irq_save(flags);				\
+	__SAL_CALL(result,args);			\
+	local_irq_restore(flags);			\
+	ia64_load_scratch_fpregs(fr);                   \
+} while (0)
+
 #define SAL_SET_VECTORS			0x01000000
 #define SAL_GET_STATE_INFO		0x01000001
 #define SAL_GET_STATE_INFO_SIZE		0x01000002
@@ -700,13 +710,14 @@ ia64_sal_get_state_info_size (u64 sal_in
 }
 
 /* Causes the processor to go into a spin loop within SAL where SAL awaits a wakeup
- * from the monarch processor.
+ * from the monarch processor.  Must not lock, this will not return on any cpu until
+ * the monarch processor sends a wake up.
  */
 static inline s64
 ia64_sal_mc_rendez (void)
 {
 	struct ia64_sal_retval isrv;
-	SAL_CALL(isrv, SAL_MC_RENDEZ, 0, 0, 0, 0, 0, 0, 0);
+	SAL_CALL_NOLOCK(isrv, SAL_MC_RENDEZ, 0, 0, 0, 0, 0, 0, 0);
 	return isrv.status;
 }
 



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Linux-ia64] [patch] 2.4.20 ia64_sal_mc_rendez must not lock
  2003-03-22  4:51 [Linux-ia64] [patch] 2.4.20 ia64_sal_mc_rendez must not lock Keith Owens
@ 2003-03-22 13:03 ` Matthew Wilcox
  2003-03-22 15:43 ` Keith Owens
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Matthew Wilcox @ 2003-03-22 13:03 UTC (permalink / raw)
  To: linux-ia64

On Sat, Mar 22, 2003 at 03:51:54PM +1100, Keith Owens wrote:
> No cpu will return from ia64_sal_mc_rendez() until all cpus have
> entered rendezvous and the monarch cpu sends the wake up ipi.  All cpus
> try to call ia64_sal_mc_rendez() but only the first one makes it, the
> others all block on the spinlock and eventually SAL hits them with an
> INIT.
> 
> Why do I get the feeling that I am the first person to really use this
> code?

Maybe this is (in part) what's causing
https://lists.linuxia64.org/archives/linux-ia64/2002-August/003876.html

Apparently the reason it hangs is that I leave my console at the default
9600 baud (you know, like a customer would ..) and when dumping state
it takes so long that the system assumes the CPU has hung.  Bag o' shite.

-- 
"It's not Hollywood.  War is real, war is primarily not about defeat or
victory, it is about death.  I've seen thousands and thousands of dead bodies.
Do you think I want to have an academic debate on this subject?" -- Robert Fisk


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Linux-ia64] [patch] 2.4.20 ia64_sal_mc_rendez must not lock
  2003-03-22  4:51 [Linux-ia64] [patch] 2.4.20 ia64_sal_mc_rendez must not lock Keith Owens
  2003-03-22 13:03 ` Matthew Wilcox
@ 2003-03-22 15:43 ` Keith Owens
  2003-03-24 19:53 ` David Mosberger
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Keith Owens @ 2003-03-22 15:43 UTC (permalink / raw)
  To: linux-ia64

On Sat, 22 Mar 2003 13:03:57 +0000, 
Matthew Wilcox <willy@debian.org> wrote:
>Maybe this is (in part) what's causing
>https://lists.linuxia64.org/archives/linux-ia64/2002-August/003876.html
>
>Apparently the reason it hangs is that I leave my console at the default
>9600 baud (you know, like a customer would ..) and when dumping state
>it takes so long that the system assumes the CPU has hung.  Bag o' shite.

I doubt that your hang has anything to do with MCA rendezvous code,
rendezvous and timeouts are not done at boot time.  Also my testing
with SGI SAL shows that the rendezvous cpus are driven first, only
after they all rendezvous or timeout is the monarch cpu passed the MCA.

More likely you are hitting the mismatch between the SAL record format
(which is variable length and has misaligned fields) and the C
definitions for those records.  In theory, that can cause an oops while
processing the record.

Try this extract from kdb v4.0.  Patch is against 2.4.20-ia64-021210,
it should be in the latest ia64 (bk) trees.  I did the original patch,
David Mosberger rewrote it.

Index: 20.5/include/asm-ia64/sal.h
--- 20.5/include/asm-ia64/sal.h Wed, 11 Dec 2002 20:58:53 +1100 kaos (linux-2.4/s/47_sal.h 1.1.3.2.3.1.1.1.1.3 644)
+++ 20.5(w)/include/asm-ia64/sal.h Sun, 23 Mar 2003 02:39:23 +1100 kaos (linux-2.4/s/47_sal.h 1.1.3.2.3.1.1.1.1.3 644)
@@ -353,6 +353,11 @@ typedef struct sal_processor_static_info
     struct ia64_fpreg       fr[128];
 } sal_processor_static_info_t;
 
+struct sal_cpuid_info {
+	u64 regs[5];
+	u64 reserved;
+};
+
 typedef struct sal_log_processor_info
 {
     sal_log_section_hdr_t       header;
@@ -373,19 +378,34 @@ typedef struct sal_log_processor_info
     u64                         proc_error_map;
     u64                         proc_state_parameter;
     u64                         proc_cr_lid;
-    sal_log_mod_error_info_t    cache_check_info[16];
-    sal_log_mod_error_info_t    tlb_check_info[16];
-    sal_log_mod_error_info_t    bus_check_info[16];
-    sal_log_mod_error_info_t    reg_file_check_info[16];
-    sal_log_mod_error_info_t    ms_check_info[16];
-    struct
-    {
-        u64 regs[5];
-        u64 reserved;
-    } cpuid_info;
-    sal_processor_static_info_t processor_static_info;
+	/*
+	 * The rest of this structure consists of variable-length arrays, which can't be
+	 * expressed in C.
+	 */
+	sal_log_mod_error_info_t info[0];
+	/*
+	 * This is what the rest looked like if C supported variable-length arrays:
+	 *
+	 * sal_log_mod_error_info_t cache_check_info[.valid.num_cache_check];
+	 * sal_log_mod_error_info_t tlb_check_info[.valid.num_tlb_check];
+	 * sal_log_mod_error_info_t bus_check_info[.valid.num_bus_check];
+	 * sal_log_mod_error_info_t reg_file_check_info[.valid.num_reg_file_check];
+	 * sal_log_mod_error_info_t ms_check_info[.valid.num_ms_check];
+	 * struct sal_cpuid_info cpuid_info;
+	 * sal_processor_static_info_t processor_static_info;
+	 */
 } sal_log_processor_info_t;
 
+/* Given a sal_log_processor_info_t pointer, return a pointer to the processor_static_info: */
+#define SAL_LPI_PSI_INFO(l)								\
+({	sal_log_processor_info_t *_l = (l);						\
+	((sal_processor_static_info_t *)						\
+	 ((char *) _l->info + ((_l->valid.num_cache_check + _l->valid.num_tlb_check	\
+			  + _l->valid.num_bus_check + _l->valid.num_reg_file_check	\
+			  + _l->valid.num_ms_check) * sizeof(sal_log_mod_error_info_t)	\
+			 + sizeof(struct sal_cpuid_info))));				\
+})
+
 /* platform error log structures */
 
 typedef struct sal_log_mem_dev_err_info
Index: 20.5/arch/ia64/kernel/mca.c
--- 20.5/arch/ia64/kernel/mca.c Wed, 11 Dec 2002 20:58:53 +1100 kaos (linux-2.4/s/c/5_mca.c 1.1.3.2.3.1.1.1.1.3 644)
+++ 20.5(w)/arch/ia64/kernel/mca.c Sun, 23 Mar 2003 02:38:05 +1100 kaos (linux-2.4/s/c/5_mca.c 1.1.3.2.3.1.1.1.1.3 644)
@@ -910,7 +910,7 @@ ia64_init_handler (struct pt_regs *regs)
 	plog_ptr=(ia64_err_rec_t *)IA64_LOG_CURR_BUFFER(SAL_INFO_TYPE_INIT);
 	proc_ptr = &plog_ptr->proc_err;
 
-	ia64_process_min_state_save(&proc_ptr->processor_static_info.min_state_area);
+	ia64_process_min_state_save(&SAL_LPI_PSI_INFO(proc_ptr)->min_state_area);
 
 	/* Clear the INIT SAL logs now that they have been saved in the OS buffer */
 	ia64_sal_clear_state_info(SAL_INFO_TYPE_INIT);
@@ -1704,7 +1704,7 @@ ia64_log_proc_dev_err_info_print (sal_lo
 	 *  absent. Also, current implementations only allocate space for number of
 	 *  elements used.  So we walk the data pointer from here on.
 	 */
-	p_data = &slpi->cache_check_info[0];
+	p_data = &slpi->info[0];
 
 	/* Print the cache check information if any*/
 	for (i = 0 ; i < slpi->valid.num_cache_check; i++, p_data++)



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Linux-ia64] [patch] 2.4.20 ia64_sal_mc_rendez must not lock
  2003-03-22  4:51 [Linux-ia64] [patch] 2.4.20 ia64_sal_mc_rendez must not lock Keith Owens
  2003-03-22 13:03 ` Matthew Wilcox
  2003-03-22 15:43 ` Keith Owens
@ 2003-03-24 19:53 ` David Mosberger
  2003-03-24 20:06 ` Luck, Tony
  2003-03-24 21:54 ` David Mosberger
  4 siblings, 0 replies; 6+ messages in thread
From: David Mosberger @ 2003-03-24 19:53 UTC (permalink / raw)
  To: linux-ia64

Tony,

Will you let us know what you think of Keith's changes?  They look
fine to me and I think we need the same/similar fixes in 2.5.

	--david


^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [Linux-ia64] [patch] 2.4.20 ia64_sal_mc_rendez must not lock
  2003-03-22  4:51 [Linux-ia64] [patch] 2.4.20 ia64_sal_mc_rendez must not lock Keith Owens
                   ` (2 preceding siblings ...)
  2003-03-24 19:53 ` David Mosberger
@ 2003-03-24 20:06 ` Luck, Tony
  2003-03-24 21:54 ` David Mosberger
  4 siblings, 0 replies; 6+ messages in thread
From: Luck, Tony @ 2003-03-24 20:06 UTC (permalink / raw)
  To: linux-ia64

David,

Yes ... the patch looks good to me too.

-Tony

-----Original Message-----
From: David Mosberger [mailto:davidm@napali.hpl.hp.com]
Sent: Monday, March 24, 2003 11:54 AM
To: Luck, Tony
Cc: Keith Owens; linux-ia64@linuxia64.org
Subject: Re: [Linux-ia64] [patch] 2.4.20 ia64_sal_mc_rendez must not
lock


Tony,

Will you let us know what you think of Keith's changes?  They look
fine to me and I think we need the same/similar fixes in 2.5.

	--david


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Linux-ia64] [patch] 2.4.20 ia64_sal_mc_rendez must not lock
  2003-03-22  4:51 [Linux-ia64] [patch] 2.4.20 ia64_sal_mc_rendez must not lock Keith Owens
                   ` (3 preceding siblings ...)
  2003-03-24 20:06 ` Luck, Tony
@ 2003-03-24 21:54 ` David Mosberger
  4 siblings, 0 replies; 6+ messages in thread
From: David Mosberger @ 2003-03-24 21:54 UTC (permalink / raw)
  To: linux-ia64

I made the analogous change in 2.5 (patch didn't apply due to some
other changes).  Also, I changed the local variable names in the
macros to avoid name collisions.

	--david

>>>>> On Sat, 22 Mar 2003 15:51:54 +1100, Keith Owens <kaos@sgi.com> said:

  Keith> No cpu will return from ia64_sal_mc_rendez() until all cpus have
  Keith> entered rendezvous and the monarch cpu sends the wake up ipi.  All cpus
  Keith> try to call ia64_sal_mc_rendez() but only the first one makes it, the
  Keith> others all block on the spinlock and eventually SAL hits them with an
  Keith> INIT.

  Keith> Why do I get the feeling that I am the first person to really use this
  Keith> code?

  Keith> With this patch and my previous patch to set
  Keith> SAL_MC_PARAM_RZ_ALWAYS, kdb v4.0 gets backtrace on _ALL_ cpus
  Keith> when an MCA occurs.  Well, almost all, if any of the cpus are
  Keith> spinning disabled then the MCA rendezvous interrupt does not
  Keith> get through, SAL sends INIT and that cpu drops into INIT
  Keith> processing.  kdb processing for INIT handlers is not complete
  Keith> yet, work in progress.



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2003-03-24 21:54 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-03-22  4:51 [Linux-ia64] [patch] 2.4.20 ia64_sal_mc_rendez must not lock Keith Owens
2003-03-22 13:03 ` Matthew Wilcox
2003-03-22 15:43 ` Keith Owens
2003-03-24 19:53 ` David Mosberger
2003-03-24 20:06 ` Luck, Tony
2003-03-24 21:54 ` David Mosberger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.