All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kiszka <jan.kiszka@siemens.com>
To: Jeroen Van den Keybus <jeroen.vandenkeybus@gmail.com>,
	Gilles Chanteperdrix <gilles.chanteperdrix@xenomai.org>
Cc: "xenomai@xenomai.org" <xenomai@xenomai.org>
Subject: Re: [Xenomai] xeno3_rc3 - Watchdog detected hard LOCKUP
Date: Thu, 09 Apr 2015 11:14:54 +0200	[thread overview]
Message-ID: <5526430E.8030808@siemens.com> (raw)
In-Reply-To: <55264097.2010203@siemens.com>

On 2015-04-09 11:04, Jan Kiszka wrote:
> On 2015-04-08 23:02, Jeroen Van den Keybus wrote:
>> It took a while, but a hard lockup occurred on Xenomai 3.0-rc4 with
>> Linux 3.16.7 running dohell. This time, I believe I have a trace of
>> the locked up CPU. It's listed below and for completeness, the first
>> part of the dmesg log is attached as well.
>>
>> [419215.683857] Kernel panic - not syncing: Watchdog detected hard
>> LOCKUP on cpu 3
>> [419215.683886] CPU: 3 PID: 18835 Comm: dohell Not tainted 3.16.7-cobalt #1
>> [419215.683903] Hardware name: Supermicro X10SAE/X10SAE, BIOS 2.0a 05/09/2014
>> [419215.683920]  0000000000000000 ffff88021fb86c38 ffffffff8175761d
>> ffffffff81a8a1e8
>> [419215.683945]  ffff88021fb86cb0 ffffffff81752c0e 0000000000000010
>> ffff88021fb86cc0
>> [419215.683968]  ffff88021fb86c60 0000000000000000 0000000000000003
>> 000000000001999e
>> [419215.684095] Call Trace:
>> [419215.684103]  <NMI>  [<ffffffff8175761d>] dump_stack+0x45/0x56
>> [419215.684125]  [<ffffffff81752c0e>] panic+0xd8/0x20a
>> [419215.684141]  [<ffffffff81103f02>] watchdog_overflow_callback+0xc2/0xd0
>> [419215.684158]  [<ffffffff8114257d>] __perf_event_overflow+0x8d/0x230
>> [419215.684174]  [<ffffffff81143024>] perf_event_overflow+0x14/0x20
>> [419215.684190]  [<ffffffff81020326>] intel_pmu_handle_irq+0x1e6/0x400
>> [419215.684259]  [<ffffffff811cb501>] ? unmap_kernel_range_noflush+0x11/0x20
>> [419215.684277]  [<ffffffff81017f2b>] perf_event_nmi_handler+0x2b/0x50
>> [419215.684293]  [<ffffffff81006f68>] nmi_handle+0x88/0x120
>> [419215.684308]  [<ffffffff8100755e>] default_do_nmi+0xce/0x130
>> [419215.684373]  [<ffffffff81007690>] do_nmi+0xd0/0xf0
>> [419215.684387]  [<ffffffff8176175a>] end_repeat_nmi+0x1e/0x2e
>> [419215.684402]  [<ffffffff8175ea4a>] ? _raw_spin_lock+0x2a/0x40
>> [419215.684417]  [<ffffffff8175ea4a>] ? _raw_spin_lock+0x2a/0x40
>> [419215.684431]  [<ffffffff8175ea4a>] ? _raw_spin_lock+0x2a/0x40
>> [419215.684445]  <<EOE>>  [<ffffffff81046bac>]
>> __ipipe_pin_range_globally+0x7c/0x2b0
>> [419215.684468]  [<ffffffff8139efe6>] ioremap_page_range+0x226/0x300
>> [419215.684485]  [<ffffffff8114e90a>] ? xnintr_core_clock_handler+0x2ea/0x310
>> [419215.684553]  [<ffffffff81093eb0>] ? update_curr+0x80/0x180
>> [419215.684568]  [<ffffffff81455e09>] ghes_copy_tofrom_phys+0x1e9/0x200
> 
> OK, maybe it is related to ACPI APEI, maybe that is just triggering an
> I-pipe bug. But could you try to disable that feature and see if the
> issue still appears?
> 
> I'll meanwhile dig deeper and try to understand what could cause a lockup.

Oh, the bug is obvious (and would have been reported when turning on
CONFIG_PROVE_LOCKING): We are calling __ipipe_pin_range_globally from
IRQ context here, but that only uses spin_lock.

Here is a quick fix for testing purposes (the function requires some
consolidating cleanup):

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 10abc67..0aba29c 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1318,10 +1318,11 @@ void __ipipe_pin_range_globally(unsigned long start, unsigned long end)
 	int ret = 0;
 
 	do {
+		unsigned long flags;
 		struct page *page;
 
 		next = pgd_addr_end(addr, end);
-		spin_lock(&pgd_lock);
+		spin_lock_irqsave(&pgd_lock, flags);
 		list_for_each_entry(page, &pgd_list, lru) {
 			pgd_t *pgd;
 			pgd = (pgd_t *)page_address(page) + pgd_index(addr);
@@ -1329,7 +1330,7 @@ void __ipipe_pin_range_globally(unsigned long start, unsigned long end)
 			if (ret)
 				break;
 		}
-		spin_unlock(&pgd_lock);
+		spin_unlock_irqrestore(&pgd_lock, flags);
 		addr = next;
 	} while (!ret && addr != end);
 #endif

Interestingly, legacy X86_32 was already using irqsave/restore.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux


  reply	other threads:[~2015-04-09  9:14 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-26 10:08 [Xenomai] xeno3_rc3 - Watchdog detected hard LOCKUP Niels Wellens
2015-02-26 10:20 ` Gilles Chanteperdrix
     [not found]   ` <54EF0790.3040607@triphase.com>
2015-02-27 14:10     ` Niels Wellens
2015-02-27 20:32       ` Jan Kiszka
2015-02-27 20:36         ` Gilles Chanteperdrix
2015-02-27 20:39           ` Jan Kiszka
2015-03-03  8:11         ` Jan Kiszka
2015-03-04 20:35           ` Jeroen Van den Keybus
2015-03-09 14:56             ` Niels Wellens
2015-03-12 20:52               ` Jan Kiszka
2015-03-13 16:25                 ` Jan Kiszka
2015-03-13 16:34                 ` Gilles Chanteperdrix
2015-03-13 17:09                   ` Jan Kiszka
2015-03-13 17:12                     ` Gilles Chanteperdrix
2015-04-02 18:47                       ` Jeroen Van den Keybus
2015-04-02 19:15                         ` Gilles Chanteperdrix
2015-04-02 19:29                           ` Jeroen Van den Keybus
2015-04-02 20:41                             ` Gilles Chanteperdrix
2015-04-08 21:02                               ` Jeroen Van den Keybus
2015-04-09  9:04                                 ` Jan Kiszka
2015-04-09  9:14                                   ` Jan Kiszka [this message]
2015-04-09  9:26                                     ` Jan Kiszka
2015-04-09 12:41                                     ` Gilles Chanteperdrix
2015-04-09 12:49                                       ` Jan Kiszka
2015-04-09 12:56                                         ` Gilles Chanteperdrix
2015-04-09 12:58                                         ` Gilles Chanteperdrix
2015-04-09 13:01                                           ` Jan Kiszka
2015-04-21 21:14                                             ` Jeroen Van den Keybus
2015-04-22  5:14                                               ` Jan Kiszka
2015-04-22 19:22                                                 ` Jeroen Van den Keybus
2015-04-28 19:12                                                   ` Jeroen Van den Keybus
2015-04-29  6:36                                                     ` Jan Kiszka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5526430E.8030808@siemens.com \
    --to=jan.kiszka@siemens.com \
    --cc=gilles.chanteperdrix@xenomai.org \
    --cc=jeroen.vandenkeybus@gmail.com \
    --cc=xenomai@xenomai.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.