All of lore.kernel.org
 help / color / mirror / Atom feed
From: rick@microway.com
To: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: linux-kernel@vger.kernel.org,
	Richard Houghton <rhoughton@microway.com>,
	ACPI Devel Mailing List <linux-acpi@vger.kernel.org>,
	Len Brown <lenb@kernel.org>,
	Matthew Garrett <mjg59@srcf.ucam.org>
Subject: Re: kernel oops and panic in acpi_atomic_read under 2.6.39.3.   call trace included
Date: Tue, 23 Aug 2011 13:16:03 -0400	[thread overview]
Message-ID: <6ab7a83c84d6398ffc089f925da89658.squirrel@www.microway.com> (raw)
In-Reply-To: <201108222313.26769.rjw@sisk.pl>

Hi,

> Hi,
>
> On Monday, August 22, 2011, Rick Warner wrote:
> ...
>> Hi Rafael,
>>
>> Thanks for the off-list help in getting you this info.
>>
>> I had already rebuilt the kernel using the change I mentioned earlier
>> (test on
>> !&g->error_status_address) since the call trace I got.
>>
>> I luckily still had a copy of the kernel and modules I built previously
>> using
>> just your patch, so I undid my change to the ghes.c source, leaving just
>> your
>> patch but not mine so it would match the ghes.ko module I ran on.  This
>> is the
>> output of gdb on that ghes.ko now:
>>
>> (gdb) l *ghes_read_estatus+0x38
>> 0x258 is in ghes_read_estatus (drivers/acpi/apei/ghes.c:296).
>> warning: Source file is more recent than executable.
>> 291             int rc;
>> 292             if (!g)
>> 293                     return -EINVAL;
>> 294
>> 295             rc = acpi_atomic_read(&buf_paddr,
>> &g->error_status_address);
>> 296             if (rc) {
>> 297                     if (!silent && printk_ratelimit())
>> 298                             pr_warning(FW_WARN GHES_PFX
>> 299     "Failed to read error status block address for hardware error
>> source:
>> %d.\n",
>> 300                                        g->header.source_id);
>>
>> The warning about the source being newer is because of the reverted
>> change in
>> the ghes.c source mentioned above.
>
> OK, since &buf_addr cannot be NULL, perhaps ghes is.  Please check if the
> appended patch makes a difference.
>
> Thanks,
> Rafael
>
> ---
>  drivers/acpi/apei/ghes.c |    7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> Index: linux/drivers/acpi/apei/ghes.c
> ===================================================================
> --- linux.orig/drivers/acpi/apei/ghes.c
> +++ linux/drivers/acpi/apei/ghes.c
> @@ -393,11 +393,16 @@ static void ghes_copy_tofrom_phys(void *
>
>  static int ghes_read_estatus(struct ghes *ghes, int silent)
>  {
> -	struct acpi_hest_generic *g = ghes->generic;
> +	struct acpi_hest_generic *g;
>  	u64 buf_paddr;
>  	u32 len;
>  	int rc;
>
> +	if (!ghes || !ghes->generic)
> +		return -EINVAL;
> +
> +	g = ghes->generic;
> +
>  	rc = acpi_atomic_read(&buf_paddr, &g->error_status_address);
>  	if (rc) {
>  		if (!silent && printk_ratelimit())
>

Unfortunately it had another panic with this patch in place.  Here is the
latest call trace:

[64614.937968] BUG: unable to handle kernel NULL pointer dereference at   
       (null)
[64614.945851] IP: [<ffffffff812a211d>] acpi_atomic_read+0x8d/0xcb
[64614.951817] PGD 2f8d40067 PUD 2f8cf8067 PMD 0
[64614.956346] Oops: 0000 [#1] PREEMPT SMP
[64614.960344] last sysfs file:
/sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map
[64614.968265] CPU 14
[64614.970203] Modules linked in: md5 nfsd lockd nfs_acl auth_rpcgss
sunrpc ipt_MASQUERADE iptable_mangle iptable_nat nf_nat nf_conntrack_ipv4
nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables af_packet
edd cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq
mperf xfs dm_mod igb joydev sr_mod cdrom pcspkr sg ioatdma button iTCO_wdt
iTCO_vendor_support dca ghes hed i2c_i801 i7core_edac edac_core ext4 jbd2
crc16 raid456 async_raid6_recov async_pq raid6_pq async_xor xor
async_memcpy async_tx raid10 raid1 raid0 fan processor thermal thermal_sys
ata_generic pata_atiixp arcmsr
[64615.024806]
[64615.026305] Pid: 10723, comm: cluster Not tainted
2.6.39.3-microwaycustom #5 Supermicro X8DTH-i/6/iF/6F/X8DTH
[64615.036291] RIP: 0010:[<ffffffff812a211d>]  [<ffffffff812a211d>]
acpi_atomic_read+0x8d/0xcb
[64615.044671] RSP: 0000:ffff88063fcc7da8  EFLAGS: 00010046
[64615.049994] RAX: 0000000000000000 RBX: ffff88063fcc7df0 RCX:
00000000bf7b6000
[64615.057132] RDX: 0000000000000000 RSI: 00000000bf7b6010 RDI:
00000000bf7b5ff0
[64615.064271] RBP: ffff88063fcc7dd8 R08: 00000000bf7b7000 R09:
0000000000000002
[64615.071411] R10: 0000000000000000 R11: 0000000000000000 R12:
ffffc90003044c20
[64615.078549] R13: 0000000000000000 R14: 00000000bf7b5ff0 R15:
0000000000000000
[64615.085688] FS:  0000000000000000(0000) GS:ffff88063fcc0000(0000)
knlGS:0000000000000000
[64615.093771] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
[64615.099517] CR2: 0000000000000000 CR3: 00000003015b1000 CR4:
00000000000006e0
[64615.106658] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[64615.113795] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[64615.120928] Process cluster (pid: 10723, threadinfo ffff8802fb3b6000,
task ffff880301534640)
[64615.129361] Stack:
[64615.131386]  0000000000000000 00000000bf7b5ff0 00000000ffffffea
ffff88032b1c3d40
[64615.138871]  0000000000000001 ffffc90003044ca8 ffff88063fcc7e18
ffffffffa01b7245
[64615.146354]  0000000000000000 0000000000000000 ffff88032b1c3d40
0000000000000000
[64615.153840] Call Trace:
[64615.156293]  <NMI>
[64615.158442]  [<ffffffffa01b7245>] ghes_read_estatus+0x55/0x180 [ghes]
[64615.164900]  [<ffffffffa01b760c>] ghes_notify_nmi+0xbc/0x190 [ghes]
[64615.171182]  [<ffffffff8150ddfd>] notifier_call_chain+0x4d/0x70
[64615.177116]  [<ffffffff8150de63>] __atomic_notifier_call_chain+0x43/0x60
[64615.183824]  [<ffffffff8150de91>] atomic_notifier_call_chain+0x11/0x20
[64615.190373]  [<ffffffff8150dece>] notify_die+0x2e/0x30
[64615.195535]  [<ffffffff8150b4f2>] do_nmi+0xa2/0x260
[64615.200430]  [<ffffffff8150b150>] nmi+0x20/0x30
[64615.204981]  [<ffffffff81029f6a>] ? native_write_msr_safe+0xa/0x10
[64615.211170]  <<EOE>>
[64615.213276]  <IRQ>
[64615.215609]  [<ffffffff81011568>] intel_pmu_disable_all+0x38/0xb0
[64615.221710]  [<ffffffff81010efa>] x86_pmu_disable+0x4a/0x50
[64615.227306]  [<ffffffff810ea842>] perf_event_task_tick+0x1a2/0x2a0
[64615.233495]  [<ffffffff81050750>] scheduler_tick+0x1b0/0x290
[64615.239165]  [<ffffffff81066c29>] update_process_times+0x69/0x80
[64615.245193]  [<ffffffff81088098>] tick_sched_timer+0x58/0x150
[64615.250956]  [<ffffffff8107b7ef>] __run_hrtimer+0x6f/0x250
[64615.256459]  [<ffffffff81088040>] ? tick_init_highres+0x20/0x20
[64615.262393]  [<ffffffff8107bf7a>] hrtimer_interrupt+0xda/0x230
[64615.268244]  [<ffffffff8101f5c6>] smp_apic_timer_interrupt+0x66/0xa0
[64615.274622]  [<ffffffff815120f3>] apic_timer_interrupt+0x13/0x20
[64615.280633]  <EOI>
[64615.282570] Code: fc 10 74 1f 77 08 41 80 fc 08 75 49 eb 0e 41 80 fc 20
74 17 41 80 fc 40 75 3b eb 15 8a 00 0f b6 c0 eb 11 66 8b 00 0f b7 c0 eb 09
<8b> 00 89 c0 eb 03 48 8b 00 48 89 03 e8 62 55 e2 ff eb 1d 41 0f
[64615.303108] RIP  [<ffffffff812a211d>] acpi_atomic_read+0x8d/0xcb
[64615.309163]  RSP <ffff88063fcc7da8>
[64615.312668] CR2: 0000000000000000
[64615.316007] ---[ end trace 3ab5dd3ba3391edf ]---
[64615.320637] Kernel panic - not syncing: Fatal exception in interrupt
[64615.326999] Pid: 10723, comm: cluster Tainted: G      D    
2.6.39.3-microwaycustom #5
[64615.334914] Call Trace:
[64615.337371]  <NMI>  [<ffffffff815071ee>] panic+0x9b/0x1b0
[64615.342837]  [<ffffffff8150bb4a>] oops_end+0xea/0xf0
[64615.347828]  [<ffffffff81031dc3>] no_context+0xf3/0x260
[64615.353081]  [<ffffffff81032055>] __bad_area_nosemaphore+0x125/0x1e0
[64615.359456]  [<ffffffff8103211e>] bad_area_nosemaphore+0xe/0x10
[64615.365389]  [<ffffffff8150dd10>] do_page_fault+0x500/0x5a0
[64615.370985]  [<ffffffff810eb839>] ? __perf_event_overflow+0x99/0x210
[64615.377357]  [<ffffffff8150ae95>] page_fault+0x25/0x30
[64615.382516]  [<ffffffff812a211d>] ? acpi_atomic_read+0x8d/0xcb
[64615.388365]  [<ffffffff812a20f0>] ? acpi_atomic_read+0x60/0xcb
[64615.394224]  [<ffffffffa01b7245>] ghes_read_estatus+0x55/0x180 [ghes]
[64615.400685]  [<ffffffffa01b760c>] ghes_notify_nmi+0xbc/0x190 [ghes]
[64615.406959]  [<ffffffff8150ddfd>] notifier_call_chain+0x4d/0x70
[64615.412887]  [<ffffffff8150de63>] __atomic_notifier_call_chain+0x43/0x60
[64615.419594]  [<ffffffff8150de91>] atomic_notifier_call_chain+0x11/0x20
[64615.426138]  [<ffffffff8150dece>] notify_die+0x2e/0x30
[64615.431292]  [<ffffffff8150b4f2>] do_nmi+0xa2/0x260
[64615.436180]  [<ffffffff8150b150>] nmi+0x20/0x30
[64615.440730]  [<ffffffff81029f6a>] ? native_write_msr_safe+0xa/0x10
[64615.446911]  <<EOE>>  <IRQ>  [<ffffffff81011568>]
intel_pmu_disable_all+0x38/0xb0
[64615.454467]  [<ffffffff81010efa>] x86_pmu_disable+0x4a/0x50
[64615.460050]  [<ffffffff810ea842>] perf_event_task_tick+0x1a2/0x2a0
[64615.466233]  [<ffffffff81050750>] scheduler_tick+0x1b0/0x290
[64615.471908]  [<ffffffff81066c29>] update_process_times+0x69/0x80
[64615.477933]  [<ffffffff81088098>] tick_sched_timer+0x58/0x150
[64615.483691]  [<ffffffff8107b7ef>] __run_hrtimer+0x6f/0x250
[64615.489202]  [<ffffffff81088040>] ? tick_init_highres+0x20/0x20
[64615.495138]  [<ffffffff8107bf7a>] hrtimer_interrupt+0xda/0x230
[64615.500989]  [<ffffffff8101f5c6>] smp_apic_timer_interrupt+0x66/0xa0
[64615.507362]  [<ffffffff815120f3>] apic_timer_interrupt+0x13/0x20
[64615.513375]  <EOI>

What should I try next?

Thanks,
Rick


  reply	other threads:[~2011-08-23 17:16 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-17 21:51 kernel oops and panic in acpi_atomic_read under 2.6.39.3. call trace included Rick Warner
2011-08-18  7:47 ` Rafael J. Wysocki
2011-08-18 21:43   ` Rafael J. Wysocki
2011-08-22 14:42     ` rick
2011-08-22 18:47       ` Rafael J. Wysocki
2011-08-22 20:51         ` Rick Warner
2011-08-22 21:13           ` Rafael J. Wysocki
2011-08-23 17:16             ` rick [this message]
2011-08-23 17:14           ` Don Zickus
2011-08-23 17:24             ` rick
2011-08-24  4:16               ` Huang Ying
2011-08-24 22:18                 ` rick
2011-08-25 15:47                   ` rick
2011-08-26  0:34                     ` Huang Ying
2011-09-02 23:32                       ` rick
2011-09-05  2:50                         ` Huang Ying
2011-09-15 18:35                           ` rick
2011-09-16  0:20                             ` Huang Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6ab7a83c84d6398ffc089f925da89658.squirrel@www.microway.com \
    --to=rick@microway.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mjg59@srcf.ucam.org \
    --cc=rhoughton@microway.com \
    --cc=rjw@sisk.pl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.