linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Acpi deadlocks with 3.7.0-rc4
@ 2012-11-15 16:09 Zdenek Kabelac
  2012-11-28 16:01 ` Linus Torvalds
  0 siblings, 1 reply; 15+ messages in thread
From: Zdenek Kabelac @ 2012-11-15 16:09 UTC (permalink / raw)
  To: LKML

Hello


I've already seen twice this oops after resuming my Lenovo T61 in docking station.

Since for some reason currently the serial line doesn't work correctly after 
resume
(while I'm pretty sure it used to work in past) here is at least hand-written oops
message from mobile camera picture.

 From the trace it seem os_wait semaphore is accessed twice.
Unsure which device is behind it - but it seem docking station is need to hit 
this issue.


kernel 3.7.0-rc4

Pid:  pm-suspend

RIP:   acpi_ns_lookup  + 0xa1/0x5b9

Call Trace:

? acpi_os_wait_semaphore + 0x136/0x149
acpi_ns_get_mode + 0x96/0x102
? __lock_is_held +0x5f/0x90
acpi_ns_evaluate +0x47/0x2de
? _raw_spin_lock_irqsave
? acpi_ut_evaluate_object
? sub_preempt_count
? pnpacpi_can_wakeup
acpi_rs_get_method_data
? acpi_os_signal_semaphore
acpi_walk_resources
? acpi_ut_release_mutex
pnpacpi_build_resource_template
? acpi_bus_get_device
pnpacpi_set_resources
? pnp_device_shutdown
pnp_start_dev
pnp_bus_resume
dpm_run_callback
device_resume
dpm_resume
dpm_resume_end
? acpi_suspend_begin_old
suspend_devices_and_enter
pm_suspend
state_store
kobj_attr_store
sysfs_write_file
vgs_write
sys_write
system_call_fastpath

Zdenek


PS: jpg on request

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Acpi deadlocks with 3.7.0-rc4
  2012-11-15 16:09 Acpi deadlocks with 3.7.0-rc4 Zdenek Kabelac
@ 2012-11-28 16:01 ` Linus Torvalds
  2012-11-28 16:21   ` Zdenek Kabelac
  0 siblings, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2012-11-28 16:01 UTC (permalink / raw)
  To: Zdenek Kabelac, Len Brown, Rafael J. Wysocki, Linux ACPI; +Cc: LKML

Adding more people (and the acpi list) to this report.

I'm seeing *very* few changes to the core suspend/resume path in 3.7,
and while there are some acpia updates, they seem to be pretty mild
too.

I think the acpi_os_wait_semaphore thing is a red herring - that's
just stale on the stack.

Do you have the register state from the oops? Or at least the "Code:"
line? It would be nice to see exactly where the oops happens, and I
cannot line up your "acpi_ns_lookup  + 0xa1/0x5b9" with any code due
to different compilers (and configurations etc).

               Linus


On Thu, Nov 15, 2012 at 8:09 AM, Zdenek Kabelac <zkabelac@redhat.com> wrote:
> Hello
>
>
> I've already seen twice this oops after resuming my Lenovo T61 in docking
> station.
>
> Since for some reason currently the serial line doesn't work correctly after
> resume
> (while I'm pretty sure it used to work in past) here is at least
> hand-written oops
> message from mobile camera picture.
>
> From the trace it seem os_wait semaphore is accessed twice.
> Unsure which device is behind it - but it seem docking station is need to
> hit this issue.
>
>
> kernel 3.7.0-rc4
>
> Pid:  pm-suspend
>
> RIP:   acpi_ns_lookup  + 0xa1/0x5b9
>
> Call Trace:
>
> ? acpi_os_wait_semaphore + 0x136/0x149
> acpi_ns_get_mode + 0x96/0x102
> ? __lock_is_held +0x5f/0x90
> acpi_ns_evaluate +0x47/0x2de
> ? _raw_spin_lock_irqsave
> ? acpi_ut_evaluate_object
> ? sub_preempt_count
> ? pnpacpi_can_wakeup
> acpi_rs_get_method_data
> ? acpi_os_signal_semaphore
> acpi_walk_resources
> ? acpi_ut_release_mutex
> pnpacpi_build_resource_template
> ? acpi_bus_get_device
> pnpacpi_set_resources
> ? pnp_device_shutdown
> pnp_start_dev
> pnp_bus_resume
> dpm_run_callback
> device_resume
> dpm_resume
> dpm_resume_end
> ? acpi_suspend_begin_old
> suspend_devices_and_enter
> pm_suspend
> state_store
> kobj_attr_store
> sysfs_write_file
> vgs_write
> sys_write
> system_call_fastpath
>
> Zdenek
>
>
> PS: jpg on request
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Acpi deadlocks with 3.7.0-rc4
  2012-11-28 16:01 ` Linus Torvalds
@ 2012-11-28 16:21   ` Zdenek Kabelac
  2012-11-28 17:02     ` Linus Torvalds
  2012-11-28 18:35     ` Rafael J. Wysocki
  0 siblings, 2 replies; 15+ messages in thread
From: Zdenek Kabelac @ 2012-11-28 16:21 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Len Brown, Rafael J. Wysocki, Linux ACPI, LKML

Dne 28.11.2012 17:01, Linus Torvalds napsal(a):
> Adding more people (and the acpi list) to this report.
>
> I'm seeing *very* few changes to the core suspend/resume path in 3.7,
> and while there are some acpia updates, they seem to be pretty mild
> too.
>
> I think the acpi_os_wait_semaphore thing is a red herring - that's
> just stale on the stack.
>
> Do you have the register state from the oops? Or at least the "Code:"
> line? It would be nice to see exactly where the oops happens, and I
> cannot line up your "acpi_ns_lookup  + 0xa1/0x5b9" with any code due
> to different compilers (and configurations etc).
>
>                 Linus
>


I've opened  https://bugzilla.kernel.org/show_bug.cgi?id=51071
and attached picture there which is all I have.

I'll try to decode exact code line.


In all cases - I've played even with 3.4 kernel - which also does not survive 
multiple resumes in docking station - though it just leaves black screen - so 
this oops is rather 'progress' and it also could be false path.

It's probably not a regression from 3.6 - since this problem was there for 
much longer - but now it has just become much more visible.

As I usually do not have reason to make multiple suspend/resumes in dock I've 
not noticed it until now when I've tried to capture this ACPI trace.

Zdenek


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Acpi deadlocks with 3.7.0-rc4
  2012-11-28 16:21   ` Zdenek Kabelac
@ 2012-11-28 17:02     ` Linus Torvalds
  2012-11-28 17:27       ` Zdenek Kabelac
  2012-11-28 18:35     ` Rafael J. Wysocki
  1 sibling, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2012-11-28 17:02 UTC (permalink / raw)
  To: Zdenek Kabelac; +Cc: Len Brown, Rafael J. Wysocki, Linux ACPI, LKML

On Wed, Nov 28, 2012 at 8:21 AM, Zdenek Kabelac <zkabelac@redhat.com> wrote:
>
> I've opened  https://bugzilla.kernel.org/show_bug.cgi?id=51071
> and attached picture there which is all I have.
>
> I'll try to decode exact code line.

Uhhuh. It's missing much of the relevant parts of the code line, in
particular the actual oopsing instruction. But what is there decodes
to

        41 b8 10 00 00 00       mov    $0x10,%r8d
        48 c7 c1 88 52 64 81    mov    $0xffffffff81645288,%rcx
        31 c0                   xor    %eax,%eax
        48 c7 c2 98 52 64 81    mov    $0xffffffff81645298,%rdx
        bf 00 04 00 0.          mov    $0x0.00400,%edi

    .. oops in here ..

        74 33                   je     0x50
        48 89 df                mov    %rbx,%rdi
        e8 4d c9 00 00          callq  ? <offset 0xc972>
        48 89 d9                mov    %rbx,%rcx
        48 c7 c2 0a .. .. ..    mov    $0x......0a,%rdx

which isn't really very obvious. Do you have that kernel around (or at
least the same compiler and configuration)? Doing a

  objdump --disassemble  drivers/acpi/acpica/nsaccess.o

might help pinpoint where that is..

> It's probably not a regression from 3.6 - since this problem was there for
> much longer - but now it has just become much more visible.

Ok.

           Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Acpi deadlocks with 3.7.0-rc4
  2012-11-28 17:02     ` Linus Torvalds
@ 2012-11-28 17:27       ` Zdenek Kabelac
  2012-11-28 19:07         ` Linus Torvalds
  2012-11-28 20:31         ` Rafael J. Wysocki
  0 siblings, 2 replies; 15+ messages in thread
From: Zdenek Kabelac @ 2012-11-28 17:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Len Brown, Rafael J. Wysocki, Linux ACPI, LKML

Dne 28.11.2012 18:02, Linus Torvalds napsal(a):
> On Wed, Nov 28, 2012 at 8:21 AM, Zdenek Kabelac <zkabelac@redhat.com> wrote:
>>
>> I've opened  https://bugzilla.kernel.org/show_bug.cgi?id=51071
>> and attached picture there which is all I have.
>>
>> I'll try to decode exact code line.
>
> Uhhuh. It's missing much of the relevant parts of the code line, in
> particular the actual oopsing instruction. But what is there decodes
> to
>
>          41 b8 10 00 00 00       mov    $0x10,%r8d
>          48 c7 c1 88 52 64 81    mov    $0xffffffff81645288,%rcx
>          31 c0                   xor    %eax,%eax
>          48 c7 c2 98 52 64 81    mov    $0xffffffff81645298,%rdx
>          bf 00 04 00 0.          mov    $0x0.00400,%edi
>
>      .. oops in here ..
>
>          74 33                   je     0x50
>          48 89 df                mov    %rbx,%rdi
>          e8 4d c9 00 00          callq  ? <offset 0xc972>
>          48 89 d9                mov    %rbx,%rcx
>          48 c7 c2 0a .. .. ..    mov    $0x......0a,%rdx
>
> which isn't really very obvious. Do you have that kernel around (or at
> least the same compiler and configuration)? Doing a
>
>    objdump --disassemble  drivers/acpi/acpica/nsaccess.o

I've attached bigger disasfun script output to BZ 51071.
https://bugzilla.kernel.org/show_bug.cgi?id=51071#c1


         if (ACPI_GET_DESCRIPTOR_TYPE(prefix_node) !=
00000000000000a1 <acpi_ns_lookup+0xa1> cmpb   $0xf,0x8(%rbx)
00000000000000a5 <acpi_ns_lookup+0xa5> je   0da  <acpi_ns_lookup+0xda>

seems to be going out of bounds.

Zdenek


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Acpi deadlocks with 3.7.0-rc4
  2012-11-28 16:21   ` Zdenek Kabelac
  2012-11-28 17:02     ` Linus Torvalds
@ 2012-11-28 18:35     ` Rafael J. Wysocki
  1 sibling, 0 replies; 15+ messages in thread
From: Rafael J. Wysocki @ 2012-11-28 18:35 UTC (permalink / raw)
  To: Zdenek Kabelac; +Cc: Linus Torvalds, Len Brown, Linux ACPI, LKML

On Wednesday, November 28, 2012 05:21:24 PM Zdenek Kabelac wrote:
> Dne 28.11.2012 17:01, Linus Torvalds napsal(a):
> > Adding more people (and the acpi list) to this report.
> >
> > I'm seeing *very* few changes to the core suspend/resume path in 3.7,
> > and while there are some acpia updates, they seem to be pretty mild
> > too.
> >
> > I think the acpi_os_wait_semaphore thing is a red herring - that's
> > just stale on the stack.
> >
> > Do you have the register state from the oops? Or at least the "Code:"
> > line? It would be nice to see exactly where the oops happens, and I
> > cannot line up your "acpi_ns_lookup  + 0xa1/0x5b9" with any code due
> > to different compilers (and configurations etc).
> >
> >                 Linus
> >
> 
> 
> I've opened  https://bugzilla.kernel.org/show_bug.cgi?id=51071
> and attached picture there which is all I have.
> 
> I'll try to decode exact code line.
> 
> 
> In all cases - I've played even with 3.4 kernel - which also does not survive 
> multiple resumes in docking station - though it just leaves black screen - so 
> this oops is rather 'progress' and it also could be false path.
> 
> It's probably not a regression from 3.6 - since this problem was there for 
> much longer - but now it has just become much more visible.
> 
> As I usually do not have reason to make multiple suspend/resumes in dock I've 
> not noticed it until now when I've tried to capture this ACPI trace.

Well, if ACPI PNP devices are involved, and it looks like they are, that
definitely is not a new problem.

We're now in the process of reworking this area quite a bit and we'll do our
best to address this issue too in the process.

Thanks,
Rafael 


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Acpi deadlocks with 3.7.0-rc4
  2012-11-28 17:27       ` Zdenek Kabelac
@ 2012-11-28 19:07         ` Linus Torvalds
  2012-11-28 20:05           ` Rafael J. Wysocki
  2012-11-29 10:13           ` Zdenek Kabelac
  2012-11-28 20:31         ` Rafael J. Wysocki
  1 sibling, 2 replies; 15+ messages in thread
From: Linus Torvalds @ 2012-11-28 19:07 UTC (permalink / raw)
  To: Zdenek Kabelac; +Cc: Len Brown, Rafael J. Wysocki, Linux ACPI, LKML

[-- Attachment #1: Type: text/plain, Size: 913 bytes --]

On Wed, Nov 28, 2012 at 9:27 AM, Zdenek Kabelac <zkabelac@redhat.com> wrote:
>
> I've attached bigger disasfun script output to BZ 51071.
> https://bugzilla.kernel.org/show_bug.cgi?id=51071#c1
>
>
>         if (ACPI_GET_DESCRIPTOR_TYPE(prefix_node) !=
> 00000000000000a1 <acpi_ns_lookup+0xa1> cmpb   $0xf,0x8(%rbx)
> 00000000000000a5 <acpi_ns_lookup+0xa5> je   0da  <acpi_ns_lookup+0xda>
>
> seems to be going out of bounds.

The whole "prefix_node" pointer is bogus. It seems to have the value 0x1000.

I wonder how that happened. It's loaded from 'scope_info->scope.node',
and it *should* be a valid pointer.

Can you add a print-out of

  scope_info->common.descriptor_type

and check that it is ACPI_DESC_TYPE_STATE_WSCOPE (== 8). If it is not,
return early.

Or just something like the attatched, which just uses the root node
(and warns once) if it's not a valid WSCOPE thing.

                       Linus

[-- Attachment #2: patch.diff --]
[-- Type: application/octet-stream, Size: 638 bytes --]

 drivers/acpi/acpica/nsaccess.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/acpi/acpica/nsaccess.c b/drivers/acpi/acpica/nsaccess.c
index 23db53ce2293..3238cabba61a 100644
--- a/drivers/acpi/acpica/nsaccess.c
+++ b/drivers/acpi/acpica/nsaccess.c
@@ -320,6 +320,8 @@ acpi_ns_lookup(union acpi_generic_state *scope_info,
 				  acpi_gbl_root_node));
 
 		prefix_node = acpi_gbl_root_node;
+	} else if (WARN_ON_ONCE(scope_info->common.descriptor_type != ACPI_DESC_TYPE_STATE_WSCOPE)) {
+		prefix_node = acpi_gbl_root_node;
 	} else {
 		prefix_node = scope_info->scope.node;
 		if (ACPI_GET_DESCRIPTOR_TYPE(prefix_node) !=

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: Acpi deadlocks with 3.7.0-rc4
  2012-11-28 19:07         ` Linus Torvalds
@ 2012-11-28 20:05           ` Rafael J. Wysocki
  2012-11-29 10:13           ` Zdenek Kabelac
  1 sibling, 0 replies; 15+ messages in thread
From: Rafael J. Wysocki @ 2012-11-28 20:05 UTC (permalink / raw)
  To: linux-acpi; +Cc: Linus Torvalds, Zdenek Kabelac, Len Brown, LKML

On Wednesday, November 28, 2012 11:07:32 AM Linus Torvalds wrote:
> On Wed, Nov 28, 2012 at 9:27 AM, Zdenek Kabelac <zkabelac@redhat.com> wrote:
> >
> > I've attached bigger disasfun script output to BZ 51071.
> > https://bugzilla.kernel.org/show_bug.cgi?id=51071#c1
> >
> >
> >         if (ACPI_GET_DESCRIPTOR_TYPE(prefix_node) !=
> > 00000000000000a1 <acpi_ns_lookup+0xa1> cmpb   $0xf,0x8(%rbx)
> > 00000000000000a5 <acpi_ns_lookup+0xa5> je   0da  <acpi_ns_lookup+0xda>
> >
> > seems to be going out of bounds.
> 
> The whole "prefix_node" pointer is bogus. It seems to have the value 0x1000.
> 
> I wonder how that happened. It's loaded from 'scope_info->scope.node',
> and it *should* be a valid pointer.

Well, suppose that pnpacpi_build_resource_template() passes a handle that's
not a valid pointer to acpi_walk_resources().  What happens then is that
it is passed directly to acpi_rs_get_method_data() and from there to
acpi_ut_evaluate_object() - without validation (acpi_rs_get_method_data()
even has a comment about the parameters validity guaranteed by the caller,
heh, heh).  Then it becomes the prefix_node and is written into
info->prefix_node.  acpi_ns_evaluate() takes that and passes it to
acpi_ns_get_node() along with info->pathname that is just the name of the
method to evaluate, which is a valid string, so the "if (!pathname)" block in
acpi_ns_get_node() is not executed and we get scope_info.scope.node = prefix_node,
which is our bad pointer.  A pointer to that scope_info is passed to
acpi_ns_lookup() and we get the above.

So the code in pnpacpi_build_resource_template() is at fault by passing a
wrong pointer to acpi_walk_resources().  And the pointer is wrong probably
because the struct acpi_device pointed to by dev->data in there has been
removed during a previous suspend or resume (I'm not sure which one does
that), but the PNP layer has no idea about that.  And that bug has been there
for quite a while (like forever?).

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Acpi deadlocks with 3.7.0-rc4
  2012-11-28 17:27       ` Zdenek Kabelac
  2012-11-28 19:07         ` Linus Torvalds
@ 2012-11-28 20:31         ` Rafael J. Wysocki
  2012-11-29  9:03           ` Zdenek Kabelac
  1 sibling, 1 reply; 15+ messages in thread
From: Rafael J. Wysocki @ 2012-11-28 20:31 UTC (permalink / raw)
  To: Zdenek Kabelac; +Cc: linux-acpi, Linus Torvalds, Len Brown, LKML

On Wednesday, November 28, 2012 06:27:50 PM Zdenek Kabelac wrote:
> Dne 28.11.2012 18:02, Linus Torvalds napsal(a):
> > On Wed, Nov 28, 2012 at 8:21 AM, Zdenek Kabelac <zkabelac@redhat.com> wrote:
> >>
> >> I've opened  https://bugzilla.kernel.org/show_bug.cgi?id=51071
> >> and attached picture there which is all I have.

I wonder if you can try to apply the patch below and see if that makes any
difference?

Rafael


---
 drivers/pnp/pnpacpi/rsparser.c |    8 ++++++++
 1 file changed, 8 insertions(+)

Index: linux/drivers/pnp/pnpacpi/rsparser.c
===================================================================
--- linux.orig/drivers/pnp/pnpacpi/rsparser.c
+++ linux/drivers/pnp/pnpacpi/rsparser.c
@@ -610,6 +610,14 @@ int pnpacpi_build_resource_template(stru
 	struct acpi_resource *resource;
 	int res_cnt = 0;
 	acpi_status status;
+	int ret;
+
+	/* Sanity check. */
+	ret = acpi_bus_get_device(handle, &acpi_dev);
+	if (ret) {
+		dev_err(&dev->dev, "ACPI node is invalid in %s\n", __func__);
+		return ret;
+	}
 
 	status = acpi_walk_resources(handle, METHOD_NAME__CRS,
 				     pnpacpi_count_resources, &res_cnt);



-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Acpi deadlocks with 3.7.0-rc4
  2012-11-28 20:31         ` Rafael J. Wysocki
@ 2012-11-29  9:03           ` Zdenek Kabelac
  2012-11-29 10:09             ` Rafael J. Wysocki
  0 siblings, 1 reply; 15+ messages in thread
From: Zdenek Kabelac @ 2012-11-29  9:03 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-acpi, Linus Torvalds, Len Brown, LKML

Dne 28.11.2012 21:31, Rafael J. Wysocki napsal(a):
> On Wednesday, November 28, 2012 06:27:50 PM Zdenek Kabelac wrote:
>> Dne 28.11.2012 18:02, Linus Torvalds napsal(a):
>>> On Wed, Nov 28, 2012 at 8:21 AM, Zdenek Kabelac <zkabelac@redhat.com> wrote:
>>>>
>>>> I've opened  https://bugzilla.kernel.org/show_bug.cgi?id=51071
>>>> and attached picture there which is all I have.
>
> I wonder if you can try to apply the patch below and see if that makes any
> difference?
>

Yep - extended BZ with 2 new pictures - it's now crashing earlier within
the call of acpi_bus_get_device().

Zdenek


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Acpi deadlocks with 3.7.0-rc4
  2012-11-29  9:03           ` Zdenek Kabelac
@ 2012-11-29 10:09             ` Rafael J. Wysocki
  0 siblings, 0 replies; 15+ messages in thread
From: Rafael J. Wysocki @ 2012-11-29 10:09 UTC (permalink / raw)
  To: linux-acpi; +Cc: Zdenek Kabelac, Linus Torvalds, Len Brown, LKML

On Thursday, November 29, 2012 10:03:53 AM Zdenek Kabelac wrote:
> Dne 28.11.2012 21:31, Rafael J. Wysocki napsal(a):
> > On Wednesday, November 28, 2012 06:27:50 PM Zdenek Kabelac wrote:
> >> Dne 28.11.2012 18:02, Linus Torvalds napsal(a):
> >>> On Wed, Nov 28, 2012 at 8:21 AM, Zdenek Kabelac <zkabelac@redhat.com> wrote:
> >>>>
> >>>> I've opened  https://bugzilla.kernel.org/show_bug.cgi?id=51071
> >>>> and attached picture there which is all I have.
> >
> > I wonder if you can try to apply the patch below and see if that makes any
> > difference?
> >
> 
> Yep - extended BZ with 2 new pictures - it's now crashing earlier within
> the call of acpi_bus_get_device().

Pretty much as expected.

So this is the problem I was thinking it was: the previous suspend-resume
cycle unregistered the struct acpi_device pointed to by dev->data in
pnpacpi_build_resource_template(), so the handle pointer in there is now
invalid.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Acpi deadlocks with 3.7.0-rc4
  2012-11-28 19:07         ` Linus Torvalds
  2012-11-28 20:05           ` Rafael J. Wysocki
@ 2012-11-29 10:13           ` Zdenek Kabelac
  2012-11-29 10:59             ` Rafael J. Wysocki
  1 sibling, 1 reply; 15+ messages in thread
From: Zdenek Kabelac @ 2012-11-29 10:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Len Brown, Rafael J. Wysocki, Linux ACPI, LKML

Dne 28.11.2012 20:07, Linus Torvalds napsal(a):
> whole "prefix_node" pointer is bogus. It seems to have the value 0x1000.

Tested also this patch with this result:

https://bugzilla.kernel.org/show_bug.cgi?id=51071#c8

So while it's made it pass suspend/resume, it's not really usable
as docking then.

Zdenek


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Acpi deadlocks with 3.7.0-rc4
  2012-11-29 10:13           ` Zdenek Kabelac
@ 2012-11-29 10:59             ` Rafael J. Wysocki
  2012-11-29 12:26               ` Zdenek Kabelac
  0 siblings, 1 reply; 15+ messages in thread
From: Rafael J. Wysocki @ 2012-11-29 10:59 UTC (permalink / raw)
  To: Zdenek Kabelac; +Cc: Linus Torvalds, Len Brown, Linux ACPI, LKML

On Thursday, November 29, 2012 11:13:10 AM Zdenek Kabelac wrote:
> Dne 28.11.2012 20:07, Linus Torvalds napsal(a):
> > whole "prefix_node" pointer is bogus. It seems to have the value 0x1000.
> 
> Tested also this patch with this result:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=51071#c8
> 
> So while it's made it pass suspend/resume, it's not really usable
> as docking then.

This just makes acpi_ns_lookup() always return acpi_gbl_root_node
for things looked up by acpi_ns_get_node() as far as I can say.

Hmm.

If my theory correct, the patch below should catch the bug.  Can you please
test it?

Rafael


---
 drivers/acpi/scan.c            |    1 +
 drivers/pnp/pnpacpi/rsparser.c |    3 +++
 2 files changed, 4 insertions(+)

Index: linux/drivers/acpi/scan.c
===================================================================
--- linux.orig/drivers/acpi/scan.c
+++ linux/drivers/acpi/scan.c
@@ -704,6 +704,7 @@ static void acpi_device_unregister(struc
 	mutex_unlock(&acpi_device_lock);
 
 	acpi_detach_data(device->handle, acpi_bus_data_handler);
+	device->handle = ERR_PTR(-ENODEV);
 
 	acpi_device_remove_files(device);
 	device_unregister(&device->dev);
Index: linux/drivers/pnp/pnpacpi/rsparser.c
===================================================================
--- linux.orig/drivers/pnp/pnpacpi/rsparser.c
+++ linux/drivers/pnp/pnpacpi/rsparser.c
@@ -611,6 +611,9 @@ int pnpacpi_build_resource_template(stru
 	int res_cnt = 0;
 	acpi_status status;
 
+	if (WARN_ON_ONCE(IS_ERR(handle)))
+		return PTR_ERR(handle);
+
 	status = acpi_walk_resources(handle, METHOD_NAME__CRS,
 				     pnpacpi_count_resources, &res_cnt);
 	if (ACPI_FAILURE(status)) {



-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Acpi deadlocks with 3.7.0-rc4
  2012-11-29 10:59             ` Rafael J. Wysocki
@ 2012-11-29 12:26               ` Zdenek Kabelac
  2012-11-29 16:59                 ` Rafael J. Wysocki
  0 siblings, 1 reply; 15+ messages in thread
From: Zdenek Kabelac @ 2012-11-29 12:26 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Linus Torvalds, Len Brown, Linux ACPI, LKML

Dne 29.11.2012 11:59, Rafael J. Wysocki napsal(a):
> On Thursday, November 29, 2012 11:13:10 AM Zdenek Kabelac wrote:
>> Dne 28.11.2012 20:07, Linus Torvalds napsal(a):
>>> whole "prefix_node" pointer is bogus. It seems to have the value 0x1000.
>>
>> Tested also this patch with this result:
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=51071#c8
>>
>> So while it's made it pass suspend/resume, it's not really usable
>> as docking then.
>
> This just makes acpi_ns_lookup() always return acpi_gbl_root_node
> for things looked up by acpi_ns_get_node() as far as I can say.
>
> Hmm.
>
> If my theory correct, the patch below should catch the bug.  Can you please
> test it?
>

Ok now crashing right after 'undock' button press:

https://bugzilla.kernel.org/show_bug.cgi?id=51071#c10


Zdenek



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Acpi deadlocks with 3.7.0-rc4
  2012-11-29 12:26               ` Zdenek Kabelac
@ 2012-11-29 16:59                 ` Rafael J. Wysocki
  0 siblings, 0 replies; 15+ messages in thread
From: Rafael J. Wysocki @ 2012-11-29 16:59 UTC (permalink / raw)
  To: Zdenek Kabelac; +Cc: Linus Torvalds, Len Brown, Linux ACPI, LKML

On Thursday, November 29, 2012 01:26:48 PM Zdenek Kabelac wrote:
> Dne 29.11.2012 11:59, Rafael J. Wysocki napsal(a):
> > On Thursday, November 29, 2012 11:13:10 AM Zdenek Kabelac wrote:
> >> Dne 28.11.2012 20:07, Linus Torvalds napsal(a):
> >>> whole "prefix_node" pointer is bogus. It seems to have the value 0x1000.
> >>
> >> Tested also this patch with this result:
> >>
> >> https://bugzilla.kernel.org/show_bug.cgi?id=51071#c8
> >>
> >> So while it's made it pass suspend/resume, it's not really usable
> >> as docking then.
> >
> > This just makes acpi_ns_lookup() always return acpi_gbl_root_node
> > for things looked up by acpi_ns_get_node() as far as I can say.
> >
> > Hmm.
> >
> > If my theory correct, the patch below should catch the bug.  Can you please
> > test it?
> >
> 
> Ok now crashing right after 'undock' button press:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=51071#c10

And that's because I made a mistake in the patch.  We're currently testing
the appended one and it's showing that the added WARN_ON_ONCE() actually
triggers, so the theory appears to be correct.

I think we can debug this further in the Bugzilla.

Thanks,
Rafael


---
 drivers/acpi/scan.c            |    2 ++
 drivers/pnp/pnpacpi/rsparser.c |    3 +++
 2 files changed, 5 insertions(+)

Index: linux/drivers/acpi/scan.c
===================================================================
--- linux.orig/drivers/acpi/scan.c
+++ linux/drivers/acpi/scan.c
@@ -707,6 +707,8 @@ static void acpi_device_unregister(struc
 
 	acpi_device_remove_files(device);
 	device_unregister(&device->dev);
+
+	device->handle = ERR_PTR(-ENODEV);
 }
 
 /* --------------------------------------------------------------------------
Index: linux/drivers/pnp/pnpacpi/rsparser.c
===================================================================
--- linux.orig/drivers/pnp/pnpacpi/rsparser.c
+++ linux/drivers/pnp/pnpacpi/rsparser.c
@@ -611,6 +611,9 @@ int pnpacpi_build_resource_template(stru
 	int res_cnt = 0;
 	acpi_status status;
 
+	if (WARN_ON_ONCE(IS_ERR(handle)))
+		return PTR_ERR(handle);
+
 	status = acpi_walk_resources(handle, METHOD_NAME__CRS,
 				     pnpacpi_count_resources, &res_cnt);
 	if (ACPI_FAILURE(status)) {

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2012-11-29 16:55 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-15 16:09 Acpi deadlocks with 3.7.0-rc4 Zdenek Kabelac
2012-11-28 16:01 ` Linus Torvalds
2012-11-28 16:21   ` Zdenek Kabelac
2012-11-28 17:02     ` Linus Torvalds
2012-11-28 17:27       ` Zdenek Kabelac
2012-11-28 19:07         ` Linus Torvalds
2012-11-28 20:05           ` Rafael J. Wysocki
2012-11-29 10:13           ` Zdenek Kabelac
2012-11-29 10:59             ` Rafael J. Wysocki
2012-11-29 12:26               ` Zdenek Kabelac
2012-11-29 16:59                 ` Rafael J. Wysocki
2012-11-28 20:31         ` Rafael J. Wysocki
2012-11-29  9:03           ` Zdenek Kabelac
2012-11-29 10:09             ` Rafael J. Wysocki
2012-11-28 18:35     ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).