All of lore.kernel.org
 help / color / mirror / Atom feed
* 2.6.29 acpi regression: acpi_ex_extract_from_field -- div by zero
@ 2009-03-15 10:47 Jiri Slaby
  2009-03-15 17:49 ` Linus Torvalds
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Jiri Slaby @ 2009-03-15 10:47 UTC (permalink / raw)
  To: Len Brown; +Cc: linux-acpi, rjw, Linux kernel mailing list, torvalds

Hi,

sometimes, when booting up/resuming from disk, I get an oops[1].

obj_desc->common_field.access_bit_width is zero, but even after the 
loop. Division before the loop is apparently OK.

This is the case:
         /* Mask off any extra bits in the last datum */

         buffer_tail_bits = obj_desc->common_field.bit_length %
             obj_desc->common_field.access_bit_width;

.L39:
         xorl    %edx, %edx      #
         movzbl  37(%rbx), %esi  # 
<variable>.common_field.access_bit_width, 
<variable>.common_field.access_bit_width
         movl    24(%rbx), %eax  # <variable>.common_field.bit_length, 
<variable>.common_field.bit_length
--------------- here:
         divl    %esi    # <variable>.common_field.access_bit_width
         movl    %edx, %ecx      #, tmp121
         testl   %edx, %edx      # tmp121
         je      .L41    #,

[1] http://www.fi.muni.cz/~xslaby/sklad/panics/acpi_oops.png

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.6.29 acpi regression: acpi_ex_extract_from_field -- div by zero
  2009-03-15 10:47 2.6.29 acpi regression: acpi_ex_extract_from_field -- div by zero Jiri Slaby
@ 2009-03-15 17:49 ` Linus Torvalds
  2009-03-15 19:08 ` Moore, Robert
       [not found] ` <d3f22a0903152046m891dc0aq6ca01eed32a9eb32@mail.gmail.com>
  2 siblings, 0 replies; 10+ messages in thread
From: Linus Torvalds @ 2009-03-15 17:49 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: Len Brown, linux-acpi, rjw, Linux kernel mailing list



On Sun, 15 Mar 2009, Jiri Slaby wrote:
> 
> sometimes, when booting up/resuming from disk, I get an oops[1].
> 
> obj_desc->common_field.access_bit_width is zero, but even after the loop.
> Division before the loop is apparently OK.
> 
> This is the case:
>         /* Mask off any extra bits in the last datum */
> 
>         buffer_tail_bits = obj_desc->common_field.bit_length %
>             obj_desc->common_field.access_bit_width;

Hmm. "bit_length" is zero too, according to the oops (it's in %eax).

Looks like the whole "obj_desc" is basically uninitialized. I have no idea 
why, and why it is apparently not reliable. ACPI is odd.

Does everything work if you just do something like the appended?

It's insane, but so is ACPI.

		Linus
---
 drivers/acpi/acpica/exfldio.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/acpi/acpica/exfldio.c b/drivers/acpi/acpica/exfldio.c
index ef58ac4..9297adb 100644
--- a/drivers/acpi/acpica/exfldio.c
+++ b/drivers/acpi/acpica/exfldio.c
@@ -698,6 +698,9 @@ acpi_ex_extract_from_field(union acpi_operand_object *obj_desc,
 	}
 	ACPI_MEMSET(buffer, 0, buffer_length);
 
+	if (!obj_desc->common_field.bit_length)
+		return_ACPI_STATUS(AE_OK);
+
 	/* Compute the number of datums (access width data items) */
 
 	datum_count = ACPI_ROUND_UP_TO(obj_desc->common_field.bit_length,

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* RE: 2.6.29 acpi regression: acpi_ex_extract_from_field -- div by zero
  2009-03-15 10:47 2.6.29 acpi regression: acpi_ex_extract_from_field -- div by zero Jiri Slaby
  2009-03-15 17:49 ` Linus Torvalds
@ 2009-03-15 19:08 ` Moore, Robert
  2009-03-15 19:32   ` Jiri Slaby
       [not found] ` <d3f22a0903152046m891dc0aq6ca01eed32a9eb32@mail.gmail.com>
  2 siblings, 1 reply; 10+ messages in thread
From: Moore, Robert @ 2009-03-15 19:08 UTC (permalink / raw)
  To: Jiri Slaby, Len Brown
  Cc: linux-acpi, rjw, Linux kernel mailing list, torvalds

Can you please send or post the acpidump for the machine? Thanks.

>-----Original Message-----
>From: linux-acpi-owner@vger.kernel.org [mailto:linux-acpi-
>owner@vger.kernel.org] On Behalf Of Jiri Slaby
>Sent: Sunday, March 15, 2009 3:47 AM
>To: Len Brown
>Cc: linux-acpi@vger.kernel.org; rjw@sisk.pl; Linux kernel mailing list;
>torvalds@linux-foundation.org
>Subject: 2.6.29 acpi regression: acpi_ex_extract_from_field -- div by zero
>
>Hi,
>
>sometimes, when booting up/resuming from disk, I get an oops[1].
>
>obj_desc->common_field.access_bit_width is zero, but even after the
>loop. Division before the loop is apparently OK.
>
>This is the case:
>         /* Mask off any extra bits in the last datum */
>
>         buffer_tail_bits = obj_desc->common_field.bit_length %
>             obj_desc->common_field.access_bit_width;
>
>.L39:
>         xorl    %edx, %edx      #
>         movzbl  37(%rbx), %esi  #
><variable>.common_field.access_bit_width,
><variable>.common_field.access_bit_width
>         movl    24(%rbx), %eax  # <variable>.common_field.bit_length,
><variable>.common_field.bit_length
>--------------- here:
>         divl    %esi    # <variable>.common_field.access_bit_width
>         movl    %edx, %ecx      #, tmp121
>         testl   %edx, %edx      # tmp121
>         je      .L41    #,
>
>[1] http://www.fi.muni.cz/~xslaby/sklad/panics/acpi_oops.png
>--
>To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.6.29 acpi regression: acpi_ex_extract_from_field -- div by zero
  2009-03-15 19:08 ` Moore, Robert
@ 2009-03-15 19:32   ` Jiri Slaby
  0 siblings, 0 replies; 10+ messages in thread
From: Jiri Slaby @ 2009-03-15 19:32 UTC (permalink / raw)
  To: Moore, Robert
  Cc: Len Brown, linux-acpi, rjw, Linux kernel mailing list, torvalds

On 15.3.2009 20:08, Moore, Robert wrote:
> Can you please send or post the acpidump for the machine? Thanks.

Sure:
http://www.fi.muni.cz/~xslaby/sklad/adump

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.6.29 acpi regression: acpi_ex_extract_from_field -- div by zero
       [not found] ` <d3f22a0903152046m891dc0aq6ca01eed32a9eb32@mail.gmail.com>
@ 2009-03-16  3:42   ` Lin Ming
  2009-03-16 16:31     ` Jiri Slaby
  0 siblings, 1 reply; 10+ messages in thread
From: Lin Ming @ 2009-03-16  3:42 UTC (permalink / raw)
  To: jirislaby
  Cc: robert.moore, lenb, linux-acpi, Rafael J. Wysocki, linux-kernel,
	torvalds


> Hi,
> 
> sometimes, when booting up/resuming from disk, I get an oops[1].
> 
> obj_desc->common_field.access_bit_width is zero, but even after the
> loop. Division before the loop is apparently OK.

Would please try below debug patch to see which region filed is
accessed?

diff --git a/drivers/acpi/acpica/exfldio.c b/drivers/acpi/acpica/exfldio.c
index ef58ac4..10ac0c8 100644
--- a/drivers/acpi/acpica/exfldio.c
+++ b/drivers/acpi/acpica/exfldio.c
@@ -683,6 +683,7 @@ acpi_ex_extract_from_field(union acpi_operand_object *obj_desc,
 	u32 datum_count;
 	u32 field_datum_count;
 	u32 i;
+	struct acpi_buffer name_buffer = { ACPI_ALLOCATE_BUFFER, NULL };
 
 	ACPI_FUNCTION_TRACE(ex_extract_from_field);
 
@@ -765,6 +766,20 @@ acpi_ex_extract_from_field(union acpi_operand_object *obj_desc,
 		    raw_datum >> obj_desc->common_field.start_field_bit_offset;
 	}
 
+	if (obj_desc->common_field.bit_length == 0 ||
+	    obj_desc->common_field.access_bit_width == 0) {
+		status = acpi_get_name(obj_desc->common_field.node,
+				ACPI_FULL_PATHNAME, &name_buffer);
+		if (ACPI_FAILURE(status)) {
+			printk(KERN_DEBUG "ACPI Debug: %s\n",
+				acpi_format_exception(status));
+		} else {
+			printk(KERN_DEBUG "ACPI Debug: field node path: %s\n",
+				(char *) name_buffer.pointer);
+			kfree(name_buffer.pointer);
+		}
+	}
+
 	/* Mask off any extra bits in the last datum */
 
 	buffer_tail_bits = obj_desc->common_field.bit_length %

---
Lin Ming

> 
> This is the case:
>        /* Mask off any extra bits in the last datum */
> 
>        buffer_tail_bits = obj_desc->common_field.bit_length %
>            obj_desc->common_field.access_bit_width;
> 
> .L39:
>        xorl    %edx, %edx      #
>        movzbl  37(%rbx), %esi  #
> <variable>.common_field.access_bit_width,
> <variable>.common_field.access_bit_width
>        movl    24(%rbx), %eax  # <variable>.common_field.bit_length,
> <variable>.common_field.bit_length
> --------------- here:
>        divl    %esi    # <variable>.common_field.access_bit_width
>        movl    %edx, %ecx      #, tmp121
>        testl   %edx, %edx      # tmp121
>        je      .L41    #,
> 
> [1] http://www.fi.muni.cz/~xslaby/sklad/panics/acpi_oops.png
> --



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: 2.6.29 acpi regression: acpi_ex_extract_from_field -- div by zero
  2009-03-16  3:42   ` Lin Ming
@ 2009-03-16 16:31     ` Jiri Slaby
  2009-03-18  8:08       ` Jiri Slaby
  0 siblings, 1 reply; 10+ messages in thread
From: Jiri Slaby @ 2009-03-16 16:31 UTC (permalink / raw)
  To: Lin Ming
  Cc: robert.moore, lenb, linux-acpi, Rafael J. Wysocki, linux-kernel,
	torvalds

On 16.3.2009 04:42, Lin Ming wrote:
>> sometimes, when booting up/resuming from disk, I get an oops[1].
>>
>> obj_desc->common_field.access_bit_width is zero, but even after the
>> loop. Division before the loop is apparently OK.
>
> Would please try below debug patch to see which region filed is
> accessed?

Yes, except the fact, we wouldn't see anything :) -- I'll add inside the 
if a return statement or something to disallow the oops to flood screen.

Also, it doesn't happen regularly. Mostly several first tries after cold 
start. I hope this is a regression and not a HW problem, but it emerged 
after switching to 2.6.29-*, I will keep you informed.

Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.6.29 acpi regression: acpi_ex_extract_from_field -- div by zero
  2009-03-16 16:31     ` Jiri Slaby
@ 2009-03-18  8:08       ` Jiri Slaby
  2009-03-19 16:06         ` Jiri Slaby
  0 siblings, 1 reply; 10+ messages in thread
From: Jiri Slaby @ 2009-03-18  8:08 UTC (permalink / raw)
  To: Lin Ming
  Cc: robert.moore, lenb, linux-acpi, Rafael J. Wysocki, linux-kernel,
	torvalds

On 16.3.2009 17:31, Jiri Slaby wrote:
> On 16.3.2009 04:42, Lin Ming wrote:
>>> sometimes, when booting up/resuming from disk, I get an oops[1].
>>>
>>> obj_desc->common_field.access_bit_width is zero, but even after the
>>> loop. Division before the loop is apparently OK.
>>
>> Would please try below debug patch to see which region filed is
>> accessed?
>
> Yes, except the fact, we wouldn't see anything :) -- I'll add inside the
> if a return statement or something to disallow the oops to flood screen.
>
> Also, it doesn't happen regularly. Mostly several first tries after cold
> start. I hope this is a regression and not a HW problem, but it emerged
> after switching to 2.6.29-*, I will keep you informed.

Hmm, I didn't see it with rc8 and this patch applied so far (no 'ACPI 
Debug' in dmesg). Also I don't see anything changed within drivers/acpi/ 
et al. between rc7 and rc8. Maybe the timing changed somewhere else... 
I'm confused, but keep trying.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.6.29 acpi regression: acpi_ex_extract_from_field -- div by zero
  2009-03-18  8:08       ` Jiri Slaby
@ 2009-03-19 16:06         ` Jiri Slaby
  2009-03-20  0:48           ` Lin Ming
  2009-03-20  4:53           ` Lin Ming
  0 siblings, 2 replies; 10+ messages in thread
From: Jiri Slaby @ 2009-03-19 16:06 UTC (permalink / raw)
  To: Lin Ming
  Cc: robert.moore, lenb, linux-acpi, Rafael J. Wysocki, linux-kernel,
	torvalds

On 18.3.2009 09:08, Jiri Slaby wrote:
> On 16.3.2009 17:31, Jiri Slaby wrote:
>> On 16.3.2009 04:42, Lin Ming wrote:
>>>> sometimes, when booting up/resuming from disk, I get an oops[1].
>>>>
>>>> obj_desc->common_field.access_bit_width is zero, but even after the
>>>> loop. Division before the loop is apparently OK.
>>>
>>> Would please try below debug patch to see which region filed is
>>> accessed?
>
> I'm confused, but keep trying.

Got it again. obj_desc->common_field.node is martian too (0x4000000), so 
the added acpi_get_name dies:
http://www.fi.muni.cz/~xslaby/sklad/panics/acpi_oops1.png

Whole common_field seems to be mangled. Ideas?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.6.29 acpi regression: acpi_ex_extract_from_field -- div by zero
  2009-03-19 16:06         ` Jiri Slaby
@ 2009-03-20  0:48           ` Lin Ming
  2009-03-20  4:53           ` Lin Ming
  1 sibling, 0 replies; 10+ messages in thread
From: Lin Ming @ 2009-03-20  0:48 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Moore, Robert, lenb, linux-acpi, Rafael J. Wysocki, linux-kernel,
	torvalds

On Fri, 2009-03-20 at 00:06 +0800, Jiri Slaby wrote:
> On 18.3.2009 09:08, Jiri Slaby wrote:
> > On 16.3.2009 17:31, Jiri Slaby wrote:
> >> On 16.3.2009 04:42, Lin Ming wrote:
> >>>> sometimes, when booting up/resuming from disk, I get an oops[1].
> >>>>
> >>>> obj_desc->common_field.access_bit_width is zero, but even after the
> >>>> loop. Division before the loop is apparently OK.
> >>>
> >>> Would please try below debug patch to see which region filed is
> >>> accessed?
> >
> > I'm confused, but keep trying.
> 
> Got it again. obj_desc->common_field.node is martian too (0x4000000), so 
> the added acpi_get_name dies:
> http://www.fi.muni.cz/~xslaby/sklad/panics/acpi_oops1.png
> 
> Whole common_field seems to be mangled. Ideas?

Thanks for the info. We are looking at this.

Lin Ming


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.6.29 acpi regression: acpi_ex_extract_from_field -- div by zero
  2009-03-19 16:06         ` Jiri Slaby
  2009-03-20  0:48           ` Lin Ming
@ 2009-03-20  4:53           ` Lin Ming
  1 sibling, 0 replies; 10+ messages in thread
From: Lin Ming @ 2009-03-20  4:53 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Moore, Robert, lenb, linux-acpi, Rafael J. Wysocki, linux-kernel,
	torvalds

On Fri, 2009-03-20 at 00:06 +0800, Jiri Slaby wrote:
> On 18.3.2009 09:08, Jiri Slaby wrote:
> > On 16.3.2009 17:31, Jiri Slaby wrote:
> >> On 16.3.2009 04:42, Lin Ming wrote:
> >>>> sometimes, when booting up/resuming from disk, I get an oops[1].
> >>>>
> >>>> obj_desc->common_field.access_bit_width is zero, but even after the
> >>>> loop. Division before the loop is apparently OK.
> >>>
> >>> Would please try below debug patch to see which region filed is
> >>> accessed?
> >
> > I'm confused, but keep trying.
> 
> Got it again. obj_desc->common_field.node is martian too (0x4000000), so 
> the added acpi_get_name dies:
> http://www.fi.muni.cz/~xslaby/sklad/panics/acpi_oops1.png
> 
> Whole common_field seems to be mangled. Ideas?

I should save the field object and node path name at the entry of
acpi_ex_extract_from_field.
Would you please try below debug patch?
I also add a return statement before the fault to disallow the oops to
flood screen.

Thanks.

diff --git a/drivers/acpi/acpica/exfldio.c b/drivers/acpi/acpica/exfldio.c
index ef58ac4..5109f5b 100644
--- a/drivers/acpi/acpica/exfldio.c
+++ b/drivers/acpi/acpica/exfldio.c
@@ -670,6 +670,34 @@ acpi_ex_write_with_update_rule(union acpi_operand_object *obj_desc,
  *
  ******************************************************************************/
 
+void debug_dump_field(union acpi_operand_object *obj_desc)
+{
+	struct acpi_object_field_common *obj_field;
+
+	if (!obj_desc) {
+		printk(KERN_DEBUG "ACPI Debug: NULL field object\n");
+		return;
+	}
+
+	printk(KERN_DEBUG "next_object: %p\n", obj_desc->common.next_object);
+	printk(KERN_DEBUG "descriptor_type: %x\n", obj_desc->common.descriptor_type);
+	printk(KERN_DEBUG "type: %x\n", obj_desc->common.type);
+	printk(KERN_DEBUG "reference_count: %x\n", obj_desc->common.reference_count);
+	printk(KERN_DEBUG "flags: %x\n", obj_desc->common.flags);
+
+	obj_field = &obj_desc->common_field;
+
+	printk(KERN_DEBUG "field_flags: %x\n", obj_field->field_flags);
+	printk(KERN_DEBUG "attribute: %x\n", obj_field->attribute);
+	printk(KERN_DEBUG "access_byte_width: %x\n", obj_field->access_byte_width);
+	printk(KERN_DEBUG "node: %p\n", obj_field->node);
+	printk(KERN_DEBUG "bit_length: %x\n", obj_field->bit_length);
+	printk(KERN_DEBUG "base_byte_offset: %x\n", obj_field->base_byte_offset);
+	printk(KERN_DEBUG "value: %x\n", obj_field->value);
+	printk(KERN_DEBUG "start_field_bit_offset: %x\n", obj_field->start_field_bit_offset);
+	printk(KERN_DEBUG "access_bit_width: %x\n", obj_field->access_bit_width);
+}
+
 acpi_status
 acpi_ex_extract_from_field(union acpi_operand_object *obj_desc,
 			   void *buffer, u32 buffer_length)
@@ -683,9 +711,24 @@ acpi_ex_extract_from_field(union acpi_operand_object *obj_desc,
 	u32 datum_count;
 	u32 field_datum_count;
 	u32 i;
+	struct acpi_buffer name_buffer = { ACPI_ALLOCATE_BUFFER, NULL };
+	union acpi_operand_object backup_obj;
 
 	ACPI_FUNCTION_TRACE(ex_extract_from_field);
 
+	/* backup the object */
+	memcpy(&backup_obj, obj_desc, sizeof(*obj_desc));
+
+	name_buffer.pointer = NULL;
+	if (obj_desc->common_field.node) {
+		status = acpi_get_name(obj_desc->common_field.node,
+				ACPI_FULL_PATHNAME, &name_buffer);
+		if (ACPI_FAILURE(status)) {
+			printk(KERN_DEBUG "ACPI Debug: %s\n",
+				acpi_format_exception(status));
+		}
+	}
+
 	/* Validate target buffer and clear it */
 
 	if (buffer_length <
@@ -765,6 +808,29 @@ acpi_ex_extract_from_field(union acpi_operand_object *obj_desc,
 		    raw_datum >> obj_desc->common_field.start_field_bit_offset;
 	}
 
+	if (obj_desc->common_field.bit_length == 0 ||
+	    obj_desc->common_field.access_bit_width == 0) {
+
+		/* oops, dump the object */
+		if (name_buffer.pointer) {
+			printk(KERN_DEBUG "ACPI Debug: field node path: %s\n",
+				(char *) name_buffer.pointer);
+		}
+
+		printk(KERN_DEBUG "ACPI Debug: The original object:\n");
+		debug_dump_field(&backup_obj);
+
+		printk(KERN_DEBUG "ACPI Debug: The bad object:\n");
+		debug_dump_field(obj_desc);
+
+		kfree(name_buffer.pointer);
+
+		/* we return here to disallow the oops to flood screen */
+		return_ACPI_STATUS(AE_ERROR);
+	}
+
+	kfree(name_buffer.pointer);
+
 	/* Mask off any extra bits in the last datum */
 
 	buffer_tail_bits = obj_desc->common_field.bit_length %

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-03-20  4:53 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-15 10:47 2.6.29 acpi regression: acpi_ex_extract_from_field -- div by zero Jiri Slaby
2009-03-15 17:49 ` Linus Torvalds
2009-03-15 19:08 ` Moore, Robert
2009-03-15 19:32   ` Jiri Slaby
     [not found] ` <d3f22a0903152046m891dc0aq6ca01eed32a9eb32@mail.gmail.com>
2009-03-16  3:42   ` Lin Ming
2009-03-16 16:31     ` Jiri Slaby
2009-03-18  8:08       ` Jiri Slaby
2009-03-19 16:06         ` Jiri Slaby
2009-03-20  0:48           ` Lin Ming
2009-03-20  4:53           ` Lin Ming

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.