linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Debugging APM - cat /proc/apm produces oops
@ 2006-07-23 14:30 Ondrej Zary
  2006-07-23 14:41 ` Ondrej Zary
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Ondrej Zary @ 2006-07-23 14:30 UTC (permalink / raw)
  To: Linux Kernel Mailing List

Hello,
cat /proc/apm produces oops on my DTK notebook. Using "apm=broken-psr" kernel 
parameter fixes that but I lose the battery info. I'd like to have the 
battery info (and it works fine in Windows) so I want to debug it and 
(hopefully) fix.

The oops:
# cat /proc/apm
<1>BUG: unable to handle kernel paging request at virtual address 00005e88
 printing eip:
00002f9d
*pre = 00000000
Oops: 0002 [#4]
Modules linked in:
CPU:    0
EIP:    00c0:[<00002f9d>]    Not tainted VLI
EFLAGS: 00010017   (2.6.17-5-dtk #23)
EIP is at 0x2f94
eax: 00000033   ebx: 00000001   ecx: 00000000   edx: 00000000
esi: c10a1000   edi: 00000014   ebp: c4755e8a   esp: c4755e88
ds: 00c8   es: 0000   ss: 0068
Process cat (pid: 1928, threadinfo=c4754000 task=c11240b0)
Stack: 5e948001 5fc75e55 00005e94 000000c8 10000033 5ea800c0 00000001 530a0000
       00000016 00b86017 00000000 0000530a c010830f 00000060 0000530a 00000033
       0000007b 0000007b c0337368 00000000 c10a1000 00000000 00000000 00000282
Call Trace:
 <c010830f> apm_bios_call+0x68/0xba  <c0108728> apm_get_power_status+0x44/0x90
 <c01091a0> apm_get_info+0x34/0xdc  <c01617dc> proc_file_read+0xda/0x22d
 <c013b5a2> vfs_read+0x82/0x10e  <c013b873> sys_read+0x3c/0x62
 <c0102397> syscall_call+0x7/0xb
Code:  Bad EIP value.
EIP: [<00002f9d>] 0x2f9d SS:ESP 0068:c4755e88

So it looks like it dies somewhere in the APM BIOS code. But how to find 
exactly where and/or why? Maybe use GDB somehow (I've used it only for really 
simple debugging yet).
I've tried calling the APM 0x530A function from DOS (real mode, int 15h) and 
single-stepping the BIOS APM code (using good old user-friendly Turbo 
Debugger). Noticed some OUTs to 0xB1 (or something like that), then some PCI 
accesses (0xCF8 and 0xCFC) and then IP ended in area of all zeros. When I 
step over the int 15h call, it works fine - returns correct info.

-- 
Ondrej Zary

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Debugging APM - cat /proc/apm produces oops
  2006-07-23 14:30 Debugging APM - cat /proc/apm produces oops Ondrej Zary
@ 2006-07-23 14:41 ` Ondrej Zary
  2006-07-23 15:06 ` Stephen Rothwell
  2006-07-25 19:15 ` Alan Cox
  2 siblings, 0 replies; 9+ messages in thread
From: Ondrej Zary @ 2006-07-23 14:41 UTC (permalink / raw)
  To: Linux Kernel Mailing List

On Sunday 23 July 2006 16:30, Ondrej Zary wrote:
> I've tried calling the APM 0x530A function from DOS (real mode, int 15h)
> and single-stepping the BIOS APM code (using good old user-friendly Turbo
> Debugger). Noticed some OUTs to 0xB1 (or something like that), then some
> PCI accesses (0xCF8 and 0xCFC) and then IP ended in area of all zeros. When
> I step over the int 15h call, it works fine - returns correct info.

Sorry, this was my bad. It works fine even when single stepping. I've made a 
mistake and stepped over the ending int 20h instruction of my .com program...

I'm probably going to write down complete sequence of instructions which get 
executed during the call.

-- 
Ondrej Zary

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Debugging APM - cat /proc/apm produces oops
  2006-07-23 14:30 Debugging APM - cat /proc/apm produces oops Ondrej Zary
  2006-07-23 14:41 ` Ondrej Zary
@ 2006-07-23 15:06 ` Stephen Rothwell
  2006-07-23 16:35   ` Ondrej Zary
                     ` (2 more replies)
  2006-07-25 19:15 ` Alan Cox
  2 siblings, 3 replies; 9+ messages in thread
From: Stephen Rothwell @ 2006-07-23 15:06 UTC (permalink / raw)
  To: Ondrej Zary; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2919 bytes --]

On Sun, 23 Jul 2006 16:30:53 +0200 Ondrej Zary <linux@rainbow-software.org> wrote:
>
> cat /proc/apm produces oops on my DTK notebook. Using "apm=broken-psr" kernel 
> parameter fixes that but I lose the battery info. I'd like to have the 
> battery info (and it works fine in Windows) so I want to debug it and 
> (hopefully) fix.

Is there some reason you can't (or don't want to) use ACPI on this machine?

> The oops:
> # cat /proc/apm
> <1>BUG: unable to handle kernel paging request at virtual address 00005e88
                                                                        ^^^^
Looks like it tried to access through the stack pointer while executeing 16 bit code.

>  printing eip:
> 00002f9d
> *pre = 00000000
> Oops: 0002 [#4]
> Modules linked in:
> CPU:    0
> EIP:    00c0:[<00002f9d>]    Not tainted VLI
          ^^^^
This is the APM BIOS 16 bit code segment.
 
> EFLAGS: 00010017   (2.6.17-5-dtk #23)
> EIP is at 0x2f94
> eax: 00000033   ebx: 00000001   ecx: 00000000   edx: 00000000
> esi: c10a1000   edi: 00000014   ebp: c4755e8a   esp: c4755e88
                                                           ^^^^
Accessing the stack while executing 16 bit code is never going to work.

> ds: 00c8   es: 0000   ss: 0068
> Process cat (pid: 1928, threadinfo=c4754000 task=c11240b0)
> Stack: 5e948001 5fc75e55 00005e94 000000c8 10000033 5ea800c0 00000001 530a0000
>        00000016 00b86017 00000000 0000530a c010830f 00000060 0000530a 00000033
>        0000007b 0000007b c0337368 00000000 c10a1000 00000000 00000000 00000282
> Call Trace:
>  <c010830f> apm_bios_call+0x68/0xba  <c0108728> apm_get_power_status+0x44/0x90
>  <c01091a0> apm_get_info+0x34/0xdc  <c01617dc> proc_file_read+0xda/0x22d
>  <c013b5a2> vfs_read+0x82/0x10e  <c013b873> sys_read+0x3c/0x62
>  <c0102397> syscall_call+0x7/0xb
> Code:  Bad EIP value.
> EIP: [<00002f9d>] 0x2f9d SS:ESP 0068:c4755e88
> 
> So it looks like it dies somewhere in the APM BIOS code. But how to find 
> exactly where and/or why? Maybe use GDB somehow (I've used it only for really 
> simple debugging yet).
> I've tried calling the APM 0x530A function from DOS (real mode, int 15h) and 
> single-stepping the BIOS APM code (using good old user-friendly Turbo 
> Debugger). Noticed some OUTs to 0xB1 (or something like that), then some PCI 
> accesses (0xCF8 and 0xCFC) and then IP ended in area of all zeros. When I 
> step over the int 15h call, it works fine - returns correct info.

APM BIOS's are often only tested in real mode as that will suffice for
older Windows systems.  If you machine is less than 6 years old, Windows
is almost certainly using ACPI and not APM.

If you really want this fixedm you will have to talk to your machine
menufacturer (or the BIOS manufacturer).

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Debugging APM - cat /proc/apm produces oops
  2006-07-23 15:06 ` Stephen Rothwell
@ 2006-07-23 16:35   ` Ondrej Zary
  2006-07-24 21:51   ` Ondrej Zary
  2006-07-25 20:04   ` Ondrej Zary
  2 siblings, 0 replies; 9+ messages in thread
From: Ondrej Zary @ 2006-07-23 16:35 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: linux-kernel

On Sunday 23 July 2006 17:06, you wrote:
> On Sun, 23 Jul 2006 16:30:53 +0200 Ondrej Zary <linux@rainbow-software.org> 
wrote:
> > cat /proc/apm produces oops on my DTK notebook. Using "apm=broken-psr"
> > kernel parameter fixes that but I lose the battery info. I'd like to have
> > the battery info (and it works fine in Windows) so I want to debug it and
> > (hopefully) fix.
>
> Is there some reason you can't (or don't want to) use ACPI on this machine?

The machine is old (1997) and even the latest BIOS (1999) does not support 
ACPI (although the i430TX chipset itself is ACPI-compatible).

>
> > The oops:
> > # cat /proc/apm
> > <1>BUG: unable to handle kernel paging request at virtual address
> > 00005e88
>
>                                                                        
> ^^^^ Looks like it tried to access through the stack pointer while
> executeing 16 bit code.

Thanks for explanation.

>
> >  printing eip:
> > 00002f9d
> > *pre = 00000000
> > Oops: 0002 [#4]
> > Modules linked in:
> > CPU:    0
> > EIP:    00c0:[<00002f9d>]    Not tainted VLI
>
>           ^^^^
> This is the APM BIOS 16 bit code segment.
>
> > EFLAGS: 00010017   (2.6.17-5-dtk #23)
> > EIP is at 0x2f94
> > eax: 00000033   ebx: 00000001   ecx: 00000000   edx: 00000000
> > esi: c10a1000   edi: 00000014   ebp: c4755e8a   esp: c4755e88
>
>                                                            ^^^^
> Accessing the stack while executing 16 bit code is never going to work.
>
> > ds: 00c8   es: 0000   ss: 0068
> > Process cat (pid: 1928, threadinfo=c4754000 task=c11240b0)
> > Stack: 5e948001 5fc75e55 00005e94 000000c8 10000033 5ea800c0 00000001
> > 530a0000 00000016 00b86017 00000000 0000530a c010830f 00000060 0000530a
> > 00000033 0000007b 0000007b c0337368 00000000 c10a1000 00000000 00000000
> > 00000282 Call Trace:
> >  <c010830f> apm_bios_call+0x68/0xba  <c0108728>
> > apm_get_power_status+0x44/0x90 <c01091a0> apm_get_info+0x34/0xdc 
> > <c01617dc> proc_file_read+0xda/0x22d <c013b5a2> vfs_read+0x82/0x10e 
> > <c013b873> sys_read+0x3c/0x62
> >  <c0102397> syscall_call+0x7/0xb
> > Code:  Bad EIP value.
> > EIP: [<00002f9d>] 0x2f9d SS:ESP 0068:c4755e88
> >
> > So it looks like it dies somewhere in the APM BIOS code. But how to find
> > exactly where and/or why? Maybe use GDB somehow (I've used it only for
> > really simple debugging yet).
> > I've tried calling the APM 0x530A function from DOS (real mode, int 15h)
> > and single-stepping the BIOS APM code (using good old user-friendly Turbo
> > Debugger). Noticed some OUTs to 0xB1 (or something like that), then some
> > PCI accesses (0xCF8 and 0xCFC) and then IP ended in area of all zeros.
> > When I step over the int 15h call, it works fine - returns correct info.
>
> APM BIOS's are often only tested in real mode as that will suffice for
> older Windows systems.  If you machine is less than 6 years old, Windows
> is almost certainly using ACPI and not APM.
>
> If you really want this fixedm you will have to talk to your machine
> menufacturer (or the BIOS manufacturer).

This is not going to work as it's not supported anymore :(
Too bad that the BIOS is Phoenix, as I'm able to modify Award BIOSes only.
So I'll try to find out what the APM BIOS does, rewrite it in C and put in my 
kernel.

-- 
Ondrej Zary

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Debugging APM - cat /proc/apm produces oops
  2006-07-23 15:06 ` Stephen Rothwell
  2006-07-23 16:35   ` Ondrej Zary
@ 2006-07-24 21:51   ` Ondrej Zary
  2006-07-25 20:04   ` Ondrej Zary
  2 siblings, 0 replies; 9+ messages in thread
From: Ondrej Zary @ 2006-07-24 21:51 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: linux-kernel

On Sunday 23 July 2006 17:06, Stephen Rothwell wrote:
> On Sun, 23 Jul 2006 16:30:53 +0200 Ondrej Zary <linux@rainbow-software.org> 
wrote:

> >  printing eip:
> > 00002f9d
> > *pre = 00000000
> > Oops: 0002 [#4]
> > Modules linked in:
> > CPU:    0
> > EIP:    00c0:[<00002f9d>]    Not tainted VLI
>
>           ^^^^
> This is the APM BIOS 16 bit code segment.

Looking at BIOS disassembly:
2F97: push bp
2F98: mov bp,sp
2F9A: add sp,-2
2F9D: mov [bp][-2],bx    <-- it oopses here

I realized that I can modify the BIOS easily as it's stored in shadow RAM. So 
I replaced the offending MOV with three NOPs and tested again. This time it 
oopsed at 0x2FAD:
2FAD: cmp w,[bp][-2],1
2FB1: je 2FCB

that jump was taken during my single stepping, so I NOPped out the CMP and 
replaced JE with JMPS. Then booted Linux and APM seems to work fine - battery 
percentage and remaining time is there as well as AC power status.
There seems to be 4 these operations:
mov [bp][-2],bx
cmp w,[bp][-2],1
cmp w,[bp][-2],8002
cmp w,[bp][-2],8001
but I've hit only the first two of them. I wonder what's that for (especially 
when it works without that).

-- 
Ondrej Zary

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Debugging APM - cat /proc/apm produces oops
  2006-07-23 14:30 Debugging APM - cat /proc/apm produces oops Ondrej Zary
  2006-07-23 14:41 ` Ondrej Zary
  2006-07-23 15:06 ` Stephen Rothwell
@ 2006-07-25 19:15 ` Alan Cox
  2006-07-25 20:11   ` Ondrej Zary
  2 siblings, 1 reply; 9+ messages in thread
From: Alan Cox @ 2006-07-25 19:15 UTC (permalink / raw)
  To: Ondrej Zary; +Cc: Linux Kernel Mailing List

On Sul, 2006-07-23 at 16:30 +0200, Ondrej Zary wrote:
> Hello,
> cat /proc/apm produces oops on my DTK notebook. Using "apm=broken-psr" kernel 
> parameter fixes that but I lose the battery info. I'd like to have the 
> battery info (and it works fine in Windows) so I want to debug it and 
> (hopefully) fix.

If broken-psr is needed can you also send me a dmidecode and lspci -vxx
dump so I can automate that bit.

Alan


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Debugging APM - cat /proc/apm produces oops
  2006-07-23 15:06 ` Stephen Rothwell
  2006-07-23 16:35   ` Ondrej Zary
  2006-07-24 21:51   ` Ondrej Zary
@ 2006-07-25 20:04   ` Ondrej Zary
  2 siblings, 0 replies; 9+ messages in thread
From: Ondrej Zary @ 2006-07-25 20:04 UTC (permalink / raw)
  To: Stephen Rothwell; +Cc: linux-kernel, Chuck Ebbert

This is my "fix" - patches BIOS in shadow RAM. Ugly but allows me to use APM battery status.
Probably not worth including in the kernel but it might help someone...

--- linux-2.6.17.5-orig/drivers/pci/quirks.c	2006-07-15 04:38:43.000000000 +0200
+++ linux-2.6.17.5/drivers/pci/quirks.c	2006-07-26 18:41:01.000000000 +0200
@@ -1404,6 +1404,36 @@
 }
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_NCR, PCI_DEVICE_ID_NCR_53C810, fixup_rev1_53c810);
 
+#ifdef CONFIG_X86
+/* 
+ * Fix DTK FortisPro TOP-5A APM BIOS bug which causes oops on /proc/apm access
+ * Most probably works only with the latest BIOS rev 2.31
+ */
+static void __devinit quirk_dtk_top5a(struct pci_dev *dev)
+{
+	u8 *patch_addr_1 = __va(0xf2f9d);
+	u8 orig_1[] = { 0x89, 0x5e, 0xfe }; /* mov [bp-2],bx -> this causes oops */
+	u8 patch_1[] = { 0x90, 0x90, 0x90 }; /* 3x nop */
+	u8 *patch_addr_2 = __va(0xf2fad);
+	u8 orig_2[] = { 0x83, 0x7e, 0xfe, 0x01, /* cmp w,[bp-2],1 -> second oops */
+			0x74 }; 		/* je somewhere -> this must be changed to jmps */
+	u8 patch_2[] = { 0x90, 0x90, 0x90, 0x90, 0xeb }; /* 4x nop + jmps */
+	u8 shadow_cfg;
+
+	/* Check if it's the buggy BIOS */
+	if (memcmp(patch_addr_1, &orig_1[0], ARRAY_SIZE(orig_1)) ||
+	    memcmp(patch_addr_2, &orig_2[0], ARRAY_SIZE(orig_2)))
+		return;
+
+	printk(KERN_INFO "Fixing DTK FortisPro TOP-5A APM BIOS bug\n");
+	pci_read_config_byte(dev, 0x59, &shadow_cfg);
+	pci_write_config_byte(dev, 0x59, 0x20);	/* enable shadow BIOS writes */
+	memcpy(patch_addr_1, patch_1, ARRAY_SIZE(patch_1));
+	memcpy(patch_addr_2, patch_2, ARRAY_SIZE(patch_2));
+	pci_write_config_byte(dev, 0x59, shadow_cfg);
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL,	PCI_DEVICE_ID_INTEL_82439TX,	quirk_dtk_top5a);
+#endif /* CONFIG_X86 */
 
 static void pci_do_fixups(struct pci_dev *dev, struct pci_fixup *f, struct pci_fixup *end)
 {

-- 
Ondrej Zary

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Debugging APM - cat /proc/apm produces oops
  2006-07-25 19:15 ` Alan Cox
@ 2006-07-25 20:11   ` Ondrej Zary
  0 siblings, 0 replies; 9+ messages in thread
From: Ondrej Zary @ 2006-07-25 20:11 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linux Kernel Mailing List

On Tuesday 25 July 2006 21:15, Alan Cox wrote:
> On Sul, 2006-07-23 at 16:30 +0200, Ondrej Zary wrote:
> > Hello,
> > cat /proc/apm produces oops on my DTK notebook. Using "apm=broken-psr"
> > kernel parameter fixes that but I lose the battery info. I'd like to have
> > the battery info (and it works fine in Windows) so I want to debug it and
> > (hopefully) fix.
>
> If broken-psr is needed can you also send me a dmidecode and lspci -vxx
> dump so I can automate that bit.

I just found that there is another problem with this machine - DMI :)
Both kernel and dmidecode says that the DMI is not present - so one would say 
that the BIOS does not support DMI.
But wait... HWiNFO32 for DOS shows working DMI 2.0 with real data (although 
some data is from Intel Trajan reference board, apparently not changed by 
BIOS engineer...)
Examining $F000 BIOS segment revealed that the "_DMI_" signature is not 
present. But there is "_DMI20_" - and the header does not seem to match. I 
can see all the strings (like "Trajan") there...
Anyone interested in BIOS dump?

-- 
Ondrej Zary

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Debugging APM - cat /proc/apm produces oops
@ 2006-07-25  7:46 Chuck Ebbert
  0 siblings, 0 replies; 9+ messages in thread
From: Chuck Ebbert @ 2006-07-25  7:46 UTC (permalink / raw)
  To: Ondrej Zary; +Cc: linux-kernel, Stephen Rothwell

In-Reply-To: <200607242351.37578.linux@rainbow-software.org>

On Mon, 24 Jul 2006 23:51:37 +0200, Ondrej Zary wrote:
>
> > >  printing eip:
> > > 00002f9d
> > > *pre = 00000000
> > > Oops: 0002 [#4]
> > > Modules linked in:
> > > CPU:    0
> > > EIP:    00c0:[<00002f9d>]    Not tainted VLI
> >
> >           ^^^^
> > This is the APM BIOS 16 bit code segment.
>
> Looking at BIOS disassembly:
> 2F97: push bp
> 2F98: mov bp,sp
> 2F9A: add sp,-2
> 2F9D: mov [bp][-2],bx    <-- it oopses here

That's expected.  You can push/pop/call/ret using the kernel stack
because its 32-bit stack-size attribute controls how the stack is
addressed, but using it like that makes it use 16 bits (the CS
address size.)

This could probably be fixed in the kernel but it doesn't look
worth the trouble since the fix could be really ugly.

> I realized that I can modify the BIOS easily as it's stored in shadow RAM. So 
> I replaced the offending MOV with three NOPs and tested again. This time it 
> oopsed at 0x2FAD:
> 2FAD: cmp w,[bp][-2],1
> 2FB1: je 2FCB
> 
> that jump was taken during my single stepping, so I NOPped out the CMP and 
> replaced JE with JMPS. Then booted Linux and APM seems to work fine - battery 
> percentage and remaining time is there as well as AC power status.
> There seems to be 4 these operations:
> mov [bp][-2],bx
> cmp w,[bp][-2],1
> cmp w,[bp][-2],8002
> cmp w,[bp][-2],8001
> but I've hit only the first two of them. I wonder what's that for (especially 
> when it works without that).

Something is calling this after pushing the arg to the function onto the
stack.  I guess it's always calling it with a 1 if that's all you are seeing.

-- 
Chuck


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2006-07-25 20:11 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-07-23 14:30 Debugging APM - cat /proc/apm produces oops Ondrej Zary
2006-07-23 14:41 ` Ondrej Zary
2006-07-23 15:06 ` Stephen Rothwell
2006-07-23 16:35   ` Ondrej Zary
2006-07-24 21:51   ` Ondrej Zary
2006-07-25 20:04   ` Ondrej Zary
2006-07-25 19:15 ` Alan Cox
2006-07-25 20:11   ` Ondrej Zary
2006-07-25  7:46 Chuck Ebbert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).