All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] RFH: difference in read-only mapped bios.bin - memory corruption?
@ 2017-08-14 14:28 Philipp Hahn
  2017-08-14 18:39 ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 5+ messages in thread
From: Philipp Hahn @ 2017-08-14 14:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Laszlo Ersek, Dr. David Alan Gilbert

Hello,

I'm currently investigating a problem, were a Linux VM does not reboot
and gets stuck in the SeaBIOS reboot code:

I'm using SeaBIOS-1.7 from Debian with a more modern qemu-2.8

> virsh # qemu-monitor-command --hmp ucs41-414 info roms
> fw=genroms/kvmvapic.bin size=0x002400 name="kvmvapic.bin"
> addr=00000000fffe0000 size=0x020000 mem=rom name="bios.bin"

which (to my understanding) is mapped at two physical locations:
> virsh # qemu-monitor-command --hmp ucs41-414 info mtree
...> memory-region: system
...>       00000000000e0000-00000000000fffff (prio 1, R-): alias
isa-bios @pc.bios 0000000000000000-000000000001ffff
>       00000000fffe0000-00000000ffffffff (prio 0, R-): pc.bios

If I dump both regions and compare them, I get a difference:
> virsh # qemu-monitor-command --pretty --domain ucs41-414 '{"execute":"pmemsave","arguments":{"val":917504,"size":131072,"filename":"/tmp/bios-low.dump"}}'
> virsh # qemu-monitor-command --pretty --domain ucs41-414 '{"execute":"pmemsave","arguments":{"val":4294836224,"size":131072,"filename":"/tmp/bios-high.dump"}}'
> # diff --suppress-common-lines -y <(od -Ax -tx1 -w1 -v /tmp/bios-low.dump) <(od -Ax -tx1 -w1 -v /tmp/bios-high.dump)
> 00f798 fa                                                     | 00f798 80
> 00f799 7a                                                     | 00f799 89
> 00f79a f4                                                     | 00f79a f2
> 016d40 00                                                     | 016d40 ff
> 016d41 00                                                     | 016d41 ff
> 016d42 00                                                     | 016d42 ff
> 016d43 00                                                     | 016d43 ff

The high address dump is the same as the original:
> # cmp -l /tmp/bios-high.dump /usr/share/seabios/bios.bin

> virsh # qemu-monitor-command --hmp ucs41-414 x/6i 0x00000000000ef78f
> 0x00000000000ef78f:  mov    $0xcf8,%esi
> 0x00000000000ef794:  mov    $0xfa000000,%eax
> 0x00000000000ef799:  jp     0xef78f
                       ^^^^^^^^^^^^^^ BUG: endless loop
> 0x00000000000ef79b:  out    %eax,(%dx)
> 0x00000000000ef79c:  mov    $0xfe,%dl
> 0x00000000000ef79e:  in     (%dx),%ax

> virsh # qemu-monitor-command --hmp ucs41-414 xp/6i 0x00000000fffef78f
> 0x00000000fffef78f:  mov    $0xcf8,%esi
> 0x00000000fffef794:  mov    $0x80000000,%eax
> 0x00000000fffef799:  mov    %esi,%edx
                       ^^^^^^^^^^^^^^^^ CORRECT original code
> 0x00000000fffef79b:  out    %eax,(%dx)
> 0x00000000fffef79c:  mov    $0xfe,%dl
> 0x00000000fffef79e:  in     (%dx),%ax

(That's some code from seabios-1.7.0/src/pci.c)

I had exactly the same run some weeks ago, but I also get different
patterns:
> # diff --suppress-common-lines -y <(od -Ax -tx1 -w1 -v /tmp/bios2.dump) <(od -Ax -tx1 -w1 -v bios.bin)
> 00f798 f0                                                     | 00f798 80
> 00f799 8d                                                     | 00f799 89
> 00f79a f3                                                     | 00f79a f2
> 016d40 00                                                     | 016d40 ff
> 016d41 00                                                     | 016d41 ff
> 016d42 00                                                     | 016d42 ff
> 016d43 00                                                     | 016d43 ff

Not all runs lead to reboot problems, but I don't know if any other
corruption happened there.

I had a similar problem with OVMF back in June
<https://lists.nongnu.org/archive/html/qemu-devel/2017-06/msg03940.html>,
which I "solved" by upgrading the OVMF version: I have not seen the
problem there since than, but this problems looks very similar.


1. How can it be, that the low-mem ROM mapping is modified?

2. Can I tell QEMU or gdb to trap any modification of that 128 KiB area?

I'll try to get http://rr-project.org/ running, but any help is appreciated.

Philipp
-- 
Philipp Hahn
Open Source Software Engineer

Univention GmbH
be open.
Mary-Somerville-Str. 1
D-28359 Bremen
Tel.: +49 421 22232-0
Fax : +49 421 22232-99
hahn@univention.de

http://www.univention.de/
Geschäftsführer: Peter H. Ganten
HRB 20755 Amtsgericht Bremen
Steuer-Nr.: 71-597-02876

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] RFH: difference in read-only mapped bios.bin - memory corruption?
  2017-08-14 14:28 [Qemu-devel] RFH: difference in read-only mapped bios.bin - memory corruption? Philipp Hahn
@ 2017-08-14 18:39 ` Dr. David Alan Gilbert
  2017-08-15 11:25   ` Laszlo Ersek
  0 siblings, 1 reply; 5+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-14 18:39 UTC (permalink / raw)
  To: Philipp Hahn; +Cc: qemu-devel, Laszlo Ersek

* Philipp Hahn (hahn@univention.de) wrote:
> Hello,
> 
> I'm currently investigating a problem, were a Linux VM does not reboot
> and gets stuck in the SeaBIOS reboot code:
> 
> I'm using SeaBIOS-1.7 from Debian with a more modern qemu-2.8
> 
> > virsh # qemu-monitor-command --hmp ucs41-414 info roms
> > fw=genroms/kvmvapic.bin size=0x002400 name="kvmvapic.bin"
> > addr=00000000fffe0000 size=0x020000 mem=rom name="bios.bin"
> 
> which (to my understanding) is mapped at two physical locations:
> > virsh # qemu-monitor-command --hmp ucs41-414 info mtree
> ...> memory-region: system
> ...>       00000000000e0000-00000000000fffff (prio 1, R-): alias
> isa-bios @pc.bios 0000000000000000-000000000001ffff
> >       00000000fffe0000-00000000ffffffff (prio 0, R-): pc.bios
> 
> If I dump both regions and compare them, I get a difference:
> > virsh # qemu-monitor-command --pretty --domain ucs41-414 '{"execute":"pmemsave","arguments":{"val":917504,"size":131072,"filename":"/tmp/bios-low.dump"}}'
> > virsh # qemu-monitor-command --pretty --domain ucs41-414 '{"execute":"pmemsave","arguments":{"val":4294836224,"size":131072,"filename":"/tmp/bios-high.dump"}}'
> > # diff --suppress-common-lines -y <(od -Ax -tx1 -w1 -v /tmp/bios-low.dump) <(od -Ax -tx1 -w1 -v /tmp/bios-high.dump)
> > 00f798 fa                                                     | 00f798 80
> > 00f799 7a                                                     | 00f799 89
> > 00f79a f4                                                     | 00f79a f2
> > 016d40 00                                                     | 016d40 ff
> > 016d41 00                                                     | 016d41 ff
> > 016d42 00                                                     | 016d42 ff
> > 016d43 00                                                     | 016d43 ff
> 
> The high address dump is the same as the original:

You might want seabios commit c68aff5 and b837e6 that got fixed after
I tracked down some reboot hangs - although they were rare, not every
time.  c68aff5 did certainly cause a corruption, and the address of that
corruption was determined at link time and could overlay random useful
bits of code if you were unlucky.

> > # cmp -l /tmp/bios-high.dump /usr/share/seabios/bios.bin
> 
> > virsh # qemu-monitor-command --hmp ucs41-414 x/6i 0x00000000000ef78f
> > 0x00000000000ef78f:  mov    $0xcf8,%esi
> > 0x00000000000ef794:  mov    $0xfa000000,%eax
> > 0x00000000000ef799:  jp     0xef78f
>                        ^^^^^^^^^^^^^^ BUG: endless loop
> > 0x00000000000ef79b:  out    %eax,(%dx)
> > 0x00000000000ef79c:  mov    $0xfe,%dl
> > 0x00000000000ef79e:  in     (%dx),%ax
> 
> > virsh # qemu-monitor-command --hmp ucs41-414 xp/6i 0x00000000fffef78f
> > 0x00000000fffef78f:  mov    $0xcf8,%esi
> > 0x00000000fffef794:  mov    $0x80000000,%eax
> > 0x00000000fffef799:  mov    %esi,%edx
>                        ^^^^^^^^^^^^^^^^ CORRECT original code
> > 0x00000000fffef79b:  out    %eax,(%dx)
> > 0x00000000fffef79c:  mov    $0xfe,%dl
> > 0x00000000fffef79e:  in     (%dx),%ax
> 
> (That's some code from seabios-1.7.0/src/pci.c)
> 
> I had exactly the same run some weeks ago, but I also get different
> patterns:
> > # diff --suppress-common-lines -y <(od -Ax -tx1 -w1 -v /tmp/bios2.dump) <(od -Ax -tx1 -w1 -v bios.bin)
> > 00f798 f0                                                     | 00f798 80
> > 00f799 8d                                                     | 00f799 89
> > 00f79a f3                                                     | 00f79a f2
> > 016d40 00                                                     | 016d40 ff
> > 016d41 00                                                     | 016d41 ff
> > 016d42 00                                                     | 016d42 ff
> > 016d43 00                                                     | 016d43 ff
> 
> Not all runs lead to reboot problems, but I don't know if any other
> corruption happened there.
> 
> I had a similar problem with OVMF back in June
> <https://lists.nongnu.org/archive/html/qemu-devel/2017-06/msg03940.html>,
> which I "solved" by upgrading the OVMF version: I have not seen the
> problem there since than, but this problems looks very similar.
> 
> 
> 1. How can it be, that the low-mem ROM mapping is modified?

I can't remember all the details, but PC ROM is shadowed and mapped over
with RAM at various times, and the bioses play lots of silly tricks of
being copied and then reusing bits of the copied space as temporaries
and..... oh it's just a mess.

Dave

> 2. Can I tell QEMU or gdb to trap any modification of that 128 KiB area?
> 
> I'll try to get http://rr-project.org/ running, but any help is appreciated.
> 
> Philipp
> -- 
> Philipp Hahn
> Open Source Software Engineer
> 
> Univention GmbH
> be open.
> Mary-Somerville-Str. 1
> D-28359 Bremen
> Tel.: +49 421 22232-0
> Fax : +49 421 22232-99
> hahn@univention.de
> 
> http://www.univention.de/
> Geschäftsführer: Peter H. Ganten
> HRB 20755 Amtsgericht Bremen
> Steuer-Nr.: 71-597-02876
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] RFH: difference in read-only mapped bios.bin - memory corruption?
  2017-08-14 18:39 ` Dr. David Alan Gilbert
@ 2017-08-15 11:25   ` Laszlo Ersek
  2017-08-18  8:41     ` Philipp Hahn
  0 siblings, 1 reply; 5+ messages in thread
From: Laszlo Ersek @ 2017-08-15 11:25 UTC (permalink / raw)
  To: Dr. David Alan Gilbert, Philipp Hahn; +Cc: qemu-devel

On 08/14/17 20:39, Dr. David Alan Gilbert wrote:
> * Philipp Hahn (hahn@univention.de) wrote:
>> Hello,
>>
>> I'm currently investigating a problem, were a Linux VM does not reboot
>> and gets stuck in the SeaBIOS reboot code:
>>
>> I'm using SeaBIOS-1.7 from Debian with a more modern qemu-2.8
>>
>>> virsh # qemu-monitor-command --hmp ucs41-414 info roms
>>> fw=genroms/kvmvapic.bin size=0x002400 name="kvmvapic.bin"
>>> addr=00000000fffe0000 size=0x020000 mem=rom name="bios.bin"
>>
>> which (to my understanding) is mapped at two physical locations:
>>> virsh # qemu-monitor-command --hmp ucs41-414 info mtree
>> ...> memory-region: system
>> ...>       00000000000e0000-00000000000fffff (prio 1, R-): alias
>> isa-bios @pc.bios 0000000000000000-000000000001ffff
>>>       00000000fffe0000-00000000ffffffff (prio 0, R-): pc.bios
>>
>> If I dump both regions and compare them, I get a difference:
>>> virsh # qemu-monitor-command --pretty --domain ucs41-414 '{"execute":"pmemsave","arguments":{"val":917504,"size":131072,"filename":"/tmp/bios-low.dump"}}'
>>> virsh # qemu-monitor-command --pretty --domain ucs41-414 '{"execute":"pmemsave","arguments":{"val":4294836224,"size":131072,"filename":"/tmp/bios-high.dump"}}'
>>> # diff --suppress-common-lines -y <(od -Ax -tx1 -w1 -v /tmp/bios-low.dump) <(od -Ax -tx1 -w1 -v /tmp/bios-high.dump)
>>> 00f798 fa                                                     | 00f798 80
>>> 00f799 7a                                                     | 00f799 89
>>> 00f79a f4                                                     | 00f79a f2
>>> 016d40 00                                                     | 016d40 ff
>>> 016d41 00                                                     | 016d41 ff
>>> 016d42 00                                                     | 016d42 ff
>>> 016d43 00                                                     | 016d43 ff
>>
>> The high address dump is the same as the original:
> 
> You might want seabios commit c68aff5 and b837e6 that got fixed after
> I tracked down some reboot hangs - although they were rare, not every
> time.  c68aff5 did certainly cause a corruption, and the address of that
> corruption was determined at link time and could overlay random useful
> bits of code if you were unlucky.
> 
>>> # cmp -l /tmp/bios-high.dump /usr/share/seabios/bios.bin
>>
>>> virsh # qemu-monitor-command --hmp ucs41-414 x/6i 0x00000000000ef78f
>>> 0x00000000000ef78f:  mov    $0xcf8,%esi
>>> 0x00000000000ef794:  mov    $0xfa000000,%eax
>>> 0x00000000000ef799:  jp     0xef78f
>>                        ^^^^^^^^^^^^^^ BUG: endless loop
>>> 0x00000000000ef79b:  out    %eax,(%dx)
>>> 0x00000000000ef79c:  mov    $0xfe,%dl
>>> 0x00000000000ef79e:  in     (%dx),%ax
>>
>>> virsh # qemu-monitor-command --hmp ucs41-414 xp/6i 0x00000000fffef78f
>>> 0x00000000fffef78f:  mov    $0xcf8,%esi
>>> 0x00000000fffef794:  mov    $0x80000000,%eax
>>> 0x00000000fffef799:  mov    %esi,%edx
>>                        ^^^^^^^^^^^^^^^^ CORRECT original code
>>> 0x00000000fffef79b:  out    %eax,(%dx)
>>> 0x00000000fffef79c:  mov    $0xfe,%dl
>>> 0x00000000fffef79e:  in     (%dx),%ax
>>
>> (That's some code from seabios-1.7.0/src/pci.c)
>>
>> I had exactly the same run some weeks ago, but I also get different
>> patterns:
>>> # diff --suppress-common-lines -y <(od -Ax -tx1 -w1 -v /tmp/bios2.dump) <(od -Ax -tx1 -w1 -v bios.bin)
>>> 00f798 f0                                                     | 00f798 80
>>> 00f799 8d                                                     | 00f799 89
>>> 00f79a f3                                                     | 00f79a f2
>>> 016d40 00                                                     | 016d40 ff
>>> 016d41 00                                                     | 016d41 ff
>>> 016d42 00                                                     | 016d42 ff
>>> 016d43 00                                                     | 016d43 ff
>>
>> Not all runs lead to reboot problems, but I don't know if any other
>> corruption happened there.
>>
>> I had a similar problem with OVMF back in June
>> <https://lists.nongnu.org/archive/html/qemu-devel/2017-06/msg03940.html>,
>> which I "solved" by upgrading the OVMF version: I have not seen the
>> problem there since than, but this problems looks very similar.
>>
>>
>> 1. How can it be, that the low-mem ROM mapping is modified?
> 
> I can't remember all the details, but PC ROM is shadowed and mapped over
> with RAM at various times,

Right. I don't remember for sure, but I believe the state of the PAM
registers doesn't only affect what the VCPUs see in that address range,
but also what your monitor commands will dump. (This would be the
logical choice -- make the monitor output what the VCPUs see anyway, at
the moment, dependent on the PAM settings.)

Thanks
Laszlo

> and the bioses play lots of silly tricks of
> being copied and then reusing bits of the copied space as temporaries
> and..... oh it's just a mess.
> 
> Dave
> 
>> 2. Can I tell QEMU or gdb to trap any modification of that 128 KiB area?
>>
>> I'll try to get http://rr-project.org/ running, but any help is appreciated.
>>
>> Philipp
>> -- 
>> Philipp Hahn
>> Open Source Software Engineer
>>
>> Univention GmbH
>> be open.
>> Mary-Somerville-Str. 1
>> D-28359 Bremen
>> Tel.: +49 421 22232-0
>> Fax : +49 421 22232-99
>> hahn@univention.de
>>
>> http://www.univention.de/
>> Geschäftsführer: Peter H. Ganten
>> HRB 20755 Amtsgericht Bremen
>> Steuer-Nr.: 71-597-02876
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] RFH: difference in read-only mapped bios.bin - memory corruption?
  2017-08-15 11:25   ` Laszlo Ersek
@ 2017-08-18  8:41     ` Philipp Hahn
  2017-08-18  8:59       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 5+ messages in thread
From: Philipp Hahn @ 2017-08-18  8:41 UTC (permalink / raw)
  To: Laszlo Ersek, Dr. David Alan Gilbert; +Cc: qemu-devel

Hello,

Am 15.08.2017 um 13:25 schrieb Laszlo Ersek:
> On 08/14/17 20:39, Dr. David Alan Gilbert wrote:
>> * Philipp Hahn (hahn@univention.de) wrote:
>>> I'm currently investigating a problem, were a Linux VM does not reboot
>>> and gets stuck in the SeaBIOS reboot code:
>>>
>>> I'm using SeaBIOS-1.7 from Debian with a more modern qemu-2.8
...>>> If I dump both regions and compare them, I get a difference:
...>> You might want seabios commit c68aff5 and b837e6 that got fixed after
>> I tracked down some reboot hangs - although they were rare, not every
>> time.  c68aff5 did certainly cause a corruption, and the address of that
>> corruption was determined at link time and could overlay random useful
>> bits of code if you were unlucky.

Thanks you for the commit IDs - to me this looks like they fixed the
problem. Testing with seabios-1.10 does not show any reboot problem so far.

>>> 1. How can it be, that the low-mem ROM mapping is modified?
>>
>> I can't remember all the details, but PC ROM is shadowed and mapped over
>> with RAM at various times,
> 
> Right. I don't remember for sure, but I believe the state of the PAM
> registers doesn't only affect what the VCPUs see in that address range,
> but also what your monitor commands will dump. (This would be the
> logical choice -- make the monitor output what the VCPUs see anyway, at
> the moment, dependent on the PAM settings.)

That makes sense.
Do you know by change what change in Qemu triggered that bug, as I've
never seen any reboot problem with qemu-1.1.2, but only since switching
to qemu-2.8?

Thanks again for your excellent help.

Philipp

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] RFH: difference in read-only mapped bios.bin - memory corruption?
  2017-08-18  8:41     ` Philipp Hahn
@ 2017-08-18  8:59       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 5+ messages in thread
From: Dr. David Alan Gilbert @ 2017-08-18  8:59 UTC (permalink / raw)
  To: Philipp Hahn; +Cc: Laszlo Ersek, qemu-devel

* Philipp Hahn (hahn@univention.de) wrote:
> Hello,
> 
> Am 15.08.2017 um 13:25 schrieb Laszlo Ersek:
> > On 08/14/17 20:39, Dr. David Alan Gilbert wrote:
> >> * Philipp Hahn (hahn@univention.de) wrote:
> >>> I'm currently investigating a problem, were a Linux VM does not reboot
> >>> and gets stuck in the SeaBIOS reboot code:
> >>>
> >>> I'm using SeaBIOS-1.7 from Debian with a more modern qemu-2.8
> ...>>> If I dump both regions and compare them, I get a difference:
> ...>> You might want seabios commit c68aff5 and b837e6 that got fixed after
> >> I tracked down some reboot hangs - although they were rare, not every
> >> time.  c68aff5 did certainly cause a corruption, and the address of that
> >> corruption was determined at link time and could overlay random useful
> >> bits of code if you were unlucky.
> 
> Thanks you for the commit IDs - to me this looks like they fixed the
> problem. Testing with seabios-1.10 does not show any reboot problem so far.
> 
> >>> 1. How can it be, that the low-mem ROM mapping is modified?
> >>
> >> I can't remember all the details, but PC ROM is shadowed and mapped over
> >> with RAM at various times,
> > 
> > Right. I don't remember for sure, but I believe the state of the PAM
> > registers doesn't only affect what the VCPUs see in that address range,
> > but also what your monitor commands will dump. (This would be the
> > logical choice -- make the monitor output what the VCPUs see anyway, at
> > the moment, dependent on the PAM settings.)
> 
> That makes sense.
> Do you know by change what change in Qemu triggered that bug, as I've
> never seen any reboot problem with qemu-1.1.2, but only since switching
> to qemu-2.8?

I didn't go back as far as 1.1.2, but I tried bisecting around 2.4/2.6
before I understood the failure and the bisect was very flaky;  I think
in the end it's a timing race where it comes down to the exact corrupt
value;  going back to ancient qemu might be taking some other path
through seabios but I ddin't investigate.

Dave

> Thanks again for your excellent help.
> 
> Philipp
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-08-18  8:59 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-14 14:28 [Qemu-devel] RFH: difference in read-only mapped bios.bin - memory corruption? Philipp Hahn
2017-08-14 18:39 ` Dr. David Alan Gilbert
2017-08-15 11:25   ` Laszlo Ersek
2017-08-18  8:41     ` Philipp Hahn
2017-08-18  8:59       ` Dr. David Alan Gilbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.