All of lore.kernel.org
 help / color / mirror / Atom feed
* qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
@ 2012-07-09 10:57 Chris Clayton
  2012-07-11  7:09 ` Chris Clayton
  0 siblings, 1 reply; 42+ messages in thread
From: Chris Clayton @ 2012-07-09 10:57 UTC (permalink / raw)
  To: kvm

Hi,

When I run WinXP SP3 through qemu-kvm-1.1.0 on linux kernel 3.5.0-rc6, I 
get a segmentation fault within 3 or 4 minutes maximum. In dmesg I see:

qemu-kvm: sending ioctl 5326 to a partition!
qemu-kvm: sending ioctl 801c0204 to a partition!
qemu-kvm: sending ioctl 5326 to a partition!
qemu-kvm: sending ioctl 801c0204 to a partition!
qemu-kvm: sending ioctl 5326 to a partition!
qemu-kvm: sending ioctl 801c0204 to a partition!
qemu-kvm: sending ioctl 5326 to a partition!
qemu-kvm: sending ioctl 801c0204 to a partition!
qemu-kvm[860] general protection ip:b6abad77 sp:b52ff09c error:0 in 
libc-2.16.so[b697d000+1b4000]

The crash does not occur with qemu-kvm-1.0.1 on rc6. Nor does it occur 
qemu-kvm-1.0.1 or qemu-kvm-1.1.0 on kernel 3.4.4. All three combinations 
survive for 15 minutes or more

When I try to get a backtrace with gdb, the screen on which konsole and 
qemu are running locks up until I kill qemu in another console. 
Consequently I can't get a full BT, but, although probably not very 
helpful, what I did get is:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb6946b40 (LWP 506)]
0xb7705d77 in __strcmp_sse4_2 () from /lib/libc.so.6
(gdb) bt
#0  0xb7705d77 in __strcmp_sse4_2 () from /lib/libc.so.6
#1  0xb7e8d6e3 in g_str_equal () from /usr/lib/libglib-2.0.so.0
#2  0xb7e8c94c in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0
Cannot access memory at address 0xb694610c
(gdb)

Note that the gdb and dmesg outputs above are not from the same crash 
instance.

I'm not subscribed,so please cc me on any reply.

Happy to provide any additional diagnostics (but may need help on how to 
get them) or test patches, etc

Thanks

Chris Clayton

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-09 10:57 qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6 Chris Clayton
@ 2012-07-11  7:09 ` Chris Clayton
  2012-07-11  7:12   ` Gleb Natapov
  0 siblings, 1 reply; 42+ messages in thread
From: Chris Clayton @ 2012-07-11  7:09 UTC (permalink / raw)
  To: kvm

Ping.

Have I committed a bug-reporting sin in the mail below or is everyone 
simply too busy to look at this kvm-related crash?

On 07/09/12 11:57, Chris Clayton wrote:
> Hi,
>
> When I run WinXP SP3 through qemu-kvm-1.1.0 on linux kernel 3.5.0-rc6, I
> get a segmentation fault within 3 or 4 minutes maximum. In dmesg I see:
>
> qemu-kvm: sending ioctl 5326 to a partition!
> qemu-kvm: sending ioctl 801c0204 to a partition!
> qemu-kvm: sending ioctl 5326 to a partition!
> qemu-kvm: sending ioctl 801c0204 to a partition!
> qemu-kvm: sending ioctl 5326 to a partition!
> qemu-kvm: sending ioctl 801c0204 to a partition!
> qemu-kvm: sending ioctl 5326 to a partition!
> qemu-kvm: sending ioctl 801c0204 to a partition!
> qemu-kvm[860] general protection ip:b6abad77 sp:b52ff09c error:0 in
> libc-2.16.so[b697d000+1b4000]
>
> The crash does not occur with qemu-kvm-1.0.1 on rc6. Nor does it occur
> qemu-kvm-1.0.1 or qemu-kvm-1.1.0 on kernel 3.4.4. All three combinations
> survive for 15 minutes or more
>
> When I try to get a backtrace with gdb, the screen on which konsole and
> qemu are running locks up until I kill qemu in another console.
> Consequently I can't get a full BT, but, although probably not very
> helpful, what I did get is:
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0xb6946b40 (LWP 506)]
> 0xb7705d77 in __strcmp_sse4_2 () from /lib/libc.so.6
> (gdb) bt
> #0  0xb7705d77 in __strcmp_sse4_2 () from /lib/libc.so.6
> #1  0xb7e8d6e3 in g_str_equal () from /usr/lib/libglib-2.0.so.0
> #2  0xb7e8c94c in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0
> Cannot access memory at address 0xb694610c
> (gdb)
>
> Note that the gdb and dmesg outputs above are not from the same crash
> instance.
>
> I'm not subscribed,so please cc me on any reply.
>
> Happy to provide any additional diagnostics (but may need help on how to
> get them) or test patches, etc
>
> Thanks
>
> Chris Clayton



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-11  7:09 ` Chris Clayton
@ 2012-07-11  7:12   ` Gleb Natapov
  2012-07-11  7:18     ` Chris Clayton
  0 siblings, 1 reply; 42+ messages in thread
From: Gleb Natapov @ 2012-07-11  7:12 UTC (permalink / raw)
  To: Chris Clayton; +Cc: kvm

On Wed, Jul 11, 2012 at 08:09:42AM +0100, Chris Clayton wrote:
> Ping.
> 
> Have I committed a bug-reporting sin in the mail below or is
> everyone simply too busy to look at this kvm-related crash?
> 
Since you have good and bad points can you bisect the problem?

> On 07/09/12 11:57, Chris Clayton wrote:
> >Hi,
> >
> >When I run WinXP SP3 through qemu-kvm-1.1.0 on linux kernel 3.5.0-rc6, I
> >get a segmentation fault within 3 or 4 minutes maximum. In dmesg I see:
> >
> >qemu-kvm: sending ioctl 5326 to a partition!
> >qemu-kvm: sending ioctl 801c0204 to a partition!
> >qemu-kvm: sending ioctl 5326 to a partition!
> >qemu-kvm: sending ioctl 801c0204 to a partition!
> >qemu-kvm: sending ioctl 5326 to a partition!
> >qemu-kvm: sending ioctl 801c0204 to a partition!
> >qemu-kvm: sending ioctl 5326 to a partition!
> >qemu-kvm: sending ioctl 801c0204 to a partition!
> >qemu-kvm[860] general protection ip:b6abad77 sp:b52ff09c error:0 in
> >libc-2.16.so[b697d000+1b4000]
> >
> >The crash does not occur with qemu-kvm-1.0.1 on rc6. Nor does it occur
> >qemu-kvm-1.0.1 or qemu-kvm-1.1.0 on kernel 3.4.4. All three combinations
> >survive for 15 minutes or more
> >
> >When I try to get a backtrace with gdb, the screen on which konsole and
> >qemu are running locks up until I kill qemu in another console.
> >Consequently I can't get a full BT, but, although probably not very
> >helpful, what I did get is:
> >
> >Program received signal SIGSEGV, Segmentation fault.
> >[Switching to Thread 0xb6946b40 (LWP 506)]
> >0xb7705d77 in __strcmp_sse4_2 () from /lib/libc.so.6
> >(gdb) bt
> >#0  0xb7705d77 in __strcmp_sse4_2 () from /lib/libc.so.6
> >#1  0xb7e8d6e3 in g_str_equal () from /usr/lib/libglib-2.0.so.0
> >#2  0xb7e8c94c in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0
> >Cannot access memory at address 0xb694610c
> >(gdb)
> >
> >Note that the gdb and dmesg outputs above are not from the same crash
> >instance.
> >
> >I'm not subscribed,so please cc me on any reply.
> >
> >Happy to provide any additional diagnostics (but may need help on how to
> >get them) or test patches, etc
> >
> >Thanks
> >
> >Chris Clayton
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
			Gleb.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-11  7:12   ` Gleb Natapov
@ 2012-07-11  7:18     ` Chris Clayton
  2012-07-11  7:22       ` Gleb Natapov
  0 siblings, 1 reply; 42+ messages in thread
From: Chris Clayton @ 2012-07-11  7:18 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm

On 07/11/12 08:12, Gleb Natapov wrote:
> On Wed, Jul 11, 2012 at 08:09:42AM +0100, Chris Clayton wrote:
>> Ping.
>>
>> Have I committed a bug-reporting sin in the mail below or is
>> everyone simply too busy to look at this kvm-related crash?
>>
> Since you have good and bad points can you bisect the problem?
>

Yes, I can bisect, but since the crash occurs with only only one 
combination of qemu-kvm (1.1.0) and kernel (3.5.0-rc6), I'm not sure 
which of those I should bisect. Any ideas on how I could narrow that down.

Thanks.
>> On 07/09/12 11:57, Chris Clayton wrote:
>>> Hi,
>>>
>>> When I run WinXP SP3 through qemu-kvm-1.1.0 on linux kernel 3.5.0-rc6, I
>>> get a segmentation fault within 3 or 4 minutes maximum. In dmesg I see:
>>>
>>> qemu-kvm: sending ioctl 5326 to a partition!
>>> qemu-kvm: sending ioctl 801c0204 to a partition!
>>> qemu-kvm: sending ioctl 5326 to a partition!
>>> qemu-kvm: sending ioctl 801c0204 to a partition!
>>> qemu-kvm: sending ioctl 5326 to a partition!
>>> qemu-kvm: sending ioctl 801c0204 to a partition!
>>> qemu-kvm: sending ioctl 5326 to a partition!
>>> qemu-kvm: sending ioctl 801c0204 to a partition!
>>> qemu-kvm[860] general protection ip:b6abad77 sp:b52ff09c error:0 in
>>> libc-2.16.so[b697d000+1b4000]
>>>
>>> The crash does not occur with qemu-kvm-1.0.1 on rc6. Nor does it occur
>>> qemu-kvm-1.0.1 or qemu-kvm-1.1.0 on kernel 3.4.4. All three combinations
>>> survive for 15 minutes or more
>>>
>>> When I try to get a backtrace with gdb, the screen on which konsole and
>>> qemu are running locks up until I kill qemu in another console.
>>> Consequently I can't get a full BT, but, although probably not very
>>> helpful, what I did get is:
>>>
>>> Program received signal SIGSEGV, Segmentation fault.
>>> [Switching to Thread 0xb6946b40 (LWP 506)]
>>> 0xb7705d77 in __strcmp_sse4_2 () from /lib/libc.so.6
>>> (gdb) bt
>>> #0  0xb7705d77 in __strcmp_sse4_2 () from /lib/libc.so.6
>>> #1  0xb7e8d6e3 in g_str_equal () from /usr/lib/libglib-2.0.so.0
>>> #2  0xb7e8c94c in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0
>>> Cannot access memory at address 0xb694610c
>>> (gdb)
>>>
>>> Note that the gdb and dmesg outputs above are not from the same crash
>>> instance.
>>>
>>> I'm not subscribed,so please cc me on any reply.
>>>
>>> Happy to provide any additional diagnostics (but may need help on how to
>>> get them) or test patches, etc
>>>
>>> Thanks
>>>
>>> Chris Clayton
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> --
> 			Gleb.
>



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-11  7:18     ` Chris Clayton
@ 2012-07-11  7:22       ` Gleb Natapov
  2012-07-15 19:52         ` Chris Clayton
  0 siblings, 1 reply; 42+ messages in thread
From: Gleb Natapov @ 2012-07-11  7:22 UTC (permalink / raw)
  To: Chris Clayton; +Cc: kvm

On Wed, Jul 11, 2012 at 08:18:17AM +0100, Chris Clayton wrote:
> On 07/11/12 08:12, Gleb Natapov wrote:
> >On Wed, Jul 11, 2012 at 08:09:42AM +0100, Chris Clayton wrote:
> >>Ping.
> >>
> >>Have I committed a bug-reporting sin in the mail below or is
> >>everyone simply too busy to look at this kvm-related crash?
> >>
> >Since you have good and bad points can you bisect the problem?
> >
> 
> Yes, I can bisect, but since the crash occurs with only only one
> combination of qemu-kvm (1.1.0) and kernel (3.5.0-rc6), I'm not sure
> which of those I should bisect. Any ideas on how I could narrow that
> down.
> 
Bisect qemu between qemu-kvm-1.0.1 & qemu-kvm-1.1.0.

> Thanks.
> >>On 07/09/12 11:57, Chris Clayton wrote:
> >>>Hi,
> >>>
> >>>When I run WinXP SP3 through qemu-kvm-1.1.0 on linux kernel 3.5.0-rc6, I
> >>>get a segmentation fault within 3 or 4 minutes maximum. In dmesg I see:
> >>>
> >>>qemu-kvm: sending ioctl 5326 to a partition!
> >>>qemu-kvm: sending ioctl 801c0204 to a partition!
> >>>qemu-kvm: sending ioctl 5326 to a partition!
> >>>qemu-kvm: sending ioctl 801c0204 to a partition!
> >>>qemu-kvm: sending ioctl 5326 to a partition!
> >>>qemu-kvm: sending ioctl 801c0204 to a partition!
> >>>qemu-kvm: sending ioctl 5326 to a partition!
> >>>qemu-kvm: sending ioctl 801c0204 to a partition!
> >>>qemu-kvm[860] general protection ip:b6abad77 sp:b52ff09c error:0 in
> >>>libc-2.16.so[b697d000+1b4000]
> >>>
> >>>The crash does not occur with qemu-kvm-1.0.1 on rc6. Nor does it occur
> >>>qemu-kvm-1.0.1 or qemu-kvm-1.1.0 on kernel 3.4.4. All three combinations
> >>>survive for 15 minutes or more
> >>>
> >>>When I try to get a backtrace with gdb, the screen on which konsole and
> >>>qemu are running locks up until I kill qemu in another console.
> >>>Consequently I can't get a full BT, but, although probably not very
> >>>helpful, what I did get is:
> >>>
> >>>Program received signal SIGSEGV, Segmentation fault.
> >>>[Switching to Thread 0xb6946b40 (LWP 506)]
> >>>0xb7705d77 in __strcmp_sse4_2 () from /lib/libc.so.6
> >>>(gdb) bt
> >>>#0  0xb7705d77 in __strcmp_sse4_2 () from /lib/libc.so.6
> >>>#1  0xb7e8d6e3 in g_str_equal () from /usr/lib/libglib-2.0.so.0
> >>>#2  0xb7e8c94c in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0
> >>>Cannot access memory at address 0xb694610c
> >>>(gdb)
> >>>
> >>>Note that the gdb and dmesg outputs above are not from the same crash
> >>>instance.
> >>>
> >>>I'm not subscribed,so please cc me on any reply.
> >>>
> >>>Happy to provide any additional diagnostics (but may need help on how to
> >>>get them) or test patches, etc
> >>>
> >>>Thanks
> >>>
> >>>Chris Clayton
> >>
> >>
> >>--
> >>To unsubscribe from this list: send the line "unsubscribe kvm" in
> >>the body of a message to majordomo@vger.kernel.org
> >>More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> >--
> >			Gleb.
> >
> 

--
			Gleb.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-11  7:22       ` Gleb Natapov
@ 2012-07-15 19:52         ` Chris Clayton
  2012-07-19 12:14           ` Chris Clayton
  0 siblings, 1 reply; 42+ messages in thread
From: Chris Clayton @ 2012-07-15 19:52 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm

On 07/11/12 08:22, Gleb Natapov wrote:
> On Wed, Jul 11, 2012 at 08:18:17AM +0100, Chris Clayton wrote:
>> On 07/11/12 08:12, Gleb Natapov wrote:
>>> On Wed, Jul 11, 2012 at 08:09:42AM +0100, Chris Clayton wrote:
>>>> Ping.
>>>>
>>>> Have I committed a bug-reporting sin in the mail below or is
>>>> everyone simply too busy to look at this kvm-related crash?
>>>>
>>> Since you have good and bad points can you bisect the problem?
>>>
>>
>> Yes, I can bisect, but since the crash occurs with only only one
>> combination of qemu-kvm (1.1.0) and kernel (3.5.0-rc6), I'm not sure
>> which of those I should bisect. Any ideas on how I could narrow that
>> down.
>>
> Bisect qemu between qemu-kvm-1.0.1 & qemu-kvm-1.1.0.
>

Change of diagnostics, unfortunately. qemu-kvm-1.0.1 can, in fact, crash 
on 3.5.0-rc6 (and rc7). I didn't get it earlier because it takes many 
times more invocations before the crash occurs with 1.0.1 and I haven't 
used qemu-kvm much in the past few weeks.

I'm now checking whether I can get crashes (with 1.0.1 and/or 1.1.0) on 
linux-3.4.4. I'll report back in a day or two.

>> Thanks.
>>>> On 07/09/12 11:57, Chris Clayton wrote:
>>>>> Hi,
>>>>>
>>>>> When I run WinXP SP3 through qemu-kvm-1.1.0 on linux kernel 3.5.0-rc6, I
>>>>> get a segmentation fault within 3 or 4 minutes maximum. In dmesg I see:
>>>>>
>>>>> qemu-kvm: sending ioctl 5326 to a partition!
>>>>> qemu-kvm: sending ioctl 801c0204 to a partition!
>>>>> qemu-kvm: sending ioctl 5326 to a partition!
>>>>> qemu-kvm: sending ioctl 801c0204 to a partition!
>>>>> qemu-kvm: sending ioctl 5326 to a partition!
>>>>> qemu-kvm: sending ioctl 801c0204 to a partition!
>>>>> qemu-kvm: sending ioctl 5326 to a partition!
>>>>> qemu-kvm: sending ioctl 801c0204 to a partition!
>>>>> qemu-kvm[860] general protection ip:b6abad77 sp:b52ff09c error:0 in
>>>>> libc-2.16.so[b697d000+1b4000]
>>>>>
>>>>> The crash does not occur with qemu-kvm-1.0.1 on rc6. Nor does it occur
>>>>> qemu-kvm-1.0.1 or qemu-kvm-1.1.0 on kernel 3.4.4. All three combinations
>>>>> survive for 15 minutes or more
>>>>>
>>>>> When I try to get a backtrace with gdb, the screen on which konsole and
>>>>> qemu are running locks up until I kill qemu in another console.
>>>>> Consequently I can't get a full BT, but, although probably not very
>>>>> helpful, what I did get is:
>>>>>
>>>>> Program received signal SIGSEGV, Segmentation fault.
>>>>> [Switching to Thread 0xb6946b40 (LWP 506)]
>>>>> 0xb7705d77 in __strcmp_sse4_2 () from /lib/libc.so.6
>>>>> (gdb) bt
>>>>> #0  0xb7705d77 in __strcmp_sse4_2 () from /lib/libc.so.6
>>>>> #1  0xb7e8d6e3 in g_str_equal () from /usr/lib/libglib-2.0.so.0
>>>>> #2  0xb7e8c94c in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0
>>>>> Cannot access memory at address 0xb694610c
>>>>> (gdb)
>>>>>
>>>>> Note that the gdb and dmesg outputs above are not from the same crash
>>>>> instance.
>>>>>
>>>>> I'm not subscribed,so please cc me on any reply.
>>>>>
>>>>> Happy to provide any additional diagnostics (but may need help on how to
>>>>> get them) or test patches, etc
>>>>>
>>>>> Thanks
>>>>>
>>>>> Chris Clayton
>>>>
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>> --
>>> 			Gleb.
>>>
>>
>
> --
> 			Gleb.
>



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-15 19:52         ` Chris Clayton
@ 2012-07-19 12:14           ` Chris Clayton
  2012-07-19 12:17             ` Avi Kivity
  0 siblings, 1 reply; 42+ messages in thread
From: Chris Clayton @ 2012-07-19 12:14 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm

On 07/15/12 20:52, Chris Clayton wrote:
> On 07/11/12 08:22, Gleb Natapov wrote:
>> On Wed, Jul 11, 2012 at 08:18:17AM +0100, Chris Clayton wrote:
>>> On 07/11/12 08:12, Gleb Natapov wrote:
>>>> On Wed, Jul 11, 2012 at 08:09:42AM +0100, Chris Clayton wrote:
>>>>> Ping.
>>>>>
>>>>> Have I committed a bug-reporting sin in the mail below or is
>>>>> everyone simply too busy to look at this kvm-related crash?
>>>>>
>>>> Since you have good and bad points can you bisect the problem?
>>>>
>>>
>>> Yes, I can bisect, but since the crash occurs with only only one
>>> combination of qemu-kvm (1.1.0) and kernel (3.5.0-rc6), I'm not sure
>>> which of those I should bisect. Any ideas on how I could narrow that
>>> down.
>>>
>> Bisect qemu between qemu-kvm-1.0.1 & qemu-kvm-1.1.0.
>>
>
> Change of diagnostics, unfortunately. qemu-kvm-1.0.1 can, in fact, crash
> on 3.5.0-rc6 (and rc7). I didn't get it earlier because it takes many
> times more invocations before the crash occurs with 1.0.1 and I haven't
> used qemu-kvm much in the past few weeks.
>
> I'm now checking whether I can get crashes (with 1.0.1 and/or 1.1.0) on
> linux-3.4.4. I'll report back in a day or two.

I've started up qemu-kvm on kernel 3.4.4 many times and not see a crash. 
That would indicate that the problem is in the kernel. However, I pulled 
the latest and greatest from Linus yesterday evening and I now can't get 
the crash there either, so whatever it was seems to have been fixed. If 
I checkout and build 3.5.0-rc[1..7], I can get the crash pretty quickly, 
so it's been fixed in the last few days.

Thanks


>
>>> Thanks.
>>>>> On 07/09/12 11:57, Chris Clayton wrote:
>>>>>> Hi,
>>>>>>
>>>>>> When I run WinXP SP3 through qemu-kvm-1.1.0 on linux kernel
>>>>>> 3.5.0-rc6, I
>>>>>> get a segmentation fault within 3 or 4 minutes maximum. In dmesg I
>>>>>> see:
>>>>>>
>>>>>> qemu-kvm: sending ioctl 5326 to a partition!
>>>>>> qemu-kvm: sending ioctl 801c0204 to a partition!
>>>>>> qemu-kvm: sending ioctl 5326 to a partition!
>>>>>> qemu-kvm: sending ioctl 801c0204 to a partition!
>>>>>> qemu-kvm: sending ioctl 5326 to a partition!
>>>>>> qemu-kvm: sending ioctl 801c0204 to a partition!
>>>>>> qemu-kvm: sending ioctl 5326 to a partition!
>>>>>> qemu-kvm: sending ioctl 801c0204 to a partition!
>>>>>> qemu-kvm[860] general protection ip:b6abad77 sp:b52ff09c error:0 in
>>>>>> libc-2.16.so[b697d000+1b4000]
>>>>>>
>>>>>> The crash does not occur with qemu-kvm-1.0.1 on rc6. Nor does it
>>>>>> occur
>>>>>> qemu-kvm-1.0.1 or qemu-kvm-1.1.0 on kernel 3.4.4. All three
>>>>>> combinations
>>>>>> survive for 15 minutes or more
>>>>>>
>>>>>> When I try to get a backtrace with gdb, the screen on which
>>>>>> konsole and
>>>>>> qemu are running locks up until I kill qemu in another console.
>>>>>> Consequently I can't get a full BT, but, although probably not very
>>>>>> helpful, what I did get is:
>>>>>>
>>>>>> Program received signal SIGSEGV, Segmentation fault.
>>>>>> [Switching to Thread 0xb6946b40 (LWP 506)]
>>>>>> 0xb7705d77 in __strcmp_sse4_2 () from /lib/libc.so.6
>>>>>> (gdb) bt
>>>>>> #0  0xb7705d77 in __strcmp_sse4_2 () from /lib/libc.so.6
>>>>>> #1  0xb7e8d6e3 in g_str_equal () from /usr/lib/libglib-2.0.so.0
>>>>>> #2  0xb7e8c94c in g_hash_table_lookup () from
>>>>>> /usr/lib/libglib-2.0.so.0
>>>>>> Cannot access memory at address 0xb694610c
>>>>>> (gdb)
>>>>>>
>>>>>> Note that the gdb and dmesg outputs above are not from the same crash
>>>>>> instance.
>>>>>>
>>>>>> I'm not subscribed,so please cc me on any reply.
>>>>>>
>>>>>> Happy to provide any additional diagnostics (but may need help on
>>>>>> how to
>>>>>> get them) or test patches, etc
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Chris Clayton
>>>>>
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>> --
>>>>             Gleb.
>>>>
>>>
>>
>> --
>>             Gleb.
>>
>
>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-19 12:14           ` Chris Clayton
@ 2012-07-19 12:17             ` Avi Kivity
  2012-07-19 18:23               ` Chris Clayton
  0 siblings, 1 reply; 42+ messages in thread
From: Avi Kivity @ 2012-07-19 12:17 UTC (permalink / raw)
  To: Chris Clayton; +Cc: Gleb Natapov, kvm

On 07/19/2012 03:14 PM, Chris Clayton wrote:

>> Change of diagnostics, unfortunately. qemu-kvm-1.0.1 can, in fact, crash
>> on 3.5.0-rc6 (and rc7). I didn't get it earlier because it takes many
>> times more invocations before the crash occurs with 1.0.1 and I haven't
>> used qemu-kvm much in the past few weeks.
>>
>> I'm now checking whether I can get crashes (with 1.0.1 and/or 1.1.0) on
>> linux-3.4.4. I'll report back in a day or two.
> 
> I've started up qemu-kvm on kernel 3.4.4 many times and not see a crash.
> That would indicate that the problem is in the kernel. However, I pulled
> the latest and greatest from Linus yesterday evening and I now can't get
> the crash there either, so whatever it was seems to have been fixed. If
> I checkout and build 3.5.0-rc[1..7], I can get the crash pretty quickly,
> so it's been fixed in the last few days.

There were no kvm changes post-rc7.

-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-19 12:17             ` Avi Kivity
@ 2012-07-19 18:23               ` Chris Clayton
  2012-07-26  9:52                 ` Chris Clayton
  0 siblings, 1 reply; 42+ messages in thread
From: Chris Clayton @ 2012-07-19 18:23 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Gleb Natapov, kvm

On 07/19/12 13:17, Avi Kivity wrote:
> On 07/19/2012 03:14 PM, Chris Clayton wrote:
>
>>> Change of diagnostics, unfortunately. qemu-kvm-1.0.1 can, in fact, crash
>>> on 3.5.0-rc6 (and rc7). I didn't get it earlier because it takes many
>>> times more invocations before the crash occurs with 1.0.1 and I haven't
>>> used qemu-kvm much in the past few weeks.
>>>
>>> I'm now checking whether I can get crashes (with 1.0.1 and/or 1.1.0) on
>>> linux-3.4.4. I'll report back in a day or two.
>>
>> I've started up qemu-kvm on kernel 3.4.4 many times and not see a crash.
>> That would indicate that the problem is in the kernel. However, I pulled
>> the latest and greatest from Linus yesterday evening and I now can't get
>> the crash there either, so whatever it was seems to have been fixed. If
>> I checkout and build 3.5.0-rc[1..7], I can get the crash pretty quickly,
>> so it's been fixed in the last few days.
>
> There were no kvm changes post-rc7.
>
Yes, I'm aware of that, Avi. This thread started because I was getting a 
crash in qemu-kvm, which I thought was only in v1.1.0. Later it turned 
out the the problem was also present in v1.0.1, but much harder to hit. 
However, it only ever happened with 3.5.0 kernels. 3.4.4, with either 
version of qemu-kvm, was stable. So then it seemed that the problem was 
in the kernel, (but not necessarily in the kvm code).

Something that's changed since rc7 has either fixed the problem or made 
it much harder to hit. With rc7 and earlier I can recreate the crash 
quite easily with qemu-kvm-1.1.0 and with enough runs of 1.0.1. With 
rc7+, I haven't been able to get a crash at all.

I'm not inclined to bisect to find out which patch provided the fix, but 
this mail should at least close the mail thread down tidily.

Chris

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-19 18:23               ` Chris Clayton
@ 2012-07-26  9:52                 ` Chris Clayton
  2012-07-26 10:01                   ` Avi Kivity
  2012-07-26 11:10                   ` Xiao Guangrong
  0 siblings, 2 replies; 42+ messages in thread
From: Chris Clayton @ 2012-07-26  9:52 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Gleb Natapov, kvm

On 07/19/12 19:23, Chris Clayton wrote:
> On 07/19/12 13:17, Avi Kivity wrote:
>> On 07/19/2012 03:14 PM, Chris Clayton wrote:
>>
>>>> Change of diagnostics, unfortunately. qemu-kvm-1.0.1 can, in fact,
>>>> crash
>>>> on 3.5.0-rc6 (and rc7). I didn't get it earlier because it takes many
>>>> times more invocations before the crash occurs with 1.0.1 and I haven't
>>>> used qemu-kvm much in the past few weeks.
>>>>
>>>> I'm now checking whether I can get crashes (with 1.0.1 and/or 1.1.0) on
>>>> linux-3.4.4. I'll report back in a day or two.
>>>
>>> I've started up qemu-kvm on kernel 3.4.4 many times and not see a crash.
>>> That would indicate that the problem is in the kernel. However, I pulled
>>> the latest and greatest from Linus yesterday evening and I now can't get
>>> the crash there either, so whatever it was seems to have been fixed. If
>>> I checkout and build 3.5.0-rc[1..7], I can get the crash pretty quickly,
>>> so it's been fixed in the last few days.
>>
>> There were no kvm changes post-rc7.
>>
> Yes, I'm aware of that, Avi. This thread started because I was getting a
> crash in qemu-kvm, which I thought was only in v1.1.0. Later it turned
> out the the problem was also present in v1.0.1, but much harder to hit.
> However, it only ever happened with 3.5.0 kernels. 3.4.4, with either
> version of qemu-kvm, was stable. So then it seemed that the problem was
> in the kernel, (but not necessarily in the kvm code).
>
> Something that's changed since rc7 has either fixed the problem or made
> it much harder to hit. With rc7 and earlier I can recreate the crash
> quite easily with qemu-kvm-1.1.0 and with enough runs of 1.0.1. With
> rc7+, I haven't been able to get a crash at all.
>
Well, I'm getting the crash again, but this time I've managed to get a 
backtrace:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb60ffb40 (LWP 9405)]
0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
(gdb) bt
#0  0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
#1  0xb7e65333 in g_str_equal () from /usr/lib/libglib-2.0.so.0
#2  0xb7e6458d in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0
#3  0x8014e2cf in type_table_lookup (name=0x802b0c50 "apic-common") at 
qom/object.c:94
#4  type_get_by_name (name=name@entry=0x802b0c50 "apic-common") at 
qom/object.c:149
#5  0x8014e933 in object_dynamic_cast (obj=obj@entry=0x80a5d818, 
typename=typename@entry=0x802b0c50 "apic-common")
     at qom/object.c:416
#6  0x8014e8b9 in object_dynamic_cast_assert (obj=obj@entry=0x80a5d818,
     typename=typename@entry=0x802b0c50 "apic-common") at qom/object.c:478
#7  0x80193462 in cpu_set_apic_tpr (d=0x80a5d818, val=8 '\b')
     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
#8  0x801d0560 in kvm_arch_post_run (env=env@entry=0x80a55a60, 
run=run@entry=0xb6239000)
     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
#9  0x801cb05f in kvm_cpu_exec (env=env@entry=0x80a55a60) at 
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
#10 0x80199d1e in qemu_kvm_cpu_thread_fn (arg=0x80a55a60) at 
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
#11 0xb7a1fd9e in start_thread () from /lib/libpthread.so.0
#12 0xb77bbbbe in clone () from /lib/libc.so.6

This is with kernel 3.5.0 and qemu-kvm-1.1.1. glibc is 2.16.0 built 
against 3.4.4 kernel headers. The glibc, the kernel headers and the 
kernel are vanilla and the only change to the qemu-kvm sources is:

--- qemu-kvm-1.1.0/configure~   2012-07-15 22:38:39.000000000 +0100
+++ qemu-kvm-1.1.0/configure    2012-07-15 22:39:09.000000000 +0100
@@ -2783,7 +2783,7 @@ int main(int argc, char **argv)
  }
  EOF
    if ! compile_prog "" "" ; then
-    CFLAGS+="-march=i486"
+    CFLAGS+="-march=i686"
    fi
  fi

Please let me know of anything I can do to help track this down.

Thanks

Chris

> I'm not inclined to bisect to find out which patch provided the fix, but
> this mail should at least close the mail thread down tidily.
>
> Chris


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-26  9:52                 ` Chris Clayton
@ 2012-07-26 10:01                   ` Avi Kivity
  2012-07-26 10:29                     ` Jan Kiszka
  2012-07-26 11:58                     ` Chris Clayton
  2012-07-26 11:10                   ` Xiao Guangrong
  1 sibling, 2 replies; 42+ messages in thread
From: Avi Kivity @ 2012-07-26 10:01 UTC (permalink / raw)
  To: Chris Clayton; +Cc: Gleb Natapov, kvm, Jan Kiszka

On 07/26/2012 12:52 PM, Chris Clayton wrote:
> On 07/19/12 19:23, Chris Clayton wrote:
>> On 07/19/12 13:17, Avi Kivity wrote:
>>> On 07/19/2012 03:14 PM, Chris Clayton wrote:
>>>
>>>>> Change of diagnostics, unfortunately. qemu-kvm-1.0.1 can, in fact,
>>>>> crash
>>>>> on 3.5.0-rc6 (and rc7). I didn't get it earlier because it takes many
>>>>> times more invocations before the crash occurs with 1.0.1 and I
>>>>> haven't
>>>>> used qemu-kvm much in the past few weeks.
>>>>>
>>>>> I'm now checking whether I can get crashes (with 1.0.1 and/or
>>>>> 1.1.0) on
>>>>> linux-3.4.4. I'll report back in a day or two.
>>>>
>>>> I've started up qemu-kvm on kernel 3.4.4 many times and not see a
>>>> crash.
>>>> That would indicate that the problem is in the kernel. However, I
>>>> pulled
>>>> the latest and greatest from Linus yesterday evening and I now can't
>>>> get
>>>> the crash there either, so whatever it was seems to have been fixed. If
>>>> I checkout and build 3.5.0-rc[1..7], I can get the crash pretty
>>>> quickly,
>>>> so it's been fixed in the last few days.
>>>
>>> There were no kvm changes post-rc7.
>>>
>> Yes, I'm aware of that, Avi. This thread started because I was getting a
>> crash in qemu-kvm, which I thought was only in v1.1.0. Later it turned
>> out the the problem was also present in v1.0.1, but much harder to hit.
>> However, it only ever happened with 3.5.0 kernels. 3.4.4, with either
>> version of qemu-kvm, was stable. So then it seemed that the problem was
>> in the kernel, (but not necessarily in the kvm code).
>>
>> Something that's changed since rc7 has either fixed the problem or made
>> it much harder to hit. With rc7 and earlier I can recreate the crash
>> quite easily with qemu-kvm-1.1.0 and with enough runs of 1.0.1. With
>> rc7+, I haven't been able to get a crash at all.
>>
> Well, I'm getting the crash again, but this time I've managed to get a
> backtrace:
> 
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0xb60ffb40 (LWP 9405)]
> 0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
> (gdb) bt
> #0  0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
> #1  0xb7e65333 in g_str_equal () from /usr/lib/libglib-2.0.so.0
> #2  0xb7e6458d in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0
> #3  0x8014e2cf in type_table_lookup (name=0x802b0c50 "apic-common") at
> qom/object.c:94
> #4  type_get_by_name (name=name@entry=0x802b0c50 "apic-common") at
> qom/object.c:149
> #5  0x8014e933 in object_dynamic_cast (obj=obj@entry=0x80a5d818,
> typename=typename@entry=0x802b0c50 "apic-common")
>     at qom/object.c:416
> #6  0x8014e8b9 in object_dynamic_cast_assert (obj=obj@entry=0x80a5d818,
>     typename=typename@entry=0x802b0c50 "apic-common") at qom/object.c:478
> #7  0x80193462 in cpu_set_apic_tpr (d=0x80a5d818, val=8 '\b')
>     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
> #8  0x801d0560 in kvm_arch_post_run (env=env@entry=0x80a55a60,
> run=run@entry=0xb6239000)
>     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
> #9  0x801cb05f in kvm_cpu_exec (env=env@entry=0x80a55a60) at
> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
> #10 0x80199d1e in qemu_kvm_cpu_thread_fn (arg=0x80a55a60) at
> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
> #11 0xb7a1fd9e in start_thread () from /lib/libpthread.so.0
> #12 0xb77bbbbe in clone () from /lib/libc.so.6
> 
> This is with kernel 3.5.0 and qemu-kvm-1.1.1. glibc is 2.16.0 built

It looks like general memory corruption.  Is this repeatable?  What's
the guest uptime when it happens (i.e. is it immediate?)

Jan, why are we calling cpu_set_apic_tpr() with kvm_irqchip_in_kernel?


-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-26 10:01                   ` Avi Kivity
@ 2012-07-26 10:29                     ` Jan Kiszka
  2012-07-26 10:45                       ` Avi Kivity
  2012-07-26 11:58                     ` Chris Clayton
  1 sibling, 1 reply; 42+ messages in thread
From: Jan Kiszka @ 2012-07-26 10:29 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Chris Clayton, Gleb Natapov, kvm

On 2012-07-26 12:01, Avi Kivity wrote:
> On 07/26/2012 12:52 PM, Chris Clayton wrote:
>> On 07/19/12 19:23, Chris Clayton wrote:
>>> On 07/19/12 13:17, Avi Kivity wrote:
>>>> On 07/19/2012 03:14 PM, Chris Clayton wrote:
>>>>
>>>>>> Change of diagnostics, unfortunately. qemu-kvm-1.0.1 can, in fact,
>>>>>> crash
>>>>>> on 3.5.0-rc6 (and rc7). I didn't get it earlier because it takes many
>>>>>> times more invocations before the crash occurs with 1.0.1 and I
>>>>>> haven't
>>>>>> used qemu-kvm much in the past few weeks.
>>>>>>
>>>>>> I'm now checking whether I can get crashes (with 1.0.1 and/or
>>>>>> 1.1.0) on
>>>>>> linux-3.4.4. I'll report back in a day or two.
>>>>>
>>>>> I've started up qemu-kvm on kernel 3.4.4 many times and not see a
>>>>> crash.
>>>>> That would indicate that the problem is in the kernel. However, I
>>>>> pulled
>>>>> the latest and greatest from Linus yesterday evening and I now can't
>>>>> get
>>>>> the crash there either, so whatever it was seems to have been fixed. If
>>>>> I checkout and build 3.5.0-rc[1..7], I can get the crash pretty
>>>>> quickly,
>>>>> so it's been fixed in the last few days.
>>>>
>>>> There were no kvm changes post-rc7.
>>>>
>>> Yes, I'm aware of that, Avi. This thread started because I was getting a
>>> crash in qemu-kvm, which I thought was only in v1.1.0. Later it turned
>>> out the the problem was also present in v1.0.1, but much harder to hit.
>>> However, it only ever happened with 3.5.0 kernels. 3.4.4, with either
>>> version of qemu-kvm, was stable. So then it seemed that the problem was
>>> in the kernel, (but not necessarily in the kvm code).
>>>
>>> Something that's changed since rc7 has either fixed the problem or made
>>> it much harder to hit. With rc7 and earlier I can recreate the crash
>>> quite easily with qemu-kvm-1.1.0 and with enough runs of 1.0.1. With
>>> rc7+, I haven't been able to get a crash at all.
>>>
>> Well, I'm getting the crash again, but this time I've managed to get a
>> backtrace:
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0xb60ffb40 (LWP 9405)]
>> 0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
>> (gdb) bt
>> #0  0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
>> #1  0xb7e65333 in g_str_equal () from /usr/lib/libglib-2.0.so.0
>> #2  0xb7e6458d in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0
>> #3  0x8014e2cf in type_table_lookup (name=0x802b0c50 "apic-common") at
>> qom/object.c:94
>> #4  type_get_by_name (name=name@entry=0x802b0c50 "apic-common") at
>> qom/object.c:149
>> #5  0x8014e933 in object_dynamic_cast (obj=obj@entry=0x80a5d818,
>> typename=typename@entry=0x802b0c50 "apic-common")
>>     at qom/object.c:416
>> #6  0x8014e8b9 in object_dynamic_cast_assert (obj=obj@entry=0x80a5d818,
>>     typename=typename@entry=0x802b0c50 "apic-common") at qom/object.c:478
>> #7  0x80193462 in cpu_set_apic_tpr (d=0x80a5d818, val=8 '\b')
>>     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
>> #8  0x801d0560 in kvm_arch_post_run (env=env@entry=0x80a55a60,
>> run=run@entry=0xb6239000)
>>     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
>> #9  0x801cb05f in kvm_cpu_exec (env=env@entry=0x80a55a60) at
>> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
>> #10 0x80199d1e in qemu_kvm_cpu_thread_fn (arg=0x80a55a60) at
>> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
>> #11 0xb7a1fd9e in start_thread () from /lib/libpthread.so.0
>> #12 0xb77bbbbe in clone () from /lib/libc.so.6
>>
>> This is with kernel 3.5.0 and qemu-kvm-1.1.1. glibc is 2.16.0 built
> 
> It looks like general memory corruption.  Is this repeatable?  What's
> the guest uptime when it happens (i.e. is it immediate?)
> 
> Jan, why are we calling cpu_set_apic_tpr() with kvm_irqchip_in_kernel?

To sync the userspace state with what the kernel maintains. Will end up
in kvm_apic_set_tpr which does precisely this. We always did, just the
QOM modeling is new.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-26 10:29                     ` Jan Kiszka
@ 2012-07-26 10:45                       ` Avi Kivity
  2012-07-26 10:49                         ` Jan Kiszka
  0 siblings, 1 reply; 42+ messages in thread
From: Avi Kivity @ 2012-07-26 10:45 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Chris Clayton, Gleb Natapov, kvm

On 07/26/2012 01:29 PM, Jan Kiszka wrote:

>> It looks like general memory corruption.  Is this repeatable?  What's
>> the guest uptime when it happens (i.e. is it immediate?)
>> 
>> Jan, why are we calling cpu_set_apic_tpr() with kvm_irqchip_in_kernel?
> 
> To sync the userspace state with what the kernel maintains. Will end up
> in kvm_apic_set_tpr which does precisely this. We always did, just the
> QOM modeling is new.

We should move it to the general register synchronization code, there is
no reason to do this every exit (though the cost is likely minimal).

-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-26 10:45                       ` Avi Kivity
@ 2012-07-26 10:49                         ` Jan Kiszka
  2012-07-26 11:04                           ` Jan Kiszka
  0 siblings, 1 reply; 42+ messages in thread
From: Jan Kiszka @ 2012-07-26 10:49 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Chris Clayton, Gleb Natapov, kvm

On 2012-07-26 12:45, Avi Kivity wrote:
> On 07/26/2012 01:29 PM, Jan Kiszka wrote:
> 
>>> It looks like general memory corruption.  Is this repeatable?  What's
>>> the guest uptime when it happens (i.e. is it immediate?)
>>>
>>> Jan, why are we calling cpu_set_apic_tpr() with kvm_irqchip_in_kernel?
>>
>> To sync the userspace state with what the kernel maintains. Will end up
>> in kvm_apic_set_tpr which does precisely this. We always did, just the
>> QOM modeling is new.
> 
> We should move it to the general register synchronization code, there is
> no reason to do this every exit (though the cost is likely minimal).

The cost is, well, was close to nothing. But I'm not sure about that QOM
type casting magic (and also it's locking requirements, long-term).
However, if that is a problem, it's likely a much bigger one anyway.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-26 10:49                         ` Jan Kiszka
@ 2012-07-26 11:04                           ` Jan Kiszka
  0 siblings, 0 replies; 42+ messages in thread
From: Jan Kiszka @ 2012-07-26 11:04 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Chris Clayton, Gleb Natapov, kvm

On 2012-07-26 12:49, Jan Kiszka wrote:
> On 2012-07-26 12:45, Avi Kivity wrote:
>> On 07/26/2012 01:29 PM, Jan Kiszka wrote:
>>
>>>> It looks like general memory corruption.  Is this repeatable?  What's
>>>> the guest uptime when it happens (i.e. is it immediate?)
>>>>
>>>> Jan, why are we calling cpu_set_apic_tpr() with kvm_irqchip_in_kernel?
>>>
>>> To sync the userspace state with what the kernel maintains. Will end up
>>> in kvm_apic_set_tpr which does precisely this. We always did, just the
>>> QOM modeling is new.
>>
>> We should move it to the general register synchronization code, there is
>> no reason to do this every exit (though the cost is likely minimal).
> 
> The cost is, well, was close to nothing. But I'm not sure about that QOM
> type casting magic (and also it's locking requirements, long-term).
> However, if that is a problem, it's likely a much bigger one anyway.

But, independent of this, we can likely move the whole kvm_arch_post_run
out of the exit path for kvm_irqchip_in_kernel() == true. The price is
that we create more deviation between both, but that should be
controllable. I will play with a patch.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-26  9:52                 ` Chris Clayton
  2012-07-26 10:01                   ` Avi Kivity
@ 2012-07-26 11:10                   ` Xiao Guangrong
  2012-07-26 13:49                     ` Chris Clayton
  1 sibling, 1 reply; 42+ messages in thread
From: Xiao Guangrong @ 2012-07-26 11:10 UTC (permalink / raw)
  To: Chris Clayton; +Cc: Avi Kivity, Gleb Natapov, kvm

Hi Chris,

Could you please try this patch?
http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commit;h=ccebf448daf7964ee2aff7947c0bbe4c7962d059

On 07/26/2012 05:52 PM, Chris Clayton wrote:
> On 07/19/12 19:23, Chris Clayton wrote:
>> On 07/19/12 13:17, Avi Kivity wrote:
>>> On 07/19/2012 03:14 PM, Chris Clayton wrote:
>>>
>>>>> Change of diagnostics, unfortunately. qemu-kvm-1.0.1 can, in fact,
>>>>> crash
>>>>> on 3.5.0-rc6 (and rc7). I didn't get it earlier because it takes many
>>>>> times more invocations before the crash occurs with 1.0.1 and I haven't
>>>>> used qemu-kvm much in the past few weeks.
>>>>>
>>>>> I'm now checking whether I can get crashes (with 1.0.1 and/or 1.1.0) on
>>>>> linux-3.4.4. I'll report back in a day or two.
>>>>
>>>> I've started up qemu-kvm on kernel 3.4.4 many times and not see a crash.
>>>> That would indicate that the problem is in the kernel. However, I pulled
>>>> the latest and greatest from Linus yesterday evening and I now can't get
>>>> the crash there either, so whatever it was seems to have been fixed. If
>>>> I checkout and build 3.5.0-rc[1..7], I can get the crash pretty quickly,
>>>> so it's been fixed in the last few days.
>>>
>>> There were no kvm changes post-rc7.
>>>
>> Yes, I'm aware of that, Avi. This thread started because I was getting a
>> crash in qemu-kvm, which I thought was only in v1.1.0. Later it turned
>> out the the problem was also present in v1.0.1, but much harder to hit.
>> However, it only ever happened with 3.5.0 kernels. 3.4.4, with either
>> version of qemu-kvm, was stable. So then it seemed that the problem was
>> in the kernel, (but not necessarily in the kvm code).
>>
>> Something that's changed since rc7 has either fixed the problem or made
>> it much harder to hit. With rc7 and earlier I can recreate the crash
>> quite easily with qemu-kvm-1.1.0 and with enough runs of 1.0.1. With
>> rc7+, I haven't been able to get a crash at all.
>>
> Well, I'm getting the crash again, but this time I've managed to get a backtrace:
> 
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0xb60ffb40 (LWP 9405)]
> 0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
> (gdb) bt
> #0  0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
> #1  0xb7e65333 in g_str_equal () from /usr/lib/libglib-2.0.so.0
> #2  0xb7e6458d in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0
> #3  0x8014e2cf in type_table_lookup (name=0x802b0c50 "apic-common") at qom/object.c:94
> #4  type_get_by_name (name=name@entry=0x802b0c50 "apic-common") at qom/object.c:149
> #5  0x8014e933 in object_dynamic_cast (obj=obj@entry=0x80a5d818, typename=typename@entry=0x802b0c50 "apic-common")
>     at qom/object.c:416
> #6  0x8014e8b9 in object_dynamic_cast_assert (obj=obj@entry=0x80a5d818,
>     typename=typename@entry=0x802b0c50 "apic-common") at qom/object.c:478
> #7  0x80193462 in cpu_set_apic_tpr (d=0x80a5d818, val=8 '\b')
>     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
> #8  0x801d0560 in kvm_arch_post_run (env=env@entry=0x80a55a60, run=run@entry=0xb6239000)
>     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
> #9  0x801cb05f in kvm_cpu_exec (env=env@entry=0x80a55a60) at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
> #10 0x80199d1e in qemu_kvm_cpu_thread_fn (arg=0x80a55a60) at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
> #11 0xb7a1fd9e in start_thread () from /lib/libpthread.so.0
> #12 0xb77bbbbe in clone () from /lib/libc.so.6
> 
> This is with kernel 3.5.0 and qemu-kvm-1.1.1. glibc is 2.16.0 built against 3.4.4 kernel headers. The glibc, the kernel headers and the kernel are vanilla and the only change to the qemu-kvm sources is:
> 
> --- qemu-kvm-1.1.0/configure~   2012-07-15 22:38:39.000000000 +0100
> +++ qemu-kvm-1.1.0/configure    2012-07-15 22:39:09.000000000 +0100
> @@ -2783,7 +2783,7 @@ int main(int argc, char **argv)
>  }
>  EOF
>    if ! compile_prog "" "" ; then
> -    CFLAGS+="-march=i486"
> +    CFLAGS+="-march=i686"
>    fi
>  fi
> 
> Please let me know of anything I can do to help track this down.
> 
> Thanks
> 
> Chris
> 
>> I'm not inclined to bisect to find out which patch provided the fix, but
>> this mail should at least close the mail thread down tidily.
>>
>> Chris
> 
> -- 
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-26 10:01                   ` Avi Kivity
  2012-07-26 10:29                     ` Jan Kiszka
@ 2012-07-26 11:58                     ` Chris Clayton
  2012-07-26 12:07                       ` Avi Kivity
  2012-07-26 12:09                       ` Jan Kiszka
  1 sibling, 2 replies; 42+ messages in thread
From: Chris Clayton @ 2012-07-26 11:58 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Gleb Natapov, kvm, Jan Kiszka

On 07/26/12 11:01, Avi Kivity wrote:
> On 07/26/2012 12:52 PM, Chris Clayton wrote:
>> On 07/19/12 19:23, Chris Clayton wrote:
>>> On 07/19/12 13:17, Avi Kivity wrote:
>>>> On 07/19/2012 03:14 PM, Chris Clayton wrote:
>>>>
>>>>>> Change of diagnostics, unfortunately. qemu-kvm-1.0.1 can, in fact,
>>>>>> crash
>>>>>> on 3.5.0-rc6 (and rc7). I didn't get it earlier because it takes many
>>>>>> times more invocations before the crash occurs with 1.0.1 and I
>>>>>> haven't
>>>>>> used qemu-kvm much in the past few weeks.
>>>>>>
>>>>>> I'm now checking whether I can get crashes (with 1.0.1 and/or
>>>>>> 1.1.0) on
>>>>>> linux-3.4.4. I'll report back in a day or two.
>>>>>
>>>>> I've started up qemu-kvm on kernel 3.4.4 many times and not see a
>>>>> crash.
>>>>> That would indicate that the problem is in the kernel. However, I
>>>>> pulled
>>>>> the latest and greatest from Linus yesterday evening and I now can't
>>>>> get
>>>>> the crash there either, so whatever it was seems to have been fixed. If
>>>>> I checkout and build 3.5.0-rc[1..7], I can get the crash pretty
>>>>> quickly,
>>>>> so it's been fixed in the last few days.
>>>>
>>>> There were no kvm changes post-rc7.
>>>>
>>> Yes, I'm aware of that, Avi. This thread started because I was getting a
>>> crash in qemu-kvm, which I thought was only in v1.1.0. Later it turned
>>> out the the problem was also present in v1.0.1, but much harder to hit.
>>> However, it only ever happened with 3.5.0 kernels. 3.4.4, with either
>>> version of qemu-kvm, was stable. So then it seemed that the problem was
>>> in the kernel, (but not necessarily in the kvm code).
>>>
>>> Something that's changed since rc7 has either fixed the problem or made
>>> it much harder to hit. With rc7 and earlier I can recreate the crash
>>> quite easily with qemu-kvm-1.1.0 and with enough runs of 1.0.1. With
>>> rc7+, I haven't been able to get a crash at all.
>>>
>> Well, I'm getting the crash again, but this time I've managed to get a
>> backtrace:
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0xb60ffb40 (LWP 9405)]
>> 0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
>> (gdb) bt
>> #0  0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
>> #1  0xb7e65333 in g_str_equal () from /usr/lib/libglib-2.0.so.0
>> #2  0xb7e6458d in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0
>> #3  0x8014e2cf in type_table_lookup (name=0x802b0c50 "apic-common") at
>> qom/object.c:94
>> #4  type_get_by_name (name=name@entry=0x802b0c50 "apic-common") at
>> qom/object.c:149
>> #5  0x8014e933 in object_dynamic_cast (obj=obj@entry=0x80a5d818,
>> typename=typename@entry=0x802b0c50 "apic-common")
>>      at qom/object.c:416
>> #6  0x8014e8b9 in object_dynamic_cast_assert (obj=obj@entry=0x80a5d818,
>>      typename=typename@entry=0x802b0c50 "apic-common") at qom/object.c:478
>> #7  0x80193462 in cpu_set_apic_tpr (d=0x80a5d818, val=8 '\b')
>>      at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
>> #8  0x801d0560 in kvm_arch_post_run (env=env@entry=0x80a55a60,
>> run=run@entry=0xb6239000)
>>      at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
>> #9  0x801cb05f in kvm_cpu_exec (env=env@entry=0x80a55a60) at
>> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
>> #10 0x80199d1e in qemu_kvm_cpu_thread_fn (arg=0x80a55a60) at
>> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
>> #11 0xb7a1fd9e in start_thread () from /lib/libpthread.so.0
>> #12 0xb77bbbbe in clone () from /lib/libc.so.6
>>
>> This is with kernel 3.5.0 and qemu-kvm-1.1.1. glibc is 2.16.0 built
>
> It looks like general memory corruption.  Is this repeatable?  What's
> the guest uptime when it happens (i.e. is it immediate?)

I've just done 10 runs of WinXP SP3 and 5 of them crashed. Three crashed 
early as XP was starting up - well before the desktop would have 
appeared. The other two crashed as XP was closing down, having been 
running for a few minutes (but not doing much).

The error messages seen through dmesg are:

qemu-kvm[12778] general protection ip:b6c43d77 sp:b5e800fc error:0 in 
libc-2.16.so[b6b06000+1b4000]
qemu-kvm[12813] general protection ip:b6bf6d77 sp:b54ff0fc error:0 in 
libc-2.16.so[b6ab9000+1b4000]
qemu-kvm[12986] general protection ip:b6cd3d77 sp:b55ff0fc error:0 in 
libc-2.16.so[b6b96000+1b4000]
qemu-kvm[13045] general protection ip:b6c91d77 sp:b54ff0fc error:0 in 
libc-2.16.so[b6b54000+1b4000]
qemu-kvm[13225] general protection ip:b6c5bd77 sp:b54ff0fc error:0 in 
libc-2.16.so[b6b1e000+1b4000]

The other 5 were OK, although I only did a bit of web browsing for  few 
minutes with IE.

>
> Jan, why are we calling cpu_set_apic_tpr() with kvm_irqchip_in_kernel?
>
>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-26 11:58                     ` Chris Clayton
@ 2012-07-26 12:07                       ` Avi Kivity
  2012-07-26 23:22                         ` Chris Clayton
  2012-07-26 12:09                       ` Jan Kiszka
  1 sibling, 1 reply; 42+ messages in thread
From: Avi Kivity @ 2012-07-26 12:07 UTC (permalink / raw)
  To: Chris Clayton; +Cc: Gleb Natapov, kvm, Jan Kiszka

On 07/26/2012 02:58 PM, Chris Clayton wrote:

>> It looks like general memory corruption.  Is this repeatable?  What's
>> the guest uptime when it happens (i.e. is it immediate?)
> 
> I've just done 10 runs of WinXP SP3 and 5 of them crashed. Three crashed
> early as XP was starting up - well before the desktop would have
> appeared. The other two crashed as XP was closing down, having been
> running for a few minutes (but not doing much).
> 
> The error messages seen through dmesg are:
> 
> qemu-kvm[12778] general protection ip:b6c43d77 sp:b5e800fc error:0 in
> libc-2.16.so[b6b06000+1b4000]
> qemu-kvm[12813] general protection ip:b6bf6d77 sp:b54ff0fc error:0 in
> libc-2.16.so[b6ab9000+1b4000]
> qemu-kvm[12986] general protection ip:b6cd3d77 sp:b55ff0fc error:0 in
> libc-2.16.so[b6b96000+1b4000]
> qemu-kvm[13045] general protection ip:b6c91d77 sp:b54ff0fc error:0 in
> libc-2.16.so[b6b54000+1b4000]
> qemu-kvm[13225] general protection ip:b6c5bd77 sp:b54ff0fc error:0 in
> libc-2.16.so[b6b1e000+1b4000]
> 
> The other 5 were OK, although I only did a bit of web browsing for  few
> minutes with IE.

Failures always in the same place (I'm guess the variations are due to
PIE -- please configure with --disable-pie for future tests).

Please generate a core and look around, esp. in frame 3
(type_table_lookup).  Also try to dissect type_table (you may need to
install the glib debug symbols for this).



-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-26 11:58                     ` Chris Clayton
  2012-07-26 12:07                       ` Avi Kivity
@ 2012-07-26 12:09                       ` Jan Kiszka
  1 sibling, 0 replies; 42+ messages in thread
From: Jan Kiszka @ 2012-07-26 12:09 UTC (permalink / raw)
  To: Chris Clayton; +Cc: Avi Kivity, Gleb Natapov, kvm

On 2012-07-26 13:58, Chris Clayton wrote:
> On 07/26/12 11:01, Avi Kivity wrote:
>> On 07/26/2012 12:52 PM, Chris Clayton wrote:
>>> On 07/19/12 19:23, Chris Clayton wrote:
>>>> On 07/19/12 13:17, Avi Kivity wrote:
>>>>> On 07/19/2012 03:14 PM, Chris Clayton wrote:
>>>>>
>>>>>>> Change of diagnostics, unfortunately. qemu-kvm-1.0.1 can, in fact,
>>>>>>> crash
>>>>>>> on 3.5.0-rc6 (and rc7). I didn't get it earlier because it takes many
>>>>>>> times more invocations before the crash occurs with 1.0.1 and I
>>>>>>> haven't
>>>>>>> used qemu-kvm much in the past few weeks.
>>>>>>>
>>>>>>> I'm now checking whether I can get crashes (with 1.0.1 and/or
>>>>>>> 1.1.0) on
>>>>>>> linux-3.4.4. I'll report back in a day or two.
>>>>>>
>>>>>> I've started up qemu-kvm on kernel 3.4.4 many times and not see a
>>>>>> crash.
>>>>>> That would indicate that the problem is in the kernel. However, I
>>>>>> pulled
>>>>>> the latest and greatest from Linus yesterday evening and I now can't
>>>>>> get
>>>>>> the crash there either, so whatever it was seems to have been fixed. If
>>>>>> I checkout and build 3.5.0-rc[1..7], I can get the crash pretty
>>>>>> quickly,
>>>>>> so it's been fixed in the last few days.
>>>>>
>>>>> There were no kvm changes post-rc7.
>>>>>
>>>> Yes, I'm aware of that, Avi. This thread started because I was getting a
>>>> crash in qemu-kvm, which I thought was only in v1.1.0. Later it turned
>>>> out the the problem was also present in v1.0.1, but much harder to hit.
>>>> However, it only ever happened with 3.5.0 kernels. 3.4.4, with either
>>>> version of qemu-kvm, was stable. So then it seemed that the problem was
>>>> in the kernel, (but not necessarily in the kvm code).
>>>>
>>>> Something that's changed since rc7 has either fixed the problem or made
>>>> it much harder to hit. With rc7 and earlier I can recreate the crash
>>>> quite easily with qemu-kvm-1.1.0 and with enough runs of 1.0.1. With
>>>> rc7+, I haven't been able to get a crash at all.
>>>>
>>> Well, I'm getting the crash again, but this time I've managed to get a
>>> backtrace:
>>>
>>> Program received signal SIGSEGV, Segmentation fault.
>>> [Switching to Thread 0xb60ffb40 (LWP 9405)]
>>> 0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
>>> (gdb) bt
>>> #0  0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
>>> #1  0xb7e65333 in g_str_equal () from /usr/lib/libglib-2.0.so.0
>>> #2  0xb7e6458d in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0
>>> #3  0x8014e2cf in type_table_lookup (name=0x802b0c50 "apic-common") at
>>> qom/object.c:94
>>> #4  type_get_by_name (name=name@entry=0x802b0c50 "apic-common") at
>>> qom/object.c:149
>>> #5  0x8014e933 in object_dynamic_cast (obj=obj@entry=0x80a5d818,
>>> typename=typename@entry=0x802b0c50 "apic-common")
>>>      at qom/object.c:416
>>> #6  0x8014e8b9 in object_dynamic_cast_assert (obj=obj@entry=0x80a5d818,
>>>      typename=typename@entry=0x802b0c50 "apic-common") at qom/object.c:478
>>> #7  0x80193462 in cpu_set_apic_tpr (d=0x80a5d818, val=8 '\b')
>>>      at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
>>> #8  0x801d0560 in kvm_arch_post_run (env=env@entry=0x80a55a60,
>>> run=run@entry=0xb6239000)
>>>      at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
>>> #9  0x801cb05f in kvm_cpu_exec (env=env@entry=0x80a55a60) at
>>> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
>>> #10 0x80199d1e in qemu_kvm_cpu_thread_fn (arg=0x80a55a60) at
>>> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
>>> #11 0xb7a1fd9e in start_thread () from /lib/libpthread.so.0
>>> #12 0xb77bbbbe in clone () from /lib/libc.so.6
>>>
>>> This is with kernel 3.5.0 and qemu-kvm-1.1.1. glibc is 2.16.0 built
>>
>> It looks like general memory corruption.  Is this repeatable?  What's
>> the guest uptime when it happens (i.e. is it immediate?)
> 
> I've just done 10 runs of WinXP SP3 and 5 of them crashed. Three crashed 

Hmm, I'm running various XP SP3 here against qemu.git (now widely
equivalent to qemu-kvm), and I saw no crashes at all.

> early as XP was starting up - well before the desktop would have 
> appeared. The other two crashed as XP was closing down, having been 
> running for a few minutes (but not doing much).
> 
> The error messages seen through dmesg are:
> 
> qemu-kvm[12778] general protection ip:b6c43d77 sp:b5e800fc error:0 in 
> libc-2.16.so[b6b06000+1b4000]
> qemu-kvm[12813] general protection ip:b6bf6d77 sp:b54ff0fc error:0 in 
> libc-2.16.so[b6ab9000+1b4000]
> qemu-kvm[12986] general protection ip:b6cd3d77 sp:b55ff0fc error:0 in 
> libc-2.16.so[b6b96000+1b4000]
> qemu-kvm[13045] general protection ip:b6c91d77 sp:b54ff0fc error:0 in 
> libc-2.16.so[b6b54000+1b4000]
> qemu-kvm[13225] general protection ip:b6c5bd77 sp:b54ff0fc error:0 in 
> libc-2.16.so[b6b1e000+1b4000]

Oh, you are running 32-bit userland? Also 32-bit kernel? Most of us do
64-on-64.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-26 11:10                   ` Xiao Guangrong
@ 2012-07-26 13:49                     ` Chris Clayton
  0 siblings, 0 replies; 42+ messages in thread
From: Chris Clayton @ 2012-07-26 13:49 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Gleb Natapov, kvm

On 07/26/12 12:10, Xiao Guangrong wrote:
> Hi Chris,
>
> Could you please try this patch?
> http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commit;h=ccebf448daf7964ee2aff7947c0bbe4c7962d059
>

Sorry, that patch does not fix the crashes.

> On 07/26/2012 05:52 PM, Chris Clayton wrote:
>> On 07/19/12 19:23, Chris Clayton wrote:
>>> On 07/19/12 13:17, Avi Kivity wrote:
>>>> On 07/19/2012 03:14 PM, Chris Clayton wrote:
>>>>
>>>>>> Change of diagnostics, unfortunately. qemu-kvm-1.0.1 can, in fact,
>>>>>> crash
>>>>>> on 3.5.0-rc6 (and rc7). I didn't get it earlier because it takes many
>>>>>> times more invocations before the crash occurs with 1.0.1 and I haven't
>>>>>> used qemu-kvm much in the past few weeks.
>>>>>>
>>>>>> I'm now checking whether I can get crashes (with 1.0.1 and/or 1.1.0) on
>>>>>> linux-3.4.4. I'll report back in a day or two.
>>>>>
>>>>> I've started up qemu-kvm on kernel 3.4.4 many times and not see a crash.
>>>>> That would indicate that the problem is in the kernel. However, I pulled
>>>>> the latest and greatest from Linus yesterday evening and I now can't get
>>>>> the crash there either, so whatever it was seems to have been fixed. If
>>>>> I checkout and build 3.5.0-rc[1..7], I can get the crash pretty quickly,
>>>>> so it's been fixed in the last few days.
>>>>
>>>> There were no kvm changes post-rc7.
>>>>
>>> Yes, I'm aware of that, Avi. This thread started because I was getting a
>>> crash in qemu-kvm, which I thought was only in v1.1.0. Later it turned
>>> out the the problem was also present in v1.0.1, but much harder to hit.
>>> However, it only ever happened with 3.5.0 kernels. 3.4.4, with either
>>> version of qemu-kvm, was stable. So then it seemed that the problem was
>>> in the kernel, (but not necessarily in the kvm code).
>>>
>>> Something that's changed since rc7 has either fixed the problem or made
>>> it much harder to hit. With rc7 and earlier I can recreate the crash
>>> quite easily with qemu-kvm-1.1.0 and with enough runs of 1.0.1. With
>>> rc7+, I haven't been able to get a crash at all.
>>>
>> Well, I'm getting the crash again, but this time I've managed to get a backtrace:
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0xb60ffb40 (LWP 9405)]
>> 0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
>> (gdb) bt
>> #0  0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
>> #1  0xb7e65333 in g_str_equal () from /usr/lib/libglib-2.0.so.0
>> #2  0xb7e6458d in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0
>> #3  0x8014e2cf in type_table_lookup (name=0x802b0c50 "apic-common") at qom/object.c:94
>> #4  type_get_by_name (name=name@entry=0x802b0c50 "apic-common") at qom/object.c:149
>> #5  0x8014e933 in object_dynamic_cast (obj=obj@entry=0x80a5d818, typename=typename@entry=0x802b0c50 "apic-common")
>>      at qom/object.c:416
>> #6  0x8014e8b9 in object_dynamic_cast_assert (obj=obj@entry=0x80a5d818,
>>      typename=typename@entry=0x802b0c50 "apic-common") at qom/object.c:478
>> #7  0x80193462 in cpu_set_apic_tpr (d=0x80a5d818, val=8 '\b')
>>      at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
>> #8  0x801d0560 in kvm_arch_post_run (env=env@entry=0x80a55a60, run=run@entry=0xb6239000)
>>      at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
>> #9  0x801cb05f in kvm_cpu_exec (env=env@entry=0x80a55a60) at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
>> #10 0x80199d1e in qemu_kvm_cpu_thread_fn (arg=0x80a55a60) at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
>> #11 0xb7a1fd9e in start_thread () from /lib/libpthread.so.0
>> #12 0xb77bbbbe in clone () from /lib/libc.so.6
>>
>> This is with kernel 3.5.0 and qemu-kvm-1.1.1. glibc is 2.16.0 built against 3.4.4 kernel headers. The glibc, the kernel headers and the kernel are vanilla and the only change to the qemu-kvm sources is:
>>
>> --- qemu-kvm-1.1.0/configure~   2012-07-15 22:38:39.000000000 +0100
>> +++ qemu-kvm-1.1.0/configure    2012-07-15 22:39:09.000000000 +0100
>> @@ -2783,7 +2783,7 @@ int main(int argc, char **argv)
>>   }
>>   EOF
>>     if ! compile_prog "" "" ; then
>> -    CFLAGS+="-march=i486"
>> +    CFLAGS+="-march=i686"
>>     fi
>>   fi
>>
>> Please let me know of anything I can do to help track this down.
>>
>> Thanks
>>
>> Chris
>>
>>> I'm not inclined to bisect to find out which patch provided the fix, but
>>> this mail should at least close the mail thread down tidily.
>>>
>>> Chris
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>
>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-26 12:07                       ` Avi Kivity
@ 2012-07-26 23:22                         ` Chris Clayton
  2012-07-27 10:46                           ` Chris Clayton
  0 siblings, 1 reply; 42+ messages in thread
From: Chris Clayton @ 2012-07-26 23:22 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Gleb Natapov, kvm, Jan Kiszka

On 07/26/12 13:07, Avi Kivity wrote:
> On 07/26/2012 02:58 PM, Chris Clayton wrote:
>
>>> It looks like general memory corruption.  Is this repeatable?  What's
>>> the guest uptime when it happens (i.e. is it immediate?)
>>
>> I've just done 10 runs of WinXP SP3 and 5 of them crashed. Three crashed
>> early as XP was starting up - well before the desktop would have
>> appeared. The other two crashed as XP was closing down, having been
>> running for a few minutes (but not doing much).
>>
>> The error messages seen through dmesg are:
>>
>> qemu-kvm[12778] general protection ip:b6c43d77 sp:b5e800fc error:0 in
>> libc-2.16.so[b6b06000+1b4000]
>> qemu-kvm[12813] general protection ip:b6bf6d77 sp:b54ff0fc error:0 in
>> libc-2.16.so[b6ab9000+1b4000]
>> qemu-kvm[12986] general protection ip:b6cd3d77 sp:b55ff0fc error:0 in
>> libc-2.16.so[b6b96000+1b4000]
>> qemu-kvm[13045] general protection ip:b6c91d77 sp:b54ff0fc error:0 in
>> libc-2.16.so[b6b54000+1b4000]
>> qemu-kvm[13225] general protection ip:b6c5bd77 sp:b54ff0fc error:0 in
>> libc-2.16.so[b6b1e000+1b4000]
>>
>> The other 5 were OK, although I only did a bit of web browsing for  few
>> minutes with IE.
>
> Failures always in the same place (I'm guess the variations are due to
> PIE -- please configure with --disable-pie for future tests).
>
> Please generate a core and look around, esp. in frame 3
> (type_table_lookup).  Also try to dissect type_table (you may need to
> install the glib debug symbols for this).
>
>
>
Mmm, I'm sailing out of my comfort zone here, but I've built a debug 
version of glib and trapped another crash. The backtrace is:

(gdb) bt
#0  0xb7822d77 in __strcmp_sse4_2 () from /lib/libc.so.6
#1  0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319b82) at ghash.c:1704
#2  0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800, 
key=0x8319b82, hash_return=0xb60ff178)
     at ghash.c:422
#3  0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800, 
key=key@entry=0x8319b82) at ghash.c:1074
#4  0x0815c9cb in type_table_lookup (name=0x8319b82 "apic-common") at 
qom/object.c:94
#5  type_get_by_name (name=name@entry=0x8319b82 "apic-common") at 
qom/object.c:149
#6  0x0815cf93 in object_dynamic_cast (obj=obj@entry=0x8a44818, 
typename=typename@entry=0x8319b82 "apic-common")
     at qom/object.c:416
#7  0x0815cf2d in object_dynamic_cast_assert (obj=obj@entry=0x8a44818,
     typename=typename@entry=0x8319b82 "apic-common") at qom/object.c:478
#8  0x08192c1b in cpu_set_apic_tpr (d=0x8a44818, val=8 '\b')
     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
#9  0x081cb86c in kvm_arch_post_run (env=env@entry=0x8a3ca60, 
run=run@entry=0xb6258000)
     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
#10 0x081c686f in kvm_cpu_exec (env=env@entry=0x8a3ca60) at 
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
#11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a3ca60) at 
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
#12 0xb7a3ed9e in start_thread () from /lib/libpthread.so.0
#13 0xb77dabbe in clone () from /lib/libc.so.6

Inspecting the args passed into g_str_equal shows:

(gdb) print (gchar *) 0x8a0cd58
$12 = (gchar *) 0x8a0cd58 "apic-common"
(gdb) print (gchar *) 0x8319b82
$13 = (gchar *) 0x8319b82 "apic-common"

So it seems odd that glibc's implementation of strcmp should crash with 
two equal strings. As I say, however, I'm a bit out of my comfort zone 
here, so I may be missing something.

I wouldn't know how to go about disecting type_table, which I assume is 
the hash_table arg passed into g_hash_table_lookup, so advice on how to 
do that and what I am looking for (NULL pointer?) would be helpful.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-26 23:22                         ` Chris Clayton
@ 2012-07-27 10:46                           ` Chris Clayton
       [not found]                             ` <CAG7+5M2y8gJvDCNuWsSB3zH=r75H0Mn=JNV+4DBc5xYjM+BJWA@mail.gmail.com>
  0 siblings, 1 reply; 42+ messages in thread
From: Chris Clayton @ 2012-07-27 10:46 UTC (permalink / raw)
  To: Chris Clayton; +Cc: Avi Kivity, Gleb Natapov, kvm, Jan Kiszka

On 07/27/12 00:22, Chris Clayton wrote:
> On 07/26/12 13:07, Avi Kivity wrote:
>> On 07/26/2012 02:58 PM, Chris Clayton wrote:
>>
>>>> It looks like general memory corruption.  Is this repeatable?  What's
>>>> the guest uptime when it happens (i.e. is it immediate?)
>>>
>>> I've just done 10 runs of WinXP SP3 and 5 of them crashed. Three crashed
>>> early as XP was starting up - well before the desktop would have
>>> appeared. The other two crashed as XP was closing down, having been
>>> running for a few minutes (but not doing much).
>>>
>>> The error messages seen through dmesg are:
>>>
>>> qemu-kvm[12778] general protection ip:b6c43d77 sp:b5e800fc error:0 in
>>> libc-2.16.so[b6b06000+1b4000]
>>> qemu-kvm[12813] general protection ip:b6bf6d77 sp:b54ff0fc error:0 in
>>> libc-2.16.so[b6ab9000+1b4000]
>>> qemu-kvm[12986] general protection ip:b6cd3d77 sp:b55ff0fc error:0 in
>>> libc-2.16.so[b6b96000+1b4000]
>>> qemu-kvm[13045] general protection ip:b6c91d77 sp:b54ff0fc error:0 in
>>> libc-2.16.so[b6b54000+1b4000]
>>> qemu-kvm[13225] general protection ip:b6c5bd77 sp:b54ff0fc error:0 in
>>> libc-2.16.so[b6b1e000+1b4000]
>>>
>>> The other 5 were OK, although I only did a bit of web browsing for  few
>>> minutes with IE.
>>
>> Failures always in the same place (I'm guess the variations are due to
>> PIE -- please configure with --disable-pie for future tests).
>>
>> Please generate a core and look around, esp. in frame 3
>> (type_table_lookup).  Also try to dissect type_table (you may need to
>> install the glib debug symbols for this).
>>
>>
>>
<snip>
Here's another backtrace and source listing of the failing function, 
following build and installation of libc (2.16) with debugging turned 
on. I'm afraid it's beyond my current knowledge to know what this might 
be telling us.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb60ffb40 (LWP 6515)]
__strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
217             movdqu  (%edx), %xmm2
(gdb) generate-core-file
Saved corefile core.6509
(gdb) bt
#0  __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
#1  0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319b82) at ghash.c:1704
#2  0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800, 
key=0x8319b82, hash_return=0xb60ff178)
     at ghash.c:422
#3  0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800, 
key=key@entry=0x8319b82) at ghash.c:1074
#4  0x0815c9cb in type_table_lookup (name=0x8319b82 "apic-common") at 
qom/object.c:94
#5  type_get_by_name (name=name@entry=0x8319b82 "apic-common") at 
qom/object.c:149
#6  0x0815cf93 in object_dynamic_cast (obj=obj@entry=0x8a44818, 
typename=typename@entry=0x8319b82 "apic-common")
     at qom/object.c:416
#7  0x0815cf2d in object_dynamic_cast_assert (obj=obj@entry=0x8a44818,
     typename=typename@entry=0x8319b82 "apic-common") at qom/object.c:478
#8  0x08192c1b in cpu_set_apic_tpr (d=0x8a44818, val=7 '\a')
     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
#9  0x081cb86c in kvm_arch_post_run (env=env@entry=0x8a3ca60, 
run=run@entry=0xb6271000)
     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
#10 0x081c686f in kvm_cpu_exec (env=env@entry=0x8a3ca60) at 
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
#11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a3ca60) at 
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
#12 0xb7a3ed9e in start_thread () from /lib/libpthread.so.0
#13 0xb77e45ee in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:132
(gdb) print *(0x8a0cd58)
$1 = 1667854433
(gdb) print (char*) 0x8a0cd58
$2 = 0x8a0cd58 "apic-common"
(gdb) list __strcmp_sse4_2
201             PUSH    (REM)
202     #endif
203     #if defined USE_AS_STRCASECMP_L || defined USE_AS_STRNCASECMP_L
204             PUSH    (%edi)
205     #endif
206             mov     STR1(%esp), %edx
207             mov     STR2(%esp), %eax
208     #if defined USE_AS_STRNCMP || defined USE_AS_STRNCASECMP_L
209             movl    CNT(%esp), REM
210             test    REM, REM
(gdb) list
211             je      L(eq)
212     #endif
213             mov     %dx, %cx
214             and     $0xfff, %cx
215             cmp     $0xff0, %cx
216             ja      L(first4bytes)
217             movdqu  (%edx), %xmm2
218             mov     %eax, %ecx
219             and     $0xfff, %ecx
220             cmp     $0xff0, %ecx
(gdb) list
221             ja      L(first4bytes)
222     #if defined USE_AS_STRCASECMP_L || defined USE_AS_STRNCASECMP_L
223     # define TOLOWER(reg1, reg2) \
224             movdqa  reg1, %xmm3; 
               \
225             movdqa  UCHIGH_reg, %xmm4; 
               \
226             movdqa  reg2, %xmm5; 
               \
227             movdqa  UCHIGH_reg, %xmm6; 
               \
228             pcmpgtb UCLOW_reg, %xmm3; 
               \
229             pcmpgtb reg1, %xmm4; 
               \
230             pcmpgtb UCLOW_reg, %xmm5; 
               \
(gdb)

I'll stop sending backtraces etc in now in the hope that someone will 
advise me on how I might better direct my efforts.

Thanks for your help so far.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
       [not found]                             ` <CAG7+5M2y8gJvDCNuWsSB3zH=r75H0Mn=JNV+4DBc5xYjM+BJWA@mail.gmail.com>
@ 2012-07-27 19:04                               ` Chris Clayton
  2012-07-29 12:42                                 ` Avi Kivity
  0 siblings, 1 reply; 42+ messages in thread
From: Chris Clayton @ 2012-07-27 19:04 UTC (permalink / raw)
  To: Eric Northup, kvm, Avi Kivity, Gleb Natapov, Jan Kiszka

On 07/27/12 19:08, Eric Northup wrote:
> Could you include the output of "info registers" at the point where it
> crashed?
>

Here you go:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb6a78b40 (LWP 13249)]
__strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
217             movdqu  (%edx), %xmm2
(gdb) bt
#0  __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
#1  0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319b82) at ghash.c:1704
#2  0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800, 
key=0x8319b82, hash_return=0xb6a78178)
     at ghash.c:422
#3  0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800, 
key=key@entry=0x8319b82) at ghash.c:1074
#4  0x0815c9cb in type_table_lookup (name=0x8319b82 "apic-common") at 
qom/object.c:94
#5  type_get_by_name (name=name@entry=0x8319b82 "apic-common") at 
qom/object.c:149
#6  0x0815cf93 in object_dynamic_cast (obj=obj@entry=0x8a313e0, 
typename=typename@entry=0x8319b82 "apic-common")
     at qom/object.c:416
#7  0x0815cf2d in object_dynamic_cast_assert (obj=obj@entry=0x8a313e0,
     typename=typename@entry=0x8319b82 "apic-common") at qom/object.c:478
#8  0x08192c1b in cpu_set_apic_tpr (d=0x8a313e0, val=13 '\r')
     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
#9  0x081cb86c in kvm_arch_post_run (env=env@entry=0x8a29370, 
run=run@entry=0xb6274000)
     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
#10 0x081c686f in kvm_cpu_exec (env=env@entry=0x8a29370) at 
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
#11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a29370) at 
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
#12 0xb7a3ed9e in start_thread () from /lib/libpthread.so.0
#13 0xb77e45ee in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:132
(gdb) info registers
eax            0x8319b82        137468802
ecx            0xd58    3416
edx            0x8a0cd58        144756056
ebx            0xb7f7f2c4       -1208487228
esp            0xb6a780ec       0xb6a780ec
ebp            0xb6a78118       0xb6a78118
esi            0x8a313e0        144905184
edi            0xc513   50451
eip            0xb7824f77       0xb7824f77 <__strcmp_sse4_2+23>
eflags         0x10283  [ CF SF IF RF ]
cs             0x73     115
ss             0x7b     123
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x33     51



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-27 19:04                               ` Chris Clayton
@ 2012-07-29 12:42                                 ` Avi Kivity
  2012-07-29 14:03                                   ` Chris Clayton
  0 siblings, 1 reply; 42+ messages in thread
From: Avi Kivity @ 2012-07-29 12:42 UTC (permalink / raw)
  To: Chris Clayton; +Cc: Eric Northup, kvm, Gleb Natapov, Jan Kiszka

On 07/27/2012 10:04 PM, Chris Clayton wrote:
> On 07/27/12 19:08, Eric Northup wrote:
>> Could you include the output of "info registers" at the point where it
>> crashed?
>>
> 
> Here you go:
> 
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0xb6a78b40 (LWP 13249)]
> __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
> 217             movdqu  (%edx), %xmm2
> (gdb) bt
> #0  __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
> #1  0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319b82) at ghash.c:1704
> #2  0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800,
> key=0x8319b82, hash_return=0xb6a78178)
>     at ghash.c:422
> #3  0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800,
> key=key@entry=0x8319b82) at ghash.c:1074
> #4  0x0815c9cb in type_table_lookup (name=0x8319b82 "apic-common") at
> qom/object.c:94
> #5  type_get_by_name (name=name@entry=0x8319b82 "apic-common") at
> qom/object.c:149
> #6  0x0815cf93 in object_dynamic_cast (obj=obj@entry=0x8a313e0,
> typename=typename@entry=0x8319b82 "apic-common")
>     at qom/object.c:416
> #7  0x0815cf2d in object_dynamic_cast_assert (obj=obj@entry=0x8a313e0,
>     typename=typename@entry=0x8319b82 "apic-common") at qom/object.c:478
> #8  0x08192c1b in cpu_set_apic_tpr (d=0x8a313e0, val=13 '\r')
>     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
> #9  0x081cb86c in kvm_arch_post_run (env=env@entry=0x8a29370,
> run=run@entry=0xb6274000)
>     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
> #10 0x081c686f in kvm_cpu_exec (env=env@entry=0x8a29370) at
> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
> #11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a29370) at
> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
> #12 0xb7a3ed9e in start_thread () from /lib/libpthread.so.0
> #13 0xb77e45ee in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:132
> (gdb) info registers
> eax            0x8319b82        137468802
> ecx            0xd58    3416
> edx            0x8a0cd58        144756056
> ebx            0xb7f7f2c4       -1208487228
> esp            0xb6a780ec       0xb6a780ec
> ebp            0xb6a78118       0xb6a78118
> esi            0x8a313e0        144905184
> edi            0xc513   50451
> eip            0xb7824f77       0xb7824f77 <__strcmp_sse4_2+23>
> eflags         0x10283  [ CF SF IF RF ]
> cs             0x73     115
> ss             0x7b     123
> ds             0x0      0
> es             0x0      0
> fs             0x0      0
> gs             0x33     51
> 

ds shouldn't be zero for a 32-bit process.

But that should have crashed *much* earlier, ds is accessed all the time.

Please add the following snippet to the beginning of kvm_arch_post_run():

{
    unsigned short ds;
    asm("mov %%ds, %0" : "=rm"(ds));
    assert(ds != 0);
}

if the assert triggers, then kvm corrupted the segment registers.  If
not, corruption happens somewhere above.

-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-29 12:42                                 ` Avi Kivity
@ 2012-07-29 14:03                                   ` Chris Clayton
  2012-07-29 14:18                                     ` Avi Kivity
  0 siblings, 1 reply; 42+ messages in thread
From: Chris Clayton @ 2012-07-29 14:03 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Eric Northup, kvm, Gleb Natapov, Jan Kiszka

On 07/29/12 13:42, Avi Kivity wrote:
> On 07/27/2012 10:04 PM, Chris Clayton wrote:
>> On 07/27/12 19:08, Eric Northup wrote:
>>> Could you include the output of "info registers" at the point where it
>>> crashed?
>>>
>>
>> Here you go:
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0xb6a78b40 (LWP 13249)]
>> __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
>> 217             movdqu  (%edx), %xmm2
>> (gdb) bt
>> #0  __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
>> #1  0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319b82) at ghash.c:1704
>> #2  0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800,
>> key=0x8319b82, hash_return=0xb6a78178)
>>      at ghash.c:422
>> #3  0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800,
>> key=key@entry=0x8319b82) at ghash.c:1074
>> #4  0x0815c9cb in type_table_lookup (name=0x8319b82 "apic-common") at
>> qom/object.c:94
>> #5  type_get_by_name (name=name@entry=0x8319b82 "apic-common") at
>> qom/object.c:149
>> #6  0x0815cf93 in object_dynamic_cast (obj=obj@entry=0x8a313e0,
>> typename=typename@entry=0x8319b82 "apic-common")
>>      at qom/object.c:416
>> #7  0x0815cf2d in object_dynamic_cast_assert (obj=obj@entry=0x8a313e0,
>>      typename=typename@entry=0x8319b82 "apic-common") at qom/object.c:478
>> #8  0x08192c1b in cpu_set_apic_tpr (d=0x8a313e0, val=13 '\r')
>>      at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
>> #9  0x081cb86c in kvm_arch_post_run (env=env@entry=0x8a29370,
>> run=run@entry=0xb6274000)
>>      at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
>> #10 0x081c686f in kvm_cpu_exec (env=env@entry=0x8a29370) at
>> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
>> #11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a29370) at
>> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
>> #12 0xb7a3ed9e in start_thread () from /lib/libpthread.so.0
>> #13 0xb77e45ee in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:132
>> (gdb) info registers
>> eax            0x8319b82        137468802
>> ecx            0xd58    3416
>> edx            0x8a0cd58        144756056
>> ebx            0xb7f7f2c4       -1208487228
>> esp            0xb6a780ec       0xb6a780ec
>> ebp            0xb6a78118       0xb6a78118
>> esi            0x8a313e0        144905184
>> edi            0xc513   50451
>> eip            0xb7824f77       0xb7824f77 <__strcmp_sse4_2+23>
>> eflags         0x10283  [ CF SF IF RF ]
>> cs             0x73     115
>> ss             0x7b     123
>> ds             0x0      0
>> es             0x0      0
>> fs             0x0      0
>> gs             0x33     51
>>
>
> ds shouldn't be zero for a 32-bit process.
>
> But that should have crashed *much* earlier, ds is accessed all the time.
>
> Please add the following snippet to the beginning of kvm_arch_post_run():
>
> {
>      unsigned short ds;
>      asm("mov %%ds, %0" : "=rm"(ds));
>      assert(ds != 0);
> }
>
> if the assert triggers, then kvm corrupted the segment registers.  If
> not, corruption happens somewhere above.
>
Thanks, Avi.

The assert didn't trigger - I got:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb60ffb40 (LWP 2134)]
__strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
217             movdqu  (%edx), %xmm2
(gdb) info registers
eax            0x8319ba2        137468834
ecx            0xd58    3416
edx            0x8a0cd58        144756056
ebx            0xb7f7f2c4       -1208487228
esp            0xb60ff0ec       0xb60ff0ec
ebp            0xb60ff118       0xb60ff118
esi            0x8a44818        144984088
edi            0xc513   50451
eip            0xb7820f77       0xb7820f77 <__strcmp_sse4_2+23>
eflags         0x10283  [ CF SF IF RF ]
cs             0x73     115
ss             0x7b     123
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x33     51
(gdb) list
212     #endif
213             mov     %dx, %cx
214             and     $0xfff, %cx
215             cmp     $0xff0, %cx
216             ja      L(first4bytes)
217             movdqu  (%edx), %xmm2
218             mov     %eax, %ecx
219             and     $0xfff, %ecx
220             cmp     $0xff0, %ecx
221             ja      L(first4bytes)
(gdb) bt
#0  __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
#1  0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319ba2) at ghash.c:1704
#2  0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800, 
key=0x8319ba2, hash_return=0xb60ff178)
     at ghash.c:422
#3  0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800, 
key=key@entry=0x8319ba2) at ghash.c:1074
#4  0x0815c9cb in type_table_lookup (name=0x8319ba2 "apic-common") at 
qom/object.c:94
#5  type_get_by_name (name=name@entry=0x8319ba2 "apic-common") at 
qom/object.c:149
#6  0x0815cf93 in object_dynamic_cast (obj=obj@entry=0x8a44818, 
typename=typename@entry=0x8319ba2 "apic-common")
     at qom/object.c:416
#7  0x0815cf2d in object_dynamic_cast_assert (obj=obj@entry=0x8a44818,
     typename=typename@entry=0x8319ba2 "apic-common") at qom/object.c:478
#8  0x08192c1b in cpu_set_apic_tpr (d=0x8a44818, val=7 '\a')
     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
#9  0x081cb874 in kvm_arch_post_run (env=env@entry=0x8a3ca60, 
run=run@entry=0xb626d000)
     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1702
#10 0x081c686f in kvm_cpu_exec (env=env@entry=0x8a3ca60) at 
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
#11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a3ca60) at 
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
#12 0xb7a3ad9e in start_thread () from /lib/libpthread.so.0
#13 0xb77e05ee in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:132

I think you are saying that the problem isn't in kvm, so where would you 
recommend I continue investigations. I'm not seeing a crash with any 
other applications.

Thanks again.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-29 14:03                                   ` Chris Clayton
@ 2012-07-29 14:18                                     ` Avi Kivity
  2012-07-29 14:48                                       ` Avi Kivity
  2012-07-29 15:47                                       ` Avi Kivity
  0 siblings, 2 replies; 42+ messages in thread
From: Avi Kivity @ 2012-07-29 14:18 UTC (permalink / raw)
  To: Chris Clayton; +Cc: Eric Northup, kvm, Gleb Natapov, Jan Kiszka

On 07/29/2012 05:03 PM, Chris Clayton wrote:
> On 07/29/12 13:42, Avi Kivity wrote:
>> On 07/27/2012 10:04 PM, Chris Clayton wrote:
>>> On 07/27/12 19:08, Eric Northup wrote:
>>>> Could you include the output of "info registers" at the point where it
>>>> crashed?
>>>>
>>>
>>> Here you go:
>>>
>>> Program received signal SIGSEGV, Segmentation fault.
>>> [Switching to Thread 0xb6a78b40 (LWP 13249)]
>>> __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
>>> 217             movdqu  (%edx), %xmm2
>>> (gdb) bt
>>> #0  __strcmp_sse4_2 () at
>>> ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
>>> #1  0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319b82) at
>>> ghash.c:1704
>>> #2  0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800,
>>> key=0x8319b82, hash_return=0xb6a78178)
>>>      at ghash.c:422
>>> #3  0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800,
>>> key=key@entry=0x8319b82) at ghash.c:1074
>>> #4  0x0815c9cb in type_table_lookup (name=0x8319b82 "apic-common") at
>>> qom/object.c:94
>>> #5  type_get_by_name (name=name@entry=0x8319b82 "apic-common") at
>>> qom/object.c:149
>>> #6  0x0815cf93 in object_dynamic_cast (obj=obj@entry=0x8a313e0,
>>> typename=typename@entry=0x8319b82 "apic-common")
>>>      at qom/object.c:416
>>> #7  0x0815cf2d in object_dynamic_cast_assert (obj=obj@entry=0x8a313e0,
>>>      typename=typename@entry=0x8319b82 "apic-common") at
>>> qom/object.c:478
>>> #8  0x08192c1b in cpu_set_apic_tpr (d=0x8a313e0, val=13 '\r')
>>>      at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
>>> #9  0x081cb86c in kvm_arch_post_run (env=env@entry=0x8a29370,
>>> run=run@entry=0xb6274000)
>>>      at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
>>> #10 0x081c686f in kvm_cpu_exec (env=env@entry=0x8a29370) at
>>> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
>>> #11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a29370) at
>>> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
>>> #12 0xb7a3ed9e in start_thread () from /lib/libpthread.so.0
>>> #13 0xb77e45ee in clone () at
>>> ../sysdeps/unix/sysv/linux/i386/clone.S:132
>>> (gdb) info registers
>>> eax            0x8319b82        137468802
>>> ecx            0xd58    3416
>>> edx            0x8a0cd58        144756056
>>> ebx            0xb7f7f2c4       -1208487228
>>> esp            0xb6a780ec       0xb6a780ec
>>> ebp            0xb6a78118       0xb6a78118
>>> esi            0x8a313e0        144905184
>>> edi            0xc513   50451
>>> eip            0xb7824f77       0xb7824f77 <__strcmp_sse4_2+23>
>>> eflags         0x10283  [ CF SF IF RF ]
>>> cs             0x73     115
>>> ss             0x7b     123
>>> ds             0x0      0
>>> es             0x0      0
>>> fs             0x0      0
>>> gs             0x33     51
>>>
>>
>> ds shouldn't be zero for a 32-bit process.
>>
>> But that should have crashed *much* earlier, ds is accessed all the time.
>>
>> Please add the following snippet to the beginning of kvm_arch_post_run():
>>
>> {
>>      unsigned short ds;
>>      asm("mov %%ds, %0" : "=rm"(ds));
>>      assert(ds != 0);
>> }
>>
>> if the assert triggers, then kvm corrupted the segment registers.  If
>> not, corruption happens somewhere above.
>>
> Thanks, Avi.
> 
> The assert didn't trigger - I got:
> 
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0xb60ffb40 (LWP 2134)]
> __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
> 217             movdqu  (%edx), %xmm2
> (gdb) info registers
> eax            0x8319ba2        137468834
> ecx            0xd58    3416
> edx            0x8a0cd58        144756056
> ebx            0xb7f7f2c4       -1208487228
> esp            0xb60ff0ec       0xb60ff0ec
> ebp            0xb60ff118       0xb60ff118
> esi            0x8a44818        144984088
> edi            0xc513   50451
> eip            0xb7820f77       0xb7820f77 <__strcmp_sse4_2+23>
> eflags         0x10283  [ CF SF IF RF ]
> cs             0x73     115
> ss             0x7b     123
> ds             0x0      0
> es             0x0      0
> fs             0x0      0
> gs             0x33     51
> (gdb) list
> 212     #endif
> 213             mov     %dx, %cx
> 214             and     $0xfff, %cx
> 215             cmp     $0xff0, %cx
> 216             ja      L(first4bytes)
> 217             movdqu  (%edx), %xmm2
> 218             mov     %eax, %ecx
> 219             and     $0xfff, %ecx
> 220             cmp     $0xff0, %ecx
> 221             ja      L(first4bytes)
> (gdb) bt
> #0  __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
> #1  0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319ba2) at ghash.c:1704
> #2  0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800,
> key=0x8319ba2, hash_return=0xb60ff178)
>     at ghash.c:422
> #3  0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800,
> key=key@entry=0x8319ba2) at ghash.c:1074
> #4  0x0815c9cb in type_table_lookup (name=0x8319ba2 "apic-common") at
> qom/object.c:94
> #5  type_get_by_name (name=name@entry=0x8319ba2 "apic-common") at
> qom/object.c:149
> #6  0x0815cf93 in object_dynamic_cast (obj=obj@entry=0x8a44818,
> typename=typename@entry=0x8319ba2 "apic-common")
>     at qom/object.c:416
> #7  0x0815cf2d in object_dynamic_cast_assert (obj=obj@entry=0x8a44818,
>     typename=typename@entry=0x8319ba2 "apic-common") at qom/object.c:478
> #8  0x08192c1b in cpu_set_apic_tpr (d=0x8a44818, val=7 '\a')
>     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
> #9  0x081cb874 in kvm_arch_post_run (env=env@entry=0x8a3ca60,
> run=run@entry=0xb626d000)
>     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1702
> #10 0x081c686f in kvm_cpu_exec (env=env@entry=0x8a3ca60) at
> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
> #11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a3ca60) at
> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
> #12 0xb7a3ad9e in start_thread () from /lib/libpthread.so.0
> #13 0xb77e05ee in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:132
> 
> I think you are saying that the problem isn't in kvm, so where would you
> recommend I continue investigations. I'm not seeing a crash with any
> other applications.

What might have happened is that the movdqu instruction faulted (as it's
an fpu instruction), and on the way back from the fault, ds and es
didn't get restored correctly.

You can test this by writing a trivial version of g_str_equal()
somewhere in the qemu source code and rebuilding it.


-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-29 14:18                                     ` Avi Kivity
@ 2012-07-29 14:48                                       ` Avi Kivity
  2012-07-29 15:21                                         ` Chris Clayton
  2012-07-29 15:47                                       ` Avi Kivity
  1 sibling, 1 reply; 42+ messages in thread
From: Avi Kivity @ 2012-07-29 14:48 UTC (permalink / raw)
  To: Chris Clayton; +Cc: Eric Northup, kvm, Gleb Natapov, Jan Kiszka

On 07/29/2012 05:18 PM, Avi Kivity wrote:
>> 
>> I think you are saying that the problem isn't in kvm, so where would you
>> recommend I continue investigations. I'm not seeing a crash with any
>> other applications.
> 
> What might have happened is that the movdqu instruction faulted (as it's
> an fpu instruction), and on the way back from the fault, ds and es
> didn't get restored correctly.
> 
> You can test this by writing a trivial version of g_str_equal()
> somewhere in the qemu source code and rebuilding it.

You're running a 32-bit kernel, yes?  Please confirm.


-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-29 14:48                                       ` Avi Kivity
@ 2012-07-29 15:21                                         ` Chris Clayton
  0 siblings, 0 replies; 42+ messages in thread
From: Chris Clayton @ 2012-07-29 15:21 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Eric Northup, kvm, Gleb Natapov, Jan Kiszka

On 07/29/12 15:48, Avi Kivity wrote:
> On 07/29/2012 05:18 PM, Avi Kivity wrote:
>>>
>>> I think you are saying that the problem isn't in kvm, so where would you
>>> recommend I continue investigations. I'm not seeing a crash with any
>>> other applications.
>>
>> What might have happened is that the movdqu instruction faulted (as it's
>> an fpu instruction), and on the way back from the fault, ds and es
>> didn't get restored correctly.
>>
>> You can test this by writing a trivial version of g_str_equal()
>> somewhere in the qemu source code and rebuilding it.
>
> You're running a 32-bit kernel, yes?  Please confirm.
>
>
Yes, I am running a 32-bit kernel and userland.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-29 14:18                                     ` Avi Kivity
  2012-07-29 14:48                                       ` Avi Kivity
@ 2012-07-29 15:47                                       ` Avi Kivity
  2012-07-29 16:34                                         ` Avi Kivity
  1 sibling, 1 reply; 42+ messages in thread
From: Avi Kivity @ 2012-07-29 15:47 UTC (permalink / raw)
  To: Chris Clayton; +Cc: Eric Northup, kvm, Gleb Natapov, Jan Kiszka

On 07/29/2012 05:18 PM, Avi Kivity wrote:
> On 07/29/2012 05:03 PM, Chris Clayton wrote:
>> On 07/29/12 13:42, Avi Kivity wrote:
>>> On 07/27/2012 10:04 PM, Chris Clayton wrote:
>>>> On 07/27/12 19:08, Eric Northup wrote:
>>>>> Could you include the output of "info registers" at the point where it
>>>>> crashed?
>>>>>
>>>>
>>>> Here you go:
>>>>
>>>> Program received signal SIGSEGV, Segmentation fault.
>>>> [Switching to Thread 0xb6a78b40 (LWP 13249)]
>>>> __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
>>>> 217             movdqu  (%edx), %xmm2
>>>> (gdb) bt
>>>> #0  __strcmp_sse4_2 () at
>>>> ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
>>>> #1  0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319b82) at
>>>> ghash.c:1704
>>>> #2  0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800,
>>>> key=0x8319b82, hash_return=0xb6a78178)
>>>>      at ghash.c:422
>>>> #3  0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800,
>>>> key=key@entry=0x8319b82) at ghash.c:1074
>>>> #4  0x0815c9cb in type_table_lookup (name=0x8319b82 "apic-common") at
>>>> qom/object.c:94
>>>> #5  type_get_by_name (name=name@entry=0x8319b82 "apic-common") at
>>>> qom/object.c:149
>>>> #6  0x0815cf93 in object_dynamic_cast (obj=obj@entry=0x8a313e0,
>>>> typename=typename@entry=0x8319b82 "apic-common")
>>>>      at qom/object.c:416
>>>> #7  0x0815cf2d in object_dynamic_cast_assert (obj=obj@entry=0x8a313e0,
>>>>      typename=typename@entry=0x8319b82 "apic-common") at
>>>> qom/object.c:478
>>>> #8  0x08192c1b in cpu_set_apic_tpr (d=0x8a313e0, val=13 '\r')
>>>>      at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
>>>> #9  0x081cb86c in kvm_arch_post_run (env=env@entry=0x8a29370,
>>>> run=run@entry=0xb6274000)
>>>>      at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
>>>> #10 0x081c686f in kvm_cpu_exec (env=env@entry=0x8a29370) at
>>>> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
>>>> #11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a29370) at
>>>> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
>>>> #12 0xb7a3ed9e in start_thread () from /lib/libpthread.so.0
>>>> #13 0xb77e45ee in clone () at
>>>> ../sysdeps/unix/sysv/linux/i386/clone.S:132
>>>> (gdb) info registers
>>>> eax            0x8319b82        137468802
>>>> ecx            0xd58    3416
>>>> edx            0x8a0cd58        144756056
>>>> ebx            0xb7f7f2c4       -1208487228
>>>> esp            0xb6a780ec       0xb6a780ec
>>>> ebp            0xb6a78118       0xb6a78118
>>>> esi            0x8a313e0        144905184
>>>> edi            0xc513   50451
>>>> eip            0xb7824f77       0xb7824f77 <__strcmp_sse4_2+23>
>>>> eflags         0x10283  [ CF SF IF RF ]
>>>> cs             0x73     115
>>>> ss             0x7b     123
>>>> ds             0x0      0
>>>> es             0x0      0
>>>> fs             0x0      0
>>>> gs             0x33     51
>>>>
>>>
>>> ds shouldn't be zero for a 32-bit process.
>>>
>>> But that should have crashed *much* earlier, ds is accessed all the time.
>>>
>>> Please add the following snippet to the beginning of kvm_arch_post_run():
>>>
>>> {
>>>      unsigned short ds;
>>>      asm("mov %%ds, %0" : "=rm"(ds));
>>>      assert(ds != 0);
>>> }
>>>
>>> if the assert triggers, then kvm corrupted the segment registers.  If
>>> not, corruption happens somewhere above.
>>>
>> Thanks, Avi.
>> 
>> The assert didn't trigger - I got:
>> 
>> Program received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0xb60ffb40 (LWP 2134)]
>> __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
>> 217             movdqu  (%edx), %xmm2
>> (gdb) info registers
>> eax            0x8319ba2        137468834
>> ecx            0xd58    3416
>> edx            0x8a0cd58        144756056
>> ebx            0xb7f7f2c4       -1208487228
>> esp            0xb60ff0ec       0xb60ff0ec
>> ebp            0xb60ff118       0xb60ff118
>> esi            0x8a44818        144984088
>> edi            0xc513   50451
>> eip            0xb7820f77       0xb7820f77 <__strcmp_sse4_2+23>
>> eflags         0x10283  [ CF SF IF RF ]
>> cs             0x73     115
>> ss             0x7b     123
>> ds             0x0      0
>> es             0x0      0
>> fs             0x0      0
>> gs             0x33     51
>> (gdb) list
>> 212     #endif
>> 213             mov     %dx, %cx
>> 214             and     $0xfff, %cx
>> 215             cmp     $0xff0, %cx
>> 216             ja      L(first4bytes)
>> 217             movdqu  (%edx), %xmm2
>> 218             mov     %eax, %ecx
>> 219             and     $0xfff, %ecx
>> 220             cmp     $0xff0, %ecx
>> 221             ja      L(first4bytes)
>> (gdb) bt
>> #0  __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
>> #1  0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319ba2) at ghash.c:1704
>> #2  0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800,
>> key=0x8319ba2, hash_return=0xb60ff178)
>>     at ghash.c:422
>> #3  0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800,
>> key=key@entry=0x8319ba2) at ghash.c:1074
>> #4  0x0815c9cb in type_table_lookup (name=0x8319ba2 "apic-common") at
>> qom/object.c:94
>> #5  type_get_by_name (name=name@entry=0x8319ba2 "apic-common") at
>> qom/object.c:149
>> #6  0x0815cf93 in object_dynamic_cast (obj=obj@entry=0x8a44818,
>> typename=typename@entry=0x8319ba2 "apic-common")
>>     at qom/object.c:416
>> #7  0x0815cf2d in object_dynamic_cast_assert (obj=obj@entry=0x8a44818,
>>     typename=typename@entry=0x8319ba2 "apic-common") at qom/object.c:478
>> #8  0x08192c1b in cpu_set_apic_tpr (d=0x8a44818, val=7 '\a')
>>     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
>> #9  0x081cb874 in kvm_arch_post_run (env=env@entry=0x8a3ca60,
>> run=run@entry=0xb626d000)
>>     at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1702
>> #10 0x081c686f in kvm_cpu_exec (env=env@entry=0x8a3ca60) at
>> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
>> #11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a3ca60) at
>> /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
>> #12 0xb7a3ad9e in start_thread () from /lib/libpthread.so.0
>> #13 0xb77e05ee in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:132
>> 
>> I think you are saying that the problem isn't in kvm, so where would you
>> recommend I continue investigations. I'm not seeing a crash with any
>> other applications.
> 
> What might have happened is that the movdqu instruction faulted (as it's
> an fpu instruction), and on the way back from the fault, ds and es
> didn't get restored correctly.
> 
> You can test this by writing a trivial version of g_str_equal()
> somewhere in the qemu source code and rebuilding it.
> 
> 

from entry_32.S:

.macro RESTORE_REGS pop=0
	RESTORE_INT_REGS
1:	popl_cfi %ds
	/*CFI_RESTORE ds;*/
2:	popl_cfi %es
	/*CFI_RESTORE es;*/
3:	popl_cfi %fs
	/*CFI_RESTORE fs;*/
	POP_GS \pop
.pushsection .fixup, "ax"
4:	movl $0, (%esp)
	jmp 1b
5:	movl $0, (%esp)
	jmp 2b
6:	movl $0, (%esp)
	jmp 3b
.popsection

this piece of code tries to restore %ds, and if it fails, zeros it,
which is consistent with the core dump.

This could happen if kvm is failing to restore GDT correctly.

-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-29 15:47                                       ` Avi Kivity
@ 2012-07-29 16:34                                         ` Avi Kivity
  2012-07-29 17:50                                           ` Chris Clayton
  0 siblings, 1 reply; 42+ messages in thread
From: Avi Kivity @ 2012-07-29 16:34 UTC (permalink / raw)
  To: Chris Clayton; +Cc: Eric Northup, kvm, Gleb Natapov, Jan Kiszka

On 07/29/2012 06:47 PM, Avi Kivity wrote:
>> What might have happened is that the movdqu instruction faulted (as it's
>> an fpu instruction), and on the way back from the fault, ds and es
>> didn't get restored correctly.
>> 
>> You can test this by writing a trivial version of g_str_equal()
>> somewhere in the qemu source code and rebuilding it.
>> 
>> 
> 
> from entry_32.S:
> 
> .macro RESTORE_REGS pop=0
> 	RESTORE_INT_REGS
> 1:	popl_cfi %ds
> 	/*CFI_RESTORE ds;*/
> 2:	popl_cfi %es
> 	/*CFI_RESTORE es;*/
> 3:	popl_cfi %fs
> 	/*CFI_RESTORE fs;*/
> 	POP_GS \pop
> .pushsection .fixup, "ax"
> 4:	movl $0, (%esp)
> 	jmp 1b
> 5:	movl $0, (%esp)
> 	jmp 2b
> 6:	movl $0, (%esp)
> 	jmp 3b
> .popsection
> 
> this piece of code tries to restore %ds, and if it fails, zeros it,
> which is consistent with the core dump.
> 
> This could happen if kvm is failing to restore GDT correctly.
> 

Possible culprit: b2da15ac26a0c00.


-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-29 16:34                                         ` Avi Kivity
@ 2012-07-29 17:50                                           ` Chris Clayton
  2012-07-29 17:54                                             ` Gleb Natapov
  0 siblings, 1 reply; 42+ messages in thread
From: Chris Clayton @ 2012-07-29 17:50 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Eric Northup, kvm, Gleb Natapov, Jan Kiszka

On 07/29/12 17:34, Avi Kivity wrote:
> On 07/29/2012 06:47 PM, Avi Kivity wrote:
>>> What might have happened is that the movdqu instruction faulted (as it's
>>> an fpu instruction), and on the way back from the fault, ds and es
>>> didn't get restored correctly.
>>>
>>> You can test this by writing a trivial version of g_str_equal()
>>> somewhere in the qemu source code and rebuilding it.
>>>
>>>
>>
>> from entry_32.S:
>>
>> .macro RESTORE_REGS pop=0
>> 	RESTORE_INT_REGS
>> 1:	popl_cfi %ds
>> 	/*CFI_RESTORE ds;*/
>> 2:	popl_cfi %es
>> 	/*CFI_RESTORE es;*/
>> 3:	popl_cfi %fs
>> 	/*CFI_RESTORE fs;*/
>> 	POP_GS \pop
>> .pushsection .fixup, "ax"
>> 4:	movl $0, (%esp)
>> 	jmp 1b
>> 5:	movl $0, (%esp)
>> 	jmp 2b
>> 6:	movl $0, (%esp)
>> 	jmp 3b
>> .popsection
>>
>> this piece of code tries to restore %ds, and if it fails, zeros it,
>> which is consistent with the core dump.
>>
>> This could happen if kvm is failing to restore GDT correctly.
>>
>
> Possible culprit: b2da15ac26a0c00.
>
>
That commit isn't in qermu-kvm-1.1.1.

I'm testing a build with g_str_equal implemented in kvm.c and so far I 
haven't had a crash in 6 invocations. That hasn't been possible with 
vanilla qemu-kvm-1.1.{0,1}, but I'll do a few more, just to be sure.

Thanks for your help, Avi.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-29 17:50                                           ` Chris Clayton
@ 2012-07-29 17:54                                             ` Gleb Natapov
  2012-07-29 19:10                                               ` Chris Clayton
  0 siblings, 1 reply; 42+ messages in thread
From: Gleb Natapov @ 2012-07-29 17:54 UTC (permalink / raw)
  To: Chris Clayton; +Cc: Avi Kivity, Eric Northup, kvm, Jan Kiszka

On Sun, Jul 29, 2012 at 06:50:09PM +0100, Chris Clayton wrote:
> On 07/29/12 17:34, Avi Kivity wrote:
> >On 07/29/2012 06:47 PM, Avi Kivity wrote:
> >>>What might have happened is that the movdqu instruction faulted (as it's
> >>>an fpu instruction), and on the way back from the fault, ds and es
> >>>didn't get restored correctly.
> >>>
> >>>You can test this by writing a trivial version of g_str_equal()
> >>>somewhere in the qemu source code and rebuilding it.
> >>>
> >>>
> >>
> >>from entry_32.S:
> >>
> >>.macro RESTORE_REGS pop=0
> >>	RESTORE_INT_REGS
> >>1:	popl_cfi %ds
> >>	/*CFI_RESTORE ds;*/
> >>2:	popl_cfi %es
> >>	/*CFI_RESTORE es;*/
> >>3:	popl_cfi %fs
> >>	/*CFI_RESTORE fs;*/
> >>	POP_GS \pop
> >>.pushsection .fixup, "ax"
> >>4:	movl $0, (%esp)
> >>	jmp 1b
> >>5:	movl $0, (%esp)
> >>	jmp 2b
> >>6:	movl $0, (%esp)
> >>	jmp 3b
> >>.popsection
> >>
> >>this piece of code tries to restore %ds, and if it fails, zeros it,
> >>which is consistent with the core dump.
> >>
> >>This could happen if kvm is failing to restore GDT correctly.
> >>
> >
> >Possible culprit: b2da15ac26a0c00.
> >
> >
> That commit isn't in qermu-kvm-1.1.1.
> 
It is in kernel.

> I'm testing a build with g_str_equal implemented in kvm.c and so far
> I haven't had a crash in 6 invocations. That hasn't been possible
> with vanilla qemu-kvm-1.1.{0,1}, but I'll do a few more, just to be
> sure.
> 
> Thanks for your help, Avi.

--
			Gleb.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-29 17:54                                             ` Gleb Natapov
@ 2012-07-29 19:10                                               ` Chris Clayton
  2012-07-30 14:00                                                 ` Chris Clayton
  0 siblings, 1 reply; 42+ messages in thread
From: Chris Clayton @ 2012-07-29 19:10 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: Avi Kivity, Eric Northup, kvm, Jan Kiszka

On 07/29/12 18:54, Gleb Natapov wrote:
> On Sun, Jul 29, 2012 at 06:50:09PM +0100, Chris Clayton wrote:
>> On 07/29/12 17:34, Avi Kivity wrote:
>>> On 07/29/2012 06:47 PM, Avi Kivity wrote:
>>>>> What might have happened is that the movdqu instruction faulted (as it's
>>>>> an fpu instruction), and on the way back from the fault, ds and es
>>>>> didn't get restored correctly.
>>>>>
>>>>> You can test this by writing a trivial version of g_str_equal()
>>>>> somewhere in the qemu source code and rebuilding it.
>>>>>
>>>>>
>>>>
>>> >from entry_32.S:
>>>>
>>>> .macro RESTORE_REGS pop=0
>>>> 	RESTORE_INT_REGS
>>>> 1:	popl_cfi %ds
>>>> 	/*CFI_RESTORE ds;*/
>>>> 2:	popl_cfi %es
>>>> 	/*CFI_RESTORE es;*/
>>>> 3:	popl_cfi %fs
>>>> 	/*CFI_RESTORE fs;*/
>>>> 	POP_GS \pop
>>>> .pushsection .fixup, "ax"
>>>> 4:	movl $0, (%esp)
>>>> 	jmp 1b
>>>> 5:	movl $0, (%esp)
>>>> 	jmp 2b
>>>> 6:	movl $0, (%esp)
>>>> 	jmp 3b
>>>> .popsection
>>>>
>>>> this piece of code tries to restore %ds, and if it fails, zeros it,
>>>> which is consistent with the core dump.
>>>>
>>>> This could happen if kvm is failing to restore GDT correctly.
>>>>
>>>
>>> Possible culprit: b2da15ac26a0c00.
>>>
>>>
>> That commit isn't in qermu-kvm-1.1.1.
>>
> It is in kernel.
>

Sorry, so it is.

With kernel 3.5.0 with b2da15ac26a0c00 reverted, I have just had 15 
clean invocations of vanilla qemu-kvm-1.1.1. So that commit would seem 
to be the problem.

>> I'm testing a build with g_str_equal implemented in kvm.c and so far
>> I haven't had a crash in 6 invocations. That hasn't been possible
>> with vanilla qemu-kvm-1.1.{0,1}, but I'll do a few more, just to be
>> sure.
>>

Similarly, with my "local" implementation of g_str_equal, I've had 15 
clean invocations on vanilla kernel 3.5.0.

I'm more than happy to test patches to fix this regression, but it will 
be tomorrow before I will be able to do so.

>> Thanks for your help, Avi.
>
> --
> 			Gleb.
>


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-29 19:10                                               ` Chris Clayton
@ 2012-07-30 14:00                                                 ` Chris Clayton
  2012-07-30 14:03                                                   ` Avi Kivity
  0 siblings, 1 reply; 42+ messages in thread
From: Chris Clayton @ 2012-07-30 14:00 UTC (permalink / raw)
  To: Chris Clayton; +Cc: Gleb Natapov, Avi Kivity, Eric Northup, kvm, Jan Kiszka

On 07/29/12 20:10, Chris Clayton wrote:
>>>> Possible culprit: b2da15ac26a0c00.
>>>>
>>>>
>>> That commit isn't in qermu-kvm-1.1.1.
>>>
>> It is in kernel.
>>
>
> Sorry, so it is.
>
> With kernel 3.5.0 with b2da15ac26a0c00 reverted, I have just had 15
> clean invocations of vanilla qemu-kvm-1.1.1. So that commit would seem
> to be the problem.

Just to be sure, I've run some more tests today. No crashes occurred in 
20 runs of vanilla qemu-kvm-1.1.1 on kernel 3.5.0 with b2da15ac26a0c00 
reverted.

Thanks.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-30 14:00                                                 ` Chris Clayton
@ 2012-07-30 14:03                                                   ` Avi Kivity
  2012-07-30 14:07                                                     ` Chris Clayton
  0 siblings, 1 reply; 42+ messages in thread
From: Avi Kivity @ 2012-07-30 14:03 UTC (permalink / raw)
  To: Chris Clayton; +Cc: Gleb Natapov, Eric Northup, kvm, Jan Kiszka

On 07/30/2012 05:00 PM, Chris Clayton wrote:
> On 07/29/12 20:10, Chris Clayton wrote:
>>>>> Possible culprit: b2da15ac26a0c00.
>>>>>
>>>>>
>>>> That commit isn't in qermu-kvm-1.1.1.
>>>>
>>> It is in kernel.
>>>
>>
>> Sorry, so it is.
>>
>> With kernel 3.5.0 with b2da15ac26a0c00 reverted, I have just had 15
>> clean invocations of vanilla qemu-kvm-1.1.1. So that commit would seem
>> to be the problem.
> 
> Just to be sure, I've run some more tests today. No crashes occurred in
> 20 runs of vanilla qemu-kvm-1.1.1 on kernel 3.5.0 with b2da15ac26a0c00
> reverted.

Ok.  I'm trying to reproduce it here on a nested-virt setup, since the
code looks correct.

What's your preemption settings?


-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-30 14:03                                                   ` Avi Kivity
@ 2012-07-30 14:07                                                     ` Chris Clayton
  2012-07-30 16:39                                                       ` Avi Kivity
  0 siblings, 1 reply; 42+ messages in thread
From: Chris Clayton @ 2012-07-30 14:07 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Gleb Natapov, Eric Northup, kvm, Jan Kiszka

On 07/30/12 15:03, Avi Kivity wrote:
> On 07/30/2012 05:00 PM, Chris Clayton wrote:
>> On 07/29/12 20:10, Chris Clayton wrote:
>>>>>> Possible culprit: b2da15ac26a0c00.
>>>>>>
>>>>>>
>>>>> That commit isn't in qermu-kvm-1.1.1.
>>>>>
>>>> It is in kernel.
>>>>
>>>
>>> Sorry, so it is.
>>>
>>> With kernel 3.5.0 with b2da15ac26a0c00 reverted, I have just had 15
>>> clean invocations of vanilla qemu-kvm-1.1.1. So that commit would seem
>>> to be the problem.
>>
>> Just to be sure, I've run some more tests today. No crashes occurred in
>> 20 runs of vanilla qemu-kvm-1.1.1 on kernel 3.5.0 with b2da15ac26a0c00
>> reverted.
>
> Ok.  I'm trying to reproduce it here on a nested-virt setup, since the
> code looks correct.
>
> What's your preemption settings?
>
>
[chris:~/kernel/linux-3.5.0]$ grep PREEMPT .config
CONFIG_TREE_PREEMPT_RCU=y
CONFIG_PREEMPT_RCU=y
CONFIG_PREEMPT_NOTIFIERS=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-30 14:07                                                     ` Chris Clayton
@ 2012-07-30 16:39                                                       ` Avi Kivity
  2012-07-30 23:36                                                         ` Marcelo Tosatti
  2012-08-01 13:11                                                         ` Avi Kivity
  0 siblings, 2 replies; 42+ messages in thread
From: Avi Kivity @ 2012-07-30 16:39 UTC (permalink / raw)
  To: Chris Clayton
  Cc: Gleb Natapov, Eric Northup, kvm, Jan Kiszka, Marcelo Tosatti

On 07/30/2012 05:07 PM, Chris Clayton wrote:
>>
>>>> With kernel 3.5.0 with b2da15ac26a0c00 reverted, I have just had 15
>>>> clean invocations of vanilla qemu-kvm-1.1.1. So that commit would seem
>>>> to be the problem.
>>>
>>> Just to be sure, I've run some more tests today. No crashes occurred in
>>> 20 runs of vanilla qemu-kvm-1.1.1 on kernel 3.5.0 with b2da15ac26a0c00
>>> reverted.
>>
>> Ok.  I'm trying to reproduce it here on a nested-virt setup, since the
>> code looks correct.
>>
>> What's your preemption settings?
>>
>>
> [chris:~/kernel/linux-3.5.0]$ grep PREEMPT .config
> CONFIG_TREE_PREEMPT_RCU=y
> CONFIG_PREEMPT_RCU=y
> CONFIG_PREEMPT_NOTIFIERS=y
> # CONFIG_PREEMPT_NONE is not set
> # CONFIG_PREEMPT_VOLUNTARY is not set
> CONFIG_PREEMPT=y
> CONFIG_PREEMPT_COUNT=y

Here's what I think that is happening

  vcpu_load
  ...
  vmx_save_host_state
  vmx_vcpu_run
  (ds.cpl, es.cpl cleared by hardware)

  interrupt
    push ds, es  # pushes bad ds, es
    schedule
      vmx_vcpu_put
        vmx_load_host_state
          reload ds, es
    pop ds, es  # of other thread's stack
    iret
  # other thread runs
  interrupt
    schedule  # back in vcpu thread
    interrupt return: pop ds, es  # <-- problem
    iret

   ...
   vcpu_put

   # bad ds, es, but !vmx->host_state.loaded

Marcelo, did I miss something here?

Unfortunately, my reproducer has ceased to reproduce.  But the fix is
easy if the analysis above is right.

-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-30 16:39                                                       ` Avi Kivity
@ 2012-07-30 23:36                                                         ` Marcelo Tosatti
  2012-07-31  9:11                                                           ` Avi Kivity
  2012-08-01 13:11                                                         ` Avi Kivity
  1 sibling, 1 reply; 42+ messages in thread
From: Marcelo Tosatti @ 2012-07-30 23:36 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Chris Clayton, Gleb Natapov, Eric Northup, kvm, Jan Kiszka

On Mon, Jul 30, 2012 at 07:39:31PM +0300, Avi Kivity wrote:
> On 07/30/2012 05:07 PM, Chris Clayton wrote:
> >>
> >>>> With kernel 3.5.0 with b2da15ac26a0c00 reverted, I have just had 15
> >>>> clean invocations of vanilla qemu-kvm-1.1.1. So that commit would seem
> >>>> to be the problem.
> >>>
> >>> Just to be sure, I've run some more tests today. No crashes occurred in
> >>> 20 runs of vanilla qemu-kvm-1.1.1 on kernel 3.5.0 with b2da15ac26a0c00
> >>> reverted.
> >>
> >> Ok.  I'm trying to reproduce it here on a nested-virt setup, since the
> >> code looks correct.
> >>
> >> What's your preemption settings?
> >>
> >>
> > [chris:~/kernel/linux-3.5.0]$ grep PREEMPT .config
> > CONFIG_TREE_PREEMPT_RCU=y
> > CONFIG_PREEMPT_RCU=y
> > CONFIG_PREEMPT_NOTIFIERS=y
> > # CONFIG_PREEMPT_NONE is not set
> > # CONFIG_PREEMPT_VOLUNTARY is not set
> > CONFIG_PREEMPT=y
> > CONFIG_PREEMPT_COUNT=y
> 
> Here's what I think that is happening
> 
>   vcpu_load
>   ...
>   vmx_save_host_state
>   vmx_vcpu_run
>   (ds.cpl, es.cpl cleared by hardware)
> 
>   interrupt
>     push ds, es  # pushes bad ds, es
>     schedule
>       vmx_vcpu_put
>         vmx_load_host_state
>           reload ds, es
>     pop ds, es  # of other thread's stack
>     iret
>   # other thread runs
>   interrupt
>     schedule  # back in vcpu thread
>     interrupt return: pop ds, es  # <-- problem
>     iret
> 
>    ...
>    vcpu_put
> 
>    # bad ds, es, but !vmx->host_state.loaded
> 
> Marcelo, did I miss something here?

Don't think so.

> 
> Unfortunately, my reproducer has ceased to reproduce.  But the fix is
> easy if the analysis above is right.
> 
> -- 
> error compiling committee.c: too many arguments to function
> 

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-30 23:36                                                         ` Marcelo Tosatti
@ 2012-07-31  9:11                                                           ` Avi Kivity
  2012-07-31 16:29                                                             ` Marcelo Tosatti
  0 siblings, 1 reply; 42+ messages in thread
From: Avi Kivity @ 2012-07-31  9:11 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Chris Clayton, Gleb Natapov, Eric Northup, kvm, Jan Kiszka

On 07/31/2012 02:36 AM, Marcelo Tosatti wrote:
> On Mon, Jul 30, 2012 at 07:39:31PM +0300, Avi Kivity wrote:
>> On 07/30/2012 05:07 PM, Chris Clayton wrote:
>> >>
>> >>>> With kernel 3.5.0 with b2da15ac26a0c00 reverted, I have just had 15
>> >>>> clean invocations of vanilla qemu-kvm-1.1.1. So that commit would seem
>> >>>> to be the problem.
>> >>>
>> >>> Just to be sure, I've run some more tests today. No crashes occurred in
>> >>> 20 runs of vanilla qemu-kvm-1.1.1 on kernel 3.5.0 with b2da15ac26a0c00
>> >>> reverted.
>> >>
>> >> Ok.  I'm trying to reproduce it here on a nested-virt setup, since the
>> >> code looks correct.
>> >>
>> >> What's your preemption settings?
>> >>
>> >>
>> > [chris:~/kernel/linux-3.5.0]$ grep PREEMPT .config
>> > CONFIG_TREE_PREEMPT_RCU=y
>> > CONFIG_PREEMPT_RCU=y
>> > CONFIG_PREEMPT_NOTIFIERS=y
>> > # CONFIG_PREEMPT_NONE is not set
>> > # CONFIG_PREEMPT_VOLUNTARY is not set
>> > CONFIG_PREEMPT=y
>> > CONFIG_PREEMPT_COUNT=y
>> 
>> Here's what I think that is happening
>> 
>>   vcpu_load
>>   ...
>>   vmx_save_host_state
>>   vmx_vcpu_run
>>   (ds.cpl, es.cpl cleared by hardware)
>> 
>>   interrupt
>>     push ds, es  # pushes bad ds, es
>>     schedule
>>       vmx_vcpu_put
>>         vmx_load_host_state
>>           reload ds, es
>>     pop ds, es  # of other thread's stack
>>     iret
>>   # other thread runs
>>   interrupt
>>     schedule  # back in vcpu thread
>>     interrupt return: pop ds, es  # <-- problem
>>     iret
>> 
>>    ...
>>    vcpu_put
>> 
>>    # bad ds, es, but !vmx->host_state.loaded
>> 
>> Marcelo, did I miss something here?
> 
> Don't think so.

So the same problem should happen with %fs and %gs, no?

x86_64 is safe, since it entry_64.S never saves/restores segment registers.

-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-31  9:11                                                           ` Avi Kivity
@ 2012-07-31 16:29                                                             ` Marcelo Tosatti
  2012-07-31 16:46                                                               ` Avi Kivity
  0 siblings, 1 reply; 42+ messages in thread
From: Marcelo Tosatti @ 2012-07-31 16:29 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Chris Clayton, Gleb Natapov, Eric Northup, kvm, Jan Kiszka

On Tue, Jul 31, 2012 at 12:11:13PM +0300, Avi Kivity wrote:
> On 07/31/2012 02:36 AM, Marcelo Tosatti wrote:
> > On Mon, Jul 30, 2012 at 07:39:31PM +0300, Avi Kivity wrote:
> >> On 07/30/2012 05:07 PM, Chris Clayton wrote:
> >> >>
> >> >>>> With kernel 3.5.0 with b2da15ac26a0c00 reverted, I have just had 15
> >> >>>> clean invocations of vanilla qemu-kvm-1.1.1. So that commit would seem
> >> >>>> to be the problem.
> >> >>>
> >> >>> Just to be sure, I've run some more tests today. No crashes occurred in
> >> >>> 20 runs of vanilla qemu-kvm-1.1.1 on kernel 3.5.0 with b2da15ac26a0c00
> >> >>> reverted.
> >> >>
> >> >> Ok.  I'm trying to reproduce it here on a nested-virt setup, since the
> >> >> code looks correct.
> >> >>
> >> >> What's your preemption settings?
> >> >>
> >> >>
> >> > [chris:~/kernel/linux-3.5.0]$ grep PREEMPT .config
> >> > CONFIG_TREE_PREEMPT_RCU=y
> >> > CONFIG_PREEMPT_RCU=y
> >> > CONFIG_PREEMPT_NOTIFIERS=y
> >> > # CONFIG_PREEMPT_NONE is not set
> >> > # CONFIG_PREEMPT_VOLUNTARY is not set
> >> > CONFIG_PREEMPT=y
> >> > CONFIG_PREEMPT_COUNT=y
> >> 
> >> Here's what I think that is happening
> >> 
> >>   vcpu_load
> >>   ...
> >>   vmx_save_host_state
> >>   vmx_vcpu_run
> >>   (ds.cpl, es.cpl cleared by hardware)
> >> 
> >>   interrupt
> >>     push ds, es  # pushes bad ds, es
> >>     schedule
> >>       vmx_vcpu_put
> >>         vmx_load_host_state
> >>           reload ds, es
> >>     pop ds, es  # of other thread's stack
> >>     iret
> >>   # other thread runs
> >>   interrupt
> >>     schedule  # back in vcpu thread
> >>     interrupt return: pop ds, es  # <-- problem
> >>     iret
> >> 
> >>    ...
> >>    vcpu_put
> >> 
> >>    # bad ds, es, but !vmx->host_state.loaded
> >> 
> >> Marcelo, did I miss something here?
> > 
> > Don't think so.
> 
> So the same problem should happen with %fs and %gs, no?

AFAICS: 

depends on CONFIG_X86_32_LAZY_GS for GS, unconditional for FS.

> x86_64 is safe, since it entry_64.S never saves/restores segment registers.

Is the comment 

        /*
         * The sysexit path does not restore ds/es, so we must set them
         * to
         * a reasonable value ourselves.
         */

Correct?

syscall_exit -> syscall_exit_work -> resume_userspace ->
restore_all -> RESTORE_REGS

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-31 16:29                                                             ` Marcelo Tosatti
@ 2012-07-31 16:46                                                               ` Avi Kivity
  0 siblings, 0 replies; 42+ messages in thread
From: Avi Kivity @ 2012-07-31 16:46 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Chris Clayton, Gleb Natapov, Eric Northup, kvm, Jan Kiszka

On 07/31/2012 07:29 PM, Marcelo Tosatti wrote:
>> 
>> So the same problem should happen with %fs and %gs, no?
> 
> AFAICS: 
> 
> depends on CONFIG_X86_32_LAZY_GS for GS, unconditional for FS.

This fs/gs were already in there, I wonder how it wasn't broken before.
 Something's fishy here.

> 
>> x86_64 is safe, since it entry_64.S never saves/restores segment registers.
> 
> Is the comment 
> 
>         /*
>          * The sysexit path does not restore ds/es, so we must set them
>          * to
>          * a reasonable value ourselves.
>          */
> 
> Correct?
> 
> syscall_exit -> syscall_exit_work -> resume_userspace ->
> restore_all -> RESTORE_REGS
> 

That's the non-sysexit path (could have arrived here by sysenter).  Look
at sysenter_exit.

-- 
error compiling committee.c: too many arguments to function



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
  2012-07-30 16:39                                                       ` Avi Kivity
  2012-07-30 23:36                                                         ` Marcelo Tosatti
@ 2012-08-01 13:11                                                         ` Avi Kivity
  1 sibling, 0 replies; 42+ messages in thread
From: Avi Kivity @ 2012-08-01 13:11 UTC (permalink / raw)
  To: Chris Clayton
  Cc: Gleb Natapov, Eric Northup, kvm, Jan Kiszka, Marcelo Tosatti

On 07/30/2012 07:39 PM, Avi Kivity wrote:
> On 07/30/2012 05:07 PM, Chris Clayton wrote:
>>>
>>>>> With kernel 3.5.0 with b2da15ac26a0c00 reverted, I have just had 15
>>>>> clean invocations of vanilla qemu-kvm-1.1.1. So that commit would seem
>>>>> to be the problem.
>>>>
>>>> Just to be sure, I've run some more tests today. No crashes occurred in
>>>> 20 runs of vanilla qemu-kvm-1.1.1 on kernel 3.5.0 with b2da15ac26a0c00
>>>> reverted.
>>>
>>> Ok.  I'm trying to reproduce it here on a nested-virt setup, since the
>>> code looks correct.
>>>
>>> What's your preemption settings?
>>>
>>>
>> [chris:~/kernel/linux-3.5.0]$ grep PREEMPT .config
>> CONFIG_TREE_PREEMPT_RCU=y
>> CONFIG_PREEMPT_RCU=y
>> CONFIG_PREEMPT_NOTIFIERS=y
>> # CONFIG_PREEMPT_NONE is not set
>> # CONFIG_PREEMPT_VOLUNTARY is not set
>> CONFIG_PREEMPT=y
>> CONFIG_PREEMPT_COUNT=y
> 
> Here's what I think that is happening
> 
>   vcpu_load
>   ...
>   vmx_save_host_state
>   vmx_vcpu_run
>   (ds.cpl, es.cpl cleared by hardware)
> 
>   interrupt
>     push ds, es  # pushes bad ds, es
>     schedule
>       vmx_vcpu_put
>         vmx_load_host_state
>           reload ds, es
>     pop ds, es  # of other thread's stack
>     iret
>   # other thread runs
>   interrupt
>     schedule  # back in vcpu thread
>     interrupt return: pop ds, es  # <-- problem

In fact, those are fine.

>     iret

But IRET-to-outer-privilege-level clears segment registers with the
wrong RPL.  Think how secure OSes would be if they used the hardware
fully.  Credit to Gleb for pinpointing this.

> 
>    ...
>    vcpu_put
> 
>    # bad ds, es, but !vmx->host_state.loaded
> 


-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2012-08-01 13:11 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-09 10:57 qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6 Chris Clayton
2012-07-11  7:09 ` Chris Clayton
2012-07-11  7:12   ` Gleb Natapov
2012-07-11  7:18     ` Chris Clayton
2012-07-11  7:22       ` Gleb Natapov
2012-07-15 19:52         ` Chris Clayton
2012-07-19 12:14           ` Chris Clayton
2012-07-19 12:17             ` Avi Kivity
2012-07-19 18:23               ` Chris Clayton
2012-07-26  9:52                 ` Chris Clayton
2012-07-26 10:01                   ` Avi Kivity
2012-07-26 10:29                     ` Jan Kiszka
2012-07-26 10:45                       ` Avi Kivity
2012-07-26 10:49                         ` Jan Kiszka
2012-07-26 11:04                           ` Jan Kiszka
2012-07-26 11:58                     ` Chris Clayton
2012-07-26 12:07                       ` Avi Kivity
2012-07-26 23:22                         ` Chris Clayton
2012-07-27 10:46                           ` Chris Clayton
     [not found]                             ` <CAG7+5M2y8gJvDCNuWsSB3zH=r75H0Mn=JNV+4DBc5xYjM+BJWA@mail.gmail.com>
2012-07-27 19:04                               ` Chris Clayton
2012-07-29 12:42                                 ` Avi Kivity
2012-07-29 14:03                                   ` Chris Clayton
2012-07-29 14:18                                     ` Avi Kivity
2012-07-29 14:48                                       ` Avi Kivity
2012-07-29 15:21                                         ` Chris Clayton
2012-07-29 15:47                                       ` Avi Kivity
2012-07-29 16:34                                         ` Avi Kivity
2012-07-29 17:50                                           ` Chris Clayton
2012-07-29 17:54                                             ` Gleb Natapov
2012-07-29 19:10                                               ` Chris Clayton
2012-07-30 14:00                                                 ` Chris Clayton
2012-07-30 14:03                                                   ` Avi Kivity
2012-07-30 14:07                                                     ` Chris Clayton
2012-07-30 16:39                                                       ` Avi Kivity
2012-07-30 23:36                                                         ` Marcelo Tosatti
2012-07-31  9:11                                                           ` Avi Kivity
2012-07-31 16:29                                                             ` Marcelo Tosatti
2012-07-31 16:46                                                               ` Avi Kivity
2012-08-01 13:11                                                         ` Avi Kivity
2012-07-26 12:09                       ` Jan Kiszka
2012-07-26 11:10                   ` Xiao Guangrong
2012-07-26 13:49                     ` Chris Clayton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.