All of lore.kernel.org
 help / color / mirror / Atom feed
* Fwd: Persistent rt_sigreturn segfaults on KVM VMs after upgrade to 5.15
@ 2023-05-18 13:57 Bagas Sanjaya
  2023-05-18 14:00 ` Bagas Sanjaya
  0 siblings, 1 reply; 4+ messages in thread
From: Bagas Sanjaya @ 2023-05-18 13:57 UTC (permalink / raw)
  To: Linux Kernel Mailing List, Linux Regressions, Linux KVM
  Cc: Paolo Bonzini, Sean Christopherson

Hi,

I notice a regression report on Bugzilla [1]. Quoting from it:

> I'm experiencing sporadic but persistent segmentation faults on the KVM VMs I manage. These faults began appearing after upgrading from Linux Kernel 4.x to 5.15.59. I further upgraded to 5.15.91 and transitioned the userspace from Debian 10 (buster) to Debian 11 (bullseye), yet the issues persist. Notably, the libc has also changed in the process as seen in the following error logs:
> 
> 
> post.sh[21952]: bad frame in rt_sigreturn frame:000072db65961bb8 ip:6c25f82a9a5d sp:72db65962168 orax:ffffffffffffffff in libc-2.28.so[6c25f8294000+147000]
> 
> cron[7626]: bad frame in rt_sigreturn frame:000073ddebeb6ff8 ip:72ad9f44d594 sp:73ddebeb75a8 orax:ffffffffffffffff in libc-2.28.so[72ad9f3a9000+147000]
> 
> cron[64687]: bad frame in rt_sigreturn frame:000073265764b038 ip:67c7b5a0f14a sp:73265764b5f0 orax:ffffffffffffffff in libc-2.31.so[67c7b596f000+159000]
> 
> worker.py[54568]: bad frame in rt_sigreturn frame:000078eef6591cf8 ip:6c9f9b2a604e sp:78eef6592298 orax:ffffffffffffffff in libpthread-2.31.so[6c9f9b29a000+10000]
> 
> 
> The segmentation faults occur 1-3 times daily across approximately 1000 VMs running on hundreds of (supermicro, intel cpu) bare-metal servers. Currently, there's no reliable way for me to reproduce the issue. I initially considered this bug - https://www.spinics.net/lists/linux-tip-commits/msg61293.html - as a possible cause, but judging from the comments it likely isn't.
> 
> The best approximation to a reproducer I have is a Python script that initiates several child processes and continuously sends them a sigusr1 signal. Still, it takes a few hours to trigger the issue even when running this script on several hundred VMs.
> 
> Switching to the 6.x kernel isn't immediately feasible as these are production systems with specific requirements. The transition is planned but will likely take several months.
> 
> I'm looking for suggestions on how to more reliably reproduce this problem. Then I could try different old and new kernels and maybe narrow it down.

See bugzilla for the full thread.

Anyway, I'm adding it to regzbot:

#regzbot introduced: v4.19..v5.15 https://bugzilla.kernel.org/show_bug.cgi?id=217457
#regzbot title: bad frame in rt_sigreturn (libc-related?) regression after 5.15 upgrade

Thanks.

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=217457

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Fwd: Persistent rt_sigreturn segfaults on KVM VMs after upgrade to 5.15
  2023-05-18 13:57 Fwd: Persistent rt_sigreturn segfaults on KVM VMs after upgrade to 5.15 Bagas Sanjaya
@ 2023-05-18 14:00 ` Bagas Sanjaya
  2023-05-18 15:01   ` Sean Christopherson
  0 siblings, 1 reply; 4+ messages in thread
From: Bagas Sanjaya @ 2023-05-18 14:00 UTC (permalink / raw)
  To: Linux Kernel Mailing List, Linux Regressions, Linux KVM
  Cc: Paolo Bonzini, Sean Christopherson, Theodor Milkov

On 5/18/23 20:57, Bagas Sanjaya wrote:
> Hi,
> 
> I notice a regression report on Bugzilla [1]. Quoting from it:
> 
>> I'm experiencing sporadic but persistent segmentation faults on the KVM VMs I manage. These faults began appearing after upgrading from Linux Kernel 4.x to 5.15.59. I further upgraded to 5.15.91 and transitioned the userspace from Debian 10 (buster) to Debian 11 (bullseye), yet the issues persist. Notably, the libc has also changed in the process as seen in the following error logs:
>>
>>
>> post.sh[21952]: bad frame in rt_sigreturn frame:000072db65961bb8 ip:6c25f82a9a5d sp:72db65962168 orax:ffffffffffffffff in libc-2.28.so[6c25f8294000+147000]
>>
>> cron[7626]: bad frame in rt_sigreturn frame:000073ddebeb6ff8 ip:72ad9f44d594 sp:73ddebeb75a8 orax:ffffffffffffffff in libc-2.28.so[72ad9f3a9000+147000]
>>
>> cron[64687]: bad frame in rt_sigreturn frame:000073265764b038 ip:67c7b5a0f14a sp:73265764b5f0 orax:ffffffffffffffff in libc-2.31.so[67c7b596f000+159000]
>>
>> worker.py[54568]: bad frame in rt_sigreturn frame:000078eef6591cf8 ip:6c9f9b2a604e sp:78eef6592298 orax:ffffffffffffffff in libpthread-2.31.so[6c9f9b29a000+10000]
>>
>>
>> The segmentation faults occur 1-3 times daily across approximately 1000 VMs running on hundreds of (supermicro, intel cpu) bare-metal servers. Currently, there's no reliable way for me to reproduce the issue. I initially considered this bug - https://www.spinics.net/lists/linux-tip-commits/msg61293.html - as a possible cause, but judging from the comments it likely isn't.
>>
>> The best approximation to a reproducer I have is a Python script that initiates several child processes and continuously sends them a sigusr1 signal. Still, it takes a few hours to trigger the issue even when running this script on several hundred VMs.
>>
>> Switching to the 6.x kernel isn't immediately feasible as these are production systems with specific requirements. The transition is planned but will likely take several months.
>>
>> I'm looking for suggestions on how to more reliably reproduce this problem. Then I could try different old and new kernels and maybe narrow it down.
> 
> See bugzilla for the full thread.
> 
> Anyway, I'm adding it to regzbot:
> 
> #regzbot introduced: v4.19..v5.15 https://bugzilla.kernel.org/show_bug.cgi?id=217457
> #regzbot title: bad frame in rt_sigreturn (libc-related?) regression after 5.15 upgrade
> 

Oops, I forgot to add the reporter:

#regzbot from: Theodor Milkov <tm@del.bg>

Sorry for inconvenience.

-- 
An old man doll... just what I always wanted! - Clara


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Fwd: Persistent rt_sigreturn segfaults on KVM VMs after upgrade to 5.15
  2023-05-18 14:00 ` Bagas Sanjaya
@ 2023-05-18 15:01   ` Sean Christopherson
  2023-05-19  8:19     ` Bagas Sanjaya
  0 siblings, 1 reply; 4+ messages in thread
From: Sean Christopherson @ 2023-05-18 15:01 UTC (permalink / raw)
  To: Bagas Sanjaya
  Cc: Linux Kernel Mailing List, Linux Regressions, Linux KVM,
	Paolo Bonzini, Theodor Milkov

On Thu, May 18, 2023, Bagas Sanjaya wrote:
> On 5/18/23 20:57, Bagas Sanjaya wrote:
> > Hi,
> > 
> > I notice a regression report on Bugzilla [1]. Quoting from it:
> > 
> >> I'm experiencing sporadic but persistent segmentation faults on the KVM
> >> VMs I manage. These faults began appearing after upgrading from Linux
> >> Kernel 4.x to 5.15.59. I further upgraded to 5.15.91 and transitioned the
> >> userspace from Debian 10 (buster) to Debian 11 (bullseye), yet the issues
> >> persist. Notably, the libc has also changed in the process as seen in the
> >> following error logs:

Was the host or guest kernel upgraded?  If the guest kernel was upgraded, it's
unlikely, though still possible, that this is a KVM bug.

> >> post.sh[21952]: bad frame in rt_sigreturn frame:000072db65961bb8
> >> ip:6c25f82a9a5d sp:72db65962168 orax:ffffffffffffffff in
> >> libc-2.28.so[6c25f8294000+147000]
> >>
> >> cron[7626]: bad frame in rt_sigreturn frame:000073ddebeb6ff8
> >> ip:72ad9f44d594 sp:73ddebeb75a8 orax:ffffffffffffffff in
> >> libc-2.28.so[72ad9f3a9000+147000]
> >>
> >> cron[64687]: bad frame in rt_sigreturn frame:000073265764b038
> >> ip:67c7b5a0f14a sp:73265764b5f0 orax:ffffffffffffffff in
> >> libc-2.31.so[67c7b596f000+159000]
> >>
> >> worker.py[54568]: bad frame in rt_sigreturn frame:000078eef6591cf8
> >> ip:6c9f9b2a604e sp:78eef6592298 orax:ffffffffffffffff in
> >> libpthread-2.31.so[6c9f9b29a000+10000]
> >>
> >>
> >> The segmentation faults occur 1-3 times daily across approximately 1000
> >> VMs running on hundreds of (supermicro, intel cpu) bare-metal servers.
> >> Currently, there's no reliable way for me to reproduce the issue. I
> >> initially considered this bug -
> >> https://www.spinics.net/lists/linux-tip-commits/msg61293.html - as a
> >> possible cause, but judging from the comments it likely isn't.
> >>
> >> The best approximation to a reproducer I have is a Python script that
> >> initiates several child processes and continuously sends them a sigusr1
> >> signal. Still, it takes a few hours to trigger the issue even when running
> >> this script on several hundred VMs.
> >>
> >> Switching to the 6.x kernel isn't immediately feasible as these are
> >> production systems with specific requirements. The transition is planned
> >> but will likely take several months.
> >>
> >> I'm looking for suggestions on how to more reliably reproduce this
> >> problem. Then I could try different old and new kernels and maybe narrow
> >> it down.
> > 
> > See bugzilla for the full thread.
> > 
> > Anyway, I'm adding it to regzbot:
> > 
> > #regzbot introduced: v4.19..v5.15 https://bugzilla.kernel.org/show_bug.cgi?id=217457
> > #regzbot title: bad frame in rt_sigreturn (libc-related?) regression after 5.15 upgrade
> > 
> 
> Oops, I forgot to add the reporter:
> 
> #regzbot from: Theodor Milkov <tm@del.bg>
> 
> Sorry for inconvenience.
> 
> -- 
> An old man doll... just what I always wanted! - Clara
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Fwd: Persistent rt_sigreturn segfaults on KVM VMs after upgrade to 5.15
  2023-05-18 15:01   ` Sean Christopherson
@ 2023-05-19  8:19     ` Bagas Sanjaya
  0 siblings, 0 replies; 4+ messages in thread
From: Bagas Sanjaya @ 2023-05-19  8:19 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Linux Kernel Mailing List, Linux Regressions, Linux KVM,
	Paolo Bonzini, Theodor Milkov

On 5/18/23 22:01, Sean Christopherson wrote:
> On Thu, May 18, 2023, Bagas Sanjaya wrote:
>> On 5/18/23 20:57, Bagas Sanjaya wrote:
>>> Hi,
>>>
>>> I notice a regression report on Bugzilla [1]. Quoting from it:
>>>
>>>> I'm experiencing sporadic but persistent segmentation faults on the KVM
>>>> VMs I manage. These faults began appearing after upgrading from Linux
>>>> Kernel 4.x to 5.15.59. I further upgraded to 5.15.91 and transitioned the
>>>> userspace from Debian 10 (buster) to Debian 11 (bullseye), yet the issues
>>>> persist. Notably, the libc has also changed in the process as seen in the
>>>> following error logs:
> 
> Was the host or guest kernel upgraded?  If the guest kernel was upgraded, it's
> unlikely, though still possible, that this is a KVM bug.
> 

The reporter [1] said that this regression occurs on guest kernel.

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=217457#c3

-- 
An old man doll... just what I always wanted! - Clara


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-05-19  8:19 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-18 13:57 Fwd: Persistent rt_sigreturn segfaults on KVM VMs after upgrade to 5.15 Bagas Sanjaya
2023-05-18 14:00 ` Bagas Sanjaya
2023-05-18 15:01   ` Sean Christopherson
2023-05-19  8:19     ` Bagas Sanjaya

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.