All of lore.kernel.org
 help / color / mirror / Atom feed
* qemu-upstream triggering OOM killer
@ 2017-02-09 14:57 Jan Beulich
  2017-02-09 22:24 ` Stefano Stabellini
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2017-02-09 14:57 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel

Stefano,

the recent qemuu update results in the produced binary triggering the
OOM killer on the first system I tried the updated code on. Is there
anything known in this area? Are there any hints as to finding out
what is going wrong?

Thanks, Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: qemu-upstream triggering OOM killer
  2017-02-09 14:57 qemu-upstream triggering OOM killer Jan Beulich
@ 2017-02-09 22:24 ` Stefano Stabellini
  2017-02-10  9:54   ` Jan Beulich
  0 siblings, 1 reply; 10+ messages in thread
From: Stefano Stabellini @ 2017-02-09 22:24 UTC (permalink / raw)
  To: Jan Beulich; +Cc: anthony.perard, xen-devel, Stefano Stabellini

CC'ing Anthony

On Thu, 9 Feb 2017, Jan Beulich wrote:
> Stefano,
> 
> the recent qemuu update results in the produced binary triggering the
> OOM killer on the first system I tried the updated code on. Is there
> anything known in this area? Are there any hints as to finding out
> what is going wrong?

Hi Jan,

Do you mean QEMU upstream (from qemu.org) or qemu-xen/staging (that
hasn't changed much in the last couple of months)? Do you know if it's
something Xen specific?

In terms of new Xen specific changes in QEMU upstream,
3a6c9172ac5951e6dac2b3f6 has the potential for changing memory
allocations, but shouldn't cause an OOM. If it's easy to reproduce, I
would bisect it.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: qemu-upstream triggering OOM killer
  2017-02-09 22:24 ` Stefano Stabellini
@ 2017-02-10  9:54   ` Jan Beulich
  2017-02-14 14:56     ` Anthony PERARD
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2017-02-10  9:54 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: anthony.perard, xen-devel

>>> On 09.02.17 at 23:24, <sstabellini@kernel.org> wrote:
> On Thu, 9 Feb 2017, Jan Beulich wrote:
>> the recent qemuu update results in the produced binary triggering the
>> OOM killer on the first system I tried the updated code on. Is there
>> anything known in this area? Are there any hints as to finding out
>> what is going wrong?
> 
> Do you mean QEMU upstream (from qemu.org) or qemu-xen/staging (that
> hasn't changed much in the last couple of months)?

The latter. The diff to my last snapshot (from early January) is 6.6Mb
though - I wouldn't call this "hasn't changed much". Looks like Anthony
did update to 2.8.0 in early January (a day or two after I had last
snapshotted it).

> Do you know if it's something Xen specific?

Not so far. It appears to happen when grub clears the screen
before displaying its graphical menu, so I'd rather suspect an issue
with a graphics related change (the one you pointed out isn't).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: qemu-upstream triggering OOM killer
  2017-02-10  9:54   ` Jan Beulich
@ 2017-02-14 14:56     ` Anthony PERARD
  2017-02-15  9:07       ` Jan Beulich
  2017-02-16 15:23       ` Jan Beulich
  0 siblings, 2 replies; 10+ messages in thread
From: Anthony PERARD @ 2017-02-14 14:56 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Stefano Stabellini

On Fri, Feb 10, 2017 at 02:54:23AM -0700, Jan Beulich wrote:
> >>> On 09.02.17 at 23:24, <sstabellini@kernel.org> wrote:
> > On Thu, 9 Feb 2017, Jan Beulich wrote:
> >> the recent qemuu update results in the produced binary triggering the
> >> OOM killer on the first system I tried the updated code on. Is there
> >> anything known in this area? Are there any hints as to finding out
> >> what is going wrong?
> > 
> > Do you mean QEMU upstream (from qemu.org) or qemu-xen/staging (that
> > hasn't changed much in the last couple of months)?
> 
> The latter. The diff to my last snapshot (from early January) is 6.6Mb
> though - I wouldn't call this "hasn't changed much". Looks like Anthony
> did update to 2.8.0 in early January (a day or two after I had last
> snapshotted it).

Yes, I did the update.

> > Do you know if it's something Xen specific?
> 
> Not so far. It appears to happen when grub clears the screen
> before displaying its graphical menu, so I'd rather suspect an issue
> with a graphics related change (the one you pointed out isn't).

I tried to reproduce this, by limiting the amount of memory available to
qemu using cgroups, but about 44MB of memory is enough to boot a guest
(tried Ubuntu and Debian).

How much memory did qemu try to use?
What guest did you try to boot?
What the xl configuration?

-- 
Anthony PERARD

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: qemu-upstream triggering OOM killer
  2017-02-14 14:56     ` Anthony PERARD
@ 2017-02-15  9:07       ` Jan Beulich
  2017-02-16 15:23       ` Jan Beulich
  1 sibling, 0 replies; 10+ messages in thread
From: Jan Beulich @ 2017-02-15  9:07 UTC (permalink / raw)
  To: Anthony PERARD; +Cc: xen-devel, Stefano Stabellini

>>> On 14.02.17 at 15:56, <anthony.perard@citrix.com> wrote:
> I tried to reproduce this, by limiting the amount of memory available to
> qemu using cgroups, but about 44MB of memory is enough to boot a guest
> (tried Ubuntu and Debian).
> 
> How much memory did qemu try to use?

According to

kernel: Out of memory: Kill process 9018 (qemu-system-i38) score 291 or sacrifice child
kernel: Killed process 9018 (qemu-system-i38) total-vm:4525776kB, anon-rss:3732084kB, file-rss:4kB, shmem-rss:0kB
kernel: oom_reaper: reaped process 9018 (qemu-system-i38), now anon-rss:3720556kB, file-rss:0kB, shmem-rss:0kB

well over 4Gb.

> What guest did you try to boot?

SLES11 SP4, but I've also just now tried Win7, which demonstrates
the same behavior. I'd like to note that the host is an older distro,
so I can't exclude there being something that qemu now simply
expects to be newer (I find upstream qemu to be notoriously bad
in ensuring backwards compatibility, but usually this surfaces as
build problems).

> What the xl configuration?

name="sles11-hvm64"
description="None"
uuid="5b537ac6-aa31-9c3e-9c7c-f0360a19acd5"
memory=2048
maxmem=3072
vcpus=8
on_poweroff="destroy"
on_reboot="restart"
on_crash="destroy"
localtime=0
keymap="en-us"

builder="hvm"
boot="c"
disk=[ 'file:/var/lib/xen/images/sles11-hvm64/disk0.raw,hda,w', ]
vif=[ 'mac=00:16:3e:33:39:71,bridge=br0,model=rtl8139,type=vif', ]

stdvga=0
vnc=1
vncunused=1
viridian=0
acpi=1
pae=1

serial="pty"

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: qemu-upstream triggering OOM killer
  2017-02-14 14:56     ` Anthony PERARD
  2017-02-15  9:07       ` Jan Beulich
@ 2017-02-16 15:23       ` Jan Beulich
  2017-02-16 16:28         ` Jan Beulich
  1 sibling, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2017-02-16 15:23 UTC (permalink / raw)
  To: Anthony PERARD; +Cc: xen-devel, Stefano Stabellini

>>> On 14.02.17 at 15:56, <anthony.perard@citrix.com> wrote:
> On Fri, Feb 10, 2017 at 02:54:23AM -0700, Jan Beulich wrote:
>> Not so far. It appears to happen when grub clears the screen
>> before displaying its graphical menu, so I'd rather suspect an issue
>> with a graphics related change (the one you pointed out isn't).
> 
> I tried to reproduce this, by limiting the amount of memory available to
> qemu using cgroups, but about 44MB of memory is enough to boot a guest
> (tried Ubuntu and Debian).

Okay, not a qemuu regression after all, but a libxc one. It just so
happens that qemut tries to allocate a much larger amount, which
triggers mmap() failure earlier and hence doesn't manage to trigger
the oom killer. Patch (almost) on its way.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: qemu-upstream triggering OOM killer
  2017-02-16 15:23       ` Jan Beulich
@ 2017-02-16 16:28         ` Jan Beulich
  2017-02-16 18:38           ` Stefano Stabellini
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2017-02-16 16:28 UTC (permalink / raw)
  To: Anthony PERARD, Stefano Stabellini; +Cc: xen-devel

>>> On 16.02.17 at 16:23, <JBeulich@suse.com> wrote:
>>>> On 14.02.17 at 15:56, <anthony.perard@citrix.com> wrote:
>> On Fri, Feb 10, 2017 at 02:54:23AM -0700, Jan Beulich wrote:
>>> Not so far. It appears to happen when grub clears the screen
>>> before displaying its graphical menu, so I'd rather suspect an issue
>>> with a graphics related change (the one you pointed out isn't).
>> 
>> I tried to reproduce this, by limiting the amount of memory available to
>> qemu using cgroups, but about 44MB of memory is enough to boot a guest
>> (tried Ubuntu and Debian).
> 
> Okay, not a qemuu regression after all, but a libxc one. It just so
> happens that qemut tries to allocate a much larger amount, which
> triggers mmap() failure earlier and hence doesn't manage to trigger
> the oom killer. Patch (almost) on its way.

Patch sent, allowing that guest to get further (and Windows to
properly boot). However, now the guest is stuck right at the point
where X wants to switch to its designated video mode, with qemu
(for somewhere between half a minute and a minute) consuming
one full CPU's bandwidth. Once qemu's CPU consumption went
down, no further progress is being made though.

Again I'd be thankful for hints on how to debug such a situation.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: qemu-upstream triggering OOM killer
  2017-02-16 16:28         ` Jan Beulich
@ 2017-02-16 18:38           ` Stefano Stabellini
  2017-02-17  7:08             ` Jan Beulich
  0 siblings, 1 reply; 10+ messages in thread
From: Stefano Stabellini @ 2017-02-16 18:38 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Anthony PERARD, xen-devel, Stefano Stabellini

On Thu, 16 Feb 2017, Jan Beulich wrote:
> >>> On 16.02.17 at 16:23, <JBeulich@suse.com> wrote:
> >>>> On 14.02.17 at 15:56, <anthony.perard@citrix.com> wrote:
> >> On Fri, Feb 10, 2017 at 02:54:23AM -0700, Jan Beulich wrote:
> >>> Not so far. It appears to happen when grub clears the screen
> >>> before displaying its graphical menu, so I'd rather suspect an issue
> >>> with a graphics related change (the one you pointed out isn't).
> >> 
> >> I tried to reproduce this, by limiting the amount of memory available to
> >> qemu using cgroups, but about 44MB of memory is enough to boot a guest
> >> (tried Ubuntu and Debian).
> > 
> > Okay, not a qemuu regression after all, but a libxc one. It just so
> > happens that qemut tries to allocate a much larger amount, which
> > triggers mmap() failure earlier and hence doesn't manage to trigger
> > the oom killer. Patch (almost) on its way.
> 
> Patch sent, allowing that guest to get further (and Windows to
> properly boot). However, now the guest is stuck right at the point
> where X wants to switch to its designated video mode, with qemu
> (for somewhere between half a minute and a minute) consuming
> one full CPU's bandwidth. Once qemu's CPU consumption went
> down, no further progress is being made though.
> 
> Again I'd be thankful for hints on how to debug such a situation.

I would bisect it. It's probably due to a change in the cirrus vga code
or common vga code. It might be worth testing with stdvga=1 to narrow it
down.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: qemu-upstream triggering OOM killer
  2017-02-16 18:38           ` Stefano Stabellini
@ 2017-02-17  7:08             ` Jan Beulich
  2017-02-17 18:39               ` Stefano Stabellini
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Beulich @ 2017-02-17  7:08 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Anthony PERARD, xen-devel

>>> On 16.02.17 at 19:38, <sstabellini@kernel.org> wrote:
> On Thu, 16 Feb 2017, Jan Beulich wrote:
>> >>> On 16.02.17 at 16:23, <JBeulich@suse.com> wrote:
>> >>>> On 14.02.17 at 15:56, <anthony.perard@citrix.com> wrote:
>> >> On Fri, Feb 10, 2017 at 02:54:23AM -0700, Jan Beulich wrote:
>> >>> Not so far. It appears to happen when grub clears the screen
>> >>> before displaying its graphical menu, so I'd rather suspect an issue
>> >>> with a graphics related change (the one you pointed out isn't).
>> >> 
>> >> I tried to reproduce this, by limiting the amount of memory available to
>> >> qemu using cgroups, but about 44MB of memory is enough to boot a guest
>> >> (tried Ubuntu and Debian).
>> > 
>> > Okay, not a qemuu regression after all, but a libxc one. It just so
>> > happens that qemut tries to allocate a much larger amount, which
>> > triggers mmap() failure earlier and hence doesn't manage to trigger
>> > the oom killer. Patch (almost) on its way.
>> 
>> Patch sent, allowing that guest to get further (and Windows to
>> properly boot). However, now the guest is stuck right at the point
>> where X wants to switch to its designated video mode, with qemu
>> (for somewhere between half a minute and a minute) consuming
>> one full CPU's bandwidth. Once qemu's CPU consumption went
>> down, no further progress is being made though.
>> 
>> Again I'd be thankful for hints on how to debug such a situation.
> 
> I would bisect it. It's probably due to a change in the cirrus vga code
> or common vga code. It might be worth testing with stdvga=1 to narrow it
> down.

No need to bisect - I finally remembered the behavior matching a
regression I had spotted back in December with a security backport
to one of our older trees. Commit 913a87885f ("display: cirrus:
ignore source pitch value as needed in blit_is_unsafe") needs
backporting.

Considering that this has been around for a while, it raises another
question: Are regression fixes being actively looked for by the two
of you, or are we depending on people running into issues for
necessary fixes to be pulled in?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: qemu-upstream triggering OOM killer
  2017-02-17  7:08             ` Jan Beulich
@ 2017-02-17 18:39               ` Stefano Stabellini
  0 siblings, 0 replies; 10+ messages in thread
From: Stefano Stabellini @ 2017-02-17 18:39 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Anthony PERARD, xen-devel, Stefano Stabellini

On Fri, 17 Feb 2017, Jan Beulich wrote:
> >>> On 16.02.17 at 19:38, <sstabellini@kernel.org> wrote:
> > On Thu, 16 Feb 2017, Jan Beulich wrote:
> >> >>> On 16.02.17 at 16:23, <JBeulich@suse.com> wrote:
> >> >>>> On 14.02.17 at 15:56, <anthony.perard@citrix.com> wrote:
> >> >> On Fri, Feb 10, 2017 at 02:54:23AM -0700, Jan Beulich wrote:
> >> >>> Not so far. It appears to happen when grub clears the screen
> >> >>> before displaying its graphical menu, so I'd rather suspect an issue
> >> >>> with a graphics related change (the one you pointed out isn't).
> >> >> 
> >> >> I tried to reproduce this, by limiting the amount of memory available to
> >> >> qemu using cgroups, but about 44MB of memory is enough to boot a guest
> >> >> (tried Ubuntu and Debian).
> >> > 
> >> > Okay, not a qemuu regression after all, but a libxc one. It just so
> >> > happens that qemut tries to allocate a much larger amount, which
> >> > triggers mmap() failure earlier and hence doesn't manage to trigger
> >> > the oom killer. Patch (almost) on its way.
> >> 
> >> Patch sent, allowing that guest to get further (and Windows to
> >> properly boot). However, now the guest is stuck right at the point
> >> where X wants to switch to its designated video mode, with qemu
> >> (for somewhere between half a minute and a minute) consuming
> >> one full CPU's bandwidth. Once qemu's CPU consumption went
> >> down, no further progress is being made though.
> >> 
> >> Again I'd be thankful for hints on how to debug such a situation.
> > 
> > I would bisect it. It's probably due to a change in the cirrus vga code
> > or common vga code. It might be worth testing with stdvga=1 to narrow it
> > down.
> 
> No need to bisect - I finally remembered the behavior matching a
> regression I had spotted back in December with a security backport
> to one of our older trees. Commit 913a87885f ("display: cirrus:
> ignore source pitch value as needed in blit_is_unsafe") needs
> backporting.

Done


> Considering that this has been around for a while, it raises another
> question: Are regression fixes being actively looked for by the two
> of you, or are we depending on people running into issues for
> necessary fixes to be pulled in?

Anthony often looks at osstest results. I try to make sure either me or
somebody else is looking at outstanding bugs and regressions. In this
case for example, Anthony offered to help. I backported another fix to a
bug reported by Boris just yesterday. But for this to happen, we need to
know there is a regression in the first place. With the wide range of
guests and QEMU options available, it is not surprising that bugs slip
through. For example, I never test with Windows guests, I don't even a
license anymore.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-02-17 18:40 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-09 14:57 qemu-upstream triggering OOM killer Jan Beulich
2017-02-09 22:24 ` Stefano Stabellini
2017-02-10  9:54   ` Jan Beulich
2017-02-14 14:56     ` Anthony PERARD
2017-02-15  9:07       ` Jan Beulich
2017-02-16 15:23       ` Jan Beulich
2017-02-16 16:28         ` Jan Beulich
2017-02-16 18:38           ` Stefano Stabellini
2017-02-17  7:08             ` Jan Beulich
2017-02-17 18:39               ` Stefano Stabellini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.