linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sander Eikelenboom <linux@eikelenboom.it>
To: Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Juergen Gross <jgross@suse.com>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>
Subject: Re: Linux 5.13-rc6 regression to 5.12.x: kernel OOM and panic during kernel boot in low memory Xen VM's (256MB assigned memory).
Date: Fri, 18 Jun 2021 03:06:16 +0200	[thread overview]
Message-ID: <e7f9c4f8-1669-75ce-b052-1030350a159e@eikelenboom.it> (raw)
In-Reply-To: <7338064f-10b6-545d-bc6c-843d04aafe28@eikelenboom.it>

On 17/06/2021 21:39, Sander Eikelenboom wrote:
> On 17/06/2021 20:02, Sander Eikelenboom wrote:
>> On 17/06/2021 17:37, Rasmus Villemoes wrote:
>>> On 17/06/2021 17.01, Linus Torvalds wrote:
>>>> On Thu, Jun 17, 2021 at 2:26 AM Sander Eikelenboom <linux@eikelenboom.it> wrote:
>>>>>
>>>>> I just tried to upgrade and test the linux kernel going from the 5.12 kernel series to 5.13-rc6 on my homeserver with Xen, but ran in some trouble.
>>>>>
>>>>> Some VM's boot fine (with more than 256MB memory assigned), but the smaller (memory wise) PVH ones crash during kernel boot due to OOM.
>>>>> Booting VM's with 5.12(.9) kernel still works fine, also when dom0 is running 5.13-rc6 (but it has more memory assigned, so that is not unexpected).
>>>>
>>>> Adding Rasmus to the cc, because this looks kind of like the async
>>>> roofs population thing that caused some other oom issues too.
>>>
>>> Yes, that looks like the same issue.
>>>
>>>> Rasmus? Original report here:
>>>>
>>>>       https://lore.kernel.org/lkml/ee8bf04c-6e55-1d9b-7bdb-25e6108e8e1e@eikelenboom.it/
>>>>
>>>> I do find it odd that we'd be running out of memory so early..
>>>
>>> Indeed. It would be nice to know if these also reproduce with
>>> initramfs_async=0 on the command line.
>>>
>>> But what is even more curious is that in the other report
>>> (https://lore.kernel.org/lkml/20210607144419.GA23706@xsang-OptiPlex-9020/),
>>> it seemed to trigger with _more_ memory - though I may be misreading
>>> what Oliver was telling me:
>>>
>>>> please be noted that we use 'vmalloc=512M' for both parent and this
>>> commit.
>>>> since it's ok on parent but oom on this commit, we want to send this
>>> report
>>>> to show the potential problem of the commit on some cases.
>>>>
>>>> we also tested by changing to use 'vmalloc=128M', it will succeed.
>>>
>>> Those tests were done in a VM with 16G memory, and then he also wrote
>>>
>>>> we also tried to follow exactly above steps to test on
>>>> some local machine (8G memory), but cannot reproduce.
>>>
>>> Are there some special rules for what memory pools PID1 versus the
>>> kworker threads can dip into?
>>>
>>>
>>> Side note: I also had a ppc64 report with different symptoms (the
>>> initramfs was corrupted), but that turned out to also reproduce with
>>> e7cb072eb98 reverted, so that is likely unrelated. But just FTR that
>>> thread is here:
>>> https://lore.kernel.org/lkml/CA+QYu4qxf2CYe2gC6EYnOHXPKS-+cEXL=MnUvqRFaN7W1i6ahQ@mail.gmail.com/
>>>
>>> Rasmus
>>>
>>
>> I choose to first finish the bisection attempt, not so suprising it ends up with:
>> e7cb072eb988e46295512617c39d004f9e1c26f8 is the first bad commit
>>
>> So at least that link is confirmed.
>>
>> I also checked out booting with "initramfs_async=0" and now the guest boots with the 5.13-rc6-ish kernel which fails without that.
>>
>> --
>> Sander
>>
> 
> CC'ed Juergen.
> 
> Juergen, do you know how the direct kernel boot works and if that could interfere
> with this commit ?
> 
> After reading the last part of the commit message e7cb072eb98 namely:
> 
>       Should one of the initcalls done after rootfs_initcall time (i.e., device_
>       and late_ initcalls) need something from the initramfs (say, a kernel
>       module or a firmware blob), it will simply wait for the initramfs
>       unpacking to be done before proceeding, which should in theory make this
>       completely safe.
>       
>       But if some driver pokes around in the filesystem directly and not via one
>       of the official kernel interfaces (i.e.  request_firmware*(),
>       call_usermodehelper*) that theory may not hold - also, I certainly might
>       have missed a spot when sprinkling wait_for_initramfs().  So there is an
>       escape hatch in the form of an initramfs_async= command line parameter.
> 
> It dawned on me I'm using "direct kernel boot" functionality, which lets you boot a guest
> were the kernel and initramfs get copied in from dom0, that works great, but perhaps it
> pokes around as the last part of the commit message warns about ?
> 
> (I think the feature was called "direct kernel boot", what I mean is using the for example:
>       kernel      = '/boot/vmlinuz-5.13.0-rc6-20210617-doflr-mac80211debug+'
>       ramdisk     = '/boot/initrd.img-5.13.0-rc6-20210617-doflr-mac80211debug+'
>       cmdline     = 'root=UUID=2f757320-caca-4215-868d-73a4aacf12aa ro nomodeset xen_blkfront.max_ring_page_order=1 console=hvc0 earlyprintk=xen initramfs_async=0'
> 
> options in the xen guest config file to boot the (in this case PVH) guest.
> )
> 
> --
> Sander
> 

OK, done some experimentation and it seems with 256M assigned to the VM it was almost at the edge of OOM with the 5.12 kernel as well in the config I am using it.
With v5.12 when I assign 240M it boots, with 230M it doesn't. With 5.13 the tipping point seems to be around 265M and 270M, so my config was already quite close to the edge.

The "direct kernel boot" feature I'm using just seems somewhat memory hungry, but using another compression algorithm for the kernel and initramfs already helped in my case.

So sorry for the noise, clearly user-error.

--
Sander




  reply	other threads:[~2021-06-18  1:06 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-17  9:26 Linux 5.13-rc6 regression to 5.12.x: kernel OOM and panic during kernel boot in low memory Xen VM's (256MB assigned memory) Sander Eikelenboom
2021-06-17 10:30 ` Juergen Gross
2021-06-17 12:35   ` Sander Eikelenboom
2021-06-17 15:01 ` Linus Torvalds
2021-06-17 15:37   ` Rasmus Villemoes
2021-06-17 15:37   ` Rasmus Villemoes
2021-06-17 18:02     ` Sander Eikelenboom
2021-06-17 19:39       ` Sander Eikelenboom
2021-06-18  1:06         ` Sander Eikelenboom [this message]
2021-06-21 16:54           ` Rasmus Villemoes
2021-06-21 21:36             ` Sander Eikelenboom

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e7f9c4f8-1669-75ce-b052-1030350a159e@eikelenboom.it \
    --to=linux@eikelenboom.it \
    --cc=jgross@suse.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=torvalds@linux-foundation.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).