From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53D4CC433F5 for ; Thu, 21 Oct 2021 11:53:31 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1AD68606A5 for ; Thu, 21 Oct 2021 11:53:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 1AD68606A5 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.xenproject.org Received: from list by lists.xenproject.org with outflank-mailman.214360.372872 (Exim 4.92) (envelope-from ) id 1mdWdC-0008F4-Aq; Thu, 21 Oct 2021 11:53:10 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version Received: by outflank-mailman (output) from mailman id 214360.372872; Thu, 21 Oct 2021 11:53:10 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1mdWdC-0008Ex-81; Thu, 21 Oct 2021 11:53:10 +0000 Received: by outflank-mailman (input) for mailman id 214360; Thu, 21 Oct 2021 11:53:09 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1mdWdB-0008Er-FE for xen-devel@lists.xenproject.org; Thu, 21 Oct 2021 11:53:09 +0000 Received: from smtp-out1.suse.de (unknown [195.135.220.28]) by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS id 75017a48-3265-11ec-8376-12813bfff9fa; Thu, 21 Oct 2021 11:53:08 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 7D782218B1; Thu, 21 Oct 2021 11:53:07 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 660F5133A6; Thu, 21 Oct 2021 11:53:07 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id vvawF6NUcWEkcgAAMHmgww (envelope-from ); Thu, 21 Oct 2021 11:53:07 +0000 X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 75017a48-3265-11ec-8376-12813bfff9fa DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1634817187; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type; bh=Tp6cFf9CV+0py7Wah9oxllfjuQvZL1C6jr/1fO4cO1k=; b=YFVPU9lYQr9doAPHmPikANNacIIazkgfoOvp/4cY+L3G2nJrswcBJAA3Dg/F8PFohfJxyY 5DfdVP6uo3AUwTSmEXadkv/ORuKvvJspmlfgjMj41Djghp9lizlWHRqkMXsUbRvtqcinAO 0l5eQPKVSSBKu7ybwLIWmIGhoB4F1Fo= To: =?UTF-8?Q?Marek_Marczykowski-G=c3=b3recki?= Cc: "xen-devel@lists.xenproject.org" From: Juergen Gross Subject: Tentative fix for "out of PoD memory" issue Message-ID: <912c7377-26f0-c14a-e3aa-f00a81ed5766@suse.com> Date: Thu, 21 Oct 2021 13:53:06 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.12.0 MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="9jicweCDwgCDyfDE1g1x9UE4MmkUUiXrs" This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --9jicweCDwgCDyfDE1g1x9UE4MmkUUiXrs Content-Type: multipart/mixed; boundary="aaXZVHuozSjg1ot6FjpsWQR04zmftmZVu"; protected-headers="v1" From: Juergen Gross To: =?UTF-8?Q?Marek_Marczykowski-G=c3=b3recki?= Cc: "xen-devel@lists.xenproject.org" Message-ID: <912c7377-26f0-c14a-e3aa-f00a81ed5766@suse.com> Subject: Tentative fix for "out of PoD memory" issue --aaXZVHuozSjg1ot6FjpsWQR04zmftmZVu Content-Type: multipart/mixed; boundary="------------AB1936863D48348999C53954" Content-Language: en-US This is a multi-part message in MIME format. --------------AB1936863D48348999C53954 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Marek, could you please test whether the attached patch is fixing your problem? BTW, I don't think this couldn't happen before kernel 5.15. I guess my modification to use a kernel thread instead of a workqueue just made the issue more probable. I couldn't reproduce the crash you are seeing, but the introduced wait was 4.2 seconds on my test system (a PVH guest with 2 GB of memory, maxmem 6 GB). Juergen --------------AB1936863D48348999C53954 Content-Type: text/x-patch; charset=UTF-8; name="0001-xen-balloon-add-late_initcall_sync-for-initial-ballo.patch" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename*0="0001-xen-balloon-add-late_initcall_sync-for-initial-ballo.pa"; filename*1="tch" =46rom 3ee35f6f110e2258ec94f0d1397fac8c26b41761 Mon Sep 17 00:00:00 2001 From: Juergen Gross To: linux-kernel@vger.kernel.org Cc: Boris Ostrovsky Cc: Juergen Gross Cc: Stefano Stabellini Cc: xen-devel@lists.xenproject.org Date: Thu, 21 Oct 2021 12:51:06 +0200 Subject: [PATCH] xen/balloon: add late_initcall_sync() for initial balloo= ning done MIME-Version: 1.0 Content-Type: text/plain; charset=3DUTF-8 Content-Transfer-Encoding: 8bit When running as PVH or HVM guest with actual memory < max memory the hypervisor is using "populate on demand" in order to allow the guest to balloon down from its maximum memory size. For this to work correctly the guest must not touch more memory pages than its target memory size as otherwise the PoD cache will be exhausted and the guest is crashed as a result of that. In extreme cases ballooning down might not be finished today before the init process is started, which can consume lots of memory. In order to avoid random boot crashes in such cases, add a late init call to wait for ballooning down having finished for PVH/HVM guests. Reported-by: Marek Marczykowski-G=C3=B3recki Signed-off-by: Juergen Gross --- drivers/xen/balloon.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/drivers/xen/balloon.c b/drivers/xen/balloon.c index 3a50f097ed3e..d19b851c3d3b 100644 --- a/drivers/xen/balloon.c +++ b/drivers/xen/balloon.c @@ -765,3 +765,23 @@ static int __init balloon_init(void) return 0; } subsys_initcall(balloon_init); + +static int __init balloon_wait_finish(void) +{ + if (!xen_domain()) + return -ENODEV; + + /* PV guests don't need to wait. */ + if (xen_pv_domain() || !current_credit()) + return 0; + + pr_info("Waiting for initial ballooning down having finished.\n"); + + while (current_credit()) + schedule_timeout_interruptible(HZ / 10); + + pr_info("Initial ballooning down finished.\n"); + + return 0; +} +late_initcall_sync(balloon_wait_finish); --=20 2.26.2 --------------AB1936863D48348999C53954 Content-Type: application/pgp-keys; name="OpenPGP_0xB0DE9DD628BF132F.asc" Content-Transfer-Encoding: quoted-printable Content-Description: OpenPGP public key Content-Disposition: attachment; filename="OpenPGP_0xB0DE9DD628BF132F.asc" -----BEGIN PGP PUBLIC KEY BLOCK----- xsBNBFOMcBYBCACgGjqjoGvbEouQZw/ToiBg9W98AlM2QHV+iNHsEs7kxWhKMjrioyspZKOBy= cWx w3ie3j9uvg9EOB3aN4xiTv4qbnGiTr3oJhkB1gsb6ToJQZ8uxGq2kaV2KL9650I1SJvedYm8O= f8Z d621lSmoKOwlNClALZNew72NjJLEzTalU1OdT7/i1TXkH09XSSI8mEQ/ouNcMvIJNwQpd369y= 9bf IhWUiVXEK7MlRgUG6MvIj6Y3Am/BBLUVbDa4+gmzDC9ezlZkTZG2t14zWPvxXP3FAp2pkW0xq= G7/ 377qptDmrk42GlSKN4z76ELnLxussxc7I2hx18NUcbP8+uty4bMxABEBAAHNHEp1ZXJnZW4gR= 3Jv c3MgPGpnQHBmdXBmLm5ldD7CwHkEEwECACMFAlOMcBYCGwMHCwkIBwMCAQYVCAIJCgsEFgIDA= QIe AQIXgAAKCRCw3p3WKL8TL0KdB/93FcIZ3GCNwFU0u3EjNbNjmXBKDY4FUGNQH2lvWAUy+dnyT= hpw dtF/jQ6j9RwE8VP0+NXcYpGJDWlNb9/JmYqLiX2Q3TyevpB0CA3dbBQp0OW0fgCetToGIQrg0= MbD 1C/sEOv8Mr4NAfbauXjZlvTj30H2jO0u+6WGM6nHwbh2l5O8ZiHkH32iaSTfN7Eu5RnNVUJbv= oPH Z8SlM4KWm8rG+lIkGurqqu5gu8q8ZMKdsdGC4bBxdQKDKHEFExLJK/nRPFmAuGlId1E3fe10v= 5QL +qHI3EIPtyfE7i9Hz6rVwi7lWKgh7pe0ZvatAudZ+JNIlBKptb64FaiIOAWDCx1SzR9KdWVyZ= 2Vu IEdyb3NzIDxqZ3Jvc3NAc3VzZS5jb20+wsB5BBMBAgAjBQJTjHCvAhsDBwsJCAcDAgEGFQgCC= QoL BBYCAwECHgECF4AACgkQsN6d1ii/Ey/HmQf/RtI7kv5A2PS4RF7HoZhPVPogNVbC4YA6lW7Dr= Wf0 teC0RR3MzXfy6pJ+7KLgkqMlrAbN/8Dvjoz78X+5vhH/rDLa9BuZQlhFmvcGtCF8eR0T1v0nC= /nu AFVGy+67q2DH8As3KPu0344TBDpAvr2uYM4tSqxK4DURx5INz4ZZ0WNFHcqsfvlGJALDeE0Lh= ITT d9jLzdDad1pQSToCnLl6SBJZjDOX9QQcyUigZFtCXFst4dlsvddrxyqT1f17+2cFSdu7+ynLm= XBK 7abQ3rwJY8SbRO2iRulogc5vr/RLMMlscDAiDkaFQWLoqHHOdfO9rURssHNN8WkMnQfvUewRz= 80h SnVlcmdlbiBHcm9zcyA8amdyb3NzQG5vdmVsbC5jb20+wsB5BBMBAgAjBQJTjHDXAhsDBwsJC= AcD AgEGFQgCCQoLBBYCAwECHgECF4AACgkQsN6d1ii/Ey8PUQf/ehmgCI9jB9hlgexLvgOtf7PJn= FOX gMLdBQgBlVPO3/D9R8LtF9DBAFPNhlrsfIG/SqICoRCqUcJ96Pn3P7UUinFG/I0ECGF4EvTE1= jnD kfJZr6jrbjgyoZHiw/4BNwSTL9rWASyLgqlA8u1mf+c2yUwcGhgkRAd1gOwungxcwzwqgljf0= N51 N5JfVRHRtyfwq/ge+YEkDGcTU6Y0sPOuj4Dyfm8fJzdfHNQsWq3PnczLVELStJNdapwPOoE+l= otu fe3AM2vAEYJ9rTz3Cki4JFUsgLkHFqGZarrPGi1eyQcXeluldO3m91NK/1xMI3/+8jbO0tsn1= tqS EUGIJi7ox80eSnVlcmdlbiBHcm9zcyA8amdyb3NzQHN1c2UuZGU+wsB5BBMBAgAjBQJTjHDrA= hsD BwsJCAcDAgEGFQgCCQoLBBYCAwECHgECF4AACgkQsN6d1ii/Ey+LhQf9GL45eU5vOowA2u5N3= g3O ZUEBmDHVVbqMtzwlmNC4k9Kx39r5s2vcFl4tXqW7g9/ViXYuiDXb0RfUpZiIUW89siKrkzmQ5= dM7 wRqzgJpJwK8Bn2MIxAKArekWpiCKvBOB/Cc+3EXE78XdlxLyOi/NrmSGRIov0karw2RzMNOu5= D+j LRZQd1Sv27AR+IP3I8U4aqnhLpwhK7MEy9oCILlgZ1QZe49kpcumcZKORmzBTNh30FVKK1Evm= V2x AKDoaEOgQB4iFQLhJCdP1I5aSgM5IVFdn7v5YgEYuJYx37IoN1EblHI//x/e2AaIHpzK5h88N= Eaw QsaNRpNSrcfbFmAg987ATQRTjHAWAQgAyzH6AOODMBjgfWE9VeCgsrwH3exNAU32gLq2xvjpW= nHI s98ndPUDpnoxWQugJ6MpMncr0xSwFmHEgnSEjK/PAjppgmyc57BwKII3sV4on+gDVFJR6Y8ZR= wgn BC5mVM6JjQ5xDk8WRXljExRfUX9pNhdE5eBOZJrDRoLUmmjDtKzWaDhIg/+1Hzz93X4fCQkNV= bVF LELU9bMaLPBG/x5q4iYZ2k2ex6d47YE1ZFdMm6YBYMOljGkZKwYde5ldM9mo45mmwe0icXKLk= pEd IXKTZeKDO+Hdv1aqFuAcccTg9RXDQjmwhC3yEmrmcfl0+rPghO0Iv3OOImwTEe4co3c1mwARA= QAB wsBfBBgBAgAJBQJTjHAWAhsMAAoJELDendYovxMvQ/gH/1ha96vm4P/L+bQpJwrZ/dneZcmEw= Tbe 8YFsw2V/Buv6Z4Mysln3nQK5ZadD534CF7TDVft7fC4tU4PONxF5D+/tvgkPfDAfF77zy2AH1= vJz Q1fOU8lYFpZXTXIHb+559UqvIB8AdgR3SAJGHHt4RKA0F7f5ipYBBrC6cyXJyyoprT10EMvU8= VGi wXvTyJz3fjoYsdFzpWPlJEBRMedCot60g5dmbdrZ5DWClAr0yau47zpWj3enf1tLWaqcsuylW= svi uGjKGw7KHQd3bxALOknAp4dN3QwBYCKuZ7AddY9yjynVaD5X7nF9nO5BjR/i1DG86lem3iBDX= zXs ZDn8R38=3D =3D2wuH -----END PGP PUBLIC KEY BLOCK----- --------------AB1936863D48348999C53954-- --aaXZVHuozSjg1ot6FjpsWQR04zmftmZVu-- --9jicweCDwgCDyfDE1g1x9UE4MmkUUiXrs Content-Type: application/pgp-signature; name="OpenPGP_signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="OpenPGP_signature" -----BEGIN PGP SIGNATURE----- wsB5BAABCAAjFiEEhRJncuj2BJSl0Jf3sN6d1ii/Ey8FAmFxVKIFAwAAAAAACgkQsN6d1ii/Ey9+ mwf/fWDnXvMNmIeBqCHACyT8Y6EMt39qxjRdNQ+yFjJUFellmSpPpDe1PO2MqvhK41RSHA1Xjt/c dhXB+8qZq85ngsP6ctlai5Bh6TQygvXwybD30lkvxPZkOr/vvVLZC9xmPz/e0ApwqW2MyzftV8xp fBuMo6j5IEJd1i05/F3qB4X9Ubm4cMFRA5vjrFIMu6+W1ltbUFZpi+UoJwx5ChC/Tmt4OnDZ88Xi F7NFT+mQlD7dEFxI9J3S8+dYrLXbwhFJx1ay1G4In9QTMvbdouokX+/e/+oB+I+08MaljV41Mo/b Nx+2sOw7JmfpBb8GYPwJGyOyf1Ldh4jyOucFa5J/MA== =Y3LY -----END PGP SIGNATURE----- --9jicweCDwgCDyfDE1g1x9UE4MmkUUiXrs--