From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.4 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28EE7C35247 for ; Wed, 5 Feb 2020 00:16:08 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C68DC2082E for ; Wed, 5 Feb 2020 00:16:07 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="RFmrWc0U" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C68DC2082E Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5611C6B0007; Tue, 4 Feb 2020 19:16:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4EA626B000C; Tue, 4 Feb 2020 19:16:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3B2056B000D; Tue, 4 Feb 2020 19:16:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0130.hostedemail.com [216.40.44.130]) by kanga.kvack.org (Postfix) with ESMTP id 1C2876B0007 for ; Tue, 4 Feb 2020 19:16:07 -0500 (EST) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 921EC181AEF00 for ; Wed, 5 Feb 2020 00:16:06 +0000 (UTC) X-FDA: 76454155932.10.pie87_6f72df4d9d503 X-HE-Tag: pie87_6f72df4d9d503 X-Filterd-Recvd-Size: 15294 Received: from mail-vs1-f45.google.com (mail-vs1-f45.google.com [209.85.217.45]) by imf33.hostedemail.com (Postfix) with ESMTP for ; Wed, 5 Feb 2020 00:16:05 +0000 (UTC) Received: by mail-vs1-f45.google.com with SMTP id g15so240294vsf.1 for ; Tue, 04 Feb 2020 16:16:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=cGdCfLjRtDPXydcBWCp03hXWh5cpE7/XZ7Wz5DbtW14=; b=RFmrWc0U/dIAJtt5R/8dM7UgmOH85IdeAtBXWbC/zNcqG8LUeI4Npil5deg0kFOjbq RVtHNlAgzrp2RS0BIywygFp6FM3oM4kXfn0apGLkxQDeVr4n/Ep2ZA5FFKJuGr7Mg2ep VA3EPK/SDYhhc1thLIQeEvxRSHkFCdskCtI1j3GbSvJ7VPxRwERKA5xFZ9+eDrBqffwK byEwRZNtdWK/AD/clK4eUqw+hUN4vZZNnakb3nHrWd9ZMT3VrnySNX7Cm54PreEBmXAE VF/iPdA5CyfejvYwDG5SiF36r6KiZRcOx39cXhwL5+I/HmPb/lmwRjQf/cazHbnrHFRV Gtdg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=cGdCfLjRtDPXydcBWCp03hXWh5cpE7/XZ7Wz5DbtW14=; b=o4U+RDNlOEpYHdUY0WHZ1xpl3k69Row6K4Q+iAlD9sQIx0EOxIK7xm6btAsyM2mvDf xmJM/qqzQ54fpKo+K4QqUGbNncpMrjBraruKcK0GofqfhQrz13Tuw1Y1whw6/CtDZPyM g7FGqIpxVPUzEZBBg+ZC1dVV3AjkEGvWDdqRZt7H/0WkspA1aieVzTJoZGE60tOMFasP 5dW91NEALPjMCCVECgCY95WmtBiz7lf0ooxd1lEoxDFtl74zb554oxAyBOFyKtjMUzm2 PrItpo5RTjhBMgjuPl4FEcfEgwnkWdVS9rR2AiwkGnyzdtzCjar8zdH/FkreV2ZM7Nq8 TeXA== X-Gm-Message-State: APjAAAV/w2u8acmCuapcTqFuLLCRhn9i870rslmqq7LlAuGkE/zfDEj4 OuQ1JTDSs1p1r4cEGSATRynVEhpahQAdZVA0TZNy4eR9 X-Google-Smtp-Source: APXvYqzqd/nogo6x9tGOdPka5NWogfh1P/DbNOd+nF3DsA8pEZ/NaHAeyHMuOTz0Sa7OAdTcDaYQXpq5t8VuRkR4hQQ= X-Received: by 2002:a05:6102:72b:: with SMTP id u11mr20251693vsg.69.1580861765002; Tue, 04 Feb 2020 16:16:05 -0800 (PST) MIME-Version: 1.0 References: <91270a68-ff48-88b0-219c-69801f0c252f@redhat.com> <75d4594f-0864-5172-a0f8-f97affedb366@redhat.com> <286AC319A985734F985F78AFA26841F73E3F8A02@shsmsx102.ccr.corp.intel.com> <20200203080520-mutt-send-email-mst@kernel.org> <5ac131de8e3b7fc1fafd05a61feb5f6889aeb917.camel@linux.intel.com> <20200203120225-mutt-send-email-mst@kernel.org> <74cc25a6-cefb-c580-8e59-5b76fb680bf4@redhat.com> In-Reply-To: From: Tyler Sanderson Date: Tue, 4 Feb 2020 16:15:53 -0800 Message-ID: Subject: Re: Balloon pressuring page cache To: David Hildenbrand Cc: "Michael S. Tsirkin" , Alexander Duyck , "Wang, Wei W" , "virtualization@lists.linux-foundation.org" , David Rientjes , "linux-mm@kvack.org" , Michal Hocko , namit@vmware.com Content-Type: multipart/alternative; boundary="000000000000f193dd059dc90f7b" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --000000000000f193dd059dc90f7b Content-Type: text/plain; charset="UTF-8" On Tue, Feb 4, 2020 at 3:58 PM Tyler Sanderson wrote: > > > On Tue, Feb 4, 2020 at 11:17 AM David Hildenbrand > wrote: > >> On 04.02.20 19:52, Tyler Sanderson wrote: >> > >> > >> > On Tue, Feb 4, 2020 at 12:29 AM David Hildenbrand > > > wrote: >> > >> > On 03.02.20 21:32, Tyler Sanderson wrote: >> > > There were apparently good reasons for moving away from OOM >> notifier >> > > callback: >> > > https://lkml.org/lkml/2018/7/12/314 >> > > https://lkml.org/lkml/2018/8/2/322 >> > > >> > > In particular the OOM notifier is worse than the shrinker because: >> > >> > The issue is that DEFLATE_ON_OOM is under-specified. >> > >> > > >> > > 1. It is last-resort, which means the system has already gone >> through >> > > heroics to prevent OOM. Those heroic reclaim efforts are >> expensive >> > > and impact application performance. >> > >> > That's *exactly* what "deflate on OOM" suggests. >> > >> > >> > It seems there are some use cases where "deflate on OOM" is desired and >> > others where "deflate on pressure" is desired. >> > This suggests adding a new feature bit "DEFLATE_ON_PRESSURE" that >> > registers the shrinker, and reverting DEFLATE_ON_OOM to use the OOM >> > notifier callback. >> > >> > This lets users configure the balloon for their use case. >> >> You want the old behavior back, so why should we introduce a new one? Or >> am I missing something? (you did want us to revert to old handling, no?) >> > Reverting actually doesn't help me because this has been the behavior > since Linux 4.19 which is already widely in use. So my device > implementation needs to handle the shrinker behavior anyways. I started > this conversation to ask what the intended device implementation was. > I should clarify: reverting _would_ improve guest performance under my implementation. So I guess I'm in favor. But I think we should consider reasonable alternative implementations. I think this suggests adding a new feature bit to allow device implementations to choose. > I think there are reasonable device implementations that would prefer the > shrinker behavior (it turns out that mine doesn't). > For example, an implementation that slowly inflates the balloon for the > purpose of memory overcommit. It might leave the balloon inflated and > expect any memory pressure (including page cache usage) to deflate the > balloon as a way to dynamically right-size the balloon. > > Two reasons I didn't go with the above implementation: > 1. I need to support guests before Linux 4.19 which don't have the > shrinker behavior. > 2. Memory in the balloon does not appear as "available" in /proc/meminfo > even though it is freeable. This is confusing to users, but isn't a deal > breaker. > > If we added a DEFLATE_ON_PRESSURE feature bit that indicated shrinker API > support then that would resolve reason #1 (ideally we would backport the > bit to 4.19). > > In any case, the shrinker behavior when pressuring page cache is more of > an inefficiency than a bug. It's not clear to me that it necessitates > reverting. If there were/are reasons to be on the shrinker interface then I > think those carry similar weight as the problem itself. > > >> >> I consider virtio-balloon to this very day a big hack. And I don't see >> it getting better with new config knobs. Having that said, the >> technologies that are candidates to replace it (free page reporting, >> taming the guest page cache, etc.) are still not ready - so we'll have >> to stick with it for now :( . >> >> > >> > I'm actually not sure how you would safely do memory overcommit without >> > DEFLATE_ON_OOM. So I think it unlocks a huge use case. >> >> Using better suited technologies that are not ready yet (well, some form >> of free page reporting is available under IBM z already but in a >> proprietary form) ;) Anyhow, I remember that DEFLATE_ON_OOM only makes >> it less likely to crash your guest, but not that you are safe to squeeze >> the last bit out of your guest VM. >> > Can you elaborate on the danger of DEFLATE_ON_OOM? I haven't seen any > problems in testing but I'd really like to know about the dangers. > Is there a difference in safety between the OOM notifier callback and the > shrinker API? > > >> >> -- >> Thanks, >> >> David / dhildenb >> >> --000000000000f193dd059dc90f7b Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Tue, Feb 4, 2020 at 3:58 PM Tyler = Sanderson <tysand@google.com>= ; wrote:


On Tue, Feb 4, 2020 at 11:17 AM David Hilden= brand <david@redha= t.com> wrote:
On 04.02.20 19:52, Tyler Sanderson wrote:
>
>
> On Tue, Feb 4, 2020 at 12:29 AM David Hildenbrand <david@redhat.com
> <mailto:david= @redhat.com>> wrote:
>
>=C2=A0 =C2=A0 =C2=A0On 03.02.20 21:32, Tyler Sanderson wrote:
>=C2=A0 =C2=A0 =C2=A0> There were apparently good reasons for moving = away from OOM notifier
>=C2=A0 =C2=A0 =C2=A0> callback:
>=C2=A0 =C2=A0 =C2=A0> https://lkml.org/lkml/2018/7/12/314<= /a>
>=C2=A0 =C2=A0 =C2=A0>
https://lkml.org/lkml/2018/8/2/322
>=C2=A0 =C2=A0 =C2=A0>
>=C2=A0 =C2=A0 =C2=A0> In particular the OOM notifier is worse than t= he shrinker because:
>
>=C2=A0 =C2=A0 =C2=A0The issue is that DEFLATE_ON_OOM is under-specified= .
>
>=C2=A0 =C2=A0 =C2=A0>
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 1. It is last-resort, which means the sy= stem has already gone through
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0heroics to prevent OOM. Tho= se heroic reclaim efforts are expensive
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0and impact application perf= ormance.
>
>=C2=A0 =C2=A0 =C2=A0That's *exactly* what "deflate on OOM"= ; suggests.
>
>
> It seems there are some use cases where "deflate on OOM" is = desired and
> others where "deflate on pressure" is desired.
> This suggests adding a new feature bit "DEFLATE_ON_PRESSURE"= that
> registers the shrinker, and reverting DEFLATE_ON_OOM to use the OOM > notifier callback.
>
> This lets users configure the balloon for their use case.

You want the old behavior back, so why should we introduce a new one? Or am I missing something? (you did want us to revert to old handling, no?)




I consider virtio-balloon to this very day a big hack. And I don't see<= br> it getting better with new config knobs. Having that said, the
technologies that are candidates to replace it (free page reporting,
taming the guest page cache, etc.) are still not ready - so we'll have<= br> to stick with it for now :( .

>
> I'm actually not sure how you would safely do memory overcommit wi= thout
> DEFLATE_ON_OOM. So I think it unlocks a huge use case.

Using better suited technologies that are not ready yet (well, some form of free page reporting is available under IBM z already but in a
proprietary form) ;) Anyhow, I remember that DEFLATE_ON_OOM only makes
it less likely to crash your guest, but not that you are safe to squeeze the last bit out of your guest VM.
--000000000000f193dd059dc90f7b-- From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tyler Sanderson via Virtualization Subject: Re: Balloon pressuring page cache Date: Tue, 4 Feb 2020 16:15:53 -0800 Message-ID: References: <91270a68-ff48-88b0-219c-69801f0c252f@redhat.com> <75d4594f-0864-5172-a0f8-f97affedb366@redhat.com> <286AC319A985734F985F78AFA26841F73E3F8A02@shsmsx102.ccr.corp.intel.com> <20200203080520-mutt-send-email-mst@kernel.org> <5ac131de8e3b7fc1fafd05a61feb5f6889aeb917.camel@linux.intel.com> <20200203120225-mutt-send-email-mst@kernel.org> <74cc25a6-cefb-c580-8e59-5b76fb680bf4@redhat.com> Reply-To: Tyler Sanderson Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============5999778019002158300==" Return-path: In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: virtualization-bounces@lists.linux-foundation.org Sender: "Virtualization" To: David Hildenbrand Cc: "Michael S. Tsirkin" , "virtualization@lists.linux-foundation.org" , "linux-mm@kvack.org" , namit@vmware.com, David Rientjes , Alexander Duyck , Michal Hocko List-Id: virtualization@lists.linuxfoundation.org --===============5999778019002158300== Content-Type: multipart/alternative; boundary="000000000000f193dd059dc90f7b" --000000000000f193dd059dc90f7b Content-Type: text/plain; charset="UTF-8" On Tue, Feb 4, 2020 at 3:58 PM Tyler Sanderson wrote: > > > On Tue, Feb 4, 2020 at 11:17 AM David Hildenbrand > wrote: > >> On 04.02.20 19:52, Tyler Sanderson wrote: >> > >> > >> > On Tue, Feb 4, 2020 at 12:29 AM David Hildenbrand > > > wrote: >> > >> > On 03.02.20 21:32, Tyler Sanderson wrote: >> > > There were apparently good reasons for moving away from OOM >> notifier >> > > callback: >> > > https://lkml.org/lkml/2018/7/12/314 >> > > https://lkml.org/lkml/2018/8/2/322 >> > > >> > > In particular the OOM notifier is worse than the shrinker because: >> > >> > The issue is that DEFLATE_ON_OOM is under-specified. >> > >> > > >> > > 1. It is last-resort, which means the system has already gone >> through >> > > heroics to prevent OOM. Those heroic reclaim efforts are >> expensive >> > > and impact application performance. >> > >> > That's *exactly* what "deflate on OOM" suggests. >> > >> > >> > It seems there are some use cases where "deflate on OOM" is desired and >> > others where "deflate on pressure" is desired. >> > This suggests adding a new feature bit "DEFLATE_ON_PRESSURE" that >> > registers the shrinker, and reverting DEFLATE_ON_OOM to use the OOM >> > notifier callback. >> > >> > This lets users configure the balloon for their use case. >> >> You want the old behavior back, so why should we introduce a new one? Or >> am I missing something? (you did want us to revert to old handling, no?) >> > Reverting actually doesn't help me because this has been the behavior > since Linux 4.19 which is already widely in use. So my device > implementation needs to handle the shrinker behavior anyways. I started > this conversation to ask what the intended device implementation was. > I should clarify: reverting _would_ improve guest performance under my implementation. So I guess I'm in favor. But I think we should consider reasonable alternative implementations. I think this suggests adding a new feature bit to allow device implementations to choose. > I think there are reasonable device implementations that would prefer the > shrinker behavior (it turns out that mine doesn't). > For example, an implementation that slowly inflates the balloon for the > purpose of memory overcommit. It might leave the balloon inflated and > expect any memory pressure (including page cache usage) to deflate the > balloon as a way to dynamically right-size the balloon. > > Two reasons I didn't go with the above implementation: > 1. I need to support guests before Linux 4.19 which don't have the > shrinker behavior. > 2. Memory in the balloon does not appear as "available" in /proc/meminfo > even though it is freeable. This is confusing to users, but isn't a deal > breaker. > > If we added a DEFLATE_ON_PRESSURE feature bit that indicated shrinker API > support then that would resolve reason #1 (ideally we would backport the > bit to 4.19). > > In any case, the shrinker behavior when pressuring page cache is more of > an inefficiency than a bug. It's not clear to me that it necessitates > reverting. If there were/are reasons to be on the shrinker interface then I > think those carry similar weight as the problem itself. > > >> >> I consider virtio-balloon to this very day a big hack. And I don't see >> it getting better with new config knobs. Having that said, the >> technologies that are candidates to replace it (free page reporting, >> taming the guest page cache, etc.) are still not ready - so we'll have >> to stick with it for now :( . >> >> > >> > I'm actually not sure how you would safely do memory overcommit without >> > DEFLATE_ON_OOM. So I think it unlocks a huge use case. >> >> Using better suited technologies that are not ready yet (well, some form >> of free page reporting is available under IBM z already but in a >> proprietary form) ;) Anyhow, I remember that DEFLATE_ON_OOM only makes >> it less likely to crash your guest, but not that you are safe to squeeze >> the last bit out of your guest VM. >> > Can you elaborate on the danger of DEFLATE_ON_OOM? I haven't seen any > problems in testing but I'd really like to know about the dangers. > Is there a difference in safety between the OOM notifier callback and the > shrinker API? > > >> >> -- >> Thanks, >> >> David / dhildenb >> >> --000000000000f193dd059dc90f7b Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=


On Tue, Feb 4, 2020 at 11:17 AM David Hilden= brand <david@redha= t.com> wrote:
On 04.02.20 19:52, Tyler Sanderson wrote:
>
>
> On Tue, Feb 4, 2020 at 12:29 AM David Hildenbrand <david@redhat.com
> <mailto:david= @redhat.com>> wrote:
>
>=C2=A0 =C2=A0 =C2=A0On 03.02.20 21:32, Tyler Sanderson wrote:
>=C2=A0 =C2=A0 =C2=A0> There were apparently good reasons for moving = away from OOM notifier
>=C2=A0 =C2=A0 =C2=A0> callback:
>=C2=A0 =C2=A0 =C2=A0> https://lkml.org/lkml/2018/7/12/314<= /a>
>=C2=A0 =C2=A0 =C2=A0>
https://lkml.org/lkml/2018/8/2/322
>=C2=A0 =C2=A0 =C2=A0>
>=C2=A0 =C2=A0 =C2=A0> In particular the OOM notifier is worse than t= he shrinker because:
>
>=C2=A0 =C2=A0 =C2=A0The issue is that DEFLATE_ON_OOM is under-specified= .
>
>=C2=A0 =C2=A0 =C2=A0>
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 1. It is last-resort, which means the sy= stem has already gone through
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0heroics to prevent OOM. Tho= se heroic reclaim efforts are expensive
>=C2=A0 =C2=A0 =C2=A0>=C2=A0 =C2=A0 =C2=A0and impact application perf= ormance.
>
>=C2=A0 =C2=A0 =C2=A0That's *exactly* what "deflate on OOM"= ; suggests.
>
>
> It seems there are some use cases where "deflate on OOM" is = desired and
> others where "deflate on pressure" is desired.
> This suggests adding a new feature bit "DEFLATE_ON_PRESSURE"= that
> registers the shrinker, and reverting DEFLATE_ON_OOM to use the OOM > notifier callback.
>
> This lets users configure the balloon for their use case.

You want the old behavior back, so why should we introduce a new one? Or am I missing something? (you did want us to revert to old handling, no?)




I consider virtio-balloon to this very day a big hack. And I don't see<= br> it getting better with new config knobs. Having that said, the
technologies that are candidates to replace it (free page reporting,
taming the guest page cache, etc.) are still not ready - so we'll have<= br> to stick with it for now :( .

>
> I'm actually not sure how you would safely do memory overcommit wi= thout
> DEFLATE_ON_OOM. So I think it unlocks a huge use case.

Using better suited technologies that are not ready yet (well, some form of free page reporting is available under IBM z already but in a
proprietary form) ;) Anyhow, I remember that DEFLATE_ON_OOM only makes
it less likely to crash your guest, but not that you are safe to squeeze the last bit out of your guest VM.
--000000000000f193dd059dc90f7b-- --===============5999778019002158300== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization --===============5999778019002158300==--