From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.9 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 79C43C35247 for ; Tue, 4 Feb 2020 16:50:32 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 39FC02082E for ; Tue, 4 Feb 2020 16:50:32 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="I/Vi18sF" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 39FC02082E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id CA9436B0003; Tue, 4 Feb 2020 11:50:31 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C32946B0006; Tue, 4 Feb 2020 11:50:31 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AD3586B0007; Tue, 4 Feb 2020 11:50:31 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0027.hostedemail.com [216.40.44.27]) by kanga.kvack.org (Postfix) with ESMTP id 8FED56B0003 for ; Tue, 4 Feb 2020 11:50:31 -0500 (EST) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 2429E8248047 for ; Tue, 4 Feb 2020 16:50:31 +0000 (UTC) X-FDA: 76453033062.13.bike80_4619cbdb575f X-HE-Tag: bike80_4619cbdb575f X-Filterd-Recvd-Size: 8302 Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [205.139.110.120]) by imf48.hostedemail.com (Postfix) with ESMTP for ; Tue, 4 Feb 2020 16:50:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1580835029; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6QgcRX9LPObQmCwRNSfjEvLam2lZppkv7pwR5130Cpc=; b=I/Vi18sFXoXVQ500k6vUiIt9SqMUJ0EAliQ9lthxunQloE7Uz4AR84OSVlrGlpl9qxxfr3 hDHxVry3Tzlokdj+r5Njf1kvpsgKA7IG0WULm5RwIG0NFL4DzyH5yD+XkhzlUD4hus02Qe 8DQROSxlQnW/PLJqtJPVbyOEcfF8QQ4= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-434-7LmMMpWHPNaAB7QeYVnEiA-1; Tue, 04 Feb 2020 11:50:25 -0500 Received: by mail-qv1-f71.google.com with SMTP id c1so12021816qvw.17 for ; Tue, 04 Feb 2020 08:50:25 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=33tLdh2GltAiz5pndS3UXxNtn0Ze5hBZUDNBHu9k9MY=; b=QE9Y0/plPS7Ue3ByN2qdGZSCVw47pkIAY28Ex1fLBF4w9rw0FHfmSSnypuOHQh2ps3 eJpVU88gorYkUOAkgalAGv4KPADjrY70VRzxKzyAXhh5kbcIzJ5PJGQ1So/RYchGAoxl 8D2i4Gs4e5T4gbYeUJxCTeDuu8dio49kToph28fJQIvgPkrTw0M+RCoqeXqLhyCFHEcO 3XMT4Lz2bRlxdKajG0+meXnXg5WV5HFYEsAbEYGYKoB3hnKlayMGGfMi6g+FZg2YRvaM vG3Ebz3MomxYkXicqujw1GWHq/Ydsb94+S0WhHW3kkFg6J6igU5+BofjYATDw1kPxuVe Smbg== X-Gm-Message-State: APjAAAXkTycwDTsScc01NUjA2g9CmdK0S5aolUoQxNuBe0IOES+/qvb4 uZ5gUteT9d1maENBhfSz0vwdoM37LgKpJagS7bKbcKXUizX2np8t9o7cB7WcFZNEB2GuOIT+yg9 3Nn4ZbeCWaiQ= X-Received: by 2002:ac8:5149:: with SMTP id h9mr29597948qtn.123.1580835025110; Tue, 04 Feb 2020 08:50:25 -0800 (PST) X-Google-Smtp-Source: APXvYqzkZGN4l+MS+r8NdRjTaKfTv3b91i9pRUKwIyWvPi+sOl829vka10ykcKudAzcuY3xKv1kS4g== X-Received: by 2002:ac8:5149:: with SMTP id h9mr29597929qtn.123.1580835024810; Tue, 04 Feb 2020 08:50:24 -0800 (PST) Received: from redhat.com (bzq-79-176-41-183.red.bezeqint.net. [79.176.41.183]) by smtp.gmail.com with ESMTPSA id p126sm11126768qke.108.2020.02.04.08.50.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Feb 2020 08:50:23 -0800 (PST) Date: Tue, 4 Feb 2020 11:50:19 -0500 From: "Michael S. Tsirkin" To: David Hildenbrand Cc: Nadav Amit , Alexander Duyck , Tyler Sanderson , "Wang, Wei W" , "virtualization@lists.linux-foundation.org" , David Rientjes , "linux-mm@kvack.org" , Michal Hocko Subject: Re: Balloon pressuring page cache Message-ID: <20200204114336-mutt-send-email-mst@kernel.org> References: <75d4594f-0864-5172-a0f8-f97affedb366@redhat.com> <286AC319A985734F985F78AFA26841F73E3F8A02@shsmsx102.ccr.corp.intel.com> <20200203080520-mutt-send-email-mst@kernel.org> <5ac131de8e3b7fc1fafd05a61feb5f6889aeb917.camel@linux.intel.com> <539B606A-A0CA-4AA4-B99A-759C874A1EF8@vmware.com> <20200204033657-mutt-send-email-mst@kernel.org> MIME-Version: 1.0 In-Reply-To: X-MC-Unique: 7LmMMpWHPNaAB7QeYVnEiA-1 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Content-Disposition: inline X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Feb 04, 2020 at 03:30:19PM +0100, David Hildenbrand wrote: > On 04.02.20 09:40, Michael S. Tsirkin wrote: > > On Tue, Feb 04, 2020 at 09:35:21AM +0100, David Hildenbrand wrote: > >>>>> I would say reverting probably makes sense. I'm not sure there is m= uch > >>>>> value to having a shrinker running deflation when you are actively = trying > >>>>> to increase the balloon. It would make more sense to wait until you= are > >>>>> actually about to start hitting oom. > >>>> > >>>> I think the shrinker makes sense for free page hinting feature > >>>> (everything on free_page_list). > >>>> > >>>> So instead of only reverting, I think we should split it up and alwa= ys > >>>> register the shrinker for VIRTIO_BALLOON_F_FREE_PAGE_HINT and the OO= M > >>>> notifier (as before) for VIRTIO_BALLOON_F_MUST_TELL_HOST. > >> > >> s/VIRTIO_BALLOON_F_MUST_TELL_HOST/VIRTIO_BALLOON_F_DEFLATE_ON_OOM/ > >> > >> :) > >=20 > > Well VIRTIO_BALLOON_F_MUST_TELL_HOST is also broken by shrinker > > with VIRTIO_BALLOON_F_FREE_PAGE_HINT as that code adds buffers > > but does not wait for them to be used even with VIRTIO_BALLOON_F_MUST_T= ELL_HOST. > > We never noticed because QEMU does not advertize > > VIRTIO_BALLOON_F_MUST_TELL_HOST. >=20 > So, I am trying to understand how the code is intended to work, but I > am afraid I am missing something (or to rephrase: I think I found a BUG := ) and > there is lack of proper documentation about this feature). >=20 > a) We allocate pages and add them to the list as long as we are told to d= o so. > We send these pages to the host one by one. > b) We free all pages once we get a STOP signal. Until then, we keep pages= allocated. > c) When called via the shrinker, we want to free pages from the list, eve= n > though the hypervisor did not notify us to do so. >=20 >=20 > Issue 1: When we unload the balloon driver in the guest in an unlucky eve= nt, > we won't free the pages. We are missing something like (if I am not wrong= ): >=20 > diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_ball= oon.c > index b1d2068fa2bd..e2b0925e1e83 100644 > --- a/drivers/virtio/virtio_balloon.c > +++ b/drivers/virtio/virtio_balloon.c > @@ -929,6 +929,10 @@ static void remove_common(struct virtio_balloon *vb) > leak_balloon(vb, vb->num_pages); > update_balloon_size(vb); > =20 > + /* There might be free pages that are being reported: release the= m. */ > + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)= ) > + return_free_pages_to_mm(vb, ULONG_MAX); > + > /* Now we reset the device so we can clean up the queues. */ > vb->vdev->config->reset(vb->vdev); Indeed. >=20 > Issue 2: When called via the shrinker, (but also to fix Issue 1), it coul= d be > that we do have VIRTIO_BALLOON_F_MUST_TELL_HOST. I assume this means > (-ENOCLUE) that we have to wait until the hypervisor notifies us via the = STOP? Or > for which event do we have to wait? Because there is no way to *tell host= * here > that we want to reuse a page. The hypervisor will *tell us* when we can r= euse pages. > For the shrinker it is simple: Don't use the shrinker with > VIRTIO_BALLOON_F_MUST_TELL_HOST :) . But to fix Issue 1, we *would* have = to wait > until we get a STOP signal. That is not really possible because it might > take an infinite amount of time. >=20 > Michael, any clue on which event we have to wait with > VIRTIO_BALLOON_F_MUST_TELL_HOST? IMHO, I don't think > VIRTIO_BALLOON_F_MUST_TELL_HOST applies to VIRTIO_BALLOON_F_FREE_PAGE_HIN= T and > we'd better document that. It introduces complexity with no clear benefit= . I meant that we must wait for host to see the hint. Signalled via using the buffer. But maybe that's too far in the meaning from VIRTIO_BALLOON_F_MUST_TELL_HOST and we need a separate new flag for that. Then current code won't be broken (yay!) but we need to document another flag that's pretty similar. >=20 > --=20 > Thanks, >=20 > David / dhildenb