From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.9 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F598C33CA1 for ; Wed, 5 Feb 2020 10:25:25 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D02BD2085B for ; Wed, 5 Feb 2020 10:25:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="QcMRZXkl" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D02BD2085B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 40DDE6B00AA; Wed, 5 Feb 2020 05:25:24 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3BD696B00AB; Wed, 5 Feb 2020 05:25:24 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2FB006B00AC; Wed, 5 Feb 2020 05:25:24 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0078.hostedemail.com [216.40.44.78]) by kanga.kvack.org (Postfix) with ESMTP id 17DD76B00AA for ; Wed, 5 Feb 2020 05:25:24 -0500 (EST) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id BE7A02463 for ; Wed, 5 Feb 2020 10:25:23 +0000 (UTC) X-FDA: 76455691326.06.sign29_2f0ea18a5752e X-HE-Tag: sign29_2f0ea18a5752e X-Filterd-Recvd-Size: 6211 Received: from us-smtp-1.mimecast.com (us-smtp-2.mimecast.com [205.139.110.61]) by imf04.hostedemail.com (Postfix) with ESMTP for ; Wed, 5 Feb 2020 10:25:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1580898322; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MS5+6/4zMOUyKjMPJjvg7OlCbu10tsnmgE+V5FOxXqY=; b=QcMRZXklP8XXAGCGtSy8GUUFjIWOA+9xKUlSOj6T97w/nbwfY2dNsjzj8TfT28aP4e5bPN ukvjGxFK1ewHrlikg8jNtSTwszyvg2fQC8eehWscDUlj5J3u7ZHM3z6OI70ZKFVI4EVlOC y+jRdCaTX6OVzca+qcNN++Wh6Ruf7ow= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-6-63D-A5ybNfqZOHPnP_rkMA-1; Wed, 05 Feb 2020 05:25:19 -0500 Received: by mail-qv1-f71.google.com with SMTP id g6so1202309qvp.0 for ; Wed, 05 Feb 2020 02:25:19 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=Th0XcpP7r0V6eKsY9PqmpTgUBfuwmqI1knRa/il+Xzw=; b=Iec5dAZ88eErV0wKEnhj44nx8DMZLgEinW6EVKyowsOEbefReUoEVaTFknH4JQkEUN tRVrvl3DIud2ohho4tqVGczPnVKVFQWVhDEaUu5NvaeKYFU7mD//EgNYyzULYFOZVBcA V64z4+d6vAQLphQiSb4Mgr9rjhDiv6UK0X6UElh4cDBkY1XHfy54PjBa5G1AAg8PyhYt dEug+bWu4JOEAU+Z5G5cvcYWmobgm5w6M4DWi0hooHZhiKbhmk3KIHcezg5+gO00rSIY axofTwgT50euU+8gvHjn3siwWdOlm5pg+82e4Cj6ZOk7Ca0aDtTASaNsfhp4QWnj3OG0 xJlA== X-Gm-Message-State: APjAAAUKiv+B8Rkb6WbXW1C0Zs7Q93C4YWmQemQpAwCO7/77uPirBtsx iQmCm3YSAL8mftpwt9PJtafKKTtZG5Ki6JhEy4UIJg0EMdUUWkFGlftSXfiMYxm6m0LRw6PR73O gtfKlqrsX1rc= X-Received: by 2002:ac8:4244:: with SMTP id r4mr32591671qtm.169.1580898318696; Wed, 05 Feb 2020 02:25:18 -0800 (PST) X-Google-Smtp-Source: APXvYqxeN9/Sf51A57E2dquqji1cEOs/pPxiQYKvs5cjPgo3kYdFuAUjbxh7yniZG0sAnx3PGUrIfQ== X-Received: by 2002:ac8:4244:: with SMTP id r4mr32591653qtm.169.1580898318351; Wed, 05 Feb 2020 02:25:18 -0800 (PST) Received: from redhat.com (bzq-79-176-41-183.red.bezeqint.net. [79.176.41.183]) by smtp.gmail.com with ESMTPSA id r3sm3957528qtc.85.2020.02.05.02.25.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Feb 2020 02:25:17 -0800 (PST) Date: Wed, 5 Feb 2020 05:25:12 -0500 From: "Michael S. Tsirkin" To: David Hildenbrand Cc: "Wang, Wei W" , Nadav Amit , Alexander Duyck , Tyler Sanderson , "virtualization@lists.linux-foundation.org" , David Rientjes , "linux-mm@kvack.org" , Michal Hocko Subject: Re: Balloon pressuring page cache Message-ID: <20200205051404-mutt-send-email-mst@kernel.org> References: <286AC319A985734F985F78AFA26841F73E41F306@shsmsx102.ccr.corp.intel.com> <2b388a78-79cd-f99a-6f87-6446dcb4b819@redhat.com> <286AC319A985734F985F78AFA26841F73E41F367@shsmsx102.ccr.corp.intel.com> <605bef3e-373f-f39a-4849-930326564b5b@redhat.com> <286AC319A985734F985F78AFA26841F73E41F3DE@shsmsx102.ccr.corp.intel.com> <286AC319A985734F985F78AFA26841F73E41F490@shsmsx102.ccr.corp.intel.com> <5b184893-014c-35a1-928b-37b8f4647116@redhat.com> <286AC319A985734F985F78AFA26841F73E41F562@shsmsx102.ccr.corp.intel.com> MIME-Version: 1.0 In-Reply-To: X-MC-Unique: 63D-A5ybNfqZOHPnP_rkMA-1 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Content-Disposition: inline X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Feb 05, 2020 at 10:58:14AM +0100, David Hildenbrand wrote: > On 05.02.20 10:49, Wang, Wei W wrote: > > On Wednesday, February 5, 2020 5:37 PM, David Hildenbrand wrote: > >>> > >>> Not sure how TCG tracks the dirty bits. But In whatever > >>> implementation, the hypervisor should have > >> > >> There is only a single bitmap for that purpose. (well, the one where K= VM > >> syncs to) > >> > >>> already dealt with the race between he current round and the previous > >> round dirty recording. > >>> (the race isn't brought by this feature essentially) > >> > >> It is guaranteed to work reliably without this feature as you only cle= ar what > >> *has been migrated*,=20 > >=20 > > Not "clear what has been migrated" (that skips nothing..) > > Anyway, it's a hint used for optimization. >=20 > Yes, an optimization that might easily lead to data corruption when the > two bitmaps are either not in place or don't play along in that specific > way (and I suspect this is the case under TCG). So I checked and TCG has two copies too. Each block has bmap used for migration and also dirty_memory where pages are marked dirty. See cpu_physical_memory_sync_dirty_bitmap. So from QEMU POV, there is a callback that tells balloon when it's safe to request hints. As that affects the bitmap, that must not happen in parallel with dirty bitmap handling. Sounds like a reasonable limitation. The hint can be useful outside migration, but in its current form needs to then be non-destructive. E.g. I can imaging userspace calling MADV_SOFT_OFFLINE on the hinted memory. Again a flag that tells guest it should wait until used could be a reasonable expension. If we stick to the shrinker it's actually implementable easily. With an OOM notifier - I'm not so sure ... And a big part of the problem is that after all this time the page hinting interfaces are still undocumented. Quite sad really :( > --=20 > Thanks, >=20 > David / dhildenb From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Michael S. Tsirkin" Subject: Re: Balloon pressuring page cache Date: Wed, 5 Feb 2020 05:25:12 -0500 Message-ID: <20200205051404-mutt-send-email-mst@kernel.org> References: <286AC319A985734F985F78AFA26841F73E41F306@shsmsx102.ccr.corp.intel.com> <2b388a78-79cd-f99a-6f87-6446dcb4b819@redhat.com> <286AC319A985734F985F78AFA26841F73E41F367@shsmsx102.ccr.corp.intel.com> <605bef3e-373f-f39a-4849-930326564b5b@redhat.com> <286AC319A985734F985F78AFA26841F73E41F3DE@shsmsx102.ccr.corp.intel.com> <286AC319A985734F985F78AFA26841F73E41F490@shsmsx102.ccr.corp.intel.com> <5b184893-014c-35a1-928b-37b8f4647116@redhat.com> <286AC319A985734F985F78AFA26841F73E41F562@shsmsx102.ccr.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Content-Disposition: inline List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: virtualization-bounces@lists.linux-foundation.org Sender: "Virtualization" To: David Hildenbrand Cc: "virtualization@lists.linux-foundation.org" , Tyler Sanderson , "linux-mm@kvack.org" , Nadav Amit , David Rientjes , Alexander Duyck , Michal Hocko List-Id: virtualization@lists.linuxfoundation.org On Wed, Feb 05, 2020 at 10:58:14AM +0100, David Hildenbrand wrote: > On 05.02.20 10:49, Wang, Wei W wrote: > > On Wednesday, February 5, 2020 5:37 PM, David Hildenbrand wrote: > >>> > >>> Not sure how TCG tracks the dirty bits. But In whatever > >>> implementation, the hypervisor should have > >> > >> There is only a single bitmap for that purpose. (well, the one where KVM > >> syncs to) > >> > >>> already dealt with the race between he current round and the previous > >> round dirty recording. > >>> (the race isn't brought by this feature essentially) > >> > >> It is guaranteed to work reliably without this feature as you only clear what > >> *has been migrated*, > > > > Not "clear what has been migrated" (that skips nothing..) > > Anyway, it's a hint used for optimization. > > Yes, an optimization that might easily lead to data corruption when the > two bitmaps are either not in place or don't play along in that specific > way (and I suspect this is the case under TCG). So I checked and TCG has two copies too. Each block has bmap used for migration and also dirty_memory where pages are marked dirty. See cpu_physical_memory_sync_dirty_bitmap. So from QEMU POV, there is a callback that tells balloon when it's safe to request hints. As that affects the bitmap, that must not happen in parallel with dirty bitmap handling. Sounds like a reasonable limitation. The hint can be useful outside migration, but in its current form needs to then be non-destructive. E.g. I can imaging userspace calling MADV_SOFT_OFFLINE on the hinted memory. Again a flag that tells guest it should wait until used could be a reasonable expension. If we stick to the shrinker it's actually implementable easily. With an OOM notifier - I'm not so sure ... And a big part of the problem is that after all this time the page hinting interfaces are still undocumented. Quite sad really :( > -- > Thanks, > > David / dhildenb