From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.9 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC37EC33CA1 for ; Wed, 5 Feb 2020 09:35:41 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7CD0320661 for ; Wed, 5 Feb 2020 09:35:41 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="dh1Olzek" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7CD0320661 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0D65F6B0099; Wed, 5 Feb 2020 04:35:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 087A66B009B; Wed, 5 Feb 2020 04:35:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EB94C6B009C; Wed, 5 Feb 2020 04:35:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0185.hostedemail.com [216.40.44.185]) by kanga.kvack.org (Postfix) with ESMTP id D07796B0099 for ; Wed, 5 Feb 2020 04:35:40 -0500 (EST) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 6B6BD180AD802 for ; Wed, 5 Feb 2020 09:35:40 +0000 (UTC) X-FDA: 76455566040.11.steam97_3162feebf7802 X-HE-Tag: steam97_3162feebf7802 X-Filterd-Recvd-Size: 6350 Received: from us-smtp-delivery-1.mimecast.com (us-smtp-1.mimecast.com [207.211.31.81]) by imf11.hostedemail.com (Postfix) with ESMTP for ; Wed, 5 Feb 2020 09:35:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1580895338; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=X1tpiZxErI7Y9KqkNVkrJfQJ0ez629vBBJ9WAMgjLcQ=; b=dh1Olzekmgy8ETVkzzaQ8usZy3nhI/Q1gApfkEPslFAYVeLDIN3AnGxqdn1jiqGkdxv7sV YH8OQ0c3I9Su5pQizvIrXdPjCpk4oWpqSlv/7qoW5fh/ilgG/gBQDT/nfRo/23VZ0hlCju pmmzBA/abUFQebotsUHol/srvdgFHR0= Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-321-PeLT9fo0NGO5dl9iB-j9mA-1; Wed, 05 Feb 2020 04:35:34 -0500 Received: by mail-qk1-f199.google.com with SMTP id k10so894261qki.2 for ; Wed, 05 Feb 2020 01:35:34 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=7MqEFNgg7VYElHgtk2ZTi+JKO+vEk3haTWlqj2aheBQ=; b=X6de5SNx5DcycdAlo55GYq7RqwR28+AxVz0Eb4mWDHBxFZqzAyIUIo9TRCCxjj3w/p bFaRCs9pbHoK8UhSwg/YRazsHW+7PkhFqinyhLHGgwHIaFrmoCOI1YuMZsHgzKHV2Zq6 ecEoov9gR8PkLGiYmhPW+C4nmBURns32X8TW583Bm0hlcyyEXndx4r2HYhZI1Gg+QSVQ R7Nth9b0COrUV1GbaNhh/kq6Wd/4EortVmxe1i+wqmEAhpbquUjnTYRealIIv0Xy6R8S BlWjmCMlrX1zAZoQwuW8uj5sqjUNP+VaMosldoKuPm9cGSnzXT1Y1W6NpfHMQN84gZUG cN4g== X-Gm-Message-State: APjAAAWYLN1m/c3fzeLpv0zT+MKuMvlDuVhNIOc7Mw7AQEFdTDXh3scR FfhkanGnr2lcjSMrq+mgENwq0Y1tCPnjja6SoSPRR3FOzCK7EvyzjwLP9XZ8E8nDWcNLxhEViIt y7fe6NzXVDcE= X-Received: by 2002:ac8:f02:: with SMTP id e2mr30883804qtk.216.1580895333854; Wed, 05 Feb 2020 01:35:33 -0800 (PST) X-Google-Smtp-Source: APXvYqzKVxl/3Ja0GmRL0rmvaS3pQQsmYk9al03PirFusI3TalJVg/D1Y6CTtli+aC6JJuNwZQ2xTw== X-Received: by 2002:ac8:f02:: with SMTP id e2mr30883786qtk.216.1580895333592; Wed, 05 Feb 2020 01:35:33 -0800 (PST) Received: from redhat.com (bzq-79-176-41-183.red.bezeqint.net. [79.176.41.183]) by smtp.gmail.com with ESMTPSA id p92sm13317863qtd.14.2020.02.05.01.35.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 05 Feb 2020 01:35:32 -0800 (PST) Date: Wed, 5 Feb 2020 04:35:27 -0500 From: "Michael S. Tsirkin" To: David Hildenbrand Cc: "Wang, Wei W" , Nadav Amit , Alexander Duyck , Tyler Sanderson , "virtualization@lists.linux-foundation.org" , David Rientjes , "linux-mm@kvack.org" , Michal Hocko Subject: Re: Balloon pressuring page cache Message-ID: <20200205042655-mutt-send-email-mst@kernel.org> References: <20200204033657-mutt-send-email-mst@kernel.org> <286AC319A985734F985F78AFA26841F73E41F0F0@shsmsx102.ccr.corp.intel.com> <1eff30a0-a38a-cd45-2fc1-80cfd0bf5f04@redhat.com> <286AC319A985734F985F78AFA26841F73E41F306@shsmsx102.ccr.corp.intel.com> <2b388a78-79cd-f99a-6f87-6446dcb4b819@redhat.com> <286AC319A985734F985F78AFA26841F73E41F367@shsmsx102.ccr.corp.intel.com> <605bef3e-373f-f39a-4849-930326564b5b@redhat.com> <286AC319A985734F985F78AFA26841F73E41F3DE@shsmsx102.ccr.corp.intel.com> MIME-Version: 1.0 In-Reply-To: X-MC-Unique: PeLT9fo0NGO5dl9iB-j9mA-1 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Content-Disposition: inline X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Feb 05, 2020 at 10:22:34AM +0100, David Hildenbrand wrote: > >> 1. Guest allocates a page and sends it to the host. > >> 2. Shrinker gets active and releases that page again. > >> 3. Some user in the guest allocates and modifies that page. The dirty = bit is > >> set in the hypervisor. > >=20 > > The bit will be set in KVM's bitmap, and will be synced to QEMU's bitma= p when the next round starts. > >=20 > >> 4. The host processes the request and clears the bit in the dirty bitm= ap. > >=20 > > This clears the bit from the QEMU bitmap, and this page will not be sen= t in this round. > >=20 > >> 5. The guest is stopped and the last set of dirty pages is migrated. T= he > >> modified page is not being migrated (because not marked dirty). > >=20 > > When QEMU start the last round, it first syncs the bitmap from KVM, whi= ch includes the one set in step 3. > > Then the modified page gets sent. >=20 > So, if you run a TCG guest and use it with free page reporting, the race > is possible? I'd have to look at the implementation but the basic idea is not kvm specific. The idea is that hypervisor can detect that 3 happened after 1, by means of creating a copy of the dirty bitmap when request is sent to the guest. > So the correctness depends on two dirty bitmaps in the > hypervisor and how they interact. wow this is fragile. >=20 > Thanks for the info :) It's pretty fragile, and the annoying part is we do not actually benefit from this at all since it all only triggers in the shrinker corner case. The original idea was that we can send any hint to hypervisor and reuse the page immediately without waiting for hint to be seen. That seemed worth having, as a means to minimize impact of hinting. Then we dropped that and switched to OOM, and there not having to wait also seemed like a worthwhile thing. In the end we switched to shrinker where we can wait if we like, and many guests never even hit the shrinker so we have sacrificed robustness for nothing. If we go back to OOM then at least it's justified .. > --=20 > Thanks, >=20 > David / dhildenb From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Michael S. Tsirkin" Subject: Re: Balloon pressuring page cache Date: Wed, 5 Feb 2020 04:35:27 -0500 Message-ID: <20200205042655-mutt-send-email-mst@kernel.org> References: <20200204033657-mutt-send-email-mst@kernel.org> <286AC319A985734F985F78AFA26841F73E41F0F0@shsmsx102.ccr.corp.intel.com> <1eff30a0-a38a-cd45-2fc1-80cfd0bf5f04@redhat.com> <286AC319A985734F985F78AFA26841F73E41F306@shsmsx102.ccr.corp.intel.com> <2b388a78-79cd-f99a-6f87-6446dcb4b819@redhat.com> <286AC319A985734F985F78AFA26841F73E41F367@shsmsx102.ccr.corp.intel.com> <605bef3e-373f-f39a-4849-930326564b5b@redhat.com> <286AC319A985734F985F78AFA26841F73E41F3DE@shsmsx102.ccr.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Content-Disposition: inline List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: virtualization-bounces@lists.linux-foundation.org Sender: "Virtualization" To: David Hildenbrand Cc: "virtualization@lists.linux-foundation.org" , Tyler Sanderson , "linux-mm@kvack.org" , Nadav Amit , David Rientjes , Alexander Duyck , Michal Hocko List-Id: virtualization@lists.linuxfoundation.org On Wed, Feb 05, 2020 at 10:22:34AM +0100, David Hildenbrand wrote: > >> 1. Guest allocates a page and sends it to the host. > >> 2. Shrinker gets active and releases that page again. > >> 3. Some user in the guest allocates and modifies that page. The dirty bit is > >> set in the hypervisor. > > > > The bit will be set in KVM's bitmap, and will be synced to QEMU's bitmap when the next round starts. > > > >> 4. The host processes the request and clears the bit in the dirty bitmap. > > > > This clears the bit from the QEMU bitmap, and this page will not be sent in this round. > > > >> 5. The guest is stopped and the last set of dirty pages is migrated. The > >> modified page is not being migrated (because not marked dirty). > > > > When QEMU start the last round, it first syncs the bitmap from KVM, which includes the one set in step 3. > > Then the modified page gets sent. > > So, if you run a TCG guest and use it with free page reporting, the race > is possible? I'd have to look at the implementation but the basic idea is not kvm specific. The idea is that hypervisor can detect that 3 happened after 1, by means of creating a copy of the dirty bitmap when request is sent to the guest. > So the correctness depends on two dirty bitmaps in the > hypervisor and how they interact. wow this is fragile. > > Thanks for the info :) It's pretty fragile, and the annoying part is we do not actually benefit from this at all since it all only triggers in the shrinker corner case. The original idea was that we can send any hint to hypervisor and reuse the page immediately without waiting for hint to be seen. That seemed worth having, as a means to minimize impact of hinting. Then we dropped that and switched to OOM, and there not having to wait also seemed like a worthwhile thing. In the end we switched to shrinker where we can wait if we like, and many guests never even hit the shrinker so we have sacrificed robustness for nothing. If we go back to OOM then at least it's justified .. > -- > Thanks, > > David / dhildenb