From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,SUBJ_ALL_CAPS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5940FC35280 for ; Wed, 2 Oct 2019 13:46:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1A5F921920 for ; Wed, 2 Oct 2019 13:46:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1A5F921920 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A94546B0003; Wed, 2 Oct 2019 09:46:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A6A066B0006; Wed, 2 Oct 2019 09:46:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 958C76B0007; Wed, 2 Oct 2019 09:46:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0003.hostedemail.com [216.40.44.3]) by kanga.kvack.org (Postfix) with ESMTP id 6D9D16B0003 for ; Wed, 2 Oct 2019 09:46:39 -0400 (EDT) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id F026375BF for ; Wed, 2 Oct 2019 13:46:38 +0000 (UTC) X-FDA: 75998969676.01.jump34_9136f2a923747 X-HE-Tag: jump34_9136f2a923747 X-Filterd-Recvd-Size: 8198 Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by imf17.hostedemail.com (Postfix) with ESMTP for ; Wed, 2 Oct 2019 13:46:37 +0000 (UTC) Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id AFF9390B0A for ; Wed, 2 Oct 2019 13:46:36 +0000 (UTC) Received: by mail-wr1-f72.google.com with SMTP id w10so7521430wrl.5 for ; Wed, 02 Oct 2019 06:46:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:openpgp:message-id :date:user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=kPx8eHGD2bHazCr1CGLJp8Q8NfVHXws6Datb9inEIfY=; b=TcLwalu3BngVuFBvSA6szdr7EG8otuAuX+UGqkqYZiCOOzVcgYA3VcmwOILo5wFvgg a1ZLoYFxfjCQZbKYHRYzVsGCuG9iNd/xiVL/9uA1PNSj+K7e9uyJNDp7xmXlx5ESC4ux iHJcbJceRF5+66wlF0WHGKYgyQmIG1tIXjf2i2Z7BRhLIFkY7Lxl/QrShz7oB/0+wycJ 3iox2/Y7vaJdf4pqLfxUYyvWaQ6EJ9GACCk8Snw06h9xTICh1mqhpbJgrCzzWB8BmRDy UY4OvS5fj4PnOYLLhZkCaOe1iBDUAB8Ac0V6QXuE4sbpO+9amvBUg58ON9CxcPtgYiJQ xPGg== X-Gm-Message-State: APjAAAU689hyKN+U4Uh/fM9GH4DPko9eAxhG+ZShjxhnms/VbARKZ+ev M0PjXsqWG+RXyTBmMMi/ooLWDAUiW0goDAprGCdGBO3n0qOlNGsmVSfX1REqXL/m8jOZzrzHvLA QGfAb5k43dxY= X-Received: by 2002:a1c:1981:: with SMTP id 123mr2922471wmz.88.1570023995241; Wed, 02 Oct 2019 06:46:35 -0700 (PDT) X-Google-Smtp-Source: APXvYqx8qDYPL+VQO03VD6k1hrbeHasztg7iAYo7N7CmPWC4P/WVBb1+qFIK7XJ0Y9eQOY8+Wb3Orw== X-Received: by 2002:a1c:1981:: with SMTP id 123mr2922450wmz.88.1570023994911; Wed, 02 Oct 2019 06:46:34 -0700 (PDT) Received: from ?IPv6:2001:b07:6468:f312:b903:6d6f:a447:e464? ([2001:b07:6468:f312:b903:6d6f:a447:e464]) by smtp.gmail.com with ESMTPSA id z1sm38078286wre.40.2019.10.02.06.46.32 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 02 Oct 2019 06:46:33 -0700 (PDT) Subject: Re: DANGER WILL ROBINSON, DANGER To: Jerome Glisse , Mircea CIRJALIU - MELIU Cc: =?UTF-8?Q?Adalbert_Laz=c4=83r?= , Matthew Wilcox , "kvm@vger.kernel.org" , "linux-mm@kvack.org" , "virtualization@lists.linux-foundation.org" , =?UTF-8?B?UmFkaW0gS3LEjW3DocWZ?= , Konrad Rzeszutek Wilk , Tamas K Lengyel , Mathieu Tarral , =?UTF-8?Q?Samuel_Laur=c3=a9n?= , Patrick Colp , Jan Kiszka , Stefan Hajnoczi , Weijiang Yang , Yu C , =?UTF-8?Q?Mihai_Don=c8=9bu?= References: <20190809160047.8319-1-alazar@bitdefender.com> <20190809160047.8319-72-alazar@bitdefender.com> <20190809162444.GP5482@bombadil.infradead.org> <1565694095.D172a51.28640.@15f23d3a749365d981e968181cce585d2dcb3ffa> <20190815191929.GA9253@redhat.com> <20190815201630.GA25517@redhat.com> <20190905180955.GA3251@redhat.com> <5b0966de-b690-fb7b-5a72-bc7906459168@redhat.com> <20191002192714.GA5020@redhat.com> From: Paolo Bonzini Openpgp: preference=signencrypt Message-ID: Date: Wed, 2 Oct 2019 15:46:30 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: <20191002192714.GA5020@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 02/10/19 21:27, Jerome Glisse wrote: > On Tue, Sep 10, 2019 at 07:49:51AM +0000, Mircea CIRJALIU - MELIU wrote= : >>> On 05/09/19 20:09, Jerome Glisse wrote: >>>> Not sure i understand, you are saying that the solution i outline >>>> above does not work ? If so then i think you are wrong, in the above >>>> solution the importing process mmap a device file and the resulting >>>> vma is then populated using insert_pfn() and constantly keep >>>> synchronize with the target process through mirroring which means th= at >>>> you never have to look at the struct page ... you can mirror any kin= d >>>> of memory from the remote process. >>> >>> If insert_pfn in turn calls MMU notifiers for the target VMA (which w= ould be >>> the KVM MMU notifier), then that would work. Though I guess it would= be >>> possible to call MMU notifier update callbacks around the call to ins= ert_pfn. >> >> Can't do that. >> First, insert_pfn() uses set_pte_at() which won't trigger the MMU noti= fier on >> the target VMA. It's also static, so I'll have to access it thru vmf_i= nsert_pfn() >> or vmf_insert_mixed(). >=20 > Why would you need to target mmu notifier on target vma ? If the mapping of the source VMA changes, mirroring can update the target VMA via insert_pfn. But what ensures that KVM's MMU notifier dismantles its own existing page tables (so that they can be recreated with the new mapping from the source VMA)? Thanks, Paolo > You do not need > that. The workflow is: >=20 > userspace: > ptr =3D mmap(/dev/kvm-mirroring-device, virtual_addresse_of_tar= get) >=20 > Then when the mirroring process access ptr it triggers page fault that > endup in the vm_operation_struct->fault() which is just doing: >=20 > kernel-kvm-mirroring-function: > kvm_mirror_page_fault(struct vm_fault *vmf) { > struct kvm_mirror_struct *kvmms; >=20 > kvmms =3D kvm_mirror_struct_from_file(vmf->vma->vm_file); > ... > again: > hmm_range_register(&range); > hmm_range_snapshot(&range); > take_lock(kvmms->update); > if (!hmm_range_valid(&range)) { > vm_insert_pfn(); > drop_lock(kvmms->update); > hmm_range_unregister(&range); > return VM_FAULT_NOPAGE; > } > drop_lock(kvmms->update); > goto again; > } >=20 > The notifier callback: > kvmms_notifier_start() { > take_lock(kvmms->update); > clear_pte(start, end); > drop_lock(kvmms->update); > } >=20 >> >> Our model (the importing process is encapsulated in another VM) forces= us >> to mirror certain pages from the anon VMA backing one VM's system RAM = to=20 >> the other VM's anon VMA.=20 >=20 > The mirror does not have to be an anon vma it can very well be a > device vma ie mmap of a device file. I do not see any reasons why > the mirror need to be an anon vma. Please explain why. >=20 >> >> Using the functions above means setting VM_PFNMAP|VM_MIXEDMAP on=20 >> the target anon VMA, but I guess this breaks the VMA. Is this recommen= ded? >=20 > The mirror vma should not be an anon vma. >=20 >> >> Then, mapping anon pages from one VMA to another without fixing the=20 >> refcount and the mapcount breaks the daemons that think they're workin= g=20 >> on a pure anon VMA (kcompactd, khugepaged). >=20 > Note here the target vma ie the mirroring one is a mmap of device file > and thus is skip by all of the above (kcompactd, khugepaged, ...) it is > fully ignore by core mm. >=20 > Thus you do not need to fix the refcount in any way. If any of the core > mm try to reclaim memory from the original vma then you will get mmu > notifier callbacks and all you have to do is clear the page table of yo= ur > device vma. >=20 > I did exactly that as a tools in the past and it works just fine with > no change to core mm whatsoever. >=20 > Cheers, > J=C3=A9r=C3=B4me >=20