From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161686AbaJaCXe (ORCPT ); Thu, 30 Oct 2014 22:23:34 -0400 Received: from mail-vc0-f202.google.com ([209.85.220.202]:62016 "EHLO mail-vc0-f202.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1161558AbaJaCXb (ORCPT ); Thu, 30 Oct 2014 22:23:31 -0400 Date: Thu, 30 Oct 2014 19:23:27 -0700 From: Peter Feiner To: zhanghailiang Cc: Andrea Arcangeli , qemu-devel@nongnu.org, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Andres Lagar-Cavilla , Dave Hansen , Paolo Bonzini , Rik van Riel , Mel Gorman , Andy Lutomirski , Andrew Morton , Sasha Levin , Hugh Dickins , "Dr. David Alan Gilbert" , Christopher Covington , Johannes Weiner , Android Kernel Team , Robert Love , Dmitry Adamushko , Neil Brown , Mike Hommey , Taras Glek , Jan Kara , KOSAKI Motohiro , Michel Lespinasse , Minchan Kim , Keith Packard , "Huangpeng (Peter)" , Isaku Yamahata , Anthony Liguori , Stefan Hajnoczi , Wenchao Xia , Andrew Jones , Juan Quintela Subject: Re: [PATCH 00/17] RFC: userfault v2 Message-ID: <20141031022327.GA13275@google.com> References: <1412356087-16115-1-git-send-email-aarcange@redhat.com> <544E1143.1080905@huawei.com> <20141029174607.GK19606@redhat.com> <545221A4.9030606@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <545221A4.9030606@huawei.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Oct 30, 2014 at 07:31:48PM +0800, zhanghailiang wrote: > On 2014/10/30 1:46, Andrea Arcangeli wrote: > >On Mon, Oct 27, 2014 at 05:32:51PM +0800, zhanghailiang wrote: > >>I want to confirm a question: > >>Can we support distinguishing between writing and reading memory for userfault? > >>That is, we can decide whether writing a page, reading a page or both trigger userfault. > >Mail is going to be long enough already so I'll just assume tracking > >dirty memory in userland (instead of doing it in kernel) is worthy > >feature to have here. I'll open that can of worms :-) > [...] > Er, maybe i didn't describe clearly. What i really need for live memory snapshot > is only wrprotect fault, like kvm's dirty tracing mechanism, *only tracing write action*. > > So, what i need for userfault is supporting only wrprotect fault. i don't > want to get notification for non present reading faults, it will influence > VM's performance and the efficiency of doing snapshot. Given that you do care about performance Zhanghailiang, I don't think that a userfault handler is a good place to track dirty memory. Every dirtying write will block on the userfault handler, which is an expensively slow proposition compared to an in-kernel approach. > Also, i think this feature will benefit for migration of ivshmem and vhost-scsi > which have no dirty-page-tracing now. I do agree wholeheartedly with you here. Manually tracking non-guest writes adds to the complexity of device emulation code. A central fault-driven means for dirty tracking writes from the guest and host would be a welcome simplification to implementing pre-copy migration. Indeed, that's exactly what I'm working on! I'm using the softdirty bit, which was introduced recently for CRIU migration, to replace the use of KVM's dirty logging and manual dirty tracking by the VMM during pre-copy migration. See Documentation/vm/soft-dirty.txt and pagemap.txt in case you aren't familiar. To make softdirty usable for live migration, I've added an API to atomically test-and-clear the bit and write protect the page.