From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C47BBC433E6 for ; Wed, 6 Jan 2021 00:04:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9D25C22D74 for ; Wed, 6 Jan 2021 00:04:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727670AbhAFAEI (ORCPT ); Tue, 5 Jan 2021 19:04:08 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:43453 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727618AbhAFAEG (ORCPT ); Tue, 5 Jan 2021 19:04:06 -0500 X-Greylist: delayed 10563 seconds by postgrey-1.27 at vger.kernel.org; Tue, 05 Jan 2021 19:04:05 EST DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1609891360; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=OZHaphMQfCY5JmI7daFe9jX4tJF9k61pGXu9/0Ggk+4=; b=EGo0RzTgvAlPgiKlPk7Cswh9pDEeXaj9ggTr96jYzjiZUCyCHMbFuVekZ8QZAI2tg0NsxS 5xXqvs0lq9mVDI4rQzCOnl55X6ZgE0JS9nu/awvCr30Pf6K5Ix9DQjYDR9r5IfzNi0Jz4B Ixmf94xQm4fKJDpyZIZ8fRs5ZLVqMIs= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-12-pyh1spLwMFmdFRT2HH3AwA-1; Tue, 05 Jan 2021 19:02:36 -0500 X-MC-Unique: pyh1spLwMFmdFRT2HH3AwA-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D8D30800D55; Wed, 6 Jan 2021 00:02:34 +0000 (UTC) Received: from mail (ovpn-112-76.rdu2.redhat.com [10.10.112.76]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 4F2FA7092E; Wed, 6 Jan 2021 00:02:31 +0000 (UTC) Date: Tue, 5 Jan 2021 19:02:30 -0500 From: Andrea Arcangeli To: Nadav Amit Cc: linux-mm , lkml , Yu Zhao , Andy Lutomirski , Peter Xu , Pavel Emelyanov , Mike Kravetz , Mike Rapoport , Minchan Kim , Will Deacon , Peter Zijlstra Subject: Re: [RFC PATCH v2 2/2] fs/task_mmu: acquire mmap_lock for write on soft-dirty cleanup Message-ID: References: <20201225092529.3228466-1-namit@vmware.com> <20201225092529.3228466-3-namit@vmware.com> <15758743-B8E3-48C4-A13B-DFFEBF8AF435@vmware.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/2.0.4 (2020-12-30) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jan 05, 2021 at 09:22:51PM +0000, Nadav Amit wrote: > It is also about performance due to unwarranted TLB flushes. If there will be a problem switching to the wait_flush_pending() model suggested by Peter may not even require changes to the common code in memory.c since I'm thinking it may not even need to take a failure path if we plug it in the same place of the tlb flush. So instead of the flush we could always block there until we read zero in the atomic, then smp_rmb() and we're ready to start the copy. So either we flush IPI if we didn't read zero, or we block until we read zero, the difference is going to be hidden to do_wp_page. All do_wp_page cares about is that by the time the abstract call returns, there's no stale TLB left for such pte. If it is achieved by blocking and waiting or flushing the TLB it doesn't matter too much. So thinking of how bad the IPI will do, with the improved arm64 tlb flushing code in production, we keep track of how many simultaneous mm context there are, specifically to skip the SMP-unscalable TLBI broadcast on arm64 like we already avoid IPIs on lazy tlbs on x86 (see x86 tlb_is_not_lazy in native_flush_tlb_others). In other words the IPI will materialize only if there's more than one thread running while clear_refs run. All lazy tlbs won't get IPIs on both x86 upstream and arm64 enterprise. This won't help multithreaded processes that compute from all CPUs at all times but even multiple vcpu threads aren't always guaranteed to be running at all times. My main concern would be an IPI flood that slowdown clear_refs and UFFDIO_WRITEPROTECT, but an incremental optimization (not required for correctness) is to have UFFDIO_WRITEPROTECT and clear_refs switch to lazy tlb mode before they call inc_tlb_flush_pending() and unlazy only after dec_tlb_flush_pending. So it's possible to at least guarantee the IPI won't slow down them down. > In addition, as I stated before, having some clean interfaces that tell > whether a TLB flush is needed or not would be helpful and simpler to follow. > For instance, we can have is_pte_prot_demotion(oldprot, newprot) to figure > out whether a TLB flush is needed in change_pte_range() and avoid > unnecessary flushes when unprotecting pages with either mprotect() or > userfaultfd. When you mentioned this earlier I was thinking what happens then with flush_tlb_fix_spurious_fault(). The fact it's safe doesn't guarantee it's a performance win if there's a stream of spurious faults as result. So it'd need to be checked, especially as in the case of mprotect where the flush can be deferred and coalesced in a single IPI at the end so there's not so much to gain from it anyway. If you can guarantee there won't be a flood suprious wrprotect faults, then it'll be a nice optimization. Thanks, Andrea