From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7EDBCC433ED for ; Mon, 19 Apr 2021 08:49:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5E6C461157 for ; Mon, 19 Apr 2021 08:49:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238381AbhDSIuI (ORCPT ); Mon, 19 Apr 2021 04:50:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55356 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238331AbhDSIuB (ORCPT ); Mon, 19 Apr 2021 04:50:01 -0400 Received: from mail-oi1-x232.google.com (mail-oi1-x232.google.com [IPv6:2607:f8b0:4864:20::232]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 61795C06174A; Mon, 19 Apr 2021 01:49:30 -0700 (PDT) Received: by mail-oi1-x232.google.com with SMTP id u80so1157134oia.0; Mon, 19 Apr 2021 01:49:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=j08yIBwXLk5HTWZ6SAY1miLlwOHxJ1A2dFdxyi2DvcY=; b=ukeWmfzj4t9brWPWzsNE3efem5PCcXY6cXNQYcKhjpD9UkFz4br1NIN8f/JBbihNmc vYpUKm/Vr/B4dKqp2kH4f6bCVJejPqdAhqwyGlqZ+V5te749g4/JhajR6YKNGTZfHPE4 9GAqcvd4JZc+6feJzwqc0YVM3+6G4G83pBQSCd2ZZ3rooz34mB/YLXy6ZS8PLGZh8Omx mFd0TiOioATJNxlJI2ajCidOxSN6ig9CoEKCLhG4SSrQkVZHB+uNTmYbGuntT3PFGI0J n+/n6yEnzHnvcEFp2rzTzc+KrdM8uac5ciHP738zzQ0p/eD208FmpVF7kZw4Z+kSIUDs cFJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=j08yIBwXLk5HTWZ6SAY1miLlwOHxJ1A2dFdxyi2DvcY=; b=IawgxPcSOL7hRez6WTUtSjJsDdynYgS0MLpSRvgsXDbida/0s8YWlzv8R2Tk/vi3F/ OQAsTCqSrMEo2PvjYXI6JBbN0/fbCnade3p4i66L+RDk3Oh4Q9gz+vWIlJZwbd/Pb/Ia kb26N4vKHhdrVw7N2o71+aS0kXIiuKXJ/azhx8sYYJlYFyUiTi05GdU8LPvS20UDvLAD gqx6ew6wHekOtMkxBZr5ARvBjCmpZOSq0NZFNzGw96b9VUvg3R/Y1PxE/uSKEuPfAWPB +hXIEgK7HOjMdU/AEGZtmcbgycvUFBaoPjIo1rMh5PoutX+7bys0Scpj10V6wQxDFYno dRwg== X-Gm-Message-State: AOAM533RSaQPypaoCes8dPdMgQ5EkqQ7FSfAxJbarLi6rtm1dYZ2OWIW XqyXFcm+8dexopv2PN961h4XkIyQ4TcQ4lPKj30= X-Google-Smtp-Source: ABdhPJy8F5yRINkAPts04KIndUObakf6+CqY0t8sQ3XggJcJX4zD4FAEWgQI/Ot5FoTZGR5pMwoWgxTwQHjcY6LtqVc= X-Received: by 2002:a05:6808:5c5:: with SMTP id d5mr15029122oij.141.1618822169834; Mon, 19 Apr 2021 01:49:29 -0700 (PDT) MIME-Version: 1.0 References: <20210402005658.3024832-1-seanjc@google.com> <20210402005658.3024832-10-seanjc@google.com> In-Reply-To: <20210402005658.3024832-10-seanjc@google.com> From: Wanpeng Li Date: Mon, 19 Apr 2021 16:49:20 +0800 Message-ID: Subject: Re: [PATCH v2 09/10] KVM: Don't take mmu_lock for range invalidation unless necessary To: Sean Christopherson Cc: Marc Zyngier , Huacai Chen , Aleksandar Markovic , Paul Mackerras , Paolo Bonzini , James Morse , Julien Thierry , Suzuki K Poulose , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , LAK , kvmarm@lists.cs.columbia.edu, linux-mips@vger.kernel.org, kvm , kvm-ppc@vger.kernel.org, LKML , Ben Gardon Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 2 Apr 2021 at 08:59, Sean Christopherson wrote: > > Avoid taking mmu_lock for unrelated .invalidate_range_{start,end}() > notifications. Because mmu_notifier_count must be modified while holding > mmu_lock for write, and must always be paired across start->end to stay > balanced, lock elision must happen in both or none. To meet that > requirement, add a rwsem to prevent memslot updates across range_start() > and range_end(). > > Use a rwsem instead of a rwlock since most notifiers _allow_ blocking, > and the lock will be endl across the entire start() ... end() sequence. > If anything in the sequence sleeps, including the caller or a different > notifier, holding the spinlock would be disastrous. > > For notifiers that _disallow_ blocking, e.g. OOM reaping, simply go down > the slow path of unconditionally acquiring mmu_lock. The sane > alternative would be to try to acquire the lock and force the notifier > to retry on failure. But since OOM is currently the _only_ scenario > where blocking is disallowed attempting to optimize a guest that has been > marked for death is pointless. > > Unconditionally define and use mmu_notifier_slots_lock in the memslots > code, purely to avoid more #ifdefs. The overhead of acquiring the lock > is negligible when the lock is uncontested, which will always be the case > when the MMU notifiers are not used. > > Note, technically flag-only memslot updates could be allowed in parallel, > but stalling a memslot update for a relatively short amount of time is > not a scalability issue, and this is all more than complex enough. > > Based heavily on code from Ben Gardon. > > Suggested-by: Ben Gardon > Signed-off-by: Sean Christopherson I saw this splatting: ====================================================== WARNING: possible circular locking dependency detected 5.12.0-rc3+ #6 Tainted: G OE ------------------------------------------------------ qemu-system-x86/3069 is trying to acquire lock: ffffffff9c775ca0 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}, at: __mmu_notifier_invalidate_range_end+0x5/0x190 but task is already holding lock: ffffaff7410a9160 (&kvm->mmu_notifier_slots_lock){.+.+}-{3:3}, at: kvm_mmu_notifier_invalidate_range_start+0x36d/0x4f0 [kvm] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&kvm->mmu_notifier_slots_lock){.+.+}-{3:3}: down_read+0x48/0x250 kvm_mmu_notifier_invalidate_range_start+0x36d/0x4f0 [kvm] __mmu_notifier_invalidate_range_start+0xe8/0x260 wp_page_copy+0x82b/0xa30 do_wp_page+0xde/0x420 __handle_mm_fault+0x935/0x1230 handle_mm_fault+0x179/0x420 do_user_addr_fault+0x1b3/0x690 exc_page_fault+0x82/0x2b0 asm_exc_page_fault+0x1e/0x30 -> #0 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}: __lock_acquire+0x110f/0x1980 lock_acquire+0x1bc/0x400 __mmu_notifier_invalidate_range_end+0x47/0x190 wp_page_copy+0x796/0xa30 do_wp_page+0xde/0x420 __handle_mm_fault+0x935/0x1230 handle_mm_fault+0x179/0x420 do_user_addr_fault+0x1b3/0x690 exc_page_fault+0x82/0x2b0 asm_exc_page_fault+0x1e/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&kvm->mmu_notifier_slots_lock); lock(mmu_notifier_invalidate_range_start); lock(&kvm->mmu_notifier_slots_lock); lock(mmu_notifier_invalidate_range_start); *** DEADLOCK *** 2 locks held by qemu-system-x86/3069: #0: ffff9e4269f8a9e0 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x10e/0x690 #1: ffffaff7410a9160 (&kvm->mmu_notifier_slots_lock){.+.+}-{3:3}, at: kvm_mmu_notifier_invalidate_range_start+0x36d/0x4f0 [kvm] stack backtrace: CPU: 0 PID: 3069 Comm: qemu-system-x86 Tainted: G OE 5.12.0-rc3+ #6 Hardware name: LENOVO ThinkCentre M8500t-N000/SHARKBAY, BIOS FBKTC1AUS 02/16/2016 Call Trace: dump_stack+0x87/0xb7 print_circular_bug.isra.39+0x1b4/0x210 check_noncircular+0x103/0x150 __lock_acquire+0x110f/0x1980 ? __lock_acquire+0x110f/0x1980 lock_acquire+0x1bc/0x400 ? __mmu_notifier_invalidate_range_end+0x5/0x190 ? find_held_lock+0x40/0xb0 __mmu_notifier_invalidate_range_end+0x47/0x190 ? __mmu_notifier_invalidate_range_end+0x5/0x190 wp_page_copy+0x796/0xa30 do_wp_page+0xde/0x420 __handle_mm_fault+0x935/0x1230 handle_mm_fault+0x179/0x420 do_user_addr_fault+0x1b3/0x690 ? rcu_read_lock_sched_held+0x4f/0x80 exc_page_fault+0x82/0x2b0 ? asm_exc_page_fault+0x8/0x30 asm_exc_page_fault+0x1e/0x30 RIP: 0033:0x55f5bef2560f From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.5 required=3.0 tests=BAYES_00,DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C697C433B4 for ; Mon, 19 Apr 2021 10:40:49 +0000 (UTC) Received: from mm01.cs.columbia.edu (mm01.cs.columbia.edu [128.59.11.253]) by mail.kernel.org (Postfix) with ESMTP id 0470161157 for ; Mon, 19 Apr 2021 10:40:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0470161157 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvmarm-bounces@lists.cs.columbia.edu Received: from localhost (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 640ED4B501; Mon, 19 Apr 2021 06:40:48 -0400 (EDT) X-Virus-Scanned: at lists.cs.columbia.edu Authentication-Results: mm01.cs.columbia.edu (amavisd-new); dkim=softfail (fail, message has been altered) header.i=@gmail.com Received: from mm01.cs.columbia.edu ([127.0.0.1]) by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IWkNOfJQkwS7; Mon, 19 Apr 2021 06:40:44 -0400 (EDT) Received: from mm01.cs.columbia.edu (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 70D464B3D6; Mon, 19 Apr 2021 06:40:44 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by mm01.cs.columbia.edu (Postfix) with ESMTP id 628C34B34C for ; Mon, 19 Apr 2021 04:49:35 -0400 (EDT) X-Virus-Scanned: at lists.cs.columbia.edu Received: from mm01.cs.columbia.edu ([127.0.0.1]) by localhost (mm01.cs.columbia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ZwTGTkjjy755 for ; Mon, 19 Apr 2021 04:49:30 -0400 (EDT) Received: from mail-oi1-f173.google.com (mail-oi1-f173.google.com [209.85.167.173]) by mm01.cs.columbia.edu (Postfix) with ESMTPS id 63A6F4B32D for ; Mon, 19 Apr 2021 04:49:30 -0400 (EDT) Received: by mail-oi1-f173.google.com with SMTP id k18so29802806oik.1 for ; Mon, 19 Apr 2021 01:49:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=j08yIBwXLk5HTWZ6SAY1miLlwOHxJ1A2dFdxyi2DvcY=; b=ukeWmfzj4t9brWPWzsNE3efem5PCcXY6cXNQYcKhjpD9UkFz4br1NIN8f/JBbihNmc vYpUKm/Vr/B4dKqp2kH4f6bCVJejPqdAhqwyGlqZ+V5te749g4/JhajR6YKNGTZfHPE4 9GAqcvd4JZc+6feJzwqc0YVM3+6G4G83pBQSCd2ZZ3rooz34mB/YLXy6ZS8PLGZh8Omx mFd0TiOioATJNxlJI2ajCidOxSN6ig9CoEKCLhG4SSrQkVZHB+uNTmYbGuntT3PFGI0J n+/n6yEnzHnvcEFp2rzTzc+KrdM8uac5ciHP738zzQ0p/eD208FmpVF7kZw4Z+kSIUDs cFJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=j08yIBwXLk5HTWZ6SAY1miLlwOHxJ1A2dFdxyi2DvcY=; b=Ty4qidKf2S+LRfy5GhPOr5tSYTM7sWGO36cC5uhTEeaS4Ugn7R0IMUqTUqvaJff+iI e3Xdo24LIMwsf2n37Mu1Uz5IUW4ADKdYuDIxenE6E5SWOGHTg+kWr5Sb4JE895L3VKQP lL3EoM2G/CQ8dm/mdiHEN05v0y7AIX/0JrMlDRqr7zY6jYPVSQ/QbHZCALFcyIxGf79q UuXQtLItqrPwytnwjotCsSaUuzaSLmhISM6sh+JNPdJsuLXezFZPlUqxzLrJa6IQCSPM tFhC03Yl3bYEV9XtTu2sFX8ddfIfim3cXCGZMpoHqVAjaG5971sPMP83FIHWeoV7LPlh ZzKg== X-Gm-Message-State: AOAM532kDx0zqZfLbyLkyxqCndyMeyBHUbkHJZc5gGFV/XZGyAsoYDYA ZRGGcplTYMDTFc1gZaMQsuuH3oJaIOcmyw3uRoc= X-Google-Smtp-Source: ABdhPJy8F5yRINkAPts04KIndUObakf6+CqY0t8sQ3XggJcJX4zD4FAEWgQI/Ot5FoTZGR5pMwoWgxTwQHjcY6LtqVc= X-Received: by 2002:a05:6808:5c5:: with SMTP id d5mr15029122oij.141.1618822169834; Mon, 19 Apr 2021 01:49:29 -0700 (PDT) MIME-Version: 1.0 References: <20210402005658.3024832-1-seanjc@google.com> <20210402005658.3024832-10-seanjc@google.com> In-Reply-To: <20210402005658.3024832-10-seanjc@google.com> From: Wanpeng Li Date: Mon, 19 Apr 2021 16:49:20 +0800 Message-ID: Subject: Re: [PATCH v2 09/10] KVM: Don't take mmu_lock for range invalidation unless necessary To: Sean Christopherson X-Mailman-Approved-At: Mon, 19 Apr 2021 06:40:43 -0400 Cc: Wanpeng Li , kvm , Marc Zyngier , Joerg Roedel , Huacai Chen , linux-mips@vger.kernel.org, kvm-ppc@vger.kernel.org, LKML , Paul Mackerras , Aleksandar Markovic , LAK , Ben Gardon , Paolo Bonzini , Vitaly Kuznetsov , kvmarm@lists.cs.columbia.edu, Jim Mattson X-BeenThere: kvmarm@lists.cs.columbia.edu X-Mailman-Version: 2.1.14 Precedence: list List-Id: Where KVM/ARM decisions are made List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: kvmarm-bounces@lists.cs.columbia.edu Sender: kvmarm-bounces@lists.cs.columbia.edu On Fri, 2 Apr 2021 at 08:59, Sean Christopherson wrote: > > Avoid taking mmu_lock for unrelated .invalidate_range_{start,end}() > notifications. Because mmu_notifier_count must be modified while holding > mmu_lock for write, and must always be paired across start->end to stay > balanced, lock elision must happen in both or none. To meet that > requirement, add a rwsem to prevent memslot updates across range_start() > and range_end(). > > Use a rwsem instead of a rwlock since most notifiers _allow_ blocking, > and the lock will be endl across the entire start() ... end() sequence. > If anything in the sequence sleeps, including the caller or a different > notifier, holding the spinlock would be disastrous. > > For notifiers that _disallow_ blocking, e.g. OOM reaping, simply go down > the slow path of unconditionally acquiring mmu_lock. The sane > alternative would be to try to acquire the lock and force the notifier > to retry on failure. But since OOM is currently the _only_ scenario > where blocking is disallowed attempting to optimize a guest that has been > marked for death is pointless. > > Unconditionally define and use mmu_notifier_slots_lock in the memslots > code, purely to avoid more #ifdefs. The overhead of acquiring the lock > is negligible when the lock is uncontested, which will always be the case > when the MMU notifiers are not used. > > Note, technically flag-only memslot updates could be allowed in parallel, > but stalling a memslot update for a relatively short amount of time is > not a scalability issue, and this is all more than complex enough. > > Based heavily on code from Ben Gardon. > > Suggested-by: Ben Gardon > Signed-off-by: Sean Christopherson I saw this splatting: ====================================================== WARNING: possible circular locking dependency detected 5.12.0-rc3+ #6 Tainted: G OE ------------------------------------------------------ qemu-system-x86/3069 is trying to acquire lock: ffffffff9c775ca0 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}, at: __mmu_notifier_invalidate_range_end+0x5/0x190 but task is already holding lock: ffffaff7410a9160 (&kvm->mmu_notifier_slots_lock){.+.+}-{3:3}, at: kvm_mmu_notifier_invalidate_range_start+0x36d/0x4f0 [kvm] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&kvm->mmu_notifier_slots_lock){.+.+}-{3:3}: down_read+0x48/0x250 kvm_mmu_notifier_invalidate_range_start+0x36d/0x4f0 [kvm] __mmu_notifier_invalidate_range_start+0xe8/0x260 wp_page_copy+0x82b/0xa30 do_wp_page+0xde/0x420 __handle_mm_fault+0x935/0x1230 handle_mm_fault+0x179/0x420 do_user_addr_fault+0x1b3/0x690 exc_page_fault+0x82/0x2b0 asm_exc_page_fault+0x1e/0x30 -> #0 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}: __lock_acquire+0x110f/0x1980 lock_acquire+0x1bc/0x400 __mmu_notifier_invalidate_range_end+0x47/0x190 wp_page_copy+0x796/0xa30 do_wp_page+0xde/0x420 __handle_mm_fault+0x935/0x1230 handle_mm_fault+0x179/0x420 do_user_addr_fault+0x1b3/0x690 exc_page_fault+0x82/0x2b0 asm_exc_page_fault+0x1e/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&kvm->mmu_notifier_slots_lock); lock(mmu_notifier_invalidate_range_start); lock(&kvm->mmu_notifier_slots_lock); lock(mmu_notifier_invalidate_range_start); *** DEADLOCK *** 2 locks held by qemu-system-x86/3069: #0: ffff9e4269f8a9e0 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x10e/0x690 #1: ffffaff7410a9160 (&kvm->mmu_notifier_slots_lock){.+.+}-{3:3}, at: kvm_mmu_notifier_invalidate_range_start+0x36d/0x4f0 [kvm] stack backtrace: CPU: 0 PID: 3069 Comm: qemu-system-x86 Tainted: G OE 5.12.0-rc3+ #6 Hardware name: LENOVO ThinkCentre M8500t-N000/SHARKBAY, BIOS FBKTC1AUS 02/16/2016 Call Trace: dump_stack+0x87/0xb7 print_circular_bug.isra.39+0x1b4/0x210 check_noncircular+0x103/0x150 __lock_acquire+0x110f/0x1980 ? __lock_acquire+0x110f/0x1980 lock_acquire+0x1bc/0x400 ? __mmu_notifier_invalidate_range_end+0x5/0x190 ? find_held_lock+0x40/0xb0 __mmu_notifier_invalidate_range_end+0x47/0x190 ? __mmu_notifier_invalidate_range_end+0x5/0x190 wp_page_copy+0x796/0xa30 do_wp_page+0xde/0x420 __handle_mm_fault+0x935/0x1230 handle_mm_fault+0x179/0x420 do_user_addr_fault+0x1b3/0x690 ? rcu_read_lock_sched_held+0x4f/0x80 exc_page_fault+0x82/0x2b0 ? asm_exc_page_fault+0x8/0x30 asm_exc_page_fault+0x1e/0x30 RIP: 0033:0x55f5bef2560f _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.7 required=3.0 tests=BAYES_00,DKIM_ADSP_CUSTOM_MED, DKIM_SIGNED,DKIM_VALID,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5316BC433B4 for ; Mon, 19 Apr 2021 08:51:13 +0000 (UTC) Received: from desiato.infradead.org (desiato.infradead.org [90.155.92.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D4B4061107 for ; Mon, 19 Apr 2021 08:51:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D4B4061107 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=desiato.20200630; h=Sender:Content-Transfer-Encoding :Content-Type:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:Cc:To:Subject:Message-ID:Date:From:In-Reply-To: References:MIME-Version:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=md4VYgxzpMruq6nmVpqhnJkLaTcAwsNYqznISGogc8g=; b=BWqf2JXHJ7Gk3+YUmUOmRF2Aj o9zHo1X4ry0m9zFyPdOAJu9elhsywLvm3NgjI9h76HD+gEiAoQeqBifv3pNmgIjOe6ovdsUlYcGnD B55fobUmo9bR2IsfOAQCdaufelnXvMG/mRnhEPWAtw4ZBy2UWZYpDnqHl0tpe+EERdW+bPn5wMBWB 7KNDevCy2kyQ1xTt6i+srUZS2fWeJk/dUcKI9e4wc7hDUgDMamxWqphVFF4HFrXoDduWCmr7ZGXBf fU6WC/KWaZrgq1ULDUxFIRfVXl0XCW30bmrnJfVQiIOFmIswJlEcnV2KP8RIEFzjIq/1WS11jfm/r YDDI0Cqvw==; Received: from localhost ([::1] helo=desiato.infradead.org) by desiato.infradead.org with esmtp (Exim 4.94 #2 (Red Hat Linux)) id 1lYPbG-009UZQ-Q0; Mon, 19 Apr 2021 08:49:46 +0000 Received: from bombadil.infradead.org ([2607:7c80:54:e::133]) by desiato.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1lYPbA-009UZD-Ur for linux-arm-kernel@desiato.infradead.org; Mon, 19 Apr 2021 08:49:41 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Content-Type:Cc:To:Subject:Message-ID :Date:From:In-Reply-To:References:MIME-Version:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=j08yIBwXLk5HTWZ6SAY1miLlwOHxJ1A2dFdxyi2DvcY=; b=YaKPAOQvSLrEbbHG9zIxds2iEi L60wTayvvJIm75FDQ3/tNuyScQ17uOGOUIR2x8OnH9PxkRlA+EPclodOkvFyl/8p/VLx8QMT9C5w9 fP31kJ4hs9cEKGPUh6qon+VviDF+PE0YYXgQzv4mBv//8bTGf60NVIhoGOCLlS86AayztaE2qxK5x 0pzFt4vZ806ZGjthKDEhFuOlUA0Uqgb1yvcb0Ax+exnmqF1UhsTp8nBdhHQE4D87A/J2YSXcgVRCj LMrPmp5zPegmcmkC6C1vXZCca6vOOuZmMlLIRsJsq3YkLLWy7q9hiPTtyEpt+dipQhqjuzlegTyqF IJxPHV7w==; Received: from mail-oi1-x22c.google.com ([2607:f8b0:4864:20::22c]) by bombadil.infradead.org with esmtps (Exim 4.94 #2 (Red Hat Linux)) id 1lYPb4-00BCrA-HL for linux-arm-kernel@lists.infradead.org; Mon, 19 Apr 2021 08:49:39 +0000 Received: by mail-oi1-x22c.google.com with SMTP id b3so19952407oie.5 for ; Mon, 19 Apr 2021 01:49:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=j08yIBwXLk5HTWZ6SAY1miLlwOHxJ1A2dFdxyi2DvcY=; b=ukeWmfzj4t9brWPWzsNE3efem5PCcXY6cXNQYcKhjpD9UkFz4br1NIN8f/JBbihNmc vYpUKm/Vr/B4dKqp2kH4f6bCVJejPqdAhqwyGlqZ+V5te749g4/JhajR6YKNGTZfHPE4 9GAqcvd4JZc+6feJzwqc0YVM3+6G4G83pBQSCd2ZZ3rooz34mB/YLXy6ZS8PLGZh8Omx mFd0TiOioATJNxlJI2ajCidOxSN6ig9CoEKCLhG4SSrQkVZHB+uNTmYbGuntT3PFGI0J n+/n6yEnzHnvcEFp2rzTzc+KrdM8uac5ciHP738zzQ0p/eD208FmpVF7kZw4Z+kSIUDs cFJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=j08yIBwXLk5HTWZ6SAY1miLlwOHxJ1A2dFdxyi2DvcY=; b=ZYWnMcSrtf1NekIFlD+GcE+gAZU2kkPE25dWNyX9zl5ehkDRTQc1A21R+SrfLoECZ3 t8ojqpPTh4mH19o0Z77L2WUd9ktYxTLqo0lD2K/yREesvYe4JelQF511jO/UA1AoTDPz 2JBFPeSc5JFmN8+z50mdR8/gIRvVRiSGE0QG6QCywlwFYPqhkQS27uOm0MW3T2AnTV7i LRHxLWHWaD39K7WSLpIsspAQkwTeWuKMpojasIAj81oIVxoaCU3vTZmRvt2M9FXS8A6k clEEg7P3ExEzrg17dLdIHVxKS2x471I0g85YCADcch3uBW72uSGyf7Ir6xvfOFJh6Yry 4Prw== X-Gm-Message-State: AOAM530ibWEp2er2WT1ugJgM+MOQUPLFsoHJ1mSPTHDaX3FzN/0WWqJn K/P0QxrzPgGZFhJF9i9Zut3u9siDUrcb9p1tT7o= X-Google-Smtp-Source: ABdhPJy8F5yRINkAPts04KIndUObakf6+CqY0t8sQ3XggJcJX4zD4FAEWgQI/Ot5FoTZGR5pMwoWgxTwQHjcY6LtqVc= X-Received: by 2002:a05:6808:5c5:: with SMTP id d5mr15029122oij.141.1618822169834; Mon, 19 Apr 2021 01:49:29 -0700 (PDT) MIME-Version: 1.0 References: <20210402005658.3024832-1-seanjc@google.com> <20210402005658.3024832-10-seanjc@google.com> In-Reply-To: <20210402005658.3024832-10-seanjc@google.com> From: Wanpeng Li Date: Mon, 19 Apr 2021 16:49:20 +0800 Message-ID: Subject: Re: [PATCH v2 09/10] KVM: Don't take mmu_lock for range invalidation unless necessary To: Sean Christopherson Cc: Marc Zyngier , Huacai Chen , Aleksandar Markovic , Paul Mackerras , Paolo Bonzini , James Morse , Julien Thierry , Suzuki K Poulose , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , LAK , kvmarm@lists.cs.columbia.edu, linux-mips@vger.kernel.org, kvm , kvm-ppc@vger.kernel.org, LKML , Ben Gardon X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210419_014934_613806_A032F08A X-CRM114-Status: GOOD ( 21.61 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Fri, 2 Apr 2021 at 08:59, Sean Christopherson wrote: > > Avoid taking mmu_lock for unrelated .invalidate_range_{start,end}() > notifications. Because mmu_notifier_count must be modified while holding > mmu_lock for write, and must always be paired across start->end to stay > balanced, lock elision must happen in both or none. To meet that > requirement, add a rwsem to prevent memslot updates across range_start() > and range_end(). > > Use a rwsem instead of a rwlock since most notifiers _allow_ blocking, > and the lock will be endl across the entire start() ... end() sequence. > If anything in the sequence sleeps, including the caller or a different > notifier, holding the spinlock would be disastrous. > > For notifiers that _disallow_ blocking, e.g. OOM reaping, simply go down > the slow path of unconditionally acquiring mmu_lock. The sane > alternative would be to try to acquire the lock and force the notifier > to retry on failure. But since OOM is currently the _only_ scenario > where blocking is disallowed attempting to optimize a guest that has been > marked for death is pointless. > > Unconditionally define and use mmu_notifier_slots_lock in the memslots > code, purely to avoid more #ifdefs. The overhead of acquiring the lock > is negligible when the lock is uncontested, which will always be the case > when the MMU notifiers are not used. > > Note, technically flag-only memslot updates could be allowed in parallel, > but stalling a memslot update for a relatively short amount of time is > not a scalability issue, and this is all more than complex enough. > > Based heavily on code from Ben Gardon. > > Suggested-by: Ben Gardon > Signed-off-by: Sean Christopherson I saw this splatting: ====================================================== WARNING: possible circular locking dependency detected 5.12.0-rc3+ #6 Tainted: G OE ------------------------------------------------------ qemu-system-x86/3069 is trying to acquire lock: ffffffff9c775ca0 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}, at: __mmu_notifier_invalidate_range_end+0x5/0x190 but task is already holding lock: ffffaff7410a9160 (&kvm->mmu_notifier_slots_lock){.+.+}-{3:3}, at: kvm_mmu_notifier_invalidate_range_start+0x36d/0x4f0 [kvm] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&kvm->mmu_notifier_slots_lock){.+.+}-{3:3}: down_read+0x48/0x250 kvm_mmu_notifier_invalidate_range_start+0x36d/0x4f0 [kvm] __mmu_notifier_invalidate_range_start+0xe8/0x260 wp_page_copy+0x82b/0xa30 do_wp_page+0xde/0x420 __handle_mm_fault+0x935/0x1230 handle_mm_fault+0x179/0x420 do_user_addr_fault+0x1b3/0x690 exc_page_fault+0x82/0x2b0 asm_exc_page_fault+0x1e/0x30 -> #0 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}: __lock_acquire+0x110f/0x1980 lock_acquire+0x1bc/0x400 __mmu_notifier_invalidate_range_end+0x47/0x190 wp_page_copy+0x796/0xa30 do_wp_page+0xde/0x420 __handle_mm_fault+0x935/0x1230 handle_mm_fault+0x179/0x420 do_user_addr_fault+0x1b3/0x690 exc_page_fault+0x82/0x2b0 asm_exc_page_fault+0x1e/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&kvm->mmu_notifier_slots_lock); lock(mmu_notifier_invalidate_range_start); lock(&kvm->mmu_notifier_slots_lock); lock(mmu_notifier_invalidate_range_start); *** DEADLOCK *** 2 locks held by qemu-system-x86/3069: #0: ffff9e4269f8a9e0 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x10e/0x690 #1: ffffaff7410a9160 (&kvm->mmu_notifier_slots_lock){.+.+}-{3:3}, at: kvm_mmu_notifier_invalidate_range_start+0x36d/0x4f0 [kvm] stack backtrace: CPU: 0 PID: 3069 Comm: qemu-system-x86 Tainted: G OE 5.12.0-rc3+ #6 Hardware name: LENOVO ThinkCentre M8500t-N000/SHARKBAY, BIOS FBKTC1AUS 02/16/2016 Call Trace: dump_stack+0x87/0xb7 print_circular_bug.isra.39+0x1b4/0x210 check_noncircular+0x103/0x150 __lock_acquire+0x110f/0x1980 ? __lock_acquire+0x110f/0x1980 lock_acquire+0x1bc/0x400 ? __mmu_notifier_invalidate_range_end+0x5/0x190 ? find_held_lock+0x40/0xb0 __mmu_notifier_invalidate_range_end+0x47/0x190 ? __mmu_notifier_invalidate_range_end+0x5/0x190 wp_page_copy+0x796/0xa30 do_wp_page+0xde/0x420 __handle_mm_fault+0x935/0x1230 handle_mm_fault+0x179/0x420 do_user_addr_fault+0x1b3/0x690 ? rcu_read_lock_sched_held+0x4f/0x80 exc_page_fault+0x82/0x2b0 ? asm_exc_page_fault+0x8/0x30 asm_exc_page_fault+0x1e/0x30 RIP: 0033:0x55f5bef2560f _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wanpeng Li Date: Mon, 19 Apr 2021 08:49:20 +0000 Subject: Re: [PATCH v2 09/10] KVM: Don't take mmu_lock for range invalidation unless necessary Message-Id: List-Id: References: <20210402005658.3024832-1-seanjc@google.com> <20210402005658.3024832-10-seanjc@google.com> In-Reply-To: <20210402005658.3024832-10-seanjc@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Sean Christopherson Cc: Marc Zyngier , Huacai Chen , Aleksandar Markovic , Paul Mackerras , Paolo Bonzini , James Morse , Julien Thierry , Suzuki K Poulose , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , LAK , kvmarm@lists.cs.columbia.edu, linux-mips@vger.kernel.org, kvm , kvm-ppc@vger.kernel.org, LKML , Ben Gardon On Fri, 2 Apr 2021 at 08:59, Sean Christopherson wrote: > > Avoid taking mmu_lock for unrelated .invalidate_range_{start,end}() > notifications. Because mmu_notifier_count must be modified while holding > mmu_lock for write, and must always be paired across start->end to stay > balanced, lock elision must happen in both or none. To meet that > requirement, add a rwsem to prevent memslot updates across range_start() > and range_end(). > > Use a rwsem instead of a rwlock since most notifiers _allow_ blocking, > and the lock will be endl across the entire start() ... end() sequence. > If anything in the sequence sleeps, including the caller or a different > notifier, holding the spinlock would be disastrous. > > For notifiers that _disallow_ blocking, e.g. OOM reaping, simply go down > the slow path of unconditionally acquiring mmu_lock. The sane > alternative would be to try to acquire the lock and force the notifier > to retry on failure. But since OOM is currently the _only_ scenario > where blocking is disallowed attempting to optimize a guest that has been > marked for death is pointless. > > Unconditionally define and use mmu_notifier_slots_lock in the memslots > code, purely to avoid more #ifdefs. The overhead of acquiring the lock > is negligible when the lock is uncontested, which will always be the case > when the MMU notifiers are not used. > > Note, technically flag-only memslot updates could be allowed in parallel, > but stalling a memslot update for a relatively short amount of time is > not a scalability issue, and this is all more than complex enough. > > Based heavily on code from Ben Gardon. > > Suggested-by: Ben Gardon > Signed-off-by: Sean Christopherson I saw this splatting: =========================== WARNING: possible circular locking dependency detected 5.12.0-rc3+ #6 Tainted: G OE ------------------------------------------------------ qemu-system-x86/3069 is trying to acquire lock: ffffffff9c775ca0 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}, at: __mmu_notifier_invalidate_range_end+0x5/0x190 but task is already holding lock: ffffaff7410a9160 (&kvm->mmu_notifier_slots_lock){.+.+}-{3:3}, at: kvm_mmu_notifier_invalidate_range_start+0x36d/0x4f0 [kvm] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&kvm->mmu_notifier_slots_lock){.+.+}-{3:3}: down_read+0x48/0x250 kvm_mmu_notifier_invalidate_range_start+0x36d/0x4f0 [kvm] __mmu_notifier_invalidate_range_start+0xe8/0x260 wp_page_copy+0x82b/0xa30 do_wp_page+0xde/0x420 __handle_mm_fault+0x935/0x1230 handle_mm_fault+0x179/0x420 do_user_addr_fault+0x1b3/0x690 exc_page_fault+0x82/0x2b0 asm_exc_page_fault+0x1e/0x30 -> #0 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}: __lock_acquire+0x110f/0x1980 lock_acquire+0x1bc/0x400 __mmu_notifier_invalidate_range_end+0x47/0x190 wp_page_copy+0x796/0xa30 do_wp_page+0xde/0x420 __handle_mm_fault+0x935/0x1230 handle_mm_fault+0x179/0x420 do_user_addr_fault+0x1b3/0x690 exc_page_fault+0x82/0x2b0 asm_exc_page_fault+0x1e/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&kvm->mmu_notifier_slots_lock); lock(mmu_notifier_invalidate_range_start); lock(&kvm->mmu_notifier_slots_lock); lock(mmu_notifier_invalidate_range_start); *** DEADLOCK *** 2 locks held by qemu-system-x86/3069: #0: ffff9e4269f8a9e0 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x10e/0x690 #1: ffffaff7410a9160 (&kvm->mmu_notifier_slots_lock){.+.+}-{3:3}, at: kvm_mmu_notifier_invalidate_range_start+0x36d/0x4f0 [kvm] stack backtrace: CPU: 0 PID: 3069 Comm: qemu-system-x86 Tainted: G OE 5.12.0-rc3+ #6 Hardware name: LENOVO ThinkCentre M8500t-N000/SHARKBAY, BIOS FBKTC1AUS 02/16/2016 Call Trace: dump_stack+0x87/0xb7 print_circular_bug.isra.39+0x1b4/0x210 check_noncircular+0x103/0x150 __lock_acquire+0x110f/0x1980 ? __lock_acquire+0x110f/0x1980 lock_acquire+0x1bc/0x400 ? __mmu_notifier_invalidate_range_end+0x5/0x190 ? find_held_lock+0x40/0xb0 __mmu_notifier_invalidate_range_end+0x47/0x190 ? __mmu_notifier_invalidate_range_end+0x5/0x190 wp_page_copy+0x796/0xa30 do_wp_page+0xde/0x420 __handle_mm_fault+0x935/0x1230 handle_mm_fault+0x179/0x420 do_user_addr_fault+0x1b3/0x690 ? rcu_read_lock_sched_held+0x4f/0x80 exc_page_fault+0x82/0x2b0 ? asm_exc_page_fault+0x8/0x30 asm_exc_page_fault+0x1e/0x30 RIP: 0033:0x55f5bef2560f