From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.4 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C90BC35242 for ; Tue, 11 Feb 2020 21:28:50 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4F71A20708 for ; Tue, 11 Feb 2020 21:28:50 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="h/6NC+mq" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4F71A20708 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=nvidia.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C57406B0323; Tue, 11 Feb 2020 16:28:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BE0B06B0325; Tue, 11 Feb 2020 16:28:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A81836B0326; Tue, 11 Feb 2020 16:28:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0103.hostedemail.com [216.40.44.103]) by kanga.kvack.org (Postfix) with ESMTP id 8A20F6B0323 for ; Tue, 11 Feb 2020 16:28:49 -0500 (EST) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 424CF2479 for ; Tue, 11 Feb 2020 21:28:49 +0000 (UTC) X-FDA: 76479135978.05.cub11_4ca6b2100b942 X-HE-Tag: cub11_4ca6b2100b942 X-Filterd-Recvd-Size: 9609 Received: from hqnvemgate26.nvidia.com (hqnvemgate26.nvidia.com [216.228.121.65]) by imf38.hostedemail.com (Postfix) with ESMTP for ; Tue, 11 Feb 2020 21:28:46 +0000 (UTC) Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate26.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Tue, 11 Feb 2020 13:28:31 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Tue, 11 Feb 2020 13:28:44 -0800 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Tue, 11 Feb 2020 13:28:44 -0800 Received: from rcampbell-dev.nvidia.com (172.20.13.39) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Tue, 11 Feb 2020 21:28:43 +0000 Subject: Re: [PATCH v3] mm/mmu_notifier: prevent unpaired invalidate_start and invalidate_end To: Jason Gunthorpe , CC: Michal Hocko , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Christoph Hellwig References: <20200211205252.GA10003@ziepe.ca> X-Nvconfidentiality: public From: Ralph Campbell Message-ID: Date: Tue, 11 Feb 2020 13:28:42 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.1.0 MIME-Version: 1.0 In-Reply-To: <20200211205252.GA10003@ziepe.ca> X-Originating-IP: [172.20.13.39] X-ClientProxiedBy: HQMAIL107.nvidia.com (172.20.187.13) To HQMAIL107.nvidia.com (172.20.187.13) Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1581456511; bh=8Ocp4pegFMwwI1LezNdNiuVFRIERwBQg7HgiOA1I+rg=; h=X-PGP-Universal:Subject:To:CC:References:X-Nvconfidentiality:From: Message-ID:Date:User-Agent:MIME-Version:In-Reply-To: X-Originating-IP:X-ClientProxiedBy:Content-Type:Content-Language: Content-Transfer-Encoding; b=h/6NC+mqo/HtRwj15I4AK347QvjaqGKDEg3ab9GgG298vdIoYNim6h6qCiuoVIQFA 4XEiBeI3CWkClsayR7XTf4PYl1lACi86/j82/viNfkWXIy0NRb0zOIXQXtjDFDW0CU gc0m1CL+8zodD9URjmihIaYAPWi+OQJuQQXxysucko5eAJG+ARTB2XZOxO5i0csADc eQdYiZSD4xlPlhP+ld7Xp49lBx3A2IwlnsFZrZEFabtsuGltudkeF9kEkMZp9hJR3f yTPDOtAtzPAJE8Wz9vFnSQMqV8qPp6gWtGWsQ/FO3oi8J3VjwyxEkthykpuOi13/Bu iWquvhJV9sCHg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2/11/20 12:52 PM, Jason Gunthorpe wrote: > Many users of the mmu_notifier invalidate_range callbacks maintain > locking/counters/etc on a paired basis and have long expected that > invalidate_range_start/end() are always paired. >=20 > For instance kvm_mmu_notifier_invalidate_range_end() undoes > kvm->mmu_notifier_count which was incremented during start(). >=20 > The recent change to add non-blocking notifiers breaks this assumption > when multiple notifiers are present in the list. When EAGAIN is returned > from an invalidate_range_start() then no invalidate_range_ends() are > called, even if the subscription's start had previously been called. >=20 > Unfortunately, due to the RCU list traversal we can't reliably generate a > subset of the linked list representing the notifiers already called to > generate an invalidate_range_end() pairing. >=20 > One case works correctly, if only one subscription requires > invalidate_range_end() and it is the last entry in the hlist. In this > case, when invalidate_range_start() returns -EAGAIN there will be nothing > to unwind. >=20 > Keep the notifier hlist sorted so that notifiers that require > invalidate_range_end() are always last, and if two are added then disable > non-blocking invalidation for the mm. >=20 > A warning is printed for this case, if in future we determine this never > happens then we can simply fail during registration when there are > unsupported combinations of notifiers. >=20 > Fixes: 93065ac753e4 ("mm, oom: distinguish blockable mode for mmu notifie= rs") > Cc: Michal Hocko > Cc: "J=C3=A9r=C3=B4me Glisse" > Cc: Christoph Hellwig > Signed-off-by: Jason Gunthorpe > --- > mm/mmu_notifier.c | 53 ++++++++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 50 insertions(+), 3 deletions(-) >=20 > v1: https://lore.kernel.org/linux-mm/20190724152858.GB28493@ziepe.ca/ > v2: https://lore.kernel.org/linux-mm/20190807191627.GA3008@ziepe.ca/ > * Abandon attempting to fix it by calling invalidate_range_end() during a= n > EAGAIN start > * Just trivially ban multiple subscriptions > v3: > * Be more sophisticated, ban only multiple subscriptions if the result is > a failure. Allows multiple subscriptions without invalidate_range_end > * Include a printk when this condition is hit (Michal) >=20 > At this point the rework Christoph requested during the first posting > is completed and there are now only 3 drivers using > invalidate_range_end(): >=20 > drivers/misc/mic/scif/scif_dma.c: .invalidate_range_end =3D scif_mm= u_notifier_invalidate_range_end}; > drivers/misc/sgi-gru/grutlbpurge.c: .invalidate_range_end =3D gru_i= nvalidate_range_end, > virt/kvm/kvm_main.c: .invalidate_range_end =3D kvm_mmu_notifier_inva= lidate_range_end, >=20 > While I think it is unlikely that any of these drivers will be used in > combination with each other, display a printk in hopes to check. >=20 > Someday I expect to just fail the registration on this condition. >=20 > I think this also addresses Michal's concern about a 'big hammer' as > it probably won't ever trigger now. >=20 > Regards, > Jason >=20 > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c > index ef3973a5d34a94..f3aba7a970f576 100644 > --- a/mm/mmu_notifier.c > +++ b/mm/mmu_notifier.c > @@ -37,7 +37,8 @@ struct lockdep_map __mmu_notifier_invalidate_range_star= t_map =3D { > struct mmu_notifier_subscriptions { > /* all mmu notifiers registered in this mm are queued in this list */ > struct hlist_head list; > - bool has_itree; > + u8 has_itree; > + u8 no_blocking; > /* to serialize the list modifications and hlist_unhashed */ > spinlock_t lock; > unsigned long invalidate_seq; > @@ -475,6 +476,10 @@ static int mn_hlist_invalidate_range_start( > int ret =3D 0; > int id; > =20 > + if (unlikely(subscriptions->no_blocking && > + !mmu_notifier_range_blockable(range))) > + return -EAGAIN; > + > id =3D srcu_read_lock(&srcu); > hlist_for_each_entry_rcu(subscription, &subscriptions->list, hlist) { > const struct mmu_notifier_ops *ops =3D subscription->ops; > @@ -590,6 +595,48 @@ void __mmu_notifier_invalidate_range(struct mm_struc= t *mm, > srcu_read_unlock(&srcu, id); > } > =20 > +/* > + * Add a hlist subscription to the list. The list is kept sorted by the > + * existence of ops->invalidate_range_end. If there is more than one > + * invalidate_range_end in the list then this process can no longer supp= ort > + * non-blocking invalidation. > + * > + * non-blocking invalidation is problematic as a requirement to block re= sults in > + * the invalidation being aborted, however due to the use of RCU we have= no > + * reliable way to ensure that every sueessful invalidate_range_start() = results s/sueessful/successful > + * in a call to invalidate_range_end(). > + * > + * Thus to support blocking only the last subscription in the list can h= ave > + * invalidate_range_end() set. > + */ > +static void > +mn_hist_add_subscription(struct mmu_notifier_subscriptions *subscription= s, > + struct mmu_notifier *subscription) We have mn_hlist_xxx in a number of places in mmu_notifier.c. Seems like this should be named mn_hlist_add_subscription(). > +{ > + struct mmu_notifier *last =3D NULL; > + struct mmu_notifier *itr; > + > + hlist_for_each_entry(itr, &subscriptions->list, hlist) > + last =3D itr; > + > + if (last && last->ops->invalidate_range_end && > + subscription->ops->invalidate_range_end) { > + subscriptions->no_blocking =3D true; > + pr_warn_once( > + "%s (%d) created two mmu_notifier's with invalidate_range_end(): %ps = and %ps, non-blocking notifiers disabled\n", line length? > + current->comm, current->pid, > + last->ops->invalidate_range_end, > + subscription->ops->invalidate_range_end); > + } > + if (!last || !last->ops->invalidate_range_end) > + subscriptions->no_blocking =3D false; > + > + if (last && subscription->ops->invalidate_range_end) > + hlist_add_behind_rcu(&subscription->hlist, &last->hlist); > + else > + hlist_add_head_rcu(&subscription->hlist, &subscriptions->list); > +} > + > /* > * Same as mmu_notifier_register but here the caller must hold the mmap= _sem in > * write mode. A NULL mn signals the notifier is being registered for i= tree > @@ -660,8 +707,8 @@ int __mmu_notifier_register(struct mmu_notifier *subs= cription, > subscription->users =3D 1; > =20 > spin_lock(&mm->notifier_subscriptions->lock); > - hlist_add_head_rcu(&subscription->hlist, > - &mm->notifier_subscriptions->list); > + mn_hist_add_subscription(mm->notifier_subscriptions, > + subscription); > spin_unlock(&mm->notifier_subscriptions->lock); > } else > mm->notifier_subscriptions->has_itree =3D true; >=20 Other than some nits, looks good to me so you can add: Reviewed-by: Ralph Campbell