From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 409E2C433DB for ; Thu, 11 Mar 2021 07:23:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D08D864FAA for ; Thu, 11 Mar 2021 07:23:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231697AbhCKHWr (ORCPT ); Thu, 11 Mar 2021 02:22:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52386 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231631AbhCKHWS (ORCPT ); Thu, 11 Mar 2021 02:22:18 -0500 Received: from mail-pj1-x1034.google.com (mail-pj1-x1034.google.com [IPv6:2607:f8b0:4864:20::1034]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5DD51C061574 for ; Wed, 10 Mar 2021 23:22:18 -0800 (PST) Received: by mail-pj1-x1034.google.com with SMTP id kk2-20020a17090b4a02b02900c777aa746fso8897407pjb.3 for ; Wed, 10 Mar 2021 23:22:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=MNNXBjyFBBg+ZOfSSdEcuVG7CkN6sHcDa7incDvsMj4=; b=inVEDgwGBQio+eYnQwGOqgiOqLZoTLz4zOJCGFyjXY495PngbnvjoG5nvIXKkBZu5U ZM3EEFoO2Wm1/SKDjIpEO3+fu0jqkTrmR7uScfS3em12PrLAxgEo9JpO5ygJhByGGBK1 inv7JU2UK+XHhkLzLGiecN7nB4oGkKd+ym7U5I3FMk2jeT3tc3+YXqazPfUfFs6Qduzu IO/2bo+7+Yhf45fGyKH37deQFwIJfLfWDPrcVvsVpfAcwSQx1FgCVzE3VAp90QTkMqWj /AUF2A3CLVqKhMFy6tvOthjtWj7zM5ufKpzlmBErsztKzb7DEHWkFNFFm89r2qxEb8P4 r+sQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=MNNXBjyFBBg+ZOfSSdEcuVG7CkN6sHcDa7incDvsMj4=; b=CRMyeNa91XvyR9JD6RQP3WlCtIgubVAN9iK3EyflPx9TTupZwA76sphHfF5auq3v6d 3vAMx7efA1tNg7ELGMcucjmCYgcUBIpaLXb9zhqs7WiGYsEyC6RCvnWycqPGwRWAq3Qf +gKZnC6E9NSnBreKDrmPJJVw0B61Mu3JMJrBx+9geIWGCnn4rdr/dqKPWTlZdG/tUu37 UcWIje0CP7ncAbcDh2gIlPxRUnj+6k4KPdXNnS2/pYDgEIvPthzzfGkIPQKQnqc/92eR 8tq8TfmhUG4KeyroxH/MWdSb7bTliHrNON5yyv2SZupHtTOW/jyErjqwP+Vh4RaOqCrS jh4w== X-Gm-Message-State: AOAM533bdStre01gOkJ5q3WRQsI7b8CVxEWojpoq1Mf63Y3M6o5JFF/H ElKeYRlti4bT4zPrnpmj+W9JZftMoJmXOg== X-Google-Smtp-Source: ABdhPJyUmk5L9/MKNyiCY8rn2invnJXnWqFXJPq+frVezy84HP9fOnE+eF3+tuNv8H+q3Ei+qXw8cw== X-Received: by 2002:a17:902:8f8c:b029:e3:7e6c:36fa with SMTP id z12-20020a1709028f8cb02900e37e6c36famr6934282plo.77.1615447337704; Wed, 10 Mar 2021 23:22:17 -0800 (PST) Received: from google.com ([2620:15c:f:10:c4e3:28f1:33df:26ba]) by smtp.gmail.com with ESMTPSA id y8sm1269919pjb.11.2021.03.10.23.22.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 10 Mar 2021 23:22:16 -0800 (PST) Date: Wed, 10 Mar 2021 23:22:09 -0800 From: Sean Christopherson To: Jason Gunthorpe Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, David Rientjes , Ben Gardon , Michal Hocko , =?iso-8859-1?B?Suly9G1l?= Glisse , Andrea Arcangeli , Johannes Weiner , Dimitri Sivanich Subject: Re: [PATCH] mm/oom_kill: Ensure MMU notifier range_end() is paired with range_start() Message-ID: References: <20210310213117.1444147-1-seanjc@google.com> <20210311002807.GQ444867@ziepe.ca> <20210311015013.GS444867@ziepe.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210311015013.GS444867@ziepe.ca> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 10, 2021, Jason Gunthorpe wrote: > On Wed, Mar 10, 2021 at 05:20:01PM -0800, Sean Christopherson wrote: > > > > Which I believe is fatal to kvm? These notifiers certainly do not only > > > happen at process exit. > > > > My point about the process dying is that the existing bug that causes > > mmu_notifier_count to become imbalanced is benign only because the process is > > being killed, and thus KVM will stop running its vCPUs. > > Are you saying we only call non-blocking invalidate during a process > exit event?? Yes? __oom_reap_task_mm() is the only user of _nonblock(), if that's what you're asking. > > > So, both of the remaining _end users become corrupted with this patch! > > > > I don't follow. mn_hlist_invalidate_range_start() iterates over all > > notifiers, even if a notifier earlier in the chain failed. How will > > KVM become imbalanced? > > Er, ok, that got left in a weird way. There is another "bug" where end > is not supposed to be called if the start failed. Ha, the best kind of feature. :-) > > The existing _end users never fail their _start. If KVM started failing its > > start, then yes, it could get corrupted. > > Well, maybe that is the way out of this now. If we don't permit a > start to fail if there is an end then we have no problem to unwind it > as we can continue to call everything. This can't be backported too > far though, the itree notifier conversions are what made the WARN_ON > safe today. > > Something very approximately like this is closer to my preference: Makes sense. I don't have a strong preference, I'll give this a spin tomorrow. > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c > index 61ee40ed804ee5..6d5cd20f81dadc 100644 > --- a/mm/mmu_notifier.c > +++ b/mm/mmu_notifier.c > @@ -501,10 +501,25 @@ static int mn_hlist_invalidate_range_start( > ""); > WARN_ON(mmu_notifier_range_blockable(range) || > _ret != -EAGAIN); > + /* > + * We call all the notifiers on any EAGAIN, > + * there is no way for a notifier to know if > + * its start method failed, thus a start that > + * does EAGAIN can't also do end. > + */ > + WARN_ON(ops->invalidate_range_end); > ret = _ret; > } > } > } > + > + if (ret) { > + /* Must be non-blocking to get here*/ > + hlist_for_each_entry_rcu (subscription, &subscriptions->list, > + hlist, srcu_read_lock_held(&srcu)) > + subscription->ops->invalidate_range_end(subscription, > + range); > + } > srcu_read_unlock(&srcu, id); > > return ret;