From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_2 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 55F19C43603 for ; Fri, 6 Dec 2019 07:53:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3142F2467B for ; Fri, 6 Dec 2019 07:53:33 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726509AbfLFHxc (ORCPT ); Fri, 6 Dec 2019 02:53:32 -0500 Received: from bhuna.collabora.co.uk ([46.235.227.227]:37260 "EHLO bhuna.collabora.co.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726184AbfLFHxc (ORCPT ); Fri, 6 Dec 2019 02:53:32 -0500 Received: from localhost (unknown [IPv6:2a01:e0a:2c:6930:5cf4:84a1:2763:fe0d]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: bbrezillon) by bhuna.collabora.co.uk (Postfix) with ESMTPSA id 73DD229256E; Fri, 6 Dec 2019 07:53:30 +0000 (GMT) Date: Fri, 6 Dec 2019 08:53:27 +0100 From: Boris Brezillon To: Rob Herring Cc: Steven Price , Tomeu Vizoso , Alyssa Rosenzweig , stable , dri-devel Subject: Re: [PATCH 2/8] drm/panfrost: Fix a race in panfrost_ioctl_madvise() Message-ID: <20191206085327.66a8c479@collabora.com> In-Reply-To: References: <20191129135908.2439529-1-boris.brezillon@collabora.com> <20191129135908.2439529-3-boris.brezillon@collabora.com> <20191129153310.2f9c80e1@collabora.com> Organization: Collabora X-Mailer: Claws Mail 3.17.4 (GTK+ 2.24.32; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: stable-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org On Thu, 5 Dec 2019 17:08:02 -0600 Rob Herring wrote: > On Fri, Nov 29, 2019 at 8:33 AM Boris Brezillon > wrote: > > > > On Fri, 29 Nov 2019 14:24:48 +0000 > > Steven Price wrote: > > > > > On 29/11/2019 13:59, Boris Brezillon wrote: > > > > If 2 threads change the MADVISE property of the same BO in parallel we > > > > might end up with an shmem->madv value that's inconsistent with the > > > > presence of the BO in the shrinker list. > > > > > > I'm a bit worried from the point of view of user space sanity that you > > > observed this - but clearly the kernel should be robust! > > > > It's not something I observed, just found the race by inspecting the > > code, and I thought it was worth fixing it. > > I'm not so sure there's a race. I'm pretty sure there's one: T0 T1 lock(pages) madv = 1 unlock(pages) lock(pages) madv = 0 unlock(pages) lock(shrinker) remove_from_list(bo) unlock(shrinker) lock(shrinker) add_to_list(bo) unlock(shrinker) You end up with madv = 0 and the BO is added to the list. > If there is, we still check madv value > when purging, so it would be harmless even if the state is > inconsistent. Indeed. Note that you could also have this other situation where the BO is marked purgeable but not present in the list. In that case it will never be purged, but it's kinda user space fault anyway. I agree, none of this problems are critical, and I'm fine leaving it unfixed as long as it's documented somewhere that the race exist and is harmless. > > > > > The easiest solution to fix that is to protect the > > > > drm_gem_shmem_madvise() call with the shrinker lock. > > > > > > > > Fixes: 013b65101315 ("drm/panfrost: Add madvise and shrinker support") > > > > Cc: > > > > Signed-off-by: Boris Brezillon > > > > > > Reviewed-by: Steven Price > > > > Thanks. > > > > > > > > > --- > > > > drivers/gpu/drm/panfrost/panfrost_drv.c | 9 ++++----- > > > > 1 file changed, 4 insertions(+), 5 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c > > > > index f21bc8a7ee3a..efc0a24d1f4c 100644 > > > > --- a/drivers/gpu/drm/panfrost/panfrost_drv.c > > > > +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c > > > > @@ -347,20 +347,19 @@ static int panfrost_ioctl_madvise(struct drm_device *dev, void *data, > > > > return -ENOENT; > > > > } > > > > > > > > + mutex_lock(&pfdev->shrinker_lock); > > > > args->retained = drm_gem_shmem_madvise(gem_obj, args->madv); > > This means we now hold the shrinker_lock while we take the pages_lock. > Is lockdep happy with this change? I suspect not given all the fun I > had getting lockdep happy. I have tested with lockdep enabled and it's all good from lockdep PoV because the locks are taken in the same order in the madvise() and schinker_scan() path (first the shrinker lock, then the pages lock). Note that patch 7 introduces a deadlock in the shrinker path, but this is unrelated to this shrinker lock being taken earlier in madvise (drm_gem_put_pages() is called while the pages lock is already held).