From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BFA98C433FE for ; Tue, 22 Mar 2022 12:32:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234877AbiCVMde (ORCPT ); Tue, 22 Mar 2022 08:33:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39324 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233214AbiCVMdd (ORCPT ); Tue, 22 Mar 2022 08:33:33 -0400 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 783226D867; Tue, 22 Mar 2022 05:32:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1647952325; x=1679488325; h=date:from:to:cc:subject:in-reply-to:message-id: references:mime-version; bh=whvh601WI9v3GtquiQCtN3tCwEdYSA9tQKGmcV2TCX4=; b=jCs+fqCpJ57jrbOsdyJCHXDRvZRjcp+wEVD44wt6svV+GvmnX8OOp292 CFGoKLT7ynDQqsWHqN68qMaT7wtHnfixOzMQECsM6Km9wpRwT7bXpDZZG lXSKRvY1cFI9whicqPkAvxoqUFHVFwkDFVDW9p2fppl+s6/No/ms/jrWV e+x/yTRdBed5pJneG6anCm9cO08ACvO7E5/7U9yhpVgktUkEM5p1bGujR wFRIxkbxn2OXEL7OWi+yjtDD8840dGycJItIcxXlbnt73rDYjJa/8V9e+ BHomS+yczuDQiItd7cTg9G56o2xZc3dUPJ8vcygLr6V3ukIXgDprzWIuJ Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10293"; a="245279558" X-IronPort-AV: E=Sophos;i="5.90,201,1643702400"; d="scan'208";a="245279558" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Mar 2022 05:32:05 -0700 X-IronPort-AV: E=Sophos;i="5.90,201,1643702400"; d="scan'208";a="560365733" Received: from dcolomor-mobl1.ger.corp.intel.com ([10.252.55.151]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Mar 2022 05:31:59 -0700 Date: Tue, 22 Mar 2022 14:31:44 +0200 (EET) From: =?ISO-8859-15?Q?Ilpo_J=E4rvinen?= To: "Martinez, Ricardo" cc: Netdev , linux-wireless@vger.kernel.org, kuba@kernel.org, davem@davemloft.net, johannes@sipsolutions.net, ryazanov.s.a@gmail.com, loic.poulain@linaro.org, m.chetan.kumar@intel.com, chandrashekar.devegowda@intel.com, linuxwwan@intel.com, chiranjeevi.rapolu@linux.intel.com, haijun.liu@mediatek.com, amir.hanania@intel.com, Andy Shevchenko , dinesh.sharma@intel.com, eliot.lee@intel.com, moises.veleta@intel.com, pierre-louis.bossart@intel.com, muralidharan.sethuraman@intel.com, Soumya.Prakash.Mishra@intel.com, sreehari.kancharla@intel.com, madhusmita.sahu@intel.com Subject: Re: [PATCH net-next v5 12/13] net: wwan: t7xx: Device deep sleep lock/unlock In-Reply-To: Message-ID: References: <20220223223326.28021-1-ricardo.martinez@linux.intel.com> <20220223223326.28021-13-ricardo.martinez@linux.intel.com> <1aca9e1f-8b6b-d3e2-d3ff-1bf37abe63f5@linux.intel.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="8323329-2107175543-1647952324=:1722" Precedence: bulk List-ID: X-Mailing-List: linux-wireless@vger.kernel.org This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --8323329-2107175543-1647952324=:1722 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT On Fri, 18 Mar 2022, Martinez, Ricardo wrote: > > On 3/10/2022 2:21 AM, Ilpo Järvinen wrote: > > On Wed, 23 Feb 2022, Ricardo Martinez wrote: > > > > > From: Haijun Liu > > > > > > Introduce the mechanism to lock/unlock the device 'deep sleep' mode. > > > When the PCIe link state is L1.2 or L2, the host side still can keep > > > the device is in D0 state from the host side point of view. At the same > > > time, if the device's 'deep sleep' mode is unlocked, the device will > > > go to 'deep sleep' while it is still in D0 state on the host side. > > > > > > Signed-off-by: Haijun Liu > > > Signed-off-by: Chandrashekar Devegowda > > > Co-developed-by: Ricardo Martinez > > > Signed-off-by: Ricardo Martinez > > > --- > ... > > > +int t7xx_pci_sleep_disable_complete(struct t7xx_pci_dev *t7xx_dev) > > > +{ > > > + struct device *dev = &t7xx_dev->pdev->dev; > > > + int ret; > > > + > > > + ret = wait_for_completion_timeout(&t7xx_dev->sleep_lock_acquire, > > > + > > > msecs_to_jiffies(PM_SLEEP_DIS_TIMEOUT_MS)); > > > + if (!ret) > > > + dev_err_ratelimited(dev, "Resource wait complete timed > > > out\n"); > > > + > > > + return ret; > > > +} > > > + > > > +/** > > > + * t7xx_pci_disable_sleep() - Disable deep sleep capability. > > > + * @t7xx_dev: MTK device. > > > + * > > > + * Lock the deep sleep capability, note that the device can still go into > > > deep sleep > > > + * state while device is in D0 state, from the host's point-of-view. > > > + * > > > + * If device is in deep sleep state, wake up the device and disable deep > > > sleep capability. > > > + */ > > > +void t7xx_pci_disable_sleep(struct t7xx_pci_dev *t7xx_dev) > > > +{ > > > + unsigned long flags; > > > + > > > + if (atomic_read(&t7xx_dev->md_pm_state) < MTK_PM_RESUMED) { > > > + atomic_inc(&t7xx_dev->sleep_disable_count); > > > + complete_all(&t7xx_dev->sleep_lock_acquire); > > > + return; > > > + } > > > + > > > + spin_lock_irqsave(&t7xx_dev->md_pm_lock, flags); > > > + if (atomic_inc_return(&t7xx_dev->sleep_disable_count) == 1) { > > > + u32 deep_sleep_enabled; > > > + > > > + reinit_completion(&t7xx_dev->sleep_lock_acquire); > > You might want to check that there's a mechanism that prevents this > > racing with wait_for_completion_timeout() in > > t7xx_pci_sleep_disable_complete(). > > > > I couldn't prove it myself but there are probably aspect in the PM side of > > things I wasn't able to take fully into account (that is, which call > > paths are not possible to occur). > Those functions are called in the following order: > 1.- t7xx_pci_disable_sleep() > 2.- t7xx_pci_sleep_disable_complete() > 3.- t7xx_pci_enable_sleep() That sequence gets called from 5 places: - t7xx_cldma_send_skb - t7xx_dpmaif_rxq_work - t7xx_dpmaif_bat_release_work - t7xx_dpmaif_tx_done - t7xx_dpmaif_tx_hw_push_thread + t7xx_do_tx_hw_push Which of those can run parallel to each other, I'm not sure of. But they can, the race is likely there between those "instances" of the sequence, one instance doing reinit_completion() and the other wait_for_completion_timeout(). > That sequence and md_pm_lock protect against a race condition between > wait_for_completion_timeout() and  reinit_completion(). wait_for_completion_timeout() is not protected by md_pm_lock. There is a path with return in t7xx_pci_disable_sleep() before taking md_pm_lock. > On the other hand, there could be a race condition between > t7xx_pci_disable_sleep() and t7xx_pci_enable_sleep() which may cause sleep to > get enabled while one thread expects it to be disabled. ...And once sleep gets enabled, this can get true, no? if (atomic_read(&t7xx_dev->md_pm_state) < MTK_PM_RESUMED) { ...after which there's nothing which protects wait_for_completion_timeout() from racing with another instance of the sequence which has not yet executed reinit_completion()? I think you found the very race which I was worried about. :-) > The fix would be to protect sleep_disable_count with md_pm_lock, then > sleep_disable_count doesn't need to be declared as atomic. > The next version will include cleanup in this area. Ok. I'll take a look once you post the next version. -- i. --8323329-2107175543-1647952324=:1722--