From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932802AbcLHWqE (ORCPT ); Thu, 8 Dec 2016 17:46:04 -0500 Received: from mout.gmx.net ([212.227.15.15]:60829 "EHLO mout.gmx.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752295AbcLHWqD (ORCPT ); Thu, 8 Dec 2016 17:46:03 -0500 Subject: Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock To: Pavel Machek References: <1481141138-19466-1-git-send-email-LinoSanfilippo@gmx.de> <1481141138-19466-2-git-send-email-LinoSanfilippo@gmx.de> <20161207231534.GB5889@electric-eye.fr.zoreil.com> <051e3043-8b58-0591-36e3-99e2267f67f4@gmx.de> <20161208215409.GA12472@amd> <9b55b51c-bbbf-7f80-fb67-9df88a288708@gmx.de> <20161208221830.GD12472@amd> Cc: Francois Romieu , bh74.an@samsung.com, ks.giri@samsung.com, vipul.pandya@samsung.com, peppe.cavallaro@st.com, alexandre.torgue@st.com, davem@davemloft.net, linux-kernel@vger.kernel.org, netdev@vger.kernel.org From: Lino Sanfilippo Message-ID: <32369623-8375-e179-4edd-1abd23a378e5@gmx.de> Date: Thu, 8 Dec 2016 23:45:05 +0100 User-Agent: Mozilla/5.0 (X11; Linux i686; rv:45.0) Gecko/20100101 Thunderbird/45.5.1 MIME-Version: 1.0 In-Reply-To: <20161208221830.GD12472@amd> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit X-Provags-ID: V03:K0:Rb7R4iYoGQi3ICTIGMa1QoIWA6waHMpiBw68RzFUjRDr3O8CFB9 m16MjRejueJuw96WIKNegRIAnsUZv6DbKctl/Mvo3gJAbtiP/ynnOb+Zy7jxswETSyQ6mMC SfdlN8+2AjTeXl7aOvw8yuHTU7PDsz6RxWxpqpPUN/IezT92Ce/8KvsRnfgLqinDGLrZRsu 1VV/NXyKW/JxCqXH7zctw== X-UI-Out-Filterresults: notjunk:1;V01:K0:bXZii8Papsk=:V/DDcosCkdxVEx/1O13SXN jFmJB6/qxUxU3SXIBK5yQZJc5CSU2W0pC14ciSAtDZCyjIcoi5dUXRc+i81W/5tqGGkU9Zlj3 zj7hgt0CbNQxP24F/zTq2ssqbeqF8sUP9XcC52Vp3URfQ4Gy2I5FCMaUGm2y+jj38HPOIsx9a HLvRMUAID6tzK4ZwdpfDGRcxmtw1qVx+rRlhUcSr/1GhZ0czCzU25pNA29hx+VwX21JC4GNZ6 /gGcVixZ3/bOCwxgSoHcEKirsicaUAQ9wHtO+eu17/4+hvZz1PoB4xhppNwarBiJqTQ4kMnqR OLyndVthky/+OTEXmmpy7FOWduRtOEv0/kU/r5PkRNWjUmXPsCcw6sZP7vLHoezaJoQcUbMT3 IamkKDtRyZGE4Jwsccux4AzrcttnBei198vz4dDj4fpO7rK/Kabh2v1UYYJWV3E2XGhYH1esE 8zTqI442vErdWQm0iOzcBavH/SB5rmIXmEky0rMbakDA7TlPDojdnJEQSM5FJkeKjqizzj1jB E9FNSBww1JYwmZtTgmoTBSBJVBYyDLBcEUXJu/ASblqpxBPJQP+pwBAULRnWzIbQGcJJ4L4NZ IKblc311HAC3onaGlqCxnoWLiV8wB81MrCVgP1TsXngBa52atZNRVqLLXD7ema1ijrVCKpfSj lD+O/AbHnU5uCrWNNsdeCJMdBZeDw1cz7n7tNwZ0O6g0r+BJ2s9fVoR3kIagjcxNNV2E0kOfp uo9slIgEtBDxxNDrSiliNvGraUQ/1648g3Yz7KQHfERkhZnPjqZbHt32wyDh2UQsFTd9Agjya LKFKsDg Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 08.12.2016 23:18, Pavel Machek wrote: > On Thu 2016-12-08 23:12:10, Lino Sanfilippo wrote: >> Hi, >> >> On 08.12.2016 22:54, Pavel Machek wrote: >> > On Thu 2016-12-08 21:32:12, Lino Sanfilippo wrote: >> >> Hi, >> >> >> >> On 08.12.2016 00:15, Francois Romieu wrote: >> >> > Lino Sanfilippo : >> >> >> The driver uses a private lock for synchronization between the xmit >> >> >> function and the xmit completion handler, but since the NETIF_F_LLTX flag >> >> >> is not set, the xmit function is also called with the xmit_lock held. >> >> >> >> >> >> On the other hand the xmit completion handler first takes the private lock >> >> >> and (in case that the tx queue has been stopped) the xmit_lock, leading >> >> >> to a reverse locking order and the potential danger of a deadlock. >> >> > >> >> > netif_tx_stop_queue is used by: >> >> > 1. xmit function before releasing lock and returning. >> >> > 2. sxgbe_restart_tx_queue() >> >> > <- sxgbe_tx_interrupt >> >> > <- sxgbe_reset_all_tx_queues() >> >> > <- sxgbe_tx_timeout() >> >> > >> >> > Given xmit won't be called again until tx queue is enabled, it's not clear >> >> > how a deadlock could happen due to #1. >> >> > >> >> >> >> >> >> After spending more thoughts on this I tend to agree with you. Yes, we have the >> >> different locking order for the xmit_lock and the private lock in two concurrent >> >> threads. And one of the first things one learns about locking is that this is a >> >> good way to create a deadlock sooner or later. But in our case the deadlock >> >> can only occur if the xmit function and the tx completion handler perceive different >> >> states for the tx queue, or to be more specific: >> >> the completion handler sees the tx queue in state "stopped" while the xmit handler >> >> sees it in state "running" at the same time. Only then both functions would try to >> >> take both locks, which could lead to a deadlock. >> >> >> >> OTOH Pavel said that he actually could produce a deadlock. Now I wonder if this is caused >> >> by that locking scheme (in a way I have not figured out yet) or if it is a different issue. >> > >> > Pavel has some problems, but that's on different hardware.. and it is >> > possible that it is deadlock (or something else) somewhere else. >> > >> >> Right, it is different hardware. But the locking situation in xmit function and tx completion handler >> is very similar in both drivers. So if a deadlock is not possible in sxgbe it should >> also not be possible in stmmac (at least not due to the different locking order). >> So maybe there is no real issue that we could fix with removing the private lock and we should >> keep it as it is. > > Well.. the locking is pretty confused there. Having private lock that > mirrors lock from network layer is confusing and ugly... that should > be reason to fix it. > Pavel > Ok. Then I will resend the patches for both drivers with a different (less dramatic) commit message in which the change is not longer described as a fix for deadlock but rather as a code cleanup/improvement, ok? Regards, Lino