On Wed 2016-12-07 23:34:19, Lino Sanfilippo wrote: > On 07.12.2016 22:43, Lino Sanfilippo wrote: > > Hi Pavel, > > > > On 07.12.2016 22:37, Pavel Machek wrote: > >> On Wed 2016-12-07 21:05:38, Lino Sanfilippo wrote: > >>> The driver uses a private lock for synchronization between the xmit > >>> function and the xmit completion handler, but since the NETIF_F_LLTX flag > >>> is not set, the xmit function is also called with the xmit_lock held. > >>> > >>> On the other hand the xmit completion handler first takes the private lock > >>> and (in case that the tx queue has been stopped) the xmit_lock, leading to > >>> a reverse locking order and the potential danger of a deadlock. > >>> > >>> Fix this by removing the private lock completely and synchronizing the xmit > >>> function and completion handler solely by means of the xmit_lock. By doing > >>> this remove also the now unnecessary double check for a stopped tx queue. > >>> > >> > >> FYI, here's modified version. I believe _bh versions are needed, and > >> I'm testing that version now. (Oh and I also ported it to net-next). > >> > >> It survived 30 minutes of testing so far... > >> > > > > First off, thanks for testing. > > Hmm. I dont understand why _bh would be needed. We call that function from > > BH context only (napi poll and timer). > > Any idea? > > > > Could this once again be caused by irq coalescing? When the tx queue has been stopped > the cleanup handler has to wakeup the queue within a certain time span, otherwise the > watchdog will complain (as it happened in your test). Could you retest this with > irq coalescing disabled? I actually had TX coalescing disabled, with -#define STMMAC_TX_FRAMES 64 +#define STMMAC_TX_FRAMES 0 Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html