From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161438AbdAGRTY (ORCPT ); Sat, 7 Jan 2017 12:19:24 -0500 Received: from zeniv.linux.org.uk ([195.92.253.2]:44758 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756402AbdAGRTN (ORCPT ); Sat, 7 Jan 2017 12:19:13 -0500 Date: Sat, 7 Jan 2017 17:19:10 +0000 From: Al Viro To: Greg Kurz Cc: Tuomas Tynkkynen , linux-fsdevel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-kernel@vger.kernel.org Subject: Re: [V9fs-developer] 9pfs hangs since 4.7 Message-ID: <20170107171910.GJ1555@ZenIV.linux.org.uk> References: <20161124215023.02deb03c@duuni> <20170102102035.7d1cf903@duuni> <20170102162309.GZ1555@ZenIV.linux.org.uk> <20170104013355.4a8923b6@duuni> <20170104014753.GE1555@ZenIV.linux.org.uk> <20170104220447.74f2265d@duuni> <20170104230101.GG1555@ZenIV.linux.org.uk> <20170106145235.51630baf@bahia.lan> <20170107062647.GB12074@ZenIV.linux.org.uk> <20170107161045.742893b1@bahia.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170107161045.742893b1@bahia.lan> User-Agent: Mutt/1.7.1 (2016-10-04) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Jan 07, 2017 at 04:10:45PM +0100, Greg Kurz wrote: > > virtqueue_push(), but pdu freeing is delayed until v9fs_flush() gets woken > > up. In the meanwhile, another request arrives into the slot of freed by > > that virtqueue_push() and we are out of pdus. > > > > Indeed. Even if this doesn't seem to be the problem here, I guess this should > be fixed. FWIW, there's something that looks like an off-by-one in v9fs_device_realize_common(): /* initialize pdu allocator */ QLIST_INIT(&s->free_list); QLIST_INIT(&s->active_list); for (i = 0; i < (MAX_REQ - 1); i++) { QLIST_INSERT_HEAD(&s->free_list, &s->pdus[i], next); s->pdus[i].s = s; s->pdus[i].idx = i; } Had been there since the original merge of 9p support into qemu - that code had moved around a bit, but it had never inserted s->pdus[MAX_REQ - 1] into free list. So your scenario with failing pdu_alloc() is still possible. In that log the total amount of pending requests has reached 128 for the first time right when the requests had stopped being handled and even though it had dropped below that shortly after, extra requests being put into queue had not been processed at all... I'm not familiar with qemu guts enough to tell if that's a plausible scenario, though... shouldn't subsequent queue insertions (after enough slots had been released) simply trigger virtio_queue_notify_vq() again? It *is* a bug (if we get a burst filling a previously empty queue all at once, there won't be any slots becoming freed), but that's obviously not the case here - slots were getting freed, after all.