From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1161438AbdAGRTY (ORCPT <rfc822;w@1wt.eu>);
        Sat, 7 Jan 2017 12:19:24 -0500
Received: from zeniv.linux.org.uk ([195.92.253.2]:44758 "EHLO
        ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1756402AbdAGRTN (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Sat, 7 Jan 2017 12:19:13 -0500
Date: Sat, 7 Jan 2017 17:19:10 +0000
From: Al Viro <viro@ZenIV.linux.org.uk>
To: Greg Kurz <groug@kaod.org>
Cc: Tuomas Tynkkynen <tuomas@tuxera.com>, linux-fsdevel@vger.kernel.org,
        v9fs-developer@lists.sourceforge.net, linux-kernel@vger.kernel.org
Subject: Re: [V9fs-developer] 9pfs hangs since 4.7
Message-ID: <20170107171910.GJ1555@ZenIV.linux.org.uk>
References: <20161124215023.02deb03c@duuni>
 <20170102102035.7d1cf903@duuni>
 <20170102162309.GZ1555@ZenIV.linux.org.uk>
 <20170104013355.4a8923b6@duuni>
 <20170104014753.GE1555@ZenIV.linux.org.uk>
 <20170104220447.74f2265d@duuni>
 <20170104230101.GG1555@ZenIV.linux.org.uk>
 <20170106145235.51630baf@bahia.lan>
 <20170107062647.GB12074@ZenIV.linux.org.uk>
 <20170107161045.742893b1@bahia.lan>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20170107161045.742893b1@bahia.lan>
User-Agent: Mutt/1.7.1 (2016-10-04)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Sat, Jan 07, 2017 at 04:10:45PM +0100, Greg Kurz wrote:
> > virtqueue_push(), but pdu freeing is delayed until v9fs_flush() gets woken
> > up.  In the meanwhile, another request arrives into the slot of freed by
> > that virtqueue_push() and we are out of pdus.
> > 
> 
> Indeed. Even if this doesn't seem to be the problem here, I guess this should
> be fixed.

	FWIW, there's something that looks like an off-by-one in
v9fs_device_realize_common():
    /* initialize pdu allocator */
    QLIST_INIT(&s->free_list);
    QLIST_INIT(&s->active_list);
    for (i = 0; i < (MAX_REQ - 1); i++) {
        QLIST_INSERT_HEAD(&s->free_list, &s->pdus[i], next);
        s->pdus[i].s = s;
        s->pdus[i].idx = i;
    }

Had been there since the original merge of 9p support into qemu - that code
had moved around a bit, but it had never inserted s->pdus[MAX_REQ - 1] into
free list.  So your scenario with failing pdu_alloc() is still possible.
In that log the total amount of pending requests has reached 128 for the
first time right when the requests had stopped being handled and even
though it had dropped below that shortly after, extra requests being put
into queue had not been processed at all...

I'm not familiar with qemu guts enough to tell if that's a plausible scenario,
though... shouldn't subsequent queue insertions (after enough slots had been
released) simply trigger virtio_queue_notify_vq() again?  It *is* a bug
(if we get a burst filling a previously empty queue all at once, there won't
be any slots becoming freed), but that's obviously not the case here -
slots were getting freed, after all.