From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1164204AbdAISjo (ORCPT ); Mon, 9 Jan 2017 13:39:44 -0500 Received: from mx2.mpynet.fi ([82.197.21.85]:19175 "EHLO mx2.mpynet.fi" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1163156AbdAISjg (ORCPT ); Mon, 9 Jan 2017 13:39:36 -0500 Date: Mon, 9 Jan 2017 20:39:31 +0200 From: Tuomas Tynkkynen To: Al Viro CC: Greg Kurz , , , Subject: Re: [V9fs-developer] 9pfs hangs since 4.7 Message-ID: <20170109203931.2315e1cd@duuni> In-Reply-To: <20170107171910.GJ1555@ZenIV.linux.org.uk> References: <20161124215023.02deb03c@duuni> <20170102102035.7d1cf903@duuni> <20170102162309.GZ1555@ZenIV.linux.org.uk> <20170104013355.4a8923b6@duuni> <20170104014753.GE1555@ZenIV.linux.org.uk> <20170104220447.74f2265d@duuni> <20170104230101.GG1555@ZenIV.linux.org.uk> <20170106145235.51630baf@bahia.lan> <20170107062647.GB12074@ZenIV.linux.org.uk> <20170107161045.742893b1@bahia.lan> <20170107171910.GJ1555@ZenIV.linux.org.uk> X-Mailer: Claws Mail 3.14.1 (GTK+ 2.24.31; x86_64-unknown-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-ClientProxiedBy: tuxera-exch.ad.tuxera.com (10.20.48.11) To tuxera-exch.ad.tuxera.com (10.20.48.11) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, 7 Jan 2017 17:19:10 +0000 Al Viro wrote: > On Sat, Jan 07, 2017 at 04:10:45PM +0100, Greg Kurz wrote: > > > virtqueue_push(), but pdu freeing is delayed until v9fs_flush() gets woken > > > up. In the meanwhile, another request arrives into the slot of freed by > > > that virtqueue_push() and we are out of pdus. > > > > > > > Indeed. Even if this doesn't seem to be the problem here, I guess this should > > be fixed. > > FWIW, there's something that looks like an off-by-one in > v9fs_device_realize_common(): > /* initialize pdu allocator */ > QLIST_INIT(&s->free_list); > QLIST_INIT(&s->active_list); > for (i = 0; i < (MAX_REQ - 1); i++) { > QLIST_INSERT_HEAD(&s->free_list, &s->pdus[i], next); > s->pdus[i].s = s; > s->pdus[i].idx = i; > } > > Had been there since the original merge of 9p support into qemu - that code > had moved around a bit, but it had never inserted s->pdus[MAX_REQ - 1] into > free list. So your scenario with failing pdu_alloc() is still possible. > In that log the total amount of pending requests has reached 128 for the > first time right when the requests had stopped being handled and even > though it had dropped below that shortly after, extra requests being put > into queue had not been processed at all... > Yes, this does seem to be related to this or otherwise MAX_REQ related! - Bumping MAX_REQ up to 1024 makes the hang go away (on 4.7). - Dropping it to 64 makes the same hang happen on kernels where it worked before (I tried 4.4.x). - Doing s/(MAX_REQ - 1)/MAX_REQ/ makes the hang go away. I tested QEMU 2.8.0 as well and the behaviour is the same there. Here are the logs for 4.4 hanging with MAX_REQ == 64 without the loop condition changed: https://gist.githubusercontent.com/dezgeg/b5f0b7f8a0f3d8b6acb1566d7edcb2f0/raw/00241426890ea28d844986243c3b706881432fb4/9p-44.dmesg.log https://gist.githubusercontent.com/dezgeg/2bffe6c0271c4c9c382ac6363ce1864b/raw/92329eaef38305f82090de5dde3c944561afa372/9p-44.qemu.log