From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752400AbcHBJQi (ORCPT ); Tue, 2 Aug 2016 05:16:38 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:20950 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751733AbcHBJPz (ORCPT ); Tue, 2 Aug 2016 05:15:55 -0400 X-IBM-Helo: d06dlp02.portsmouth.uk.ibm.com X-IBM-MailFrom: cornelia.huck@de.ibm.com X-IBM-RcptTo: linux-kernel@vger.kernel.org Date: Tue, 2 Aug 2016 11:03:21 +0200 From: Cornelia Huck To: Vegard Nossum Cc: Eric Van Hensbergen , "Michael S. Tsirkin" , "Aneesh Kumar K.V" , v9fs-developer@lists.sourceforge.net, LKML Subject: Re: Hang in 9p/virtio In-Reply-To: <579D1F3A.7020806@oracle.com> References: <579D1F3A.7020806@oracle.com> Organization: IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz =?UTF-8?B?R2VzY2jDpGZ0c2bDvGhydW5nOg==?= Dirk Wittkopp Sitz der Gesellschaft: =?UTF-8?B?QsO2Ymxpbmdlbg==?= Registergericht: Amtsgericht Stuttgart, HRB 243294 X-Mailer: Claws Mail 3.11.1 (GTK+ 2.24.23; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16080209-0012-0000-0000-0000043E0CB7 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16080209-0013-0000-0000-000014E2D748 Message-Id: <20160802110321.1fc7369f.cornelia.huck@de.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-08-02_07:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1604210000 definitions=main-1608020091 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, 30 Jul 2016 23:42:18 +0200 Vegard Nossum wrote: > Hi, > > With fault injection triggering an allocation failure for the > alloc_indirect() call in virtqueue_add() I'm seeing a hang in > p9_virtio_zc_request() -- it seems to be waiting here indefinitely > (i.e. at least 120 seconds): > > err = wait_event_interruptible(*req->wq, > req->status >= REQ_STATUS_RCVD); > > Maybe somebody who is already familiar with the could would have a look? > > Stack trace for the memory allocation failure: > > CPU: 2 PID: 3877 Comm: trinity-c2 Not tainted 4.7.0+ #70 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > Ubuntu-1.8.2-1ubuntu1 04/01/2014 > ffffffff84354a78 ffff88010594f2e8 ffffffff81d72f91 ffffffff84354a60 > 1ffff10020b29e62 ffff88010594f398 ffffffff81e07df7 00007faad2003fff > 0000000000000064 ffffffffffffffff 0000000041b58ab3 ffffffff840a481c > Call Trace: > [...] > [] __kmalloc+0x66/0x2e0 > [] alloc_indirect.isra.8+0x24/0xa0 > [] virtqueue_add_sgs+0x41f/0xc90 > [] p9_virtio_zc_request+0x531/0xdb0 > [] p9_client_zc_rpc.constprop.14+0x23f/0xe80 > [] p9_client_read+0x4bc/0x8d0 > [] v9fs_file_read_iter+0xd3/0x190 > [] do_iter_readv_writev+0x212/0x490 > [] do_readv_writev+0x359/0x660 > [] vfs_readv+0x67/0xa0 > [] do_readv+0xd8/0x270 > > Stack trace for the stuck call: > > NMI backtrace for cpu 2 > CPU: 2 PID: 3877 Comm: trinity-c2 Not tainted 4.7.0+ #70 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > Ubuntu-1.8.2-1ubuntu1 04/01/2014 > task: ffff8801174f5b00 task.stack: ffff880105948000 > RIP: 0010:[] [] > __default_send_IPI_dest_field+0xe0/0x130 > Call Trace: > [...] > [] prepare_to_wait_event+0x19e/0x410 > [] p9_virtio_zc_request+0xa40/0xdb0 > [] p9_client_zc_rpc.constprop.14+0x23f/0xe80 > [] p9_client_read+0x4bc/0x8d0 > [] v9fs_file_read_iter+0xd3/0x190 > [] do_iter_readv_writev+0x212/0x490 > [] do_readv_writev+0x359/0x660 > [] vfs_readv+0x67/0xa0 > [] do_readv+0xd8/0x270 What happens is that the code falls back to direct virtio addressing (after indirect addressing failed) - and this should work. I'm more inclined to suspect a qemu instead of a kernel bug, as your qemu version is quite old and there have been fixes in the virtio buffer handling and virtio-9p in the meantime. (I'm suspecting "virtio-9p: fix any_layout".) Could you retry with a more recent qemu (at least version 2.4)?