From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966529AbbBCSZ1 (ORCPT ); Tue, 3 Feb 2015 13:25:27 -0500 Received: from mail-vc0-f180.google.com ([209.85.220.180]:63561 "EHLO mail-vc0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S966062AbbBCSZV convert rfc822-to-8bit (ORCPT ); Tue, 3 Feb 2015 13:25:21 -0500 MIME-Version: 1.0 In-Reply-To: References: <20150201031917.GA18622@wfg-t540p.sh.intel.com> <20150202073334.GB9399@lst.de> Date: Tue, 3 Feb 2015 13:25:20 -0500 Message-ID: Subject: Re: [nfs] WARNING: CPU: 1 PID: 1392 at kernel/sched/core.c:7300 __might_sleep+0xbd/0xd0() From: Trond Myklebust To: Josh Boyer Cc: Christoph Hellwig , Fengguang Wu , LKML , LKP , Linux NFS Mailing List Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 3, 2015 at 1:06 PM, Josh Boyer wrote: > On Tue, Feb 3, 2015 at 1:02 PM, Trond Myklebust > wrote: >> On Tue, Feb 3, 2015 at 12:40 PM, Josh Boyer wrote: >>> On Mon, Feb 2, 2015 at 8:43 AM, Trond Myklebust >>> wrote: >>>> On Mon, Feb 2, 2015 at 2:33 AM, Christoph Hellwig wrote: >>>>> >>>>> On Sat, Jan 31, 2015 at 07:19:17PM -0800, Fengguang Wu wrote: >>>>> > Hi Christoph, >>>>> > >>>>> > FYI, this patch discloses an 100% reproducible boot warning. >>>>> > >>>>> > git://git.infradead.org/users/hch/pnfs.git flexfiles+pnfsd >>>>> > commit 34c311faa8dcd323907c6075ab24b4d9e3c6dcb0 ("nfs: force version 4.1") >>>>> >>>>> The branch is just test branch for some new pnfs patches. But the fact >>>>> that forcing the protocol version to 4.1 makes your boot fail still seems >>>>> like an interesting observation. >>>>> >>>>> > >>>>> > +------------------------------------------------------------------+------------+------------+ >>>>> > | | 457be31a00 | 34c311faa8 | >>>>> > +------------------------------------------------------------------+------------+------------+ >>>>> > | boot_successes | 20 | 10 | >>>>> > | early-boot-hang | 1 | | >>>>> > | boot_failures | 0 | 12 | >>>>> > | Kernel_panic-not_syncing:Out_of_memory_and_no_killable_processes | 0 | 2 | >>>>> > | backtrace:vfs_write | 0 | 2 | >>>>> > | backtrace:SyS_write | 0 | 2 | >>>>> > | backtrace:populate_rootfs | 0 | 2 | >>>>> > | backtrace:kernel_init_freeable | 0 | 2 | >>>>> > | WARNING:at_kernel/sched/core.c:#__might_sleep() | 0 | 10 | >>>>> > | backtrace:nfs41_callback_svc | 0 | 10 | >>>>> > +------------------------------------------------------------------+------------+------------+ >>>>> > >>>>> > >>>>> > [ 12.520894] Key type id_resolver registered >>>>> > [ 12.522364] Key type id_legacy registered >>>>> > [ 12.530530] ------------[ cut here ]------------ >>>>> > [ 12.532061] WARNING: CPU: 1 PID: 1392 at kernel/sched/core.c:7300 __might_sleep+0xbd/0xd0() >>>>> > [ 12.534114] do not call blocking ops when !TASK_RUNNING; state=1 set at [] prepare_to_wait+0x2f/0x90 >>>>> > [ 12.536264] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver sg sr_mod cdrom ata_generic pata_acpi parport_pc floppy parport cirrus syscopyarea snd_pcm sysfillrect sysimgblt snd_timer ttm snd drm_kms_helper ata_piix soundcore libata drm pcspkr i2c_piix4 >>>>> > [ 12.542569] CPU: 1 PID: 1392 Comm: nfsv4.1-svc Not tainted 3.19.0-rc5-wl-ga224126 #1 >>>>> > [ 12.544509] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 >>>>> > [ 12.546252] ffffffff81b7a3c0 ffff88007f627bd8 ffffffff818a735f 0000000000003658 >>>>> > [ 12.548252] ffff88007f627c28 ffff88007f627c18 ffffffff810725ea ffff88007f627bf8 >>>>> > [ 12.550261] ffffffff81b90e59 00000000000004d9 0000000000000000 0000000000000001 >>>>> > [ 12.552359] Call Trace: >>>>> > [ 12.553868] [] dump_stack+0x4c/0x65 >>>>> > [ 12.555666] [] warn_slowpath_common+0x8a/0xc0 >>>>> > [ 12.557512] [] warn_slowpath_fmt+0x46/0x50 >>>>> > [ 12.559338] [] ? try_to_wake_up+0x1f4/0x380 >>>>> > [ 12.561154] [] ? prepare_to_wait+0x2f/0x90 >>>>> > [ 12.562993] [] ? prepare_to_wait+0x2f/0x90 >>>>> > [ 12.564793] [] __might_sleep+0xbd/0xd0 >>>>> > [ 12.566553] [] kmem_cache_alloc_trace+0x1d7/0x250 >>>>> > [ 12.568383] [] ? groups_alloc+0x3e/0x130 >>>>> > [ 12.570159] [] groups_alloc+0x3e/0x130 >>>>> > [ 12.571878] [] svcauth_unix_accept+0x16e/0x290 >>>>> > [ 12.573677] [] svc_authenticate+0xe1/0xf0 >>>>> > [ 12.575405] [] svc_process_common+0x224/0x680 >>>>> > [ 12.577184] [] bc_svc_process+0x1c4/0x260 >>>>> > [ 12.578904] [] nfs41_callback_svc+0x104/0x1b0 [nfsv4] >>>>> > [ 12.580752] [] ? wait_woken+0xc0/0xc0 >>>>> > [ 12.582441] [] ? nfs4_callback_svc+0x60/0x60 [nfsv4] >>>>> > [ 12.584268] [] kthread+0xef/0x110 >>>>> > [ 12.585859] [] ? kthread_create_on_node+0x180/0x180 >>>>> > [ 12.587572] [] ret_from_fork+0x7c/0xb0 >>>>> > [ 12.589175] [] ? kthread_create_on_node+0x180/0x180 >>>>> > [ 12.590895] ---[ end trace 7b39108134f7677c ]--- >>>>> > RESULT_ROOT=/result/vm-vp-2G/boot/1/debian-x86_64-2015-01-13.cgz/x86_64-rhel/a224126be542547c3d3040d2b4c145c0c024cc04/0 >>>>> >>>> >>>> >>>> That warning should hopefully be fixed by the following commit by Jeff: >>>> http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=6ffa30d3f734d4f6b478081dfc09592021028f90 >>>> >>>> I've already pulled it into my linux-next branch. >>> >>> If that's marked for stable, why wouldn't it go to Linus to get into >>> the final 3.19 release? >> >> Even stable patches need soak time. This is something that came up as >> a result of a new sleep test that was added to 3.19-rc, so we've lived >> with the problem for a while. It is a real bug, so it does need to be >> solved, however we can afford to give ourselves an extra week to make >> sure that the timeout we're now introducing is not a problem. > > OK, that's totally fair. > > To help, I've added the patch to Fedora's rawhide builds. We got a > bug report with this exact issue, so hopefully we'll get a bit more > testing coverage that way too. > Thanks! I very much appreciate that. -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com