All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Rafael J. Wysocki" <rjw@sisk.pl>
To: "Paweł Sikora" <pluto@agmk.net>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Kernel Testers List <kernel-testers@vger.kernel.org>,
	Maciej Rutecki <maciej.rutecki@gmail.com>,
	Florian Mickler <florian@mickler.org>,
	Trond Myklebust <trond.myklebust@fys.uio.no>,
	Arkadiusz Miskiewicz <arekm@maven.pl>,
	Jens Axboe <axboe@kernel.dk>
Subject: Re: [Bug #20472] [2.6.34 -> 2.6.35] INFO: task rpcbind:14163 blocked for more than 120 seconds.
Date: Wed, 20 Oct 2010 23:35:23 +0200	[thread overview]
Message-ID: <201010202335.24118.rjw@sisk.pl> (raw)
In-Reply-To: <201010200200.45570.pluto@agmk.net>

On Wednesday, October 20, 2010, Paweł Sikora wrote:
> On Tuesday 19 October 2010 22:30:22 Rafael J. Wysocki wrote:
> > On Tuesday, October 19, 2010, Paweł Sikora wrote:
> > > On Monday 18 October 2010 23:36:51 Rafael J. Wysocki wrote:
> > > > On Monday, October 18, 2010, Paweł Sikora wrote:
> > > > > On Sunday 17 October 2010 22:21:48 Rafael J. Wysocki wrote:
> > > > > > This message has been generated automatically as a part of a summary report
> > > > > > of recent regressions.
> > > > > > 
> > > > > > The following bug entry is on the current list of known regressions
> > > > > > from 2.6.35.  Please verify if it still should be listed and let the tracking team
> > > > > > know (either way).
> > > > > 
> > > > > hi,
> > > > > 
> > > > > hot news from the front, recent tests on the pld-linux.org vendor kernel
> > > > > show that the random task blocking in 2.6.35.7 during medium/heavy load (~4..25)
> > > > > is related to the grsecurity patch. we'll forward problem report to the
> > > > > grsec maintainer asap...
> > > > 
> > > > So, to make things clear, this is not a mainline kernel issue, is it?
> > > 
> > > i thought that was a grsecurity issue but now i have a testcase that
> > > casues task blocking on my machine with the vanilla 2.6.35.7 kernel:
> > > 
> > > steps to repoduce:
> > > 
> > > on console 1:
> > > - run 'make -j32' in kernel tree and wait several second for make forks
> > >   and medium system load (~15).
> > > 
> > > on console 2:
> > > - run 'sync' from root account to bump load more.
> > > 
> > > on console 3:
> > > - observe 'tail -f /var/log/kernel' for blocking issues.
> > > 
> > > [  360.517917] INFO: task sync:6712 blocked for more than 120 seconds.
> > > [  360.517920] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > [  360.517922] sync          D 00000000ffff4c7c     0  6712   2770 0x00000000
> > > [  360.517926]  ffff88021ac71d18 0000000000000086 ffff880200000000 0000000000004000
> > > [  360.517930]  0000000000013700 0000000000013700 ffff88021ac71fd8 ffff88021ac71fd8
> > > [  360.517933]  ffff880205593020 0000000000013700 ffff88021ac71fd8 0000000000004000
> > > [  360.517936] Call Trace:
> > > [  360.517944]  [<ffffffff813b8dbd>] schedule_timeout+0x20d/0x2f0
> > > [  360.517949]  [<ffffffff8103baba>] ? enqueue_entity+0xea/0x170
> > > [  360.517951]  [<ffffffff8103bc09>] ? enqueue_task_fair+0x49/0x50
> > > [  360.517954]  [<ffffffff813b8945>] wait_for_common+0xc5/0x150
> > > [  360.517957]  [<ffffffff81040430>] ? default_wake_function+0x0/0x10
> > > [  360.517959]  [<ffffffff813b8a78>] wait_for_completion+0x18/0x20
> > > [  360.517963]  [<ffffffff8113b343>] sync_inodes_sb+0x83/0x170
> > > [  360.517967]  [<ffffffff8113f890>] ? sync_one_sb+0x0/0x20
> > > [  360.517969]  [<ffffffff8113f880>] __sync_filesystem+0x80/0x90
> > > [  360.517972]  [<ffffffff8113f8ab>] sync_one_sb+0x1b/0x20
> > > [  360.517975]  [<ffffffff8111d2c7>] iterate_supers+0x77/0xc0
> > > [  360.517978]  [<ffffffff8113f7bb>] sync_filesystems+0x1b/0x20
> > > [  360.517980]  [<ffffffff8113f92c>] sys_sync+0x1c/0x40
> > > [  360.517984]  [<ffffffff81002d6b>] system_call_fastpath+0x16/0x1b
> > > 
> > > the 'make' is running on ext4 filesystem with underlying mdadm (raid10)
> > > created from 2 caviar raid edition disks. the cpu is intel quad core
> > > Q9300 with 8GB of ddr2 ram.
> > 
> > So, how is this related to the original report?
> 
> it's easier to reproduce on single test machine (no need to playing with parallel make and nfs).
> in fact this isn't a 2.6.35.5->.7 regression but 2.6.34->2.6.35. the git-bisect found
> first commit that introduces the 'sync' task blocking problem here:
> 
> commit 7c8a3554c683f512dbcee26faedb42e4c05f12fa
> Author: Jens Axboe <jens.axboe@oracle.com>
> Date:   Tue May 18 14:29:29 2010 +0200
> 
>     writeback: ensure that WB_SYNC_NONE writeback with sb pinned is sync
>     
>     Even if the writeout itself isn't a data integrity operation, we need
>     to ensure that the caller doesn't drop the sb umount sem before we
>     have actually done the writeback.
>     
>     This is a fixup for commit e913fc82.
>     
>     Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
> 
> 
> i'll try to revert this patch from the 2.6.35.7 and test make+nfs in few days...

OK, thanks.  Jens added to the CC list.

Rafael

WARNING: multiple messages have this Message-ID (diff)
From: "Rafael J. Wysocki" <rjw-KKrjLPT3xs0@public.gmane.org>
To: "Paweł Sikora" <pluto-PIIpFW8S9c0@public.gmane.org>
Cc: Linux Kernel Mailing List
	<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Kernel Testers List
	<kernel-testers-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Maciej Rutecki
	<maciej.rutecki-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Florian Mickler <florian-sVu6HhrpSfRAfugRpC6u6w@public.gmane.org>,
	Trond Myklebust
	<trond.myklebust-41N18TsMXrtuMpJDpNschA@public.gmane.org>,
	Arkadiusz Miskiewicz <arekm-evZBlRFTdvA@public.gmane.org>,
	Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
Subject: Re: [Bug #20472] [2.6.34 -> 2.6.35] INFO: task rpcbind:14163 blocked for more than 120 seconds.
Date: Wed, 20 Oct 2010 23:35:23 +0200	[thread overview]
Message-ID: <201010202335.24118.rjw@sisk.pl> (raw)
In-Reply-To: <201010200200.45570.pluto-PIIpFW8S9c0@public.gmane.org>

On Wednesday, October 20, 2010, Paweł Sikora wrote:
> On Tuesday 19 October 2010 22:30:22 Rafael J. Wysocki wrote:
> > On Tuesday, October 19, 2010, Paweł Sikora wrote:
> > > On Monday 18 October 2010 23:36:51 Rafael J. Wysocki wrote:
> > > > On Monday, October 18, 2010, Paweł Sikora wrote:
> > > > > On Sunday 17 October 2010 22:21:48 Rafael J. Wysocki wrote:
> > > > > > This message has been generated automatically as a part of a summary report
> > > > > > of recent regressions.
> > > > > > 
> > > > > > The following bug entry is on the current list of known regressions
> > > > > > from 2.6.35.  Please verify if it still should be listed and let the tracking team
> > > > > > know (either way).
> > > > > 
> > > > > hi,
> > > > > 
> > > > > hot news from the front, recent tests on the pld-linux.org vendor kernel
> > > > > show that the random task blocking in 2.6.35.7 during medium/heavy load (~4..25)
> > > > > is related to the grsecurity patch. we'll forward problem report to the
> > > > > grsec maintainer asap...
> > > > 
> > > > So, to make things clear, this is not a mainline kernel issue, is it?
> > > 
> > > i thought that was a grsecurity issue but now i have a testcase that
> > > casues task blocking on my machine with the vanilla 2.6.35.7 kernel:
> > > 
> > > steps to repoduce:
> > > 
> > > on console 1:
> > > - run 'make -j32' in kernel tree and wait several second for make forks
> > >   and medium system load (~15).
> > > 
> > > on console 2:
> > > - run 'sync' from root account to bump load more.
> > > 
> > > on console 3:
> > > - observe 'tail -f /var/log/kernel' for blocking issues.
> > > 
> > > [  360.517917] INFO: task sync:6712 blocked for more than 120 seconds.
> > > [  360.517920] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > [  360.517922] sync          D 00000000ffff4c7c     0  6712   2770 0x00000000
> > > [  360.517926]  ffff88021ac71d18 0000000000000086 ffff880200000000 0000000000004000
> > > [  360.517930]  0000000000013700 0000000000013700 ffff88021ac71fd8 ffff88021ac71fd8
> > > [  360.517933]  ffff880205593020 0000000000013700 ffff88021ac71fd8 0000000000004000
> > > [  360.517936] Call Trace:
> > > [  360.517944]  [<ffffffff813b8dbd>] schedule_timeout+0x20d/0x2f0
> > > [  360.517949]  [<ffffffff8103baba>] ? enqueue_entity+0xea/0x170
> > > [  360.517951]  [<ffffffff8103bc09>] ? enqueue_task_fair+0x49/0x50
> > > [  360.517954]  [<ffffffff813b8945>] wait_for_common+0xc5/0x150
> > > [  360.517957]  [<ffffffff81040430>] ? default_wake_function+0x0/0x10
> > > [  360.517959]  [<ffffffff813b8a78>] wait_for_completion+0x18/0x20
> > > [  360.517963]  [<ffffffff8113b343>] sync_inodes_sb+0x83/0x170
> > > [  360.517967]  [<ffffffff8113f890>] ? sync_one_sb+0x0/0x20
> > > [  360.517969]  [<ffffffff8113f880>] __sync_filesystem+0x80/0x90
> > > [  360.517972]  [<ffffffff8113f8ab>] sync_one_sb+0x1b/0x20
> > > [  360.517975]  [<ffffffff8111d2c7>] iterate_supers+0x77/0xc0
> > > [  360.517978]  [<ffffffff8113f7bb>] sync_filesystems+0x1b/0x20
> > > [  360.517980]  [<ffffffff8113f92c>] sys_sync+0x1c/0x40
> > > [  360.517984]  [<ffffffff81002d6b>] system_call_fastpath+0x16/0x1b
> > > 
> > > the 'make' is running on ext4 filesystem with underlying mdadm (raid10)
> > > created from 2 caviar raid edition disks. the cpu is intel quad core
> > > Q9300 with 8GB of ddr2 ram.
> > 
> > So, how is this related to the original report?
> 
> it's easier to reproduce on single test machine (no need to playing with parallel make and nfs).
> in fact this isn't a 2.6.35.5->.7 regression but 2.6.34->2.6.35. the git-bisect found
> first commit that introduces the 'sync' task blocking problem here:
> 
> commit 7c8a3554c683f512dbcee26faedb42e4c05f12fa
> Author: Jens Axboe <jens.axboe-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> Date:   Tue May 18 14:29:29 2010 +0200
> 
>     writeback: ensure that WB_SYNC_NONE writeback with sb pinned is sync
>     
>     Even if the writeout itself isn't a data integrity operation, we need
>     to ensure that the caller doesn't drop the sb umount sem before we
>     have actually done the writeback.
>     
>     This is a fixup for commit e913fc82.
>     
>     Signed-off-by: Jens Axboe <jens.axboe-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> 
> 
> i'll try to revert this patch from the 2.6.35.7 and test make+nfs in few days...

OK, thanks.  Jens added to the CC list.

Rafael

  reply	other threads:[~2010-10-20 21:36 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-17 20:15 2.6.36-rc8-git3: Reported regressions from 2.6.35 Rafael J. Wysocki
2010-10-17 20:15 ` Rafael J. Wysocki
2010-10-17 20:15 ` Rafael J. Wysocki
2010-10-17 20:15 ` [Bug #16951] hackbench regression with 2.6.36-rc1 Rafael J. Wysocki
2010-10-17 20:15   ` Rafael J. Wysocki
2010-10-17 20:21 ` [Bug #16971] qla4xxx compile failure on 32-bit PowerPC: missing readq and writeq Rafael J. Wysocki
2010-10-17 20:21   ` Rafael J. Wysocki
2010-10-17 20:21 ` [Bug #17121] Two blank rectangles more than 10 cm long when booting Rafael J. Wysocki
2010-10-17 20:21   ` Rafael J. Wysocki
2010-10-17 20:21 ` [Bug #17361] Random kmemcheck errors and kernel freeze on 2.6.36-rc* Rafael J. Wysocki
2010-10-17 20:21 ` [Bug #17061] 2.6.36-rc1 on zaurus: bluetooth regression Rafael J. Wysocki
2010-10-17 20:21   ` Rafael J. Wysocki
2010-10-17 20:21 ` [Bug #19002] Radeon rv730 AGP/KMS/DRM kernel lockup Rafael J. Wysocki
2010-10-17 20:21 ` [Bug #19052] 2.6.36-rc5-git1 -- [drm:i915_report_and_clear_eir] *ERROR* EIR stuck: 0x00000010, masking Rafael J. Wysocki
2010-10-17 20:21 ` [Bug #19372] 2.6.36-rc6: WARNING: at drivers/gpu/drm/radeon/radeon_fence.c:235 radeon_fence_wait+0x35a/0x3c0 Rafael J. Wysocki
2010-10-17 20:21   ` Rafael J. Wysocki
2010-10-17 20:21 ` [Bug #19392] WARNING: at drivers/net/wireless/ath/ath5k/base.c:3475 ath5k_bss_info_changed+0x44/0x168 [ath5k]() Rafael J. Wysocki
2010-10-17 21:44   ` Justin P. Mattock
2010-10-17 21:44     ` Justin P. Mattock
2010-10-17 22:35     ` Rafael J. Wysocki
2010-10-17 22:35       ` Rafael J. Wysocki
2010-10-17 20:21 ` [Bug #19072] [2.6.36-rc regression] occasional complete system hangs on sparc64 SMP Rafael J. Wysocki
2010-10-18  7:41   ` Mikael Pettersson
2010-10-18  7:41     ` Mikael Pettersson
2010-10-18 21:35     ` Rafael J. Wysocki
2010-10-18 21:35       ` Rafael J. Wysocki
2010-10-17 20:21 ` [Bug #19142] Screen flickers when switching from the console to X Rafael J. Wysocki
2010-10-17 20:21 ` [Bug #19802] [drm:init_ring_common] *ERROR* render ring head not reset to zero Rafael J. Wysocki
2010-10-17 20:21 ` [Bug #19782] 2.6.36-rc6-git2 -- BUG dentry: Poison overwritten (after resume from hibernation) Rafael J. Wysocki
2010-10-17 20:21 ` [Bug #19862] [REGRESSION] no sound on T60 laptop (HDA Intel) Rafael J. Wysocki
2010-10-17 20:21   ` Rafael J. Wysocki
2010-10-17 20:21 ` [Bug #19632] 2.6.36-rc6: modprobe Not tainted warning Rafael J. Wysocki
2010-10-17 20:21 ` [Bug #20172] Lenovo S12 2.6.36-rc7 lockup Rafael J. Wysocki
2010-10-18 22:34   ` Chris Vine
2010-10-18 22:34     ` Chris Vine
2010-10-18 22:41     ` Rafael J. Wysocki
2010-10-17 20:21 ` [Bug #20032] 2.6.36-rc7 continuos kernel panics due to of cpu_idle (and cpu_intel_idle) Rafael J. Wysocki
2010-10-17 20:21   ` Rafael J. Wysocki
2010-10-17 20:21 ` [Bug #20162] [LogFS][2.6.36.rc7+] Kernel BUG at readwrite.c:1193 Rafael J. Wysocki
2010-10-17 20:21   ` Rafael J. Wysocki
2010-10-17 20:21 ` [Bug #20022] "do_IRQ: 0.89 No irq handler for vector (irq -1)" Rafael J. Wysocki
2010-10-17 20:21   ` Rafael J. Wysocki
2010-10-17 20:21 ` [Bug #20182] 2.6.36-rc7: NULL pointer dereference in ehci_clear_tt_buffer_complete Rafael J. Wysocki
2010-10-17 21:20   ` Stefan Richter
2010-10-17 21:29     ` Rafael J. Wysocki
2010-10-17 20:21 ` [Bug #20232] kworker consumes ~100% CPU on HP Elitebook 8540w running 2.6.36_rc6-git4 Rafael J. Wysocki
2010-10-17 20:21   ` Rafael J. Wysocki
2010-10-17 20:21 ` [Bug #20332] [LogFS] [2.6.36-rc7] Kernel BUG at lib/btree.c:465! Rafael J. Wysocki
2010-10-17 20:21 ` [Bug #20322] 2.6.36-rc7: inconsistent lock state: inconsistent {IN-RECLAIM_FS-R} -> {RECLAIM_FS-ON-W} usage Rafael J. Wysocki
2010-10-17 20:21 ` [Bug #20352] Fwd: Re: UML kernel crash of v2.6.36-rcX kernel Rafael J. Wysocki
2010-10-17 20:21   ` Rafael J. Wysocki
2010-10-17 20:21 ` [Bug #20462] 2.6.36-rc7-git2 - panic/GPF: e1000e/vlans? Rafael J. Wysocki
2010-10-17 20:21   ` Rafael J. Wysocki
2010-10-17 20:21 ` [Bug #20342] [LogFS] [2.6.36-rc7] Deadlock in logfs_get_wblocks, hold and wait on same lock super->s_write_mutex Rafael J. Wysocki
2010-10-17 20:21   ` Rafael J. Wysocki
2010-10-17 20:21 ` [Bug #20472] [2.6.35.5 -> .7] INFO: task rpcbind:14163 blocked for more than 120 seconds Rafael J. Wysocki
2010-10-18 17:51   ` Paweł Sikora
2010-10-18 21:36     ` Rafael J. Wysocki
2010-10-18 21:36       ` Rafael J. Wysocki
2010-10-19 16:03       ` Paweł Sikora
2010-10-19 16:03         ` Paweł Sikora
2010-10-19 20:30         ` Rafael J. Wysocki
2010-10-19 20:30           ` Rafael J. Wysocki
2010-10-20  0:00           ` [Bug #20472] [2.6.34 -> 2.6.35] " Paweł Sikora
2010-10-20  0:00             ` Paweł Sikora
2010-10-20 21:35             ` Rafael J. Wysocki [this message]
2010-10-20 21:35               ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201010202335.24118.rjw@sisk.pl \
    --to=rjw@sisk.pl \
    --cc=arekm@maven.pl \
    --cc=axboe@kernel.dk \
    --cc=florian@mickler.org \
    --cc=kernel-testers@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maciej.rutecki@gmail.com \
    --cc=pluto@agmk.net \
    --cc=trond.myklebust@fys.uio.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.