From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756232Ab0DGRnM (ORCPT ); Wed, 7 Apr 2010 13:43:12 -0400 Received: from 64-131-60-146.usfamily.net ([64.131.60.146]:41724 "EHLO mail.sandeen.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752226Ab0DGRnL (ORCPT ); Wed, 7 Apr 2010 13:43:11 -0400 Message-ID: <4BBCC42E.2000409@sandeen.net> Date: Wed, 07 Apr 2010 12:43:10 -0500 From: Eric Sandeen User-Agent: Thunderbird 2.0.0.24 (Macintosh/20100228) MIME-Version: 1.0 To: John Berthels CC: Dave Chinner , Nick Gregory , xfs@oss.sgi.com, linux-kernel@vger.kernel.org, Rob Sanderson Subject: Re: PROBLEM + POSS FIX: kernel stack overflow, xfs, many disks, heavy write load, 8k stack, x86-64 References: <4BBC6719.7080304@humyo.com> <20100407140523.GJ11036@dastard> <4BBCAB57.3000106@humyo.com> In-Reply-To: <4BBCAB57.3000106@humyo.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org John Berthels wrote: > Dave Chinner wrote: >> I'm not seeing stacks deeper than about 5.6k on XFS under heavy write >> loads. That's nowhere near blowing an 8k stack, so there must be >> something special about what you are doing. Can you post the stack >> traces that are being generated for the deepest stack generated - >> /sys/kernel/debug/tracing/stack_trace should contain it. >> > Appended below. That doesn't seem to reach 8192 but the box it's from > has logged: > > [74649.579386] apache2 used greatest stack depth: 7024 bytes left but that's -left- (out of 8k or is that from a THREAD_ORDER=2 box?) I guess it must be out of 16k... > Depth Size Location (47 entries) > ----- ---- -------- > 0) 7568 16 mempool_alloc_slab+0x16/0x20 > 1) 7552 144 mempool_alloc+0x65/0x140 > 2) 7408 96 get_request+0x124/0x370 > 3) 7312 144 get_request_wait+0x29/0x1b0 > 4) 7168 96 __make_request+0x9b/0x490 > 5) 7072 208 generic_make_request+0x3df/0x4d0 > 6) 6864 80 submit_bio+0x7c/0x100 > 7) 6784 96 _xfs_buf_ioapply+0x128/0x2c0 [xfs] > 8) 6688 48 xfs_buf_iorequest+0x75/0xd0 [xfs] > 9) 6640 32 _xfs_buf_read+0x36/0x70 [xfs] > 10) 6608 48 xfs_buf_read+0xda/0x110 [xfs] > 11) 6560 80 xfs_trans_read_buf+0x2a7/0x410 [xfs] > 12) 6480 80 xfs_btree_read_buf_block+0x5d/0xb0 [xfs] > 13) 6400 80 xfs_btree_lookup_get_block+0x84/0xf0 [xfs] > 14) 6320 176 xfs_btree_lookup+0xd7/0x490 [xfs] > 15) 6144 16 xfs_alloc_lookup_eq+0x19/0x20 [xfs] > 16) 6128 96 xfs_alloc_fixup_trees+0xee/0x350 [xfs] > 17) 6032 144 xfs_alloc_ag_vextent_near+0x916/0xb30 [xfs] > 18) 5888 32 xfs_alloc_ag_vextent+0xe5/0x140 [xfs] > 19) 5856 96 xfs_alloc_vextent+0x49f/0x630 [xfs] > 20) 5760 160 xfs_bmbt_alloc_block+0xbe/0x1d0 [xfs] > 21) 5600 208 xfs_btree_split+0xb3/0x6a0 [xfs] > 22) 5392 96 xfs_btree_make_block_unfull+0x151/0x190 [xfs] > 23) 5296 224 xfs_btree_insrec+0x39c/0x5b0 [xfs] > 24) 5072 128 xfs_btree_insert+0x86/0x180 [xfs] > 25) 4944 352 xfs_bmap_add_extent_delay_real+0x41e/0x1660 [xfs] > 26) 4592 208 xfs_bmap_add_extent+0x41c/0x450 [xfs] > 27) 4384 448 xfs_bmapi+0x982/0x1200 [xfs] This one, I'm afraid, has always been big. > 28) 3936 256 xfs_iomap_write_allocate+0x248/0x3c0 [xfs] > 29) 3680 208 xfs_iomap+0x3d8/0x410 [xfs] > 30) 3472 32 xfs_map_blocks+0x2c/0x30 [xfs] > 31) 3440 256 xfs_page_state_convert+0x443/0x730 [xfs] > 32) 3184 64 xfs_vm_writepage+0xab/0x160 [xfs] > 33) 3120 384 shrink_page_list+0x65e/0x840 > 34) 2736 528 shrink_zone+0x63f/0xe10 that's a nice one (actually the two together at > 900 bytes, ouch) > 35) 2208 112 do_try_to_free_pages+0xc2/0x3c0 > 36) 2096 128 try_to_free_pages+0x77/0x80 > 37) 1968 240 __alloc_pages_nodemask+0x3e4/0x710 > 38) 1728 48 alloc_pages_current+0x8c/0xe0 > 39) 1680 16 __get_free_pages+0xe/0x50 > 40) 1664 48 __pollwait+0xca/0x110 > 41) 1616 32 unix_poll+0x28/0xc0 > 42) 1584 16 sock_poll+0x1d/0x20 > 43) 1568 912 do_select+0x3d6/0x700 912, ouch! int do_select(int n, fd_set_bits *fds, struct timespec *end_time) { ktime_t expire, *to = NULL; struct poll_wqueues table; (gdb) p sizeof(struct poll_wqueues) $1 = 624 I guess that's been there forever, though. > 44) 656 416 core_sys_select+0x18c/0x2c0 416 hurts too. The xfs callchain is deep, no doubt, but the combination of the select path and the shrink calls is almost 2k in just a few calls, and that doesn't help much. -Eric > 45) 240 112 sys_select+0x4f/0x110 > 46) 128 128 system_call_fastpath+0x16/0x1b > > > ------------------------------------------------------------------------ > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id o37HfMjc210006 for ; Wed, 7 Apr 2010 12:41:23 -0500 Received: from mail.sandeen.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 27E1F125539F for ; Wed, 7 Apr 2010 10:43:11 -0700 (PDT) Received: from mail.sandeen.net (64-131-60-146.usfamily.net [64.131.60.146]) by cuda.sgi.com with ESMTP id FoSNViuJwFo4Y4fr for ; Wed, 07 Apr 2010 10:43:11 -0700 (PDT) Message-ID: <4BBCC42E.2000409@sandeen.net> Date: Wed, 07 Apr 2010 12:43:10 -0500 From: Eric Sandeen MIME-Version: 1.0 Subject: Re: PROBLEM + POSS FIX: kernel stack overflow, xfs, many disks, heavy write load, 8k stack, x86-64 References: <4BBC6719.7080304@humyo.com> <20100407140523.GJ11036@dastard> <4BBCAB57.3000106@humyo.com> In-Reply-To: <4BBCAB57.3000106@humyo.com> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: John Berthels Cc: Nick Gregory , Rob Sanderson , linux-kernel@vger.kernel.org, xfs@oss.sgi.com John Berthels wrote: > Dave Chinner wrote: >> I'm not seeing stacks deeper than about 5.6k on XFS under heavy write >> loads. That's nowhere near blowing an 8k stack, so there must be >> something special about what you are doing. Can you post the stack >> traces that are being generated for the deepest stack generated - >> /sys/kernel/debug/tracing/stack_trace should contain it. >> > Appended below. That doesn't seem to reach 8192 but the box it's from > has logged: > > [74649.579386] apache2 used greatest stack depth: 7024 bytes left but that's -left- (out of 8k or is that from a THREAD_ORDER=2 box?) I guess it must be out of 16k... > Depth Size Location (47 entries) > ----- ---- -------- > 0) 7568 16 mempool_alloc_slab+0x16/0x20 > 1) 7552 144 mempool_alloc+0x65/0x140 > 2) 7408 96 get_request+0x124/0x370 > 3) 7312 144 get_request_wait+0x29/0x1b0 > 4) 7168 96 __make_request+0x9b/0x490 > 5) 7072 208 generic_make_request+0x3df/0x4d0 > 6) 6864 80 submit_bio+0x7c/0x100 > 7) 6784 96 _xfs_buf_ioapply+0x128/0x2c0 [xfs] > 8) 6688 48 xfs_buf_iorequest+0x75/0xd0 [xfs] > 9) 6640 32 _xfs_buf_read+0x36/0x70 [xfs] > 10) 6608 48 xfs_buf_read+0xda/0x110 [xfs] > 11) 6560 80 xfs_trans_read_buf+0x2a7/0x410 [xfs] > 12) 6480 80 xfs_btree_read_buf_block+0x5d/0xb0 [xfs] > 13) 6400 80 xfs_btree_lookup_get_block+0x84/0xf0 [xfs] > 14) 6320 176 xfs_btree_lookup+0xd7/0x490 [xfs] > 15) 6144 16 xfs_alloc_lookup_eq+0x19/0x20 [xfs] > 16) 6128 96 xfs_alloc_fixup_trees+0xee/0x350 [xfs] > 17) 6032 144 xfs_alloc_ag_vextent_near+0x916/0xb30 [xfs] > 18) 5888 32 xfs_alloc_ag_vextent+0xe5/0x140 [xfs] > 19) 5856 96 xfs_alloc_vextent+0x49f/0x630 [xfs] > 20) 5760 160 xfs_bmbt_alloc_block+0xbe/0x1d0 [xfs] > 21) 5600 208 xfs_btree_split+0xb3/0x6a0 [xfs] > 22) 5392 96 xfs_btree_make_block_unfull+0x151/0x190 [xfs] > 23) 5296 224 xfs_btree_insrec+0x39c/0x5b0 [xfs] > 24) 5072 128 xfs_btree_insert+0x86/0x180 [xfs] > 25) 4944 352 xfs_bmap_add_extent_delay_real+0x41e/0x1660 [xfs] > 26) 4592 208 xfs_bmap_add_extent+0x41c/0x450 [xfs] > 27) 4384 448 xfs_bmapi+0x982/0x1200 [xfs] This one, I'm afraid, has always been big. > 28) 3936 256 xfs_iomap_write_allocate+0x248/0x3c0 [xfs] > 29) 3680 208 xfs_iomap+0x3d8/0x410 [xfs] > 30) 3472 32 xfs_map_blocks+0x2c/0x30 [xfs] > 31) 3440 256 xfs_page_state_convert+0x443/0x730 [xfs] > 32) 3184 64 xfs_vm_writepage+0xab/0x160 [xfs] > 33) 3120 384 shrink_page_list+0x65e/0x840 > 34) 2736 528 shrink_zone+0x63f/0xe10 that's a nice one (actually the two together at > 900 bytes, ouch) > 35) 2208 112 do_try_to_free_pages+0xc2/0x3c0 > 36) 2096 128 try_to_free_pages+0x77/0x80 > 37) 1968 240 __alloc_pages_nodemask+0x3e4/0x710 > 38) 1728 48 alloc_pages_current+0x8c/0xe0 > 39) 1680 16 __get_free_pages+0xe/0x50 > 40) 1664 48 __pollwait+0xca/0x110 > 41) 1616 32 unix_poll+0x28/0xc0 > 42) 1584 16 sock_poll+0x1d/0x20 > 43) 1568 912 do_select+0x3d6/0x700 912, ouch! int do_select(int n, fd_set_bits *fds, struct timespec *end_time) { ktime_t expire, *to = NULL; struct poll_wqueues table; (gdb) p sizeof(struct poll_wqueues) $1 = 624 I guess that's been there forever, though. > 44) 656 416 core_sys_select+0x18c/0x2c0 416 hurts too. The xfs callchain is deep, no doubt, but the combination of the select path and the shrink calls is almost 2k in just a few calls, and that doesn't help much. -Eric > 45) 240 112 sys_select+0x4f/0x110 > 46) 128 128 system_call_fastpath+0x16/0x1b > > > ------------------------------------------------------------------------ > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs