From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D5BD9C282C4 for ; Thu, 7 Feb 2019 16:57:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B019121872 for ; Thu, 7 Feb 2019 16:57:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726622AbfBGQ53 (ORCPT ); Thu, 7 Feb 2019 11:57:29 -0500 Received: from smtp3-g21.free.fr ([212.27.42.3]:3886 "EHLO smtp3-g21.free.fr" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726196AbfBGQ52 (ORCPT ); Thu, 7 Feb 2019 11:57:28 -0500 Received: from [192.168.108.68] (unknown [213.36.7.13]) (Authenticated sender: marc.w.gonzalez) by smtp3-g21.free.fr (Postfix) with ESMTPSA id 5B7A513F84C; Thu, 7 Feb 2019 17:56:29 +0100 (CET) Subject: Re: dd hangs when reading large partitions From: Marc Gonzalez To: linux-mm , linux-block Cc: Jianchao Wang , Christoph Hellwig , Jens Axboe , fsdevel , SCSI , Joao Pinto , Jeffrey Hugo , Evan Green , Matthias Kaehlcke , Douglas Anderson , Stephen Boyd , Tomas Winkler , Adrian Hunter , Alim Akhtar , Avri Altman , Bart Van Assche , Martin Petersen , Bjorn Andersson , Ming Lei , Omar Sandoval , Roman Gushchin , Andrew Morton , Michal Hocko References: <398a6e83-d482-6e72-5806-6d5bbe8bfdd9@oracle.com> <20190119095601.GA7440@infradead.org> <07b2df5d-e1fe-9523-7c11-f3058a966f8a@free.fr> <985b340c-623f-6df2-66bd-d9f4003189ea@free.fr> <5132e41b-cb1a-5b81-4a72-37d0f9ea4bb9@oracle.com> <7bd8b010-bf0c-ad64-f927-2d2187a18d0b@free.fr> <0cfe1ed2-41e1-66a4-8d98-ebc0d9645d21@free.fr> Message-ID: <27165898-88c3-ab42-c6c9-dd52bf0a41c8@free.fr> Date: Thu, 7 Feb 2019 17:56:29 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=iso-8859-15 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On 07/02/2019 11:44, Marc Gonzalez wrote: > + linux-mm > > Summarizing the issue for linux-mm readers: > > If I read data from a storage device larger than my system's RAM, the system freezes > once dd has read more data than available RAM. > > # dd if=/dev/sde of=/dev/null bs=1M & while true; do echo m > /proc/sysrq-trigger; echo; echo; sleep 1; done > https://pastebin.ubuntu.com/p/HXzdqDZH4W/ > > A few seconds before the system hangs, Mem-Info shows: > > [ 90.986784] Node 0 active_anon:7060kB inactive_anon:13644kB active_file:0kB inactive_file:3797500kB [...] > > => 3797500kB is basically all of RAM. > > I tried to locate where "inactive_file" was being increased from, and saw two signatures: > > [ 255.606019] __mod_node_page_state | __pagevec_lru_add_fn | pagevec_lru_move_fn | __lru_cache_add | lru_cache_add | add_to_page_cache_lru | mpage_readpages | blkdev_readpages | read_pages | __do_page_cache_readahead | ondemand_readahead | page_cache_sync_readahead > > [ 255.637238] __mod_node_page_state | __pagevec_lru_add_fn | pagevec_lru_move_fn | __lru_cache_add | lru_cache_add | lru_cache_add_active_or_unevictable | __handle_mm_fault | handle_mm_fault | do_page_fault | do_translation_fault | do_mem_abort | el1_da > > Are these expected? > > NB: the system does not hang if I specify 'iflag=direct' to dd. > > According to the RCU watchdog: > > [ 108.466240] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: > [ 108.466420] rcu: 1-...0: (130 ticks this GP) idle=79e/1/0x4000000000000000 softirq=2393/2523 fqs=2626 > [ 108.471436] rcu: (detected by 4, t=5252 jiffies, g=133, q=85) > [ 108.480605] Task dump for CPU 1: > [ 108.486483] kworker/1:1H R running task 0 680 2 0x0000002a > [ 108.489977] Workqueue: kblockd blk_mq_run_work_fn > [ 108.496908] Call trace: > [ 108.501513] __switch_to+0x174/0x1e0 > [ 108.503757] blk_mq_run_work_fn+0x28/0x40 > [ 108.507589] process_one_work+0x208/0x480 > [ 108.511486] worker_thread+0x48/0x460 > [ 108.515480] kthread+0x124/0x130 > [ 108.519123] ret_from_fork+0x10/0x1c > > Can anyone shed some light on what's going on? Saw a slightly different report from another test run: https://pastebin.ubuntu.com/p/jCywbKgRCq/ [ 340.689764] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 340.689992] rcu: 1-...0: (8548 ticks this GP) idle=c6e/1/0x4000000000000000 softirq=82/82 fqs=6 [ 340.694977] rcu: (detected by 5, t=5430 jiffies, g=-719, q=16) [ 340.703803] Task dump for CPU 1: [ 340.709507] dd R running task 0 675 673 0x00000002 [ 340.713018] Call trace: [ 340.720059] __switch_to+0x174/0x1e0 [ 340.722192] 0xffffffc0f6dc9600 [ 352.689742] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 33s! [ 352.689910] Showing busy workqueues and worker pools: [ 352.696743] workqueue mm_percpu_wq: flags=0x8 [ 352.701753] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 [ 352.706099] pending: vmstat_update [ 384.693730] BUG: workqueue lockup - pool cpus=1 node=0 flags=0x0 nice=0 stuck for 65s! [ 384.693815] Showing busy workqueues and worker pools: [ 384.700577] workqueue events: flags=0x0 [ 384.705699] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 [ 384.709351] pending: vmstat_shepherd [ 384.715587] workqueue mm_percpu_wq: flags=0x8 [ 384.719495] pwq 2: cpus=1 node=0 flags=0x0 nice=0 active=1/256 [ 384.723754] pending: vmstat_update