From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jim Meyering Subject: Re: [PATCH 1/3] tmpfs: revert SEEK_DATA and SEEK_HOLE Date: Tue, 31 Jul 2012 16:30:02 +0200 Message-ID: <877gtkxatx.fsf@rho.meyering.net> References: Mime-Version: 1.0 Content-Type: text/plain Cc: Hugh Dickins To: linux-fsdevel@vger.kernel.org Return-path: Received: from smtpfb2-g21.free.fr ([212.27.42.10]:41696 "EHLO smtpfb2-g21.free.fr" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754281Ab2GaOax (ORCPT ); Tue, 31 Jul 2012 10:30:53 -0400 Received: from smtp5-g21.free.fr (smtp5-g21.free.fr [212.27.42.5]) by smtpfb2-g21.free.fr (Postfix) with ESMTP id 00A58D1B166 for ; Tue, 31 Jul 2012 16:30:49 +0200 (CEST) Received: from mx.meyering.net (unknown [88.168.87.75]) by smtp5-g21.free.fr (Postfix) with ESMTP id D0FD1D48131 for ; Tue, 31 Jul 2012 16:30:03 +0200 (CEST) In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Hugh Dickins wrote: > On Thu, 12 Jul 2012, Jeff Liu wrote: > > On 07/12/2012 07:01 AM, Dave Chinner wrote: > > > On Wed, Jul 11, 2012 at 11:55:34AM -0700, Hugh Dickins wrote: > > >> > > >> But your vote would count for a lot more if you know of some app which > > >> would really benefit from this functionality in tmpfs: I've heard > > >> of none. ... [Jeff mentioned "cp"] grep is another tool that would benefit. I often put very large files (often sparse, too) on tmpfs file systems and would like "grep -r PAT /tmp" to work well in spite of those files. Please consider restoring SEEK_HOLE/SEEK_DATA support for tmpfs. The lack of cross-FS support in SEEK_HOLE/SEEK_DATA support is a bit of a thorn in our sides. FIEMAP is not a viable option, and SEEK_HOLE support works only if you happen to be using btrfs, xfs, ocfs2 or 3.5.0-rcN tmpfs. Not something we can rely on for a feature whose lack can convert grep -r into a memory-hogging apparently-hung job or OOM-killer-target. What would you like to happen when you run (deliberately or inadvertently) grep on a large sparse file? I want it to search only the non-HOLE sections of that file, especially when examining a hole involves accumulating a "line" that may be so long that it exhausts virtual memory. We're not quite there, but for now can at least avoid the VM-abusing behavior with --binary-file=without-match option, which says to treat "binary" (sparse) files as if they contain no match. Sometimes. With working SEEK_HOLE support, grep does the right thing here: (${AWK-awk} 'BEGIN{ for (i=0;i<1000;i++) printf "%080d\n", 0 }' < /dev/null echo x | dd bs=1024k seek=8000000 ) >8T-or-so $ env time --format=%e grep x 8T-or-so 0.00 But without SEEK_HOLE support, and with a lot of memory, grep takes a long time to allocate all of that space before it finally chokes or is killed. Here, it takes 46 seconds before running out of memory: $ env time grep --binary-file=without-match x 8T-or-so grep: memory exhausted 3.15user 25.48system 0:46.46elapsed 61%CPU\ (0avgtext+0avgdata 12583712maxresident)k 0inputs+8outputs (0major+2733623minor)pagefaults 0swaps [Exit 2] Until very recently, grep was trying to guess whether an input has a hole using st_blocks and st_size, but with file systems now using compression, that method it too subject to false-positives. Ideally we would use SEEK_HOLE/SEEK_DATA, but until that is useful on more linux file systems, I suspect we'll have to choose our method based on the file system type (at the cost of a statvfs call for each st_dev), possibly in combination with the linux kernel version. Here's some background/discussion on the topic, including the original report about the st_blocks-based heuristic not working: http://thread.gmane.org/gmane.comp.gnu.grep.bugs/4604/focus=4610 In case you want to see the SEEK_HOLE-using code, grep's file_is_binary function is here: http://git.savannah.gnu.org/cgit/grep.git/tree/src/main.c#n439