From mboxrd@z Thu Jan  1 00:00:00 1970
From: Paul Eggert <eggert@cs.ucla.edu>
Subject: Re: [PATCH 1/3] tmpfs: revert SEEK_DATA and SEEK_HOLE
Date: Tue, 14 Aug 2012 10:03:23 -0700
Message-ID: <502A84DB.5090607@cs.ucla.edu>
References: <877gtkxatx.fsf@rho.meyering.net> <alpine.LSU.2.00.1208071735220.1983@eggly.anvils>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Cc: Jim Meyering <jim@meyering.net>, Zheng Liu <wenqing.lz@taobao.com>,
	linux-fsdevel@vger.kernel.org
To: Hugh Dickins <hughd@google.com>
Return-path: <linux-fsdevel-owner@vger.kernel.org>
Received: from smtp.cs.ucla.edu ([131.179.128.62]:37900 "EHLO smtp.cs.ucla.edu"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1752809Ab2HNRN3 (ORCPT <rfc822;linux-fsdevel@vger.kernel.org>);
	Tue, 14 Aug 2012 13:13:29 -0400
In-Reply-To: <alpine.LSU.2.00.1208071735220.1983@eggly.anvils>
Sender: linux-fsdevel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On 08/07/2012 07:08 PM, Hugh Dickins wrote:
> wouldn't the developer's common case (object files amidst source
> in the tree) usually be handled by that check on the first 32k?

Yes, but grep should also handle the less-common case
where the first 32K is text and there's a large hole later.
The particular case I'm worried about is a denial-of-service
attack, so it's irrelevant that this case is uncommon in
typical files.

> shouldn't grep instead just be
> checking for over-long lines instead of running out of memory?

GNU programs should not have arbitrary limits.  An arbitrary limit, such
as 100,000 bytes, that we put on line length, would cause grep to
not work on some valid inputs.

This is not to say that grep couldn't function better on files with
lots of nulls -- it can, and that's on our list of things to do --
but SEEK_HOLE is a big and obvious win in this area.

We also need SEEK_HOLE and SEEK_DATA for GNU 'tar', for the same reason
(denial-of-service attacks, mostly).