git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Elijah Newren <newren@gmail.com>
Cc: "brian m. carlson" <sandals@crustytoothpaste.net>,
	Jason Gore <Jason.Gore@microsoft.com>,
	"git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: Git clean enumerates ignored directories (since 2.27)
Date: Thu, 6 May 2021 23:43:00 -0400	[thread overview]
Message-ID: <YJS3RIEtittWdFSF@coredump.intra.peff.net> (raw)
In-Reply-To: <YJSzQm2p5bCAd8Fv@coredump.intra.peff.net>

On Thu, May 06, 2021 at 11:25:54PM -0400, Jeff King wrote:

> > Anyone have any bright ideas about how to tweak this test?  See [3]
> > for the current incarnation of the code, which was basically taken
> > from Brian's sample testcase.
> 
> My guess is that that version of "rm" is trying to feed the entire
> pathname directly to unlink() and rmdir(), and it exceeds PATH_MAX.
> 
> Even with GNU tools, for instance, I get:
> 
>   $ rmdir $(find avoid-traversing-deep-hierarchy -type d | tail -1)
>   rmdir: failed to remove 'avoid-traversing-deep-hierarchy/directory400/
>     [...and so on...]/directory1': File name too long
> 
> because it feeds the whole to a single rmdir() call. Whereas stracing
> GNU "rm -rf", it uses unlinkat() and openat() to delete each level
> individually (probably to avoid this exact problem).
> 
> Is the actual path length important, or just the depth? If the latter,
> then calling it "d400/d399/.../d2/d1" would likely help, as that's less
> than 2000 bytes.

Reading your commit messages a little more carefully, I'm still not
quite sure of the answer to that question.

But if you really do need the long length, a workaround is to avoid
dealing with the full path all at once. E.g., make two strings, one with
"directory400/.../directory200", and one with "directory199/.../directory1".

And then you can probably:

  (cd $one && rm -rf directory199) &&
  rm -rf directory400

to do it in two parts, with each "rm" seeing only a half-length path.

I notice you also run O(n) "mkdir" and "mv" calls to create the
directory. I "mkdir -p" would be much more efficient, though it might
run afoul of similar path-length problems (especially on non-GNU
systems).

It might be worth turning to perl here:

  perl -e '
    for (reverse 1..400) {
      my $d = "directory$_";
      mkdir($d) and chdir($d) or die "mkdir($d): $!";
    }
    open(my $fh, ">", "some-file");
  '

and you could probably do something similar to remove it. Sadly, I don't
think using File::Path makes building it easier, because it hits the
same path limit (it builds up the string internally). However, its
removal does work (and is in the set of core modules that we can count
on always being available):

  perl -MFile::Path=remove_tree -e 'remove_tree("directory400")'

-Peff

  reply	other threads:[~2021-05-07  3:43 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-14 17:17 Jason Gore
2021-04-14 22:56 ` brian m. carlson
2021-04-15  8:51   ` Jeff King
2021-04-22 17:18     ` Jason Gore
2021-04-22 17:24       ` Elijah Newren
2021-05-07  4:08       ` Elijah Newren
2021-05-07  2:31     ` Elijah Newren
2021-05-07  3:25       ` Jeff King
2021-05-07  3:43         ` Jeff King [this message]
2021-05-07  3:44         ` Elijah Newren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YJS3RIEtittWdFSF@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=Jason.Gore@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=newren@gmail.com \
    --cc=sandals@crustytoothpaste.net \
    --subject='Re: Git clean enumerates ignored directories (since 2.27)' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).