linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jim Meyering <jim@meyering.net>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: "Andreas Dilger" <adilger@dilger.ca>,
	"Niklas Hambüchen" <niklas@nh2.me>,
	"Linux FS-devel Mailing List" <linux-fsdevel@vger.kernel.org>,
	"Paul Eggert" <eggert@cs.ucla.edu>,
	"Pádraig Brady" <P@draigbrady.com>
Subject: Re: O(n^2) deletion performance
Date: Wed, 3 Jan 2018 20:16:58 -0800	[thread overview]
Message-ID: <CA+8g5KFGDG6=R3vY4jvWuSZjap4sCxrSnK1sfTsPnjjJzbYEQw@mail.gmail.com> (raw)
In-Reply-To: <20180102062245.GJ2532@thunk.org>

On Mon, Jan 1, 2018 at 10:22 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> On Mon, Jan 01, 2018 at 08:27:48PM -0800, Jim Meyering wrote:
>> Our goal (with fts and coreutils) has been to make it harder for an
>> accident or maliciousness (with a few million entries in a directory)
>> to hinder file system traversals. Of course, it's not just rm: any
>> FS-traversal tool is affected: cp, chmod, chgrp, du, find, tar, etc.
>> Sure, quotas can help, but even self-inflicted accidents happen on
>> single-user systems with no quotas.
>>
>> Idly wondered if the default inode limits could save ext4 users? Perhaps not.
>> In this 850GB file system, I see it has 48M inodes (caveat, I may have
>> changed the default when I created it -- don't recall):
>
> Well, it's a bit of a blunt hammer, but you *can* set a mount option
> "mount -t ext4 -o max_dir_size_kb=512" which will not allow the
> directory to grow larger than 512k (or pick your favorite limit).

Thanks, but no thanks :-)

Still wondering how this happened... deliberate optimization for
something else, probably.
And wishing I'd written a relative (not absolute) test for it in 2008,
so I would have noticed sooner.
In 2008 when I wrote this coreutils extN performance test:

  https://git.savannah.gnu.org/cgit/coreutils.git/tree/tests/rm/ext3-perf.sh

there was no O(N^2) or even "just" O(N^1.5) component when using the
then-just-improved rm. Many of us plotted the curves.

Any idea when ext4's unlink became more expensive?

> versus the patience needed to recover from
> accidentally dumping 16 million files into a directory --- I prefer
> the latter.  I can wait a few minutes....

I've just run a test on the spinning-disk file system mentioned above,
and it took 75 minutes to delete 12.8M entries. That's rather nasty.

On the bright side, Kevin Vigor was kind enough to run tests showing
that on some large, fast NVMe devices, everything looks linear:
https://docs.google.com/spreadsheets/d/1bPi8MTvSP4xzzuARPOd5fxFujhBoU2Dxandr-Vh1T9c/edit#gid=0

  reply	other threads:[~2018-01-04  4:17 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-02  0:21 O(n^2) deletion performance Niklas Hambüchen
2018-01-02  1:20 ` Niklas Hambüchen
2018-01-02  1:59 ` Theodore Ts'o
2018-01-02  2:49   ` Andreas Dilger
2018-01-02  4:27     ` Jim Meyering
2018-01-02  6:22       ` Theodore Ts'o
2018-01-04  4:16         ` Jim Meyering [this message]
2018-01-04  7:16           ` Theodore Ts'o
2018-01-04 11:42           ` Dave Chinner
2018-01-02  4:33     ` Theodore Ts'o
2018-01-02  4:54 ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+8g5KFGDG6=R3vY4jvWuSZjap4sCxrSnK1sfTsPnjjJzbYEQw@mail.gmail.com' \
    --to=jim@meyering.net \
    --cc=P@draigbrady.com \
    --cc=adilger@dilger.ca \
    --cc=eggert@cs.ucla.edu \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=niklas@nh2.me \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).