All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Mason <chris.mason@oracle.com>
To: "Ted Ts'o" <tytso@mit.edu>, Lukas Czerner <lczerner@redhat.com>,
	Phillip Susi <phillsusi@gmail.com>,
	Andreas Dilger <adilger@whamcloud.com>,
	Jacek Luczak <difrost.kernel@gmail.com>,
	"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: getdents - ext4 vs btrfs performance
Date: Wed, 14 Mar 2012 15:17:39 -0400	[thread overview]
Message-ID: <20120314191739.GA19217@shiny> (raw)
In-Reply-To: <20120314125002.GH15379@thunk.org>

On Wed, Mar 14, 2012 at 08:50:02AM -0400, Ted Ts'o wrote:
> On Wed, Mar 14, 2012 at 09:12:02AM +0100, Lukas Czerner wrote:
> > I kind of like the idea about having the separate btree with inode
> > numbers for the directory reading, just because it does not affect
> > allocation policy nor the write performance which is a good thing. Also
> > it has been done before in btrfs and it works very well for them. The
> > only downside (aside from format change) is the extra step when removing
> > the directory entry, but the positives outperform the negatives.
> 
> Well, there will be extra (journaled!) block reads and writes involved
> when adding or removing directory entries.  So the performance on
> workloads that do a large number of directory adds and removed will
> have to be impacted.  By how much is not obvious, but it will be
> something which is measurable.

Most of the difference in btrfs rm times comes from our double indexing.
We store the directory name 3 times, once for each dir index and once
in the inode backref.  But the inode backref doesn't generate seeks the
same way the extra directory index does, so it is probably only 10% of
the cost.

We also delete file data crcs during rm, so if you're benchmarking
things to see how much it will hurt, benchmark btrfs on zero length
files.

In terms of other costs, most users of readdir will eventually wander
off and stat or open the files in the dir.  With your current indexes, the stat
reads the same directory blocks that readdir used.

With the double index, the stat reads the name index instead of the
readdir index.  For the common case of normal directories, today's ext4
will have the directory blocks in cache when stat is called.  Btrfs has
to do extra reads to pull in the name index.

Josef worked around this by prefilling a dentry during readdir with
details on where to find the inode.

Either way, I'd definitely recommend sitting down with the xfs directory
indexes and look at how they manage locality without the double
indexing.  It's faster than btrfs and without the penalty.

It's also worth mentioning that on an SSD, there is no benefit at all to
the extra index.  So when the industry finally floods the markets with
hybrid PCM shingled drives, this might all be moot.

-chris

  parent reply	other threads:[~2012-03-14 19:17 UTC|newest]

Thread overview: 90+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-29 13:52 getdents - ext4 vs btrfs performance Jacek Luczak
2012-02-29 13:55 ` Jacek Luczak
2012-02-29 13:55   ` Jacek Luczak
2012-02-29 14:07   ` Jacek Luczak
2012-02-29 14:07     ` Jacek Luczak
2012-02-29 14:07     ` Jacek Luczak
2012-02-29 14:21     ` Jacek Luczak
2012-02-29 14:21       ` Jacek Luczak
2012-02-29 14:21       ` Jacek Luczak
2012-02-29 14:42     ` Chris Mason
2012-02-29 14:55       ` Jacek Luczak
2012-03-01 13:35         ` Jacek Luczak
2012-03-01 13:50           ` Hillf Danton
2012-03-01 14:03             ` Jacek Luczak
2012-03-01 14:18               ` Chris Mason
2012-03-01 14:43                 ` Jacek Luczak
2012-03-01 14:43                   ` Jacek Luczak
2012-03-01 14:51                   ` Chris Mason
2012-03-01 14:51                     ` Chris Mason
2012-03-01 14:51                     ` Chris Mason
2012-03-01 14:57                     ` Jacek Luczak
2012-03-01 14:57                       ` Jacek Luczak
2012-03-01 14:57                       ` Jacek Luczak
2012-03-01 18:42                   ` Ted Ts'o
2012-03-02  9:51                     ` Jacek Luczak
2012-03-01  4:44 ` Theodore Tso
2012-03-01  4:44   ` Theodore Tso
2012-03-01  4:44   ` Theodore Tso
2012-03-01 14:38   ` Chris Mason
2012-03-01 14:38     ` Chris Mason
2012-03-02 10:05     ` Jacek Luczak
2012-03-02 10:05       ` Jacek Luczak
2012-03-02 10:05       ` Jacek Luczak
2012-03-02 14:00       ` Chris Mason
2012-03-02 14:16         ` Jacek Luczak
2012-03-02 14:16           ` Jacek Luczak
2012-03-02 14:16           ` Jacek Luczak
2012-03-02 14:26           ` Chris Mason
2012-03-02 14:26             ` Chris Mason
2012-03-02 19:32             ` Ted Ts'o
2012-03-02 19:50               ` Chris Mason
2012-03-05 13:10               ` Jan Kara
2012-03-03 22:41             ` Jacek Luczak
2012-03-03 22:41               ` Jacek Luczak
2012-03-04 10:25               ` Jacek Luczak
2012-03-04 10:25                 ` Jacek Luczak
2012-03-05 11:32                 ` Jacek Luczak
2012-03-05 11:32                   ` Jacek Luczak
2012-03-05 11:32                   ` Jacek Luczak
2012-03-06  0:37                   ` Chris Mason
2012-03-06  0:37                     ` Chris Mason
2012-03-08 17:02   ` Phillip Susi
2012-03-09 11:29 ` Lukas Czerner
2012-03-09 14:34   ` Chris Mason
2012-03-10  0:09   ` Andreas Dilger
2012-03-10  4:48     ` Ted Ts'o
2012-03-11 10:30       ` Andreas Dilger
2012-03-11 16:13         ` Ted Ts'o
2012-03-15 10:42           ` Jacek Luczak
2012-03-15 10:42             ` Jacek Luczak
2012-03-15 10:42             ` Jacek Luczak
2012-03-18 20:56             ` Ted Ts'o
2012-03-13 19:05       ` Phillip Susi
2012-03-13 19:53         ` Ted Ts'o
2012-03-13 20:22           ` Phillip Susi
2012-03-13 21:33             ` Ted Ts'o
2012-03-14  2:48               ` Yongqiang Yang
2012-03-14  2:51                 ` Ted Ts'o
2012-03-14 14:17                   ` Zach Brown
2012-03-14 16:48                     ` Ted Ts'o
2012-03-14 17:37                       ` Zach Brown
2012-03-14  8:12               ` Lukas Czerner
2012-03-14  9:29                 ` Yongqiang Yang
2012-03-14  9:29                   ` Yongqiang Yang
2012-03-14  9:29                   ` Yongqiang Yang
2012-03-14  9:38                   ` Lukas Czerner
2012-03-14 12:50                 ` Ted Ts'o
2012-03-14 14:34                   ` Lukas Czerner
2012-03-14 17:02                     ` Ted Ts'o
2012-03-14 19:17                   ` Chris Mason [this message]
2012-03-14 14:28               ` Phillip Susi
2012-03-14 16:54                 ` Ted Ts'o
2012-03-10  3:52 ` Ted Ts'o
2012-03-15  7:59   ` Jacek Luczak
2012-03-15  7:59     ` Jacek Luczak
2012-03-15  7:59     ` Jacek Luczak
  -- strict thread matches above, loose matches on Subject: below --
2012-02-29 13:31 Jacek Luczak
2012-02-29 13:51 ` Chris Mason
2012-02-29 14:00   ` Lukas Czerner
2012-02-29 14:05   ` Chris Mason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120314191739.GA19217@shiny \
    --to=chris.mason@oracle.com \
    --cc=adilger@whamcloud.com \
    --cc=difrost.kernel@gmail.com \
    --cc=lczerner@redhat.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=phillsusi@gmail.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.