All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Phillips <daniel@phunq.net>
To: David Lang <david@lang.hm>
Cc: Pavel Machek <pavel@ucw.cz>, Howard Chu <hyc@symas.com>,
	Mike Galbraith <umgwanakikbuti@gmail.com>,
	Dave Chinner <david@fromorbit.com>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	tux3@tux3.org, "Theodore Ts'o" <tytso@mit.edu>,
	OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Subject: Re: xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?)
Date: Mon, 11 May 2015 21:36:02 -0700	[thread overview]
Message-ID: <55518332.10009@phunq.net> (raw)
In-Reply-To: <alpine.DEB.2.02.1505111707310.6441@nftneq.ynat.uz>

Hi David,

On 05/11/2015 05:12 PM, David Lang wrote:
> On Mon, 11 May 2015, Daniel Phillips wrote:
> 
>> On 05/11/2015 03:12 PM, Pavel Machek wrote:
>>>>> It is a fact of life that when you change one aspect of an intimately interconnected system,
>>>>> something else will change as well. You have naive/nonexistent free space management now; when you
>>>>> design something workable there it is going to impact everything else you've already done. It's an
>>>>> easy bet that the impact will be negative, the only question is to what degree.
>>>>
>>>> You might lose that bet. For example, suppose we do strictly linear allocation
>>>> each delta, and just leave nice big gaps between the deltas for future
>>>> expansion. Clearly, we run at similar or identical speed to the current naive
>>>> strategy until we must start filling in the gaps, and at that point our layout
>>>> is not any worse than XFS, which started bad and stayed that way.
>>>
>>> Umm, are you sure. If "some areas of disk are faster than others" is
>>> still true on todays harddrives, the gaps will decrease the
>>> performance (as you'll "use up" the fast areas more quickly).
>>
>> That's why I hedged my claim with "similar or identical". The
>> difference in media speed seems to be a relatively small effect
>> compared to extra seeks. It seems that XFS puts big spaces between
>> new directories, and suffers a lot of extra seeks because of it.
>> I propose to batch new directories together initially, then change
>> the allocation goal to a new, relatively empty area if a big batch
>> of files lands on a directory in a crowded region. The "big" gaps
>> would be on the order of delta size, so not really very big.
> 
> This is an interesting idea, but what happens if the files don't arrive as a big batch, but rather
> trickle in over time (think a logserver that if putting files into a bunch of directories at a
> fairly modest rate per directory)

If files are trickling in then we can afford to spend a lot more time
finding nice places to tuck them in. Log server files are an especially
irksome problem for a redirect-on-write filesystem because the final
block tends to be rewritten many times and we must move it to a new
location each time, so every extent ends up as one block. Oh well. If
we just make sure to have some free space at the end of the file that
only that file can use (until everywhere else is full) then the long
term result will be slightly ravelled blocks that nonetheless tend to
be on the same track or flash block as their logically contiguous
neighbours. There will be just zero or one empty data blocks mixed
into the file tail as we commit the tail block over and over with the
same allocation goal. Sometimes there will be a block or two of
metadata as well, which will eventually bake themselves into the
middle of contiguous data and stop moving around.

Putting this together, we have:

  * At delta flush, break out all the log type files
  * Dedicate some block groups to append type files
  * Leave lots of space between files in those block groups
  * Peek at the last block of the file to set the allocation goal

Something like that. What we don't want is to throw those files into
the middle of a lot of rewrite-all files, messing up both kinds of file.
We don't care much about keeping these files near the parent directory
because one big seek per log file in a grep is acceptable, we just need
to avoid thousands of big seeks within the file, and not dribble single
blocks all over the disk.

It would also be nice to merge together extents somehow as the final
block is rewritten. One idea is to retain the final block dirty until
the next delta, and write it again into a contiguous position, so the
final block is always flushed twice. We already have the opportunistic
merge logic, but the redirty behavior and making sure it only happens
to log files would be a bit fiddly.

We will also play the incremental defragmentation card at some point,
but first we should try hard to control fragmentation in the first
place. Tux3 is well suited to online defragmentation because the delta
commit model makes it easy to move things around efficiently and safely,
but it does generate extra IO, so as a basic mechanism it is not ideal.
When we get to piling on features, that will be high on the list,
because it is relatively easy, and having that fallback gives a certain
sense of security.

> And when you then decide that you have to move the directory/file info, doesn't that create a
> potentially large amount of unexpected IO that could end up interfering with what the user is trying
> to do?

Right, we don't like that and don't plan to rely on it. What we hope
for is behavior that, when you slowly stir the pot, tends to improve the
layout just as often as it degrades it. It may indeed become harder to
find ideal places to put things as time goes by, but we also gain more
information to base decisions on.

Regards,

Daniel

  reply	other threads:[~2015-05-12  4:35 UTC|newest]

Thread overview: 211+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-28 23:13 Tux3 Report: How fast can we fsync? Daniel Phillips
2015-04-28 23:13 ` Daniel Phillips
2015-04-29  2:21 ` Mike Galbraith
2015-04-29  6:01   ` Daniel Phillips
2015-04-29  6:01     ` Daniel Phillips
2015-04-29  6:20     ` Richard Weinberger
2015-04-29  6:56       ` Daniel Phillips
2015-04-29  6:56         ` Daniel Phillips
2015-04-29  6:33     ` Mike Galbraith
2015-04-29  7:23       ` Daniel Phillips
2015-04-29  7:23         ` Daniel Phillips
2015-04-29 16:42         ` Mike Galbraith
2015-04-29 19:05           ` xfs: does mkfs.xfs require fancy switches to get decent performance? (was Tux3 Report: How fast can we fsync?) Mike Galbraith
2015-04-29 19:20             ` Austin S Hemmelgarn
2015-04-29 21:12             ` Daniel Phillips
2015-04-30  4:40               ` Mike Galbraith
2015-04-30  0:20             ` Dave Chinner
2015-04-30  3:35               ` Mike Galbraith
2015-04-30  9:00               ` Martin Steigerwald
2015-04-30  9:00                 ` Martin Steigerwald
2015-04-30 14:57                 ` Theodore Ts'o
2015-04-30 15:59                   ` Daniel Phillips
2015-04-30 17:59                   ` Martin Steigerwald
2015-04-30 11:14               ` Daniel Phillips
2015-04-30 12:07                 ` Mike Galbraith
2015-04-30 12:58                   ` Daniel Phillips
2015-04-30 12:58                     ` Daniel Phillips
2015-04-30 13:48                     ` Mike Galbraith
2015-04-30 14:07                       ` Daniel Phillips
2015-04-30 14:28                         ` Howard Chu
2015-04-30 14:28                           ` Howard Chu
2015-04-30 15:14                           ` Daniel Phillips
2015-04-30 16:00                             ` Howard Chu
2015-04-30 18:22                             ` Christian Stroetmann
2015-05-11 22:12                             ` Pavel Machek
2015-05-11 23:17                               ` Theodore Ts'o
2015-05-12  2:34                                 ` Daniel Phillips
2015-05-12  5:38                                   ` Dave Chinner
2015-05-12  6:18                                     ` Daniel Phillips
2015-05-12  6:18                                       ` Daniel Phillips
2015-05-12 18:39                                       ` David Lang
2015-05-12 20:54                                         ` Daniel Phillips
2015-05-12 21:30                                           ` David Lang
2015-05-12 22:27                                             ` Daniel Phillips
2015-05-12 22:35                                               ` David Lang
2015-05-12 23:55                                                 ` Theodore Ts'o
2015-05-13  1:26                                                 ` Daniel Phillips
2015-05-13 19:09                                                   ` Martin Steigerwald
2015-05-13 19:37                                                     ` Daniel Phillips
2015-05-13 20:02                                                       ` Jeremy Allison
2015-05-13 20:02                                                         ` Jeremy Allison
2015-05-13 20:24                                                         ` Daniel Phillips
2015-05-13 20:25                                                       ` Martin Steigerwald
2015-05-13 20:38                                                         ` Daniel Phillips
2015-05-13 21:10                                                           ` Martin Steigerwald
2015-05-13  0:31                                             ` Daniel Phillips
2015-05-12 21:30                                           ` Christian Stroetmann
2015-05-13  7:20                                           ` Pavel Machek
2015-05-13 13:47                                             ` Elifarley Callado Coelho Cruz
2015-05-12  9:03                                   ` Pavel Machek
2015-05-12  9:03                                     ` Pavel Machek
2015-05-12 11:22                                     ` Daniel Phillips
2015-05-12 13:26                                       ` Howard Chu
2015-05-11 23:53                               ` Daniel Phillips
2015-05-11 23:53                                 ` Daniel Phillips
2015-05-12  0:12                                 ` David Lang
2015-05-12  4:36                                   ` Daniel Phillips [this message]
2015-05-12 17:30                                     ` Christian Stroetmann
2015-05-13  7:25                                 ` Pavel Machek
2015-05-13 11:31                                   ` Daniel Phillips
2015-05-13 12:41                                     ` Daniel Phillips
2015-05-13 13:08                                     ` Mike Galbraith
2015-05-13 13:15                                       ` Daniel Phillips
2015-04-30 14:33                         ` Mike Galbraith
2015-04-30 15:24                           ` Daniel Phillips
2015-04-30 15:24                             ` Daniel Phillips
2015-04-29 20:40           ` Tux3 Report: How fast can we fsync? Daniel Phillips
2015-04-29 20:40             ` Daniel Phillips
2015-04-29 22:06             ` OGAWA Hirofumi
2015-04-29 22:06               ` OGAWA Hirofumi
2015-04-30  3:57               ` Mike Galbraith
2015-04-30  3:50             ` Mike Galbraith
2015-04-30 10:59               ` Daniel Phillips
2015-04-30  1:46 ` Dave Chinner
2015-04-30 10:28   ` Daniel Phillips
2015-04-30 10:28     ` Daniel Phillips
     [not found]     ` <55420EAC.5040900@suse.com>
2015-04-30 11:36       ` Daniel Phillips
2015-04-30 13:19         ` Filipe David Manana
2015-04-30 13:25           ` Daniel Phillips
2015-05-01 15:38     ` Dave Chinner
2015-05-01 23:20       ` Daniel Phillips
2015-05-02  1:07         ` David Lang
2015-05-02 10:26           ` Daniel Phillips
2015-05-02 16:00             ` Christian Stroetmann
2015-05-02 16:30               ` Richard Weinberger
2015-05-02 17:00                 ` Christian Stroetmann
2015-05-12 17:41 ` Daniel Phillips
2015-05-12 17:46 ` Tux3 Report: How fast can we fail? Daniel Phillips
2015-05-13 22:07   ` Daniel Phillips
2015-05-26 10:03   ` Pavel Machek
2015-05-26 10:03     ` Pavel Machek
2015-05-27  6:41     ` Mosis Tembo
2015-05-27 18:28       ` Daniel Phillips
2015-05-27 18:28         ` Daniel Phillips
2015-05-27 21:39         ` Pavel Machek
2015-05-27 22:46           ` Daniel Phillips
2015-05-28 12:55             ` Austin S Hemmelgarn
2015-05-27  7:37     ` Mosis Tembo
2015-05-27 14:04       ` Austin S Hemmelgarn
2015-05-27 15:21         ` Mosis Tembo
2015-05-27 15:37           ` Austin S Hemmelgarn
2015-05-14  7:37 ` [WIP] tux3: Optimized fsync Daniel Phillips
2015-05-14  8:26 ` [FYI] tux3: Core changes Daniel Phillips
2015-05-14 12:59   ` Rik van Riel
2015-05-15  0:06     ` Daniel Phillips
2015-05-15  0:06       ` Daniel Phillips
2015-05-15  3:06       ` Rik van Riel
2015-05-15  8:09         ` Mel Gorman
2015-05-15  9:54           ` Daniel Phillips
2015-05-15  9:54             ` Daniel Phillips
2015-05-15 11:00             ` Mel Gorman
2015-05-16 22:38               ` David Lang
2015-05-18 12:57                 ` Mel Gorman
2015-05-18 12:57                   ` Mel Gorman
2015-05-15  9:38         ` Daniel Phillips
2015-05-15  9:38           ` Daniel Phillips
2015-05-27  7:41           ` Pavel Machek
2015-05-27 18:09             ` Daniel Phillips
2015-05-27 18:09               ` Daniel Phillips
2015-05-27 21:37               ` Pavel Machek
2015-05-27 22:33                 ` Daniel Phillips
2015-05-15  8:05       ` Mel Gorman
2015-05-17 13:26     ` Boaz Harrosh
2015-05-18  2:20       ` Rik van Riel
2015-05-18  7:58         ` Boaz Harrosh
2015-05-19  4:46         ` Daniel Phillips
2015-05-21 19:43     ` [WIP][PATCH] tux3: preliminatry nospace handling Daniel Phillips
2015-05-19 14:00   ` [FYI] tux3: Core changes Jan Kara
2015-05-19 19:18     ` Daniel Phillips
2015-05-19 20:33       ` David Lang
2015-05-19 20:33         ` David Lang
2015-05-20 14:44         ` Jan Kara
2015-05-20 16:22           ` Daniel Phillips
2015-05-20 18:01             ` David Lang
2015-05-20 18:01               ` David Lang
2015-05-20 19:53             ` Rik van Riel
2015-05-20 19:53               ` Rik van Riel
2015-05-20 22:51               ` Daniel Phillips
2015-05-20 22:51                 ` Daniel Phillips
2015-05-21  3:24                 ` Daniel Phillips
2015-05-21  3:51                   ` David Lang
2015-05-21 19:53                     ` Daniel Phillips
2015-05-21 19:53                       ` Daniel Phillips
2015-05-26  4:25                       ` Rik van Riel
2015-05-26  4:25                         ` Rik van Riel
2015-05-26  4:30                         ` Daniel Phillips
2015-05-26  4:30                           ` Daniel Phillips
2015-05-26  6:04                           ` David Lang
2015-05-26  6:04                             ` David Lang
2015-05-26  6:11                             ` Daniel Phillips
2015-05-26  6:13                               ` David Lang
2015-05-26  6:13                                 ` David Lang
2015-05-26  8:09                                 ` Daniel Phillips
2015-05-26  8:09                                   ` Daniel Phillips
2015-05-26 10:13                                   ` Pavel Machek
2015-05-26 10:13                                     ` Pavel Machek
2015-05-26  7:09                               ` Jan Kara
2015-05-26  8:08                                 ` Daniel Phillips
2015-05-26  8:08                                   ` Daniel Phillips
2015-05-26  9:00                                   ` Jan Kara
2015-05-26  9:00                                     ` Jan Kara
2015-05-26 20:22                                     ` Daniel Phillips
2015-05-26 21:36                                       ` Rik van Riel
2015-05-26 21:49                                         ` Daniel Phillips
2015-05-26 21:49                                           ` Daniel Phillips
2015-05-27  8:41                                       ` Jan Kara
2015-06-21 15:36                                         ` OGAWA Hirofumi
2015-06-21 15:36                                           ` OGAWA Hirofumi
2015-06-23 16:12                                           ` Jan Kara
2015-07-05 12:54                                             ` OGAWA Hirofumi
2015-07-05 12:54                                               ` OGAWA Hirofumi
2015-07-09 16:05                                               ` Jan Kara
2015-07-09 16:05                                                 ` Jan Kara
2015-07-31  4:44                                                 ` OGAWA Hirofumi
2015-07-31 15:37                                                   ` Raymond Jennings
2015-07-31 17:27                                                     ` Daniel Phillips
2015-07-31 17:27                                                       ` Daniel Phillips
2015-07-31 18:29                                                       ` David Lang
2015-07-31 18:29                                                         ` David Lang
2015-07-31 18:43                                                         ` Daniel Phillips
2015-07-31 18:43                                                           ` Daniel Phillips
2015-07-31 22:12                                                         ` Daniel Phillips
2015-07-31 22:12                                                           ` Daniel Phillips
2015-07-31 22:27                                                           ` David Lang
2015-08-01  0:00                                                             ` Daniel Phillips
2015-08-01  0:00                                                               ` Daniel Phillips
2015-08-01  0:16                                                               ` Daniel Phillips
2015-08-01  0:16                                                                 ` Daniel Phillips
2015-08-03 13:07                                                                 ` Jan Kara
2015-08-01 10:55                                                             ` Elifarley Callado Coelho Cruz
2015-08-18 16:39                                                       ` Rik van Riel
2015-08-03 13:42                                                   ` Jan Kara
2015-08-03 13:42                                                     ` Jan Kara
2015-08-09 13:42                                                     ` OGAWA Hirofumi
2015-08-10 12:45                                                       ` Jan Kara
2015-08-10 12:45                                                         ` Jan Kara
2015-08-16 19:42                                                         ` OGAWA Hirofumi
2015-05-26 10:22                                   ` Sergey Senozhatsky
2015-05-26 12:33                                     ` Jan Kara
2015-05-26 12:33                                       ` Jan Kara
2015-05-26 19:18                                     ` Daniel Phillips

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55518332.10009@phunq.net \
    --to=daniel@phunq.net \
    --cc=david@fromorbit.com \
    --cc=david@lang.hm \
    --cc=hirofumi@mail.parknet.co.jp \
    --cc=hyc@symas.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pavel@ucw.cz \
    --cc=tux3@tux3.org \
    --cc=tytso@mit.edu \
    --cc=umgwanakikbuti@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.