All of lore.kernel.org
 help / color / mirror / Atom feed
* cleaner optimization and online defragmentation: status update
@ 2013-06-18 18:30 Andreas Rohner
       [not found] ` <51C0A731.9060509-oe7qfRrRQffzPE21tAIdciO7C/xPubJB@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Andreas Rohner @ 2013-06-18 18:30 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi,

I have written a simple defragmentation tool and some extensions to the
cleaner. The implementations are not tested enough yet, but I thought I
could use some feedback if I am headed into the right direction. I am
very happy about any suggestions for improvement, because I am quite new
to kernel development.

* Cleaner:

Links to Commits: [1] [2]

I have implemented two new policies for the cleaner. Namely "Greedy",
which selects the segments with the most free blocks, and
"Cost/Benefit", which is inspired by this paper [3]. f2fs uses
apparently the same algorithms.

Unfortunately both of them require, that the file system keeps track of
the free blocks per segment. This is not trivial, mainly because of
snapshots. I chose to simply ignore snapshots altogether. That way the
tracking is simple. The problem is of course, that the
selection policy goes wrong sometimes, because the actual number of free
blocks is less than reported. But it should still perform better, than
the "Timestamp" policy.

If a block is part of a snapshot, gets deleted, is cleaned and then the
snapshot is deleted, this block is actually free, but is not counted as
such.

Steps:
1. Block A is part of a snapshot
2. A gets deleted
3. A is cleaned (it is considered live because it is in the snapshot)
4. the snapshot is deleted
5. A is counted in su_nblocks, but it will never be decremented

If there is a whole segment full of these uncounted blocks they
will never be cleaned. To prevent this kind of starvation I just reset
all the counters to zero after a snapshot gets deleted. This makes the
deletion of snapshots quite expensive and temporarily degrades the
performance of "Cost/Benefit" policy to "Timestamp". This solution is
quite ugly and I would prefer something better, but I found no
other way to prevent this problem.

There is one additional potential problem though. To track the used
blocks per segment I used the su_nblocks attribute of struct
nilfs_segment_usage. This value is currently never used, except once in
the cleaner. But since the cleaner only cleans segments that are not
active or dirty, I assume that it always will be a full segment with
nilfs_get_blocks_per_segment(nilfs) blocks. I haven't tested that enough
yet, but it seems to work. Alternatively I could just add another
attribute su_live_nblocks to struct nilfs_segment_usage.

* Defragmentation:

Links to Commits: [4] [5]

It's just a simple proof-of-concept tool called nilfs-defrag [filename].
It takes the file to defragment as an argument.

It tries to find the mount point and get a pointer to struct nilfs, to
find out the block size and the number of segments per block. It uses
the FIEMAP ioctl to get the extent information, and if the number of
extents per segment exceeds a certain value, it tries to defragment
those extents.

I added a simple new NILFS_IOCTL_MARK_EXTENT_DIRTY, which just reads in
the corresponding blocks and marks them dirty. The dirty blocks are
automatically written out to a new segment and will be hopefully less
fragmented. It seems to work quite nicely, but it needs more testing. I
am not quite sure if I got the locking right in the kernel code.

Sorry for the long post

Best Regards,
Andreas Rohner

[1]
https://github.com/zeitgeist87/linux/commit/bc763ac47c04893d3fece4f2db59f46187415cc4
[2]
https://github.com/zeitgeist87/nilfs-utils/commit/ec8281964b3b57b1b79452d9cb03887e04a089b3
[3] http://dl.acm.org/citation.cfm?id=121137
[4]
https://github.com/zeitgeist87/linux/commit/9ce900df854b1cbc968d35fd7ed892d9bf3b52d8
[5]
https://github.com/zeitgeist87/nilfs-utils/commit/d32c43e26ad5059b79c0ecc3ff167a78b0f6c814
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: cleaner optimization and online defragmentation: status update
       [not found] ` <51C0A731.9060509-oe7qfRrRQffzPE21tAIdciO7C/xPubJB@public.gmane.org>
@ 2013-06-19  7:17   ` Vyacheslav Dubeyko
  2013-06-19  9:19     ` Andreas Rohner
  0 siblings, 1 reply; 7+ messages in thread
From: Vyacheslav Dubeyko @ 2013-06-19  7:17 UTC (permalink / raw)
  To: Andreas Rohner; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Andreas,

On Tue, 2013-06-18 at 20:30 +0200, Andreas Rohner wrote:
> Hi,
> 
> I have written a simple defragmentation tool and some extensions to the
> cleaner. 

Thank you for your efforts. But I feel necessity to have answer on some
questions before diving in your code. I need to have common
understanding of your approaches, firstly.

> The implementations are not tested enough yet, but I thought I
> could use some feedback if I am headed into the right direction.

What tools do you plan to use for testing?

As I know, it is used frequently xfstests, fsstress for testing.
Benchmarking tools are very useful also (for example, iozone). But it
needs to use a special aging tool for the case of GC testing, I think.

>  I am
> very happy about any suggestions for improvement, because I am quite new
> to kernel development.

Please, use scripts/checkpatch.pl script for checking your code (as
kernel-space as user-space). As I can see, you break some code style
rules in your code.

> 
> * Cleaner:
> 
> Links to Commits: [1] [2]
> 
> I have implemented two new policies for the cleaner. Namely "Greedy",
> which selects the segments with the most free blocks, and
> "Cost/Benefit", which is inspired by this paper [3]. f2fs uses
> apparently the same algorithms.
> 

If you suggest implementation of new GC policies then we need to have
evidence of efficiency. Do you have any benchmarking results? How can
you prove efficiency of your approach?

As I understand, F2FS has slightly different allocation policy. Why do
you think that "Greedy" and "Cost/Benefit" policies are useful for the
case of NILFS2?

I am slightly confused trying to understand essence of "Greedy" and
"Cost/Benefit" policies. Could you briefly describe how it works?
Moreover, you mentioned that "Greedy" policy selects segments with the
most free blocks. As I know, F2FS has concept of invalid blocks. So,
usually, it makes sense to clean firstly segments that have the most
number of invalid blocks. But NILFS2 hasn't concept of invalid blocks.
So, what do you mean when you are talking about free blocks in "Greedy"
policy?

By the way, I think that GC doesn't be a "greedy". :-) What do you
think?

> Unfortunately both of them require, that the file system keeps track of
> the free blocks per segment. This is not trivial, mainly because of
> snapshots. I chose to simply ignore snapshots altogether. That way the
> tracking is simple. The problem is of course, that the
> selection policy goes wrong sometimes, because the actual number of free
> blocks is less than reported. But it should still perform better, than
> the "Timestamp" policy.
> 
> If a block is part of a snapshot, gets deleted, is cleaned and then the
> snapshot is deleted, this block is actually free, but is not counted as
> such.
> 
> Steps:
> 1. Block A is part of a snapshot
> 2. A gets deleted
> 3. A is cleaned (it is considered live because it is in the snapshot)
> 4. the snapshot is deleted
> 5. A is counted in su_nblocks, but it will never be decremented
> 
> If there is a whole segment full of these uncounted blocks they
> will never be cleaned. To prevent this kind of starvation I just reset
> all the counters to zero after a snapshot gets deleted. This makes the
> deletion of snapshots quite expensive and temporarily degrades the
> performance of "Cost/Benefit" policy to "Timestamp". This solution is
> quite ugly and I would prefer something better, but I found no
> other way to prevent this problem.
> 
> There is one additional potential problem though. To track the used
> blocks per segment I used the su_nblocks attribute of struct
> nilfs_segment_usage. This value is currently never used, except once in
> the cleaner. But since the cleaner only cleans segments that are not
> active or dirty, I assume that it always will be a full segment with
> nilfs_get_blocks_per_segment(nilfs) blocks. I haven't tested that enough
> yet, but it seems to work. Alternatively I could just add another
> attribute su_live_nblocks to struct nilfs_segment_usage.
> 
> * Defragmentation:
> 
> Links to Commits: [4] [5]
> 
> It's just a simple proof-of-concept tool called nilfs-defrag [filename].
> It takes the file to defragment as an argument.
> 

Does NILFS2 really needs in defragmenting tool? Could you describe
reasons of such necessity and concrete use-cases? And could you provide
benchmarking results that to prove efficiency this approach and to show
enhancement of NILFS2 performance or efficiency?

Thanks,
Vyacheslav Dubeyko.

> It tries to find the mount point and get a pointer to struct nilfs, to
> find out the block size and the number of segments per block. It uses
> the FIEMAP ioctl to get the extent information, and if the number of
> extents per segment exceeds a certain value, it tries to defragment
> those extents.
> 
> I added a simple new NILFS_IOCTL_MARK_EXTENT_DIRTY, which just reads in
> the corresponding blocks and marks them dirty. The dirty blocks are
> automatically written out to a new segment and will be hopefully less
> fragmented. It seems to work quite nicely, but it needs more testing. I
> am not quite sure if I got the locking right in the kernel code.
> 
> Sorry for the long post
> 
> Best Regards,
> Andreas Rohner
> 
> [1]
> https://github.com/zeitgeist87/linux/commit/bc763ac47c04893d3fece4f2db59f46187415cc4
> [2]
> https://github.com/zeitgeist87/nilfs-utils/commit/ec8281964b3b57b1b79452d9cb03887e04a089b3
> [3] http://dl.acm.org/citation.cfm?id=121137
> [4]
> https://github.com/zeitgeist87/linux/commit/9ce900df854b1cbc968d35fd7ed892d9bf3b52d8
> [5]
> https://github.com/zeitgeist87/nilfs-utils/commit/d32c43e26ad5059b79c0ecc3ff167a78b0f6c814
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: cleaner optimization and online defragmentation: status update
  2013-06-19  7:17   ` Vyacheslav Dubeyko
@ 2013-06-19  9:19     ` Andreas Rohner
       [not found]       ` <51C177A4.3030204-oe7qfRrRQffzPE21tAIdciO7C/xPubJB@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Andreas Rohner @ 2013-06-19  9:19 UTC (permalink / raw)
  To: slava-yeENwD64cLxBDgjK7y7TUQ; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Vyacheslav,

On 2013-06-19 09:17, Vyacheslav Dubeyko wrote:
> What tools do you plan to use for testing?
> 
> As I know, it is used frequently xfstests, fsstress for testing.
> Benchmarking tools are very useful also (for example, iozone). But it
> needs to use a special aging tool for the case of GC testing, I think.

I have written my own little aging tool, that uses nfs traces and
replays them. I also plan on doing performance benchmarks, because there
is some performance degradation to be expected because of the block
counting. But I am quite busy for the next week or so and won't be able
to start benchmarking until then.

> Please, use scripts/checkpatch.pl script for checking your code (as
> kernel-space as user-space). As I can see, you break some code style
> rules in your code.

Thanks for the tip. I guess I should have read the "The newbie's guide
to hacking the Linux kernel" first. Sorry for that. I didn't intend to
submit a patch right now.


> If you suggest implementation of new GC policies then we need to have
> evidence of efficiency. Do you have any benchmarking results? How can
> you prove efficiency of your approach?

First of all the current timestamp approach is clearly inefficient,
because it does not take the number of live blocks in the segment into
consideration. Then there is the paper I cited, that suggests the
Cost/Benefit algorithm. I don't have any results as of yet, but I plan
on doing thorough benchmarks soon.

> As I understand, F2FS has slightly different allocation policy. Why do
> you think that "Greedy" and "Cost/Benefit" policies are useful for the
> case of NILFS2?

I didn't get the idea from F2FS, but from the cited paper. I just noted,
that F2FS as far as I can tell also uses these algorithms. They apply to
NILFS2, because it is a log-structured file system with segments and a
cleaner.

> I am slightly confused trying to understand essence of "Greedy" and
> "Cost/Benefit" policies. Could you briefly describe how it works?
> Moreover, you mentioned that "Greedy" policy selects segments with the
> most free blocks. As I know, F2FS has concept of invalid blocks. So,
> usually, it makes sense to clean firstly segments that have the most
> number of invalid blocks. But NILFS2 hasn't concept of invalid blocks.
> So, what do you mean when you are talking about free blocks in "Greedy"
> policy?

When the cleaner cleans a segment, it has to decide on which blocks to
keep and which blocks to discard. The more blocks the cleaner can
discard the more efficient it is. To do this the cleaner has to know in
advance how many live blocks there are in a segment. In the context of
NILFS2 this means, that blocks that are deleted or overwritten in a file
and not part of some snapshot, can be discarded. Checkpoints are not
relevant in this distinction.

Of course the fs has to keep track of the number of live blocks per
segment. I implemented that and used the su_nblocks attribute of struct
nilfs_segment_usage.

With the tracking in place the cleaner can select the most efficient
segment to clean. "Greedy" just greedily selects the segment with the
most discardable/free/invalid number of blocks, whatever you want to
call them ;).

In the paper "The design and implementation of a log-structured file
system" the authors noted, that the greedy approach is not the most
efficient, because the free space in old segments is more valuable than
the free space in young segments. It is very likely, that the blocks in
young segments are less stable and will die off anyway. So cleaning them
would be a waste of time. They devised the "Cost/Benefit" Algorithm. It
is basically just a simple formula, that takes age and costs of cleaning
into consideration.

> Does NILFS2 really needs in defragmenting tool? Could you describe
> reasons of such necessity and concrete use-cases? And could you provide
> benchmarking results that to prove efficiency this approach and to show
> enhancement of NILFS2 performance or efficiency?

Both of these things are on the TODO-List:
http://www.nilfs.org/en/current_status.html
* Smarter and more efficient Garbage Collector
* Online defrag

NILFS2 fragments very heavily, because everything is copy on write. The
cleaner does some reordering that's true, but it is quite random. An
example would be a database file or more concrete the sqlite db files in
the firefox directory. These files are updated and fsynced quite often.
I imagine, that this could very well reduce the startup time of firefox
quite considerable, but I have no benchmark results as of yet.
The effects of fragmentation are less severe for SSDs, but it still matters.

You can easily test it with the tool filefrag, which shows you the
extents of a file. On my test system I have created a file with 100
extents and defragmented it down to 9. The nilfs-defrag tool also checks
if a file is already defragmented or partially defragmented and only
defragments the fragmented parts of a file. So it is better than a
simple "cp frag_file defrag_file" ;).

br,
Andreas Rohner

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: cleaner optimization and online defragmentation: status update
       [not found]       ` <51C177A4.3030204-oe7qfRrRQffzPE21tAIdciO7C/xPubJB@public.gmane.org>
@ 2013-06-22 15:31         ` Vyacheslav Dubeyko
       [not found]           ` <2F76977A-589B-47EB-8818-382477099600-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Vyacheslav Dubeyko @ 2013-06-22 15:31 UTC (permalink / raw)
  To: Andreas Rohner; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Andreas,

On Jun 19, 2013, at 1:19 PM, Andreas Rohner wrote:

> Hi Vyacheslav,
> 
> On 2013-06-19 09:17, Vyacheslav Dubeyko wrote:
>> What tools do you plan to use for testing?
>> 
>> As I know, it is used frequently xfstests, fsstress for testing.
>> Benchmarking tools are very useful also (for example, iozone). But it
>> needs to use a special aging tool for the case of GC testing, I think.
> 
> I have written my own little aging tool, that uses nfs traces and
> replays them. I also plan on doing performance benchmarks, because there
> is some performance degradation to be expected because of the block
> counting. But I am quite busy for the next week or so and won't be able
> to start benchmarking until then.
> 

What benchmarking tool do you plan to use? I think that it needs to use
well-known and widely used tool. Otherwise, it is impossible to check
your results independently and to trust of your results.

[skip]
> 
>> If you suggest implementation of new GC policies then we need to have
>> evidence of efficiency. Do you have any benchmarking results? How can
>> you prove efficiency of your approach?
> 
> First of all the current timestamp approach is clearly inefficient,
> because it does not take the number of live blocks in the segment into
> consideration. Then there is the paper I cited, that suggests the
> Cost/Benefit algorithm. I don't have any results as of yet, but I plan
> on doing thorough benchmarks soon.
> 
>> As I understand, F2FS has slightly different allocation policy. Why do
>> you think that "Greedy" and "Cost/Benefit" policies are useful for the
>> case of NILFS2?
> 
> I didn't get the idea from F2FS, but from the cited paper. I just noted,
> that F2FS as far as I can tell also uses these algorithms. They apply to
> NILFS2, because it is a log-structured file system with segments and a
> cleaner.
> 
>> I am slightly confused trying to understand essence of "Greedy" and
>> "Cost/Benefit" policies. Could you briefly describe how it works?
>> Moreover, you mentioned that "Greedy" policy selects segments with the
>> most free blocks. As I know, F2FS has concept of invalid blocks. So,
>> usually, it makes sense to clean firstly segments that have the most
>> number of invalid blocks. But NILFS2 hasn't concept of invalid blocks.
>> So, what do you mean when you are talking about free blocks in "Greedy"
>> policy?
> 
> When the cleaner cleans a segment, it has to decide on which blocks to
> keep and which blocks to discard. The more blocks the cleaner can
> discard the more efficient it is. To do this the cleaner has to know in
> advance how many live blocks there are in a segment. In the context of
> NILFS2 this means, that blocks that are deleted or overwritten in a file
> and not part of some snapshot, can be discarded. Checkpoints are not
> relevant in this distinction.
> 
> Of course the fs has to keep track of the number of live blocks per
> segment. I implemented that and used the su_nblocks attribute of struct
> nilfs_segment_usage.
> 
> With the tracking in place the cleaner can select the most efficient
> segment to clean. "Greedy" just greedily selects the segment with the
> most discardable/free/invalid number of blocks, whatever you want to
> call them ;).
> 
> In the paper "The design and implementation of a log-structured file
> system" the authors noted, that the greedy approach is not the most
> efficient, because the free space in old segments is more valuable than
> the free space in young segments. It is very likely, that the blocks in
> young segments are less stable and will die off anyway. So cleaning them
> would be a waste of time. They devised the "Cost/Benefit" Algorithm. It
> is basically just a simple formula, that takes age and costs of cleaning
> into consideration.
> 

Yes, I think that "Greedy" and "Cost/Benefit" policies can be used as GC
policies for the case of NILFS2. But, currently, you don't suggest proper
basis for such policies realization. As I understand, "Greedy" policy should
select such segments that contain as lesser valid blocks as possible. Valid
blocks means blocks that will be moved by GC. The "Cost/Benefit" policy
should select segment which has required (calculated by formula range)
correlation between valid and invalid blocks in segment.

You are using the su_nblocks field. As I understand, the su_nblocks keeps
number of blocks in all partial segments that are located in full segment.
So, from my point of view, this value hasn't relation with valid/invalid blocks
correlation. It means that from the su_nblocks point of view segments are
practically identical. Because, usually, allocation policy tries to fill segment
by partial segments till a segment capacity. So, if to compare your implementation
and "timestamp" GC policy then "timestamp" policy is better.

Moreover, current realization of segment allocation algorithm is efficient for the
case of GC "timestamp" policy. But the efficiency of this allocation algorithm degrades
significantly for the case of proper realization of "Greedy" and "Cost/Benefit" GC policies.

>> Does NILFS2 really needs in defragmenting tool? Could you describe
>> reasons of such necessity and concrete use-cases? And could you provide
>> benchmarking results that to prove efficiency this approach and to show
>> enhancement of NILFS2 performance or efficiency?
> 
> Both of these things are on the TODO-List:
> http://www.nilfs.org/en/current_status.html
> * Smarter and more efficient Garbage Collector
> * Online defrag
> 
> NILFS2 fragments very heavily, because everything is copy on write. The
> cleaner does some reordering that's true, but it is quite random. An
> example would be a database file or more concrete the sqlite db files in
> the firefox directory. These files are updated and fsynced quite often.
> I imagine, that this could very well reduce the startup time of firefox
> quite considerable, but I have no benchmark results as of yet.
> The effects of fragmentation are less severe for SSDs, but it still matters.
> 

First of all, I am thinking now and I will think that defragmentation should be
a part of GC activity. Usually, from my point of view, users choose NILFS2
because of using flash storage (SSD and so on). So, GC is a consequence of
log-structured nature of NILFS2. But it needs to think about flash aging, anyway.
Because activity of GC or other auxiliary subsystems should take into account
NAND flash wear-leveling. If activity of auxiliary subsystems will be significant
then NAND flash will fail soon without any clear reasons from an user's
viewpoint.

> You can easily test it with the tool filefrag, which shows you the
> extents of a file. On my test system I have created a file with 100
> extents and defragmented it down to 9. The nilfs-defrag tool also checks
> if a file is already defragmented or partially defragmented and only
> defragments the fragmented parts of a file. So it is better than a
> simple "cp frag_file defrag_file" ;).
> 

I reckon that "cp" command will be better if copying is necessary operation.

From my point of view, the utility is a bad way. Defragmentation should be a part
of file system activity. Anyway, we have GC that is moving blocks. So, namely
GC activity should be corrected with the purpose of defragmentation during
blocks moving. When you make utility that to copy blocks then you increase
GC activity because of necessity to clean segments are processed by
defragmenting utility. User should have transparent defragmenting feature
of file system without necessity to use any special utility. Moreover, if fragmentation
is a reason of GC activity or nature of internal file system operations then such
activity should be corrected by defragmentation logic.

So, I have such remarks about defragmentation utility:

(1) The defragmentation utility receives as input a path to the file. So, it means that
an user should detect and choose files that are needed in defragmentation. But if I have
many files (100,000 - 1,000,000) then it will be really time-consuming and complex
operation. A better way can be a daemon that scans state of files in background and
to process defragmenting. But we have GC daemon yet. So, from the architectural point
of view, namely GC is the proper place for defragmenting activity.

(2) Some files can be rarely accessed and it doesn't make sense to defragment such files.
But how an user can detect files that really needs in defragmenting by simple and fast
way?

(3) In your approach the defragmentation utility doubles GC activity in clearing of segments.
Factually, the utility marks blocks as dirty and then segctor thread copies these blocks in
new segments. So, as a result, GC should clear these source segments, anyway. But you
can't predict moments of time when GC will work. As a result, nilfs_cleanerd can fragment
another files from source segments during defragmenting of some big file. And, moreover,
nilfs_cleanerd can fragment other parts of big file during defragmentation activity.

(4) Your approach is using as basis peculiarities of segctor's algorithms. I mean that segctor
processes all dirty blocks of one dirty file and only after to begin processing of another one.
But if algorithm of segctor will change then the defragmentation utility will fragment files
instead of defragmenting. And fragmenting of files will take place in the case when several
segctor threads will work simultaneously (currently, we have only one segctor thread but this
situation can change potentially). Currently, segctor and nilfs_cleanerd can work
simultaneously. It is not rarely case when users to force to work nilfs_cleanerd without any
sleeping timeout. So, simultaneous working of segctor and nilfs_cleanerd will end in
fragmentation.

(5) Factually, real defragmentation is done by segctor. It means that end of working of
defragmentation utility doesn't mean the end of defragmentation. Because real point of
start of defragmentation will be sync or umount operation. So, I can request defragmentation
of file with 100 - 500 GB in size and, then, I can try to make umount operation immediately after
defragmentation utility execution ending. How long will I wait the end of such umount? I think that
a user will treat such long umount as file system bug because defragmentation utility ended
execution before start of umount.

With the best regards,
Vyacheslav Dubeyko.

> br,
> Andreas Rohner
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: cleaner optimization and online defragmentation: status update
       [not found]           ` <2F76977A-589B-47EB-8818-382477099600-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
@ 2013-06-22 18:18             ` Andreas Rohner
  2013-06-22 18:37             ` Clemens Eisserer
  1 sibling, 0 replies; 7+ messages in thread
From: Andreas Rohner @ 2013-06-22 18:18 UTC (permalink / raw)
  To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Vyacheslav,

First of all thanks for looking into it.

On 2013-06-22 17:31, Vyacheslav Dubeyko wrote:
> What benchmarking tool do you plan to use? I think that it needs to use
> well-known and widely used tool. Otherwise, it is impossible to check
> your results independently and to trust of your results.

Of course. The ones you suggested are fine.

> Yes, I think that "Greedy" and "Cost/Benefit" policies can be used as GC
> policies for the case of NILFS2. But, currently, you don't suggest proper
> basis for such policies realization. As I understand, "Greedy" policy should
> select such segments that contain as lesser valid blocks as possible. Valid
> blocks means blocks that will be moved by GC. The "Cost/Benefit" policy
> should select segment which has required (calculated by formula range)
> correlation between valid and invalid blocks in segment.
> 
> You are using the su_nblocks field. As I understand, the su_nblocks keeps
> number of blocks in all partial segments that are located in full segment.
> So, from my point of view, this value hasn't relation with valid/invalid blocks
> correlation. It means that from the su_nblocks point of view segments are
> practically identical. Because, usually, allocation policy tries to fill segment
> by partial segments till a segment capacity. So, if to compare your implementation
> and "timestamp" GC policy then "timestamp" policy is better.

I know, I changed that in my kernel patch. su_nblocks now contains the
number of valid blocks. Every time something gets deleted I decrement
su_nblocks. But this could be problematic and I will probably change
that and add a new attribute like su_valid_nblocks or something.


> Moreover, current realization of segment allocation algorithm is efficient for the
> case of GC "timestamp" policy. But the efficiency of this allocation algorithm degrades
> significantly for the case of proper realization of "Greedy" and "Cost/Benefit" GC policies.

I don't think so. The segment allocation algorithm is just a linear
search starting from 0. But sooner or later the segments will wrap
around and the oldest segments are at the end of the list. "timestamp"
cannot help you here.

> First of all, I am thinking now and I will think that defragmentation should be
> a part of GC activity. Usually, from my point of view, users choose NILFS2
> because of using flash storage (SSD and so on). So, GC is a consequence of
> log-structured nature of NILFS2. But it needs to think about flash aging, anyway.
> Because activity of GC or other auxiliary subsystems should take into account
> NAND flash wear-leveling. If activity of auxiliary subsystems will be significant
> then NAND flash will fail soon without any clear reasons from an user's
> viewpoint.
>> You can easily test it with the tool filefrag, which shows you the
>> extents of a file. On my test system I have created a file with 100
>> extents and defragmented it down to 9. The nilfs-defrag tool also checks
>> if a file is already defragmented or partially defragmented and only
>> defragments the fragmented parts of a file. So it is better than a
>> simple "cp frag_file defrag_file" ;).
>>
> 
> I reckon that "cp" command will be better if copying is necessary operation.
> 
>>From my point of view, the utility is a bad way. Defragmentation should be a part
> of file system activity. Anyway, we have GC that is moving blocks. So, namely
> GC activity should be corrected with the purpose of defragmentation during
> blocks moving. When you make utility that to copy blocks then you increase
> GC activity because of necessity to clean segments are processed by
> defragmenting utility. User should have transparent defragmenting feature
> of file system without necessity to use any special utility. Moreover, if fragmentation
> is a reason of GC activity or nature of internal file system operations then such
> activity should be corrected by defragmentation logic.

I agree. I thought about doing more defragmentation in the GC, but it is
expensive to get all the extent information needed. This could result in
extra overhead. Anyway the defrag tool is just a proof of concept
implementation. I could easily implement it in the cleaner.

> 
> So, I have such remarks about defragmentation utility:
> 
> (1) The defragmentation utility receives as input a path to the file. So, it means that
> an user should detect and choose files that are needed in defragmentation. But if I have
> many files (100,000 - 1,000,000) then it will be really time-consuming and complex
> operation. A better way can be a daemon that scans state of files in background and
> to process defragmenting. But we have GC daemon yet. So, from the architectural point
> of view, namely GC is the proper place for defragmenting activity.

Yes it's just a first attempt. Since the tool is very efficient, and
defragments only when it's necessary you could write a shell script:

IFS=$'\n'
for f in $(find /); do
    nilfs-defrag "$f"
done


> (2) Some files can be rarely accessed and it doesn't make sense to defragment such files.
> But how an user can detect files that really needs in defragmenting by simple and fast
> way?

Yes but the user knows best, which files he wants defragmented. How
should the GC know which files need to be defragmented?

> (3) In your approach the defragmentation utility doubles GC activity in clearing of segments.
> Factually, the utility marks blocks as dirty and then segctor thread copies these blocks in
> new segments. So, as a result, GC should clear these source segments, anyway. But you
> can't predict moments of time when GC will work. As a result, nilfs_cleanerd can fragment
> another files from source segments during defragmenting of some big file. And, moreover,
> nilfs_cleanerd can fragment other parts of big file during defragmentation activity.

Maybe, but it is not as bad as you make it sound. Ultimately the segctor
separates GC inodes and normal file operations. It's not like the
fragmentation could get much worse. But it's probably true, that it
could not improve.

I also check the THE_NILFS_GC_RUNNING bit in the kernel code, so that
the defragmentation will fail if the cleaner is running.

> (4) Your approach is using as basis peculiarities of segctor's algorithms. I mean that segctor
> processes all dirty blocks of one dirty file and only after to begin processing of another one.
> But if algorithm of segctor will change then the defragmentation utility will fragment files
> instead of defragmenting. And fragmenting of files will take place in the case when several
> segctor threads will work simultaneously (currently, we have only one segctor thread but this
> situation can change potentially). Currently, segctor and nilfs_cleanerd can work
> simultaneously. It is not rarely case when users to force to work nilfs_cleanerd without any
> sleeping timeout. So, simultaneous working of segctor and nilfs_cleanerd will end in
> fragmentation.

Even if you had multiple segctors, I think it would be quite strange if
they shared dirty files. I think its not a peculiarity, but a reasonable
assumption, that the file system doesn't try extra hard to fragment your
files.

> (5) Factually, real defragmentation is done by segctor. It means that end of working of
> defragmentation utility doesn't mean the end of defragmentation. Because real point of
> start of defragmentation will be sync or umount operation. So, I can request defragmentation
> of file with 100 - 500 GB in size and, then, I can try to make umount operation immediately after
> defragmentation utility execution ending. How long will I wait the end of such umount? I think that
> a user will treat such long umount as file system bug because defragmentation utility ended
> execution before start of umount.

The tool is smart enough not to mark the whole file as dirty, but only
the parts that are fragmented. If you happen to have a machine with 500
GB of RAM that could happen, but as soon as one segment worth of blocks
accumulated the fs can start writing it out and I think it does. Its
practically the same as if you copied a 500 GB file. As soon as the
cache is full it starts writing it to disk. Nothing special about the
defrag tool here.

Thanks, that you took the time to look through my code :)

Best regards,
Andreas Rohner

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: cleaner optimization and online defragmentation: status update
       [not found]           ` <2F76977A-589B-47EB-8818-382477099600-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
  2013-06-22 18:18             ` Andreas Rohner
@ 2013-06-22 18:37             ` Clemens Eisserer
       [not found]               ` <CAFvQSYTZfiNQv4==v+m5W+SMCfrxm6WmV8gwKC_+yaW1MBfdOQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 7+ messages in thread
From: Clemens Eisserer @ 2013-06-22 18:37 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi,

> First of all, I am thinking now and I will think that defragmentation should be
> a part of GC activity. Usually, from my point of view, users choose NILFS2
> because of using flash storage (SSD and so on). So, GC is a consequence of
> log-structured nature of NILFS2. But it needs to think about flash aging, anyway.
> Because activity of GC or other auxiliary subsystems should take into account
> NAND flash wear-leveling. If activity of auxiliary subsystems will be significant
> then NAND flash will fail soon without any clear reasons from an user's
> viewpoint.

I have no idea of the actual implementation or the code, so my comment
just represents the POV of a fs-user.
I've tried Nilfs2 and Btrfs (both cow) on traditional mechanical
harddisks to get cheap, efficient snapshoting.
However, due to cow, files with high random-write activity (like
firefox's internal sqlite database) fragmented so heavily those
filesystems were practiacally unuseable on HDDs.

Btrfs actually offers both, manual defragmentation as well as an
autodefrag mount option - which is useful even for SSDs when the
average continous segment size becomes as low as 4kb.
While autodefrag would be great for nilfs2, a manual tool at least
wouldn't hurt ;)

Regards
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: cleaner optimization and online defragmentation: status update
       [not found]               ` <CAFvQSYTZfiNQv4==v+m5W+SMCfrxm6WmV8gwKC_+yaW1MBfdOQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2013-06-24 13:42                 ` Vyacheslav Dubeyko
  0 siblings, 0 replies; 7+ messages in thread
From: Vyacheslav Dubeyko @ 2013-06-24 13:42 UTC (permalink / raw)
  To: Clemens Eisserer; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Sat, 2013-06-22 at 20:37 +0200, Clemens Eisserer wrote:

[snip]
> 
> Btrfs actually offers both, manual defragmentation as well as an
> autodefrag mount option - which is useful even for SSDs when the
> average continous segment size becomes as low as 4kb.
> While autodefrag would be great for nilfs2, a manual tool at least
> wouldn't hurt ;)
> 

Yes, manual tool can be useful also. But suggested implementation is
improper, from my viewpoint. And, first of all, we need to have internal
file system defragmenting technique. I have such vision. :)

With the best regards,
Vyacheslav Dubeyko.

> Regards
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-06-24 13:42 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-18 18:30 cleaner optimization and online defragmentation: status update Andreas Rohner
     [not found] ` <51C0A731.9060509-oe7qfRrRQffzPE21tAIdciO7C/xPubJB@public.gmane.org>
2013-06-19  7:17   ` Vyacheslav Dubeyko
2013-06-19  9:19     ` Andreas Rohner
     [not found]       ` <51C177A4.3030204-oe7qfRrRQffzPE21tAIdciO7C/xPubJB@public.gmane.org>
2013-06-22 15:31         ` Vyacheslav Dubeyko
     [not found]           ` <2F76977A-589B-47EB-8818-382477099600-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2013-06-22 18:18             ` Andreas Rohner
2013-06-22 18:37             ` Clemens Eisserer
     [not found]               ` <CAFvQSYTZfiNQv4==v+m5W+SMCfrxm6WmV8gwKC_+yaW1MBfdOQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-06-24 13:42                 ` Vyacheslav Dubeyko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.