* cleaner optimization and online defragmentation: status update @ 2013-06-18 18:30 Andreas Rohner [not found] ` <51C0A731.9060509-oe7qfRrRQffzPE21tAIdciO7C/xPubJB@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Andreas Rohner @ 2013-06-18 18:30 UTC (permalink / raw) To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA Hi, I have written a simple defragmentation tool and some extensions to the cleaner. The implementations are not tested enough yet, but I thought I could use some feedback if I am headed into the right direction. I am very happy about any suggestions for improvement, because I am quite new to kernel development. * Cleaner: Links to Commits: [1] [2] I have implemented two new policies for the cleaner. Namely "Greedy", which selects the segments with the most free blocks, and "Cost/Benefit", which is inspired by this paper [3]. f2fs uses apparently the same algorithms. Unfortunately both of them require, that the file system keeps track of the free blocks per segment. This is not trivial, mainly because of snapshots. I chose to simply ignore snapshots altogether. That way the tracking is simple. The problem is of course, that the selection policy goes wrong sometimes, because the actual number of free blocks is less than reported. But it should still perform better, than the "Timestamp" policy. If a block is part of a snapshot, gets deleted, is cleaned and then the snapshot is deleted, this block is actually free, but is not counted as such. Steps: 1. Block A is part of a snapshot 2. A gets deleted 3. A is cleaned (it is considered live because it is in the snapshot) 4. the snapshot is deleted 5. A is counted in su_nblocks, but it will never be decremented If there is a whole segment full of these uncounted blocks they will never be cleaned. To prevent this kind of starvation I just reset all the counters to zero after a snapshot gets deleted. This makes the deletion of snapshots quite expensive and temporarily degrades the performance of "Cost/Benefit" policy to "Timestamp". This solution is quite ugly and I would prefer something better, but I found no other way to prevent this problem. There is one additional potential problem though. To track the used blocks per segment I used the su_nblocks attribute of struct nilfs_segment_usage. This value is currently never used, except once in the cleaner. But since the cleaner only cleans segments that are not active or dirty, I assume that it always will be a full segment with nilfs_get_blocks_per_segment(nilfs) blocks. I haven't tested that enough yet, but it seems to work. Alternatively I could just add another attribute su_live_nblocks to struct nilfs_segment_usage. * Defragmentation: Links to Commits: [4] [5] It's just a simple proof-of-concept tool called nilfs-defrag [filename]. It takes the file to defragment as an argument. It tries to find the mount point and get a pointer to struct nilfs, to find out the block size and the number of segments per block. It uses the FIEMAP ioctl to get the extent information, and if the number of extents per segment exceeds a certain value, it tries to defragment those extents. I added a simple new NILFS_IOCTL_MARK_EXTENT_DIRTY, which just reads in the corresponding blocks and marks them dirty. The dirty blocks are automatically written out to a new segment and will be hopefully less fragmented. It seems to work quite nicely, but it needs more testing. I am not quite sure if I got the locking right in the kernel code. Sorry for the long post Best Regards, Andreas Rohner [1] https://github.com/zeitgeist87/linux/commit/bc763ac47c04893d3fece4f2db59f46187415cc4 [2] https://github.com/zeitgeist87/nilfs-utils/commit/ec8281964b3b57b1b79452d9cb03887e04a089b3 [3] http://dl.acm.org/citation.cfm?id=121137 [4] https://github.com/zeitgeist87/linux/commit/9ce900df854b1cbc968d35fd7ed892d9bf3b52d8 [5] https://github.com/zeitgeist87/nilfs-utils/commit/d32c43e26ad5059b79c0ecc3ff167a78b0f6c814 -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <51C0A731.9060509-oe7qfRrRQffzPE21tAIdciO7C/xPubJB@public.gmane.org>]
* Re: cleaner optimization and online defragmentation: status update [not found] ` <51C0A731.9060509-oe7qfRrRQffzPE21tAIdciO7C/xPubJB@public.gmane.org> @ 2013-06-19 7:17 ` Vyacheslav Dubeyko 2013-06-19 9:19 ` Andreas Rohner 0 siblings, 1 reply; 7+ messages in thread From: Vyacheslav Dubeyko @ 2013-06-19 7:17 UTC (permalink / raw) To: Andreas Rohner; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA Hi Andreas, On Tue, 2013-06-18 at 20:30 +0200, Andreas Rohner wrote: > Hi, > > I have written a simple defragmentation tool and some extensions to the > cleaner. Thank you for your efforts. But I feel necessity to have answer on some questions before diving in your code. I need to have common understanding of your approaches, firstly. > The implementations are not tested enough yet, but I thought I > could use some feedback if I am headed into the right direction. What tools do you plan to use for testing? As I know, it is used frequently xfstests, fsstress for testing. Benchmarking tools are very useful also (for example, iozone). But it needs to use a special aging tool for the case of GC testing, I think. > I am > very happy about any suggestions for improvement, because I am quite new > to kernel development. Please, use scripts/checkpatch.pl script for checking your code (as kernel-space as user-space). As I can see, you break some code style rules in your code. > > * Cleaner: > > Links to Commits: [1] [2] > > I have implemented two new policies for the cleaner. Namely "Greedy", > which selects the segments with the most free blocks, and > "Cost/Benefit", which is inspired by this paper [3]. f2fs uses > apparently the same algorithms. > If you suggest implementation of new GC policies then we need to have evidence of efficiency. Do you have any benchmarking results? How can you prove efficiency of your approach? As I understand, F2FS has slightly different allocation policy. Why do you think that "Greedy" and "Cost/Benefit" policies are useful for the case of NILFS2? I am slightly confused trying to understand essence of "Greedy" and "Cost/Benefit" policies. Could you briefly describe how it works? Moreover, you mentioned that "Greedy" policy selects segments with the most free blocks. As I know, F2FS has concept of invalid blocks. So, usually, it makes sense to clean firstly segments that have the most number of invalid blocks. But NILFS2 hasn't concept of invalid blocks. So, what do you mean when you are talking about free blocks in "Greedy" policy? By the way, I think that GC doesn't be a "greedy". :-) What do you think? > Unfortunately both of them require, that the file system keeps track of > the free blocks per segment. This is not trivial, mainly because of > snapshots. I chose to simply ignore snapshots altogether. That way the > tracking is simple. The problem is of course, that the > selection policy goes wrong sometimes, because the actual number of free > blocks is less than reported. But it should still perform better, than > the "Timestamp" policy. > > If a block is part of a snapshot, gets deleted, is cleaned and then the > snapshot is deleted, this block is actually free, but is not counted as > such. > > Steps: > 1. Block A is part of a snapshot > 2. A gets deleted > 3. A is cleaned (it is considered live because it is in the snapshot) > 4. the snapshot is deleted > 5. A is counted in su_nblocks, but it will never be decremented > > If there is a whole segment full of these uncounted blocks they > will never be cleaned. To prevent this kind of starvation I just reset > all the counters to zero after a snapshot gets deleted. This makes the > deletion of snapshots quite expensive and temporarily degrades the > performance of "Cost/Benefit" policy to "Timestamp". This solution is > quite ugly and I would prefer something better, but I found no > other way to prevent this problem. > > There is one additional potential problem though. To track the used > blocks per segment I used the su_nblocks attribute of struct > nilfs_segment_usage. This value is currently never used, except once in > the cleaner. But since the cleaner only cleans segments that are not > active or dirty, I assume that it always will be a full segment with > nilfs_get_blocks_per_segment(nilfs) blocks. I haven't tested that enough > yet, but it seems to work. Alternatively I could just add another > attribute su_live_nblocks to struct nilfs_segment_usage. > > * Defragmentation: > > Links to Commits: [4] [5] > > It's just a simple proof-of-concept tool called nilfs-defrag [filename]. > It takes the file to defragment as an argument. > Does NILFS2 really needs in defragmenting tool? Could you describe reasons of such necessity and concrete use-cases? And could you provide benchmarking results that to prove efficiency this approach and to show enhancement of NILFS2 performance or efficiency? Thanks, Vyacheslav Dubeyko. > It tries to find the mount point and get a pointer to struct nilfs, to > find out the block size and the number of segments per block. It uses > the FIEMAP ioctl to get the extent information, and if the number of > extents per segment exceeds a certain value, it tries to defragment > those extents. > > I added a simple new NILFS_IOCTL_MARK_EXTENT_DIRTY, which just reads in > the corresponding blocks and marks them dirty. The dirty blocks are > automatically written out to a new segment and will be hopefully less > fragmented. It seems to work quite nicely, but it needs more testing. I > am not quite sure if I got the locking right in the kernel code. > > Sorry for the long post > > Best Regards, > Andreas Rohner > > [1] > https://github.com/zeitgeist87/linux/commit/bc763ac47c04893d3fece4f2db59f46187415cc4 > [2] > https://github.com/zeitgeist87/nilfs-utils/commit/ec8281964b3b57b1b79452d9cb03887e04a089b3 > [3] http://dl.acm.org/citation.cfm?id=121137 > [4] > https://github.com/zeitgeist87/linux/commit/9ce900df854b1cbc968d35fd7ed892d9bf3b52d8 > [5] > https://github.com/zeitgeist87/nilfs-utils/commit/d32c43e26ad5059b79c0ecc3ff167a78b0f6c814 > -- > To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: cleaner optimization and online defragmentation: status update 2013-06-19 7:17 ` Vyacheslav Dubeyko @ 2013-06-19 9:19 ` Andreas Rohner [not found] ` <51C177A4.3030204-oe7qfRrRQffzPE21tAIdciO7C/xPubJB@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Andreas Rohner @ 2013-06-19 9:19 UTC (permalink / raw) To: slava-yeENwD64cLxBDgjK7y7TUQ; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA Hi Vyacheslav, On 2013-06-19 09:17, Vyacheslav Dubeyko wrote: > What tools do you plan to use for testing? > > As I know, it is used frequently xfstests, fsstress for testing. > Benchmarking tools are very useful also (for example, iozone). But it > needs to use a special aging tool for the case of GC testing, I think. I have written my own little aging tool, that uses nfs traces and replays them. I also plan on doing performance benchmarks, because there is some performance degradation to be expected because of the block counting. But I am quite busy for the next week or so and won't be able to start benchmarking until then. > Please, use scripts/checkpatch.pl script for checking your code (as > kernel-space as user-space). As I can see, you break some code style > rules in your code. Thanks for the tip. I guess I should have read the "The newbie's guide to hacking the Linux kernel" first. Sorry for that. I didn't intend to submit a patch right now. > If you suggest implementation of new GC policies then we need to have > evidence of efficiency. Do you have any benchmarking results? How can > you prove efficiency of your approach? First of all the current timestamp approach is clearly inefficient, because it does not take the number of live blocks in the segment into consideration. Then there is the paper I cited, that suggests the Cost/Benefit algorithm. I don't have any results as of yet, but I plan on doing thorough benchmarks soon. > As I understand, F2FS has slightly different allocation policy. Why do > you think that "Greedy" and "Cost/Benefit" policies are useful for the > case of NILFS2? I didn't get the idea from F2FS, but from the cited paper. I just noted, that F2FS as far as I can tell also uses these algorithms. They apply to NILFS2, because it is a log-structured file system with segments and a cleaner. > I am slightly confused trying to understand essence of "Greedy" and > "Cost/Benefit" policies. Could you briefly describe how it works? > Moreover, you mentioned that "Greedy" policy selects segments with the > most free blocks. As I know, F2FS has concept of invalid blocks. So, > usually, it makes sense to clean firstly segments that have the most > number of invalid blocks. But NILFS2 hasn't concept of invalid blocks. > So, what do you mean when you are talking about free blocks in "Greedy" > policy? When the cleaner cleans a segment, it has to decide on which blocks to keep and which blocks to discard. The more blocks the cleaner can discard the more efficient it is. To do this the cleaner has to know in advance how many live blocks there are in a segment. In the context of NILFS2 this means, that blocks that are deleted or overwritten in a file and not part of some snapshot, can be discarded. Checkpoints are not relevant in this distinction. Of course the fs has to keep track of the number of live blocks per segment. I implemented that and used the su_nblocks attribute of struct nilfs_segment_usage. With the tracking in place the cleaner can select the most efficient segment to clean. "Greedy" just greedily selects the segment with the most discardable/free/invalid number of blocks, whatever you want to call them ;). In the paper "The design and implementation of a log-structured file system" the authors noted, that the greedy approach is not the most efficient, because the free space in old segments is more valuable than the free space in young segments. It is very likely, that the blocks in young segments are less stable and will die off anyway. So cleaning them would be a waste of time. They devised the "Cost/Benefit" Algorithm. It is basically just a simple formula, that takes age and costs of cleaning into consideration. > Does NILFS2 really needs in defragmenting tool? Could you describe > reasons of such necessity and concrete use-cases? And could you provide > benchmarking results that to prove efficiency this approach and to show > enhancement of NILFS2 performance or efficiency? Both of these things are on the TODO-List: http://www.nilfs.org/en/current_status.html * Smarter and more efficient Garbage Collector * Online defrag NILFS2 fragments very heavily, because everything is copy on write. The cleaner does some reordering that's true, but it is quite random. An example would be a database file or more concrete the sqlite db files in the firefox directory. These files are updated and fsynced quite often. I imagine, that this could very well reduce the startup time of firefox quite considerable, but I have no benchmark results as of yet. The effects of fragmentation are less severe for SSDs, but it still matters. You can easily test it with the tool filefrag, which shows you the extents of a file. On my test system I have created a file with 100 extents and defragmented it down to 9. The nilfs-defrag tool also checks if a file is already defragmented or partially defragmented and only defragments the fragmented parts of a file. So it is better than a simple "cp frag_file defrag_file" ;). br, Andreas Rohner -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <51C177A4.3030204-oe7qfRrRQffzPE21tAIdciO7C/xPubJB@public.gmane.org>]
* Re: cleaner optimization and online defragmentation: status update [not found] ` <51C177A4.3030204-oe7qfRrRQffzPE21tAIdciO7C/xPubJB@public.gmane.org> @ 2013-06-22 15:31 ` Vyacheslav Dubeyko [not found] ` <2F76977A-589B-47EB-8818-382477099600-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Vyacheslav Dubeyko @ 2013-06-22 15:31 UTC (permalink / raw) To: Andreas Rohner; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA Hi Andreas, On Jun 19, 2013, at 1:19 PM, Andreas Rohner wrote: > Hi Vyacheslav, > > On 2013-06-19 09:17, Vyacheslav Dubeyko wrote: >> What tools do you plan to use for testing? >> >> As I know, it is used frequently xfstests, fsstress for testing. >> Benchmarking tools are very useful also (for example, iozone). But it >> needs to use a special aging tool for the case of GC testing, I think. > > I have written my own little aging tool, that uses nfs traces and > replays them. I also plan on doing performance benchmarks, because there > is some performance degradation to be expected because of the block > counting. But I am quite busy for the next week or so and won't be able > to start benchmarking until then. > What benchmarking tool do you plan to use? I think that it needs to use well-known and widely used tool. Otherwise, it is impossible to check your results independently and to trust of your results. [skip] > >> If you suggest implementation of new GC policies then we need to have >> evidence of efficiency. Do you have any benchmarking results? How can >> you prove efficiency of your approach? > > First of all the current timestamp approach is clearly inefficient, > because it does not take the number of live blocks in the segment into > consideration. Then there is the paper I cited, that suggests the > Cost/Benefit algorithm. I don't have any results as of yet, but I plan > on doing thorough benchmarks soon. > >> As I understand, F2FS has slightly different allocation policy. Why do >> you think that "Greedy" and "Cost/Benefit" policies are useful for the >> case of NILFS2? > > I didn't get the idea from F2FS, but from the cited paper. I just noted, > that F2FS as far as I can tell also uses these algorithms. They apply to > NILFS2, because it is a log-structured file system with segments and a > cleaner. > >> I am slightly confused trying to understand essence of "Greedy" and >> "Cost/Benefit" policies. Could you briefly describe how it works? >> Moreover, you mentioned that "Greedy" policy selects segments with the >> most free blocks. As I know, F2FS has concept of invalid blocks. So, >> usually, it makes sense to clean firstly segments that have the most >> number of invalid blocks. But NILFS2 hasn't concept of invalid blocks. >> So, what do you mean when you are talking about free blocks in "Greedy" >> policy? > > When the cleaner cleans a segment, it has to decide on which blocks to > keep and which blocks to discard. The more blocks the cleaner can > discard the more efficient it is. To do this the cleaner has to know in > advance how many live blocks there are in a segment. In the context of > NILFS2 this means, that blocks that are deleted or overwritten in a file > and not part of some snapshot, can be discarded. Checkpoints are not > relevant in this distinction. > > Of course the fs has to keep track of the number of live blocks per > segment. I implemented that and used the su_nblocks attribute of struct > nilfs_segment_usage. > > With the tracking in place the cleaner can select the most efficient > segment to clean. "Greedy" just greedily selects the segment with the > most discardable/free/invalid number of blocks, whatever you want to > call them ;). > > In the paper "The design and implementation of a log-structured file > system" the authors noted, that the greedy approach is not the most > efficient, because the free space in old segments is more valuable than > the free space in young segments. It is very likely, that the blocks in > young segments are less stable and will die off anyway. So cleaning them > would be a waste of time. They devised the "Cost/Benefit" Algorithm. It > is basically just a simple formula, that takes age and costs of cleaning > into consideration. > Yes, I think that "Greedy" and "Cost/Benefit" policies can be used as GC policies for the case of NILFS2. But, currently, you don't suggest proper basis for such policies realization. As I understand, "Greedy" policy should select such segments that contain as lesser valid blocks as possible. Valid blocks means blocks that will be moved by GC. The "Cost/Benefit" policy should select segment which has required (calculated by formula range) correlation between valid and invalid blocks in segment. You are using the su_nblocks field. As I understand, the su_nblocks keeps number of blocks in all partial segments that are located in full segment. So, from my point of view, this value hasn't relation with valid/invalid blocks correlation. It means that from the su_nblocks point of view segments are practically identical. Because, usually, allocation policy tries to fill segment by partial segments till a segment capacity. So, if to compare your implementation and "timestamp" GC policy then "timestamp" policy is better. Moreover, current realization of segment allocation algorithm is efficient for the case of GC "timestamp" policy. But the efficiency of this allocation algorithm degrades significantly for the case of proper realization of "Greedy" and "Cost/Benefit" GC policies. >> Does NILFS2 really needs in defragmenting tool? Could you describe >> reasons of such necessity and concrete use-cases? And could you provide >> benchmarking results that to prove efficiency this approach and to show >> enhancement of NILFS2 performance or efficiency? > > Both of these things are on the TODO-List: > http://www.nilfs.org/en/current_status.html > * Smarter and more efficient Garbage Collector > * Online defrag > > NILFS2 fragments very heavily, because everything is copy on write. The > cleaner does some reordering that's true, but it is quite random. An > example would be a database file or more concrete the sqlite db files in > the firefox directory. These files are updated and fsynced quite often. > I imagine, that this could very well reduce the startup time of firefox > quite considerable, but I have no benchmark results as of yet. > The effects of fragmentation are less severe for SSDs, but it still matters. > First of all, I am thinking now and I will think that defragmentation should be a part of GC activity. Usually, from my point of view, users choose NILFS2 because of using flash storage (SSD and so on). So, GC is a consequence of log-structured nature of NILFS2. But it needs to think about flash aging, anyway. Because activity of GC or other auxiliary subsystems should take into account NAND flash wear-leveling. If activity of auxiliary subsystems will be significant then NAND flash will fail soon without any clear reasons from an user's viewpoint. > You can easily test it with the tool filefrag, which shows you the > extents of a file. On my test system I have created a file with 100 > extents and defragmented it down to 9. The nilfs-defrag tool also checks > if a file is already defragmented or partially defragmented and only > defragments the fragmented parts of a file. So it is better than a > simple "cp frag_file defrag_file" ;). > I reckon that "cp" command will be better if copying is necessary operation. From my point of view, the utility is a bad way. Defragmentation should be a part of file system activity. Anyway, we have GC that is moving blocks. So, namely GC activity should be corrected with the purpose of defragmentation during blocks moving. When you make utility that to copy blocks then you increase GC activity because of necessity to clean segments are processed by defragmenting utility. User should have transparent defragmenting feature of file system without necessity to use any special utility. Moreover, if fragmentation is a reason of GC activity or nature of internal file system operations then such activity should be corrected by defragmentation logic. So, I have such remarks about defragmentation utility: (1) The defragmentation utility receives as input a path to the file. So, it means that an user should detect and choose files that are needed in defragmentation. But if I have many files (100,000 - 1,000,000) then it will be really time-consuming and complex operation. A better way can be a daemon that scans state of files in background and to process defragmenting. But we have GC daemon yet. So, from the architectural point of view, namely GC is the proper place for defragmenting activity. (2) Some files can be rarely accessed and it doesn't make sense to defragment such files. But how an user can detect files that really needs in defragmenting by simple and fast way? (3) In your approach the defragmentation utility doubles GC activity in clearing of segments. Factually, the utility marks blocks as dirty and then segctor thread copies these blocks in new segments. So, as a result, GC should clear these source segments, anyway. But you can't predict moments of time when GC will work. As a result, nilfs_cleanerd can fragment another files from source segments during defragmenting of some big file. And, moreover, nilfs_cleanerd can fragment other parts of big file during defragmentation activity. (4) Your approach is using as basis peculiarities of segctor's algorithms. I mean that segctor processes all dirty blocks of one dirty file and only after to begin processing of another one. But if algorithm of segctor will change then the defragmentation utility will fragment files instead of defragmenting. And fragmenting of files will take place in the case when several segctor threads will work simultaneously (currently, we have only one segctor thread but this situation can change potentially). Currently, segctor and nilfs_cleanerd can work simultaneously. It is not rarely case when users to force to work nilfs_cleanerd without any sleeping timeout. So, simultaneous working of segctor and nilfs_cleanerd will end in fragmentation. (5) Factually, real defragmentation is done by segctor. It means that end of working of defragmentation utility doesn't mean the end of defragmentation. Because real point of start of defragmentation will be sync or umount operation. So, I can request defragmentation of file with 100 - 500 GB in size and, then, I can try to make umount operation immediately after defragmentation utility execution ending. How long will I wait the end of such umount? I think that a user will treat such long umount as file system bug because defragmentation utility ended execution before start of umount. With the best regards, Vyacheslav Dubeyko. > br, > Andreas Rohner > -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <2F76977A-589B-47EB-8818-382477099600-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>]
* Re: cleaner optimization and online defragmentation: status update [not found] ` <2F76977A-589B-47EB-8818-382477099600-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org> @ 2013-06-22 18:18 ` Andreas Rohner 2013-06-22 18:37 ` Clemens Eisserer 1 sibling, 0 replies; 7+ messages in thread From: Andreas Rohner @ 2013-06-22 18:18 UTC (permalink / raw) To: Vyacheslav Dubeyko; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA Hi Vyacheslav, First of all thanks for looking into it. On 2013-06-22 17:31, Vyacheslav Dubeyko wrote: > What benchmarking tool do you plan to use? I think that it needs to use > well-known and widely used tool. Otherwise, it is impossible to check > your results independently and to trust of your results. Of course. The ones you suggested are fine. > Yes, I think that "Greedy" and "Cost/Benefit" policies can be used as GC > policies for the case of NILFS2. But, currently, you don't suggest proper > basis for such policies realization. As I understand, "Greedy" policy should > select such segments that contain as lesser valid blocks as possible. Valid > blocks means blocks that will be moved by GC. The "Cost/Benefit" policy > should select segment which has required (calculated by formula range) > correlation between valid and invalid blocks in segment. > > You are using the su_nblocks field. As I understand, the su_nblocks keeps > number of blocks in all partial segments that are located in full segment. > So, from my point of view, this value hasn't relation with valid/invalid blocks > correlation. It means that from the su_nblocks point of view segments are > practically identical. Because, usually, allocation policy tries to fill segment > by partial segments till a segment capacity. So, if to compare your implementation > and "timestamp" GC policy then "timestamp" policy is better. I know, I changed that in my kernel patch. su_nblocks now contains the number of valid blocks. Every time something gets deleted I decrement su_nblocks. But this could be problematic and I will probably change that and add a new attribute like su_valid_nblocks or something. > Moreover, current realization of segment allocation algorithm is efficient for the > case of GC "timestamp" policy. But the efficiency of this allocation algorithm degrades > significantly for the case of proper realization of "Greedy" and "Cost/Benefit" GC policies. I don't think so. The segment allocation algorithm is just a linear search starting from 0. But sooner or later the segments will wrap around and the oldest segments are at the end of the list. "timestamp" cannot help you here. > First of all, I am thinking now and I will think that defragmentation should be > a part of GC activity. Usually, from my point of view, users choose NILFS2 > because of using flash storage (SSD and so on). So, GC is a consequence of > log-structured nature of NILFS2. But it needs to think about flash aging, anyway. > Because activity of GC or other auxiliary subsystems should take into account > NAND flash wear-leveling. If activity of auxiliary subsystems will be significant > then NAND flash will fail soon without any clear reasons from an user's > viewpoint. >> You can easily test it with the tool filefrag, which shows you the >> extents of a file. On my test system I have created a file with 100 >> extents and defragmented it down to 9. The nilfs-defrag tool also checks >> if a file is already defragmented or partially defragmented and only >> defragments the fragmented parts of a file. So it is better than a >> simple "cp frag_file defrag_file" ;). >> > > I reckon that "cp" command will be better if copying is necessary operation. > >>From my point of view, the utility is a bad way. Defragmentation should be a part > of file system activity. Anyway, we have GC that is moving blocks. So, namely > GC activity should be corrected with the purpose of defragmentation during > blocks moving. When you make utility that to copy blocks then you increase > GC activity because of necessity to clean segments are processed by > defragmenting utility. User should have transparent defragmenting feature > of file system without necessity to use any special utility. Moreover, if fragmentation > is a reason of GC activity or nature of internal file system operations then such > activity should be corrected by defragmentation logic. I agree. I thought about doing more defragmentation in the GC, but it is expensive to get all the extent information needed. This could result in extra overhead. Anyway the defrag tool is just a proof of concept implementation. I could easily implement it in the cleaner. > > So, I have such remarks about defragmentation utility: > > (1) The defragmentation utility receives as input a path to the file. So, it means that > an user should detect and choose files that are needed in defragmentation. But if I have > many files (100,000 - 1,000,000) then it will be really time-consuming and complex > operation. A better way can be a daemon that scans state of files in background and > to process defragmenting. But we have GC daemon yet. So, from the architectural point > of view, namely GC is the proper place for defragmenting activity. Yes it's just a first attempt. Since the tool is very efficient, and defragments only when it's necessary you could write a shell script: IFS=$'\n' for f in $(find /); do nilfs-defrag "$f" done > (2) Some files can be rarely accessed and it doesn't make sense to defragment such files. > But how an user can detect files that really needs in defragmenting by simple and fast > way? Yes but the user knows best, which files he wants defragmented. How should the GC know which files need to be defragmented? > (3) In your approach the defragmentation utility doubles GC activity in clearing of segments. > Factually, the utility marks blocks as dirty and then segctor thread copies these blocks in > new segments. So, as a result, GC should clear these source segments, anyway. But you > can't predict moments of time when GC will work. As a result, nilfs_cleanerd can fragment > another files from source segments during defragmenting of some big file. And, moreover, > nilfs_cleanerd can fragment other parts of big file during defragmentation activity. Maybe, but it is not as bad as you make it sound. Ultimately the segctor separates GC inodes and normal file operations. It's not like the fragmentation could get much worse. But it's probably true, that it could not improve. I also check the THE_NILFS_GC_RUNNING bit in the kernel code, so that the defragmentation will fail if the cleaner is running. > (4) Your approach is using as basis peculiarities of segctor's algorithms. I mean that segctor > processes all dirty blocks of one dirty file and only after to begin processing of another one. > But if algorithm of segctor will change then the defragmentation utility will fragment files > instead of defragmenting. And fragmenting of files will take place in the case when several > segctor threads will work simultaneously (currently, we have only one segctor thread but this > situation can change potentially). Currently, segctor and nilfs_cleanerd can work > simultaneously. It is not rarely case when users to force to work nilfs_cleanerd without any > sleeping timeout. So, simultaneous working of segctor and nilfs_cleanerd will end in > fragmentation. Even if you had multiple segctors, I think it would be quite strange if they shared dirty files. I think its not a peculiarity, but a reasonable assumption, that the file system doesn't try extra hard to fragment your files. > (5) Factually, real defragmentation is done by segctor. It means that end of working of > defragmentation utility doesn't mean the end of defragmentation. Because real point of > start of defragmentation will be sync or umount operation. So, I can request defragmentation > of file with 100 - 500 GB in size and, then, I can try to make umount operation immediately after > defragmentation utility execution ending. How long will I wait the end of such umount? I think that > a user will treat such long umount as file system bug because defragmentation utility ended > execution before start of umount. The tool is smart enough not to mark the whole file as dirty, but only the parts that are fragmented. If you happen to have a machine with 500 GB of RAM that could happen, but as soon as one segment worth of blocks accumulated the fs can start writing it out and I think it does. Its practically the same as if you copied a 500 GB file. As soon as the cache is full it starts writing it to disk. Nothing special about the defrag tool here. Thanks, that you took the time to look through my code :) Best regards, Andreas Rohner -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: cleaner optimization and online defragmentation: status update [not found] ` <2F76977A-589B-47EB-8818-382477099600-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org> 2013-06-22 18:18 ` Andreas Rohner @ 2013-06-22 18:37 ` Clemens Eisserer [not found] ` <CAFvQSYTZfiNQv4==v+m5W+SMCfrxm6WmV8gwKC_+yaW1MBfdOQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 1 sibling, 1 reply; 7+ messages in thread From: Clemens Eisserer @ 2013-06-22 18:37 UTC (permalink / raw) To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA Hi, > First of all, I am thinking now and I will think that defragmentation should be > a part of GC activity. Usually, from my point of view, users choose NILFS2 > because of using flash storage (SSD and so on). So, GC is a consequence of > log-structured nature of NILFS2. But it needs to think about flash aging, anyway. > Because activity of GC or other auxiliary subsystems should take into account > NAND flash wear-leveling. If activity of auxiliary subsystems will be significant > then NAND flash will fail soon without any clear reasons from an user's > viewpoint. I have no idea of the actual implementation or the code, so my comment just represents the POV of a fs-user. I've tried Nilfs2 and Btrfs (both cow) on traditional mechanical harddisks to get cheap, efficient snapshoting. However, due to cow, files with high random-write activity (like firefox's internal sqlite database) fragmented so heavily those filesystems were practiacally unuseable on HDDs. Btrfs actually offers both, manual defragmentation as well as an autodefrag mount option - which is useful even for SSDs when the average continous segment size becomes as low as 4kb. While autodefrag would be great for nilfs2, a manual tool at least wouldn't hurt ;) Regards -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <CAFvQSYTZfiNQv4==v+m5W+SMCfrxm6WmV8gwKC_+yaW1MBfdOQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: cleaner optimization and online defragmentation: status update [not found] ` <CAFvQSYTZfiNQv4==v+m5W+SMCfrxm6WmV8gwKC_+yaW1MBfdOQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2013-06-24 13:42 ` Vyacheslav Dubeyko 0 siblings, 0 replies; 7+ messages in thread From: Vyacheslav Dubeyko @ 2013-06-24 13:42 UTC (permalink / raw) To: Clemens Eisserer; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA On Sat, 2013-06-22 at 20:37 +0200, Clemens Eisserer wrote: [snip] > > Btrfs actually offers both, manual defragmentation as well as an > autodefrag mount option - which is useful even for SSDs when the > average continous segment size becomes as low as 4kb. > While autodefrag would be great for nilfs2, a manual tool at least > wouldn't hurt ;) > Yes, manual tool can be useful also. But suggested implementation is improper, from my viewpoint. And, first of all, we need to have internal file system defragmenting technique. I have such vision. :) With the best regards, Vyacheslav Dubeyko. > Regards > -- > To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-06-24 13:42 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-06-18 18:30 cleaner optimization and online defragmentation: status update Andreas Rohner [not found] ` <51C0A731.9060509-oe7qfRrRQffzPE21tAIdciO7C/xPubJB@public.gmane.org> 2013-06-19 7:17 ` Vyacheslav Dubeyko 2013-06-19 9:19 ` Andreas Rohner [not found] ` <51C177A4.3030204-oe7qfRrRQffzPE21tAIdciO7C/xPubJB@public.gmane.org> 2013-06-22 15:31 ` Vyacheslav Dubeyko [not found] ` <2F76977A-589B-47EB-8818-382477099600-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org> 2013-06-22 18:18 ` Andreas Rohner 2013-06-22 18:37 ` Clemens Eisserer [not found] ` <CAFvQSYTZfiNQv4==v+m5W+SMCfrxm6WmV8gwKC_+yaW1MBfdOQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2013-06-24 13:42 ` Vyacheslav Dubeyko
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.