* [PATCH 0/2] nilfs2: add support for FITRIM ioctl @ 2014-02-15 13:34 Andreas Rohner [not found] ` <cover.1392470644.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org> 0 siblings, 1 reply; 18+ messages in thread From: Andreas Rohner @ 2014-02-15 13:34 UTC (permalink / raw) To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA; +Cc: Andreas Rohner Hi, This patch adds support for the FITRIM ioctl, which allows user space tools like fstrim to issue TRIM/DISCARD requests to the underlying device. It takes a fstrim_range structure as a parameter and for every clean segment in the specified range the function blkdev_issue_discard is called. Best regards, Andreas Rohner --- Andreas Rohner (2): nilfs2: add nilfs_sufile_trim_fs to trim clean segs nilfs2: add FITRIM ioctl support for nilfs2 fs/nilfs2/ioctl.c | 46 +++++++++++++++++++++ fs/nilfs2/sufile.c | 114 +++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/nilfs2/sufile.h | 1 + 3 files changed, 161 insertions(+) -- 1.8.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <cover.1392470644.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>]
* [PATCH 1/2] nilfs2: add nilfs_sufile_trim_fs to trim clean segs [not found] ` <cover.1392470644.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org> @ 2014-02-15 13:34 ` Andreas Rohner [not found] ` <02d463460388a9bdc1413aa4d6fdfa5a402b9452.1392470644.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org> 2014-02-15 13:34 ` [PATCH 2/2] nilfs2: add FITRIM ioctl support for nilfs2 Andreas Rohner 1 sibling, 1 reply; 18+ messages in thread From: Andreas Rohner @ 2014-02-15 13:34 UTC (permalink / raw) To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA; +Cc: Andreas Rohner This patch adds the nilfs_sufile_trim_fs function, which takes a fstrim_range structure and calls blkdev_issue_discard for every clean segment in the specified range. Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org> --- fs/nilfs2/sufile.c | 114 +++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/nilfs2/sufile.h | 1 + 2 files changed, 115 insertions(+) diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c index 3127e9f..022b17f 100644 --- a/fs/nilfs2/sufile.c +++ b/fs/nilfs2/sufile.c @@ -870,6 +870,120 @@ ssize_t nilfs_sufile_get_suinfo(struct inode *sufile, __u64 segnum, void *buf, } /** + * nilfs_sufile_trim_fs() - trim ioctl handle function + * @sufile: inode of segment usage file + * @range: fstrim_range structure + * + * start: First Byte to trim + * len: number of Bytes to trim from start + * minlen: minimum extent length in Bytes + * + * Decription: nilfs_sufile_trim_fs goes through all segments containing bytes + * from start to start+len. For each clean segment blkdev_issue_discard + * function is invoked to trim it. + * + * Return Value: On success, 0 is returned or negative error code, otherwise. + */ +int nilfs_sufile_trim_fs(struct inode *sufile, struct fstrim_range *range) +{ + struct the_nilfs *nilfs = sufile->i_sb->s_fs_info; + struct buffer_head *su_bh; + struct nilfs_segment_usage *su; + void *kaddr; + size_t susz = NILFS_MDT(sufile)->mi_entry_size; + ssize_t n; + sector_t seg_start, seg_end, start = 0, nblocks = 0; + u64 segnum, end, minlen, trimmed = 0; + int i, ret = 0; + unsigned long segusages_per_block; + unsigned int sects_per_block; + + segusages_per_block = nilfs_sufile_segment_usages_per_block(sufile); + sects_per_block = (1 << nilfs->ns_blocksize_bits) / + bdev_logical_block_size(nilfs->ns_bdev); + segnum = nilfs_get_segnum_of_block(nilfs, range->start >> + nilfs->ns_blocksize_bits); + end = nilfs_get_segnum_of_block(nilfs, (range->start + range->len) + >> nilfs->ns_blocksize_bits); + minlen = range->minlen >> nilfs->ns_blocksize_bits; + + if (minlen > nilfs->ns_blocks_per_segment || + segnum >= nilfs->ns_nsegments || + range->len < nilfs->ns_blocksize) + return -EINVAL; + if (end > nilfs->ns_nsegments) + end = nilfs->ns_nsegments; + if (end == segnum) + goto out; + + down_read(&NILFS_MDT(sufile)->mi_sem); + + while (segnum < end) { + n = min_t(unsigned long, + segusages_per_block - + nilfs_sufile_get_offset(sufile, segnum), + end - segnum); + ret = nilfs_sufile_get_segment_usage_block(sufile, segnum, 0, + &su_bh); + if (ret < 0) { + if (ret != -ENOENT) + goto out_sem; + /* hole */ + segnum += n; + continue; + } + + kaddr = kmap(su_bh->b_page); + su = nilfs_sufile_block_get_segment_usage( + sufile, segnum, su_bh, kaddr); + for (i = 0; i < n; ++i, ++segnum, su = (void *)su + susz) { + if (!nilfs_segment_usage_clean(su)) + continue; + + nilfs_get_segment_range(nilfs, segnum, &seg_start, + &seg_end); + + if (!nblocks) { + start = seg_start; + nblocks = seg_end - seg_start + 1; + } else if (start + nblocks == seg_start) { + nblocks += seg_end - seg_start + 1; + } else { + ret = blkdev_issue_discard(nilfs->ns_bdev, + start * sects_per_block, + nblocks * sects_per_block, + GFP_NOFS, 0); + if (ret < 0) { + kunmap(kaddr); + put_bh(su_bh); + goto out_sem; + } + + trimmed += nblocks; + nblocks = 0; + } + } + kunmap(kaddr); + put_bh(su_bh); + } + + if (nblocks) { + ret = blkdev_issue_discard(nilfs->ns_bdev, + start * sects_per_block, + nblocks * sects_per_block, + GFP_NOFS, 0); + if (!ret) + trimmed += nblocks; + } + +out_sem: + up_read(&NILFS_MDT(sufile)->mi_sem); +out: + range->len = trimmed << nilfs->ns_blocksize_bits; + return ret; +} + +/** * nilfs_sufile_read - read or get sufile inode * @sb: super block instance * @susize: size of a segment usage entry diff --git a/fs/nilfs2/sufile.h b/fs/nilfs2/sufile.h index e84bc5b..2434abd 100644 --- a/fs/nilfs2/sufile.h +++ b/fs/nilfs2/sufile.h @@ -65,6 +65,7 @@ void nilfs_sufile_do_set_error(struct inode *, __u64, struct buffer_head *, int nilfs_sufile_resize(struct inode *sufile, __u64 newnsegs); int nilfs_sufile_read(struct super_block *sb, size_t susize, struct nilfs_inode *raw_inode, struct inode **inodep); +int nilfs_sufile_trim_fs(struct inode *sufile, struct fstrim_range *range); /** * nilfs_sufile_scrap - make a segment garbage -- 1.8.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 18+ messages in thread
[parent not found: <02d463460388a9bdc1413aa4d6fdfa5a402b9452.1392470644.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>]
* Re: [PATCH 1/2] nilfs2: add nilfs_sufile_trim_fs to trim clean segs [not found] ` <02d463460388a9bdc1413aa4d6fdfa5a402b9452.1392470644.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org> @ 2014-02-17 3:00 ` Ryusuke Konishi [not found] ` <20140217.120000.397306175.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> 0 siblings, 1 reply; 18+ messages in thread From: Ryusuke Konishi @ 2014-02-17 3:00 UTC (permalink / raw) To: Andreas Rohner; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA Hi Andreas, On Sat, 15 Feb 2014 14:34:25 +0100, Andreas Rohner wrote: > This patch adds the nilfs_sufile_trim_fs function, which takes a > fstrim_range structure and calls blkdev_issue_discard for every > clean segment in the specified range. > > Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org> Thank you for doing this work. I'll add comments inline below. > --- > fs/nilfs2/sufile.c | 114 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > fs/nilfs2/sufile.h | 1 + > 2 files changed, 115 insertions(+) > > diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c > index 3127e9f..022b17f 100644 > --- a/fs/nilfs2/sufile.c > +++ b/fs/nilfs2/sufile.c > @@ -870,6 +870,120 @@ ssize_t nilfs_sufile_get_suinfo(struct inode *sufile, __u64 segnum, void *buf, > } > > /** > + * nilfs_sufile_trim_fs() - trim ioctl handle function > + * @sufile: inode of segment usage file > + * @range: fstrim_range structure > + * > + * start: First Byte to trim > + * len: number of Bytes to trim from start > + * minlen: minimum extent length in Bytes > + * > + * Decription: nilfs_sufile_trim_fs goes through all segments containing bytes > + * from start to start+len. For each clean segment blkdev_issue_discard > + * function is invoked to trim it. > + * > + * Return Value: On success, 0 is returned or negative error code, otherwise. > + */ > +int nilfs_sufile_trim_fs(struct inode *sufile, struct fstrim_range *range) > +{ > + struct the_nilfs *nilfs = sufile->i_sb->s_fs_info; > + struct buffer_head *su_bh; > + struct nilfs_segment_usage *su; > + void *kaddr; > + size_t susz = NILFS_MDT(sufile)->mi_entry_size; > + ssize_t n; > + sector_t seg_start, seg_end, start = 0, nblocks = 0; > + u64 segnum, end, minlen, trimmed = 0; > + int i, ret = 0; > + unsigned long segusages_per_block; > + unsigned int sects_per_block; > + > + segusages_per_block = nilfs_sufile_segment_usages_per_block(sufile); > + sects_per_block = (1 << nilfs->ns_blocksize_bits) / > + bdev_logical_block_size(nilfs->ns_bdev); > + segnum = nilfs_get_segnum_of_block(nilfs, range->start >> > + nilfs->ns_blocksize_bits); > + end = nilfs_get_segnum_of_block(nilfs, (range->start + range->len) > + >> nilfs->ns_blocksize_bits); > + minlen = range->minlen >> nilfs->ns_blocksize_bits; > + > + if (minlen > nilfs->ns_blocks_per_segment || This check looks inappropriate. If we set a lower limit for range->minlin, I think it should be the file system size (block device size). For your information, the meaning of minlen is described as follows in the man page of fstrim command: Minimum contiguous free range to discard, in bytes. (This value is internally rounded up to a multiple of the filesystem block size). Free ranges smaller than this will be ignored. By increasing this value, the fstrim operation will complete more quickly for filesystems with badly fragmented freespace, although not all blocks will be discarded. Default value is zero, discard every free block. So, too small lower limit interferes purpose of the minlen argument. > + segnum >= nilfs->ns_nsegments || This is bad too, because userland programs usually don't know the segment structure of NILFS. When we specify the partition size to range->len, FITRIM can fail due to this check. The upper limit of (range->start + range->len) should be the block device size. > + range->len < nilfs->ns_blocksize) This looks OK. > + return -EINVAL; I think these EINVAL checks should be move to ioctl interface side because they should be performed in a perspective of general file system and shouldn't depend on the internal structure of NILFS so much. If we need to adjust the range in this function, we should do it silently. > + if (end > nilfs->ns_nsegments) > + end = nilfs->ns_nsegments; Yes, this adjustment is what we should do here, but 'end' segnum was rounded down to segment alighment before. So, it should be: if (end >= nilfs->ns_nsegments) end = nilfs->ns_nsegments - 1; > + if (end == segnum) > + goto out; and if (end < segnum) goto out; > + > + down_read(&NILFS_MDT(sufile)->mi_sem); > + > + while (segnum < end) { and while (segnum <= end) { > + n = min_t(unsigned long, > + segusages_per_block - > + nilfs_sufile_get_offset(sufile, segnum), > + end - segnum); Then, we can reuse nilfs_sufile_segment_usages_in_block() to calculate 'n'. > + ret = nilfs_sufile_get_segment_usage_block(sufile, segnum, 0, > + &su_bh); > + if (ret < 0) { > + if (ret != -ENOENT) > + goto out_sem; > + /* hole */ > + segnum += n; > + continue; Skipping segments within segment usage blocks which are hole, are OK since they are never written after the previous mkfs and mkfs.nilfs2 discards segments. However, note that this also separates the extent defined with (start, nblocks). So, (start, nblocks) should be updated and blkdev_issue_discard() should be called as needed. > + } > + > + kaddr = kmap(su_bh->b_page); Avoid using kmap()/kunmap() (if possible), these are expensive for architectures nowadays which shares virtual address space among processors. I think we can replace them with kmap_atomic() and kunmap_atomic() for this function by carefully releasing the highmem before calling blocking operations such as blkdev_issue_discard() and nilfs_sufile_get_segment_usage_block(). > + su = nilfs_sufile_block_get_segment_usage( > + sufile, segnum, su_bh, kaddr); > + for (i = 0; i < n; ++i, ++segnum, su = (void *)su + susz) { > + if (!nilfs_segment_usage_clean(su)) > + continue; > + > + nilfs_get_segment_range(nilfs, segnum, &seg_start, > + &seg_end); > + > + if (!nblocks) { > + start = seg_start; > + nblocks = seg_end - seg_start + 1; > + } else if (start + nblocks == seg_start) { > + nblocks += seg_end - seg_start + 1; > + } else { We should ignore the extent if the size is smaller than range->minlen. > + ret = blkdev_issue_discard(nilfs->ns_bdev, > + start * sects_per_block, > + nblocks * sects_per_block, > + GFP_NOFS, 0); > + if (ret < 0) { > + kunmap(kaddr); > + put_bh(su_bh); > + goto out_sem; > + } > + > + trimmed += nblocks; > + nblocks = 0; > + } > + } > + kunmap(kaddr); > + put_bh(su_bh); > + } > + > + if (nblocks) { > + ret = blkdev_issue_discard(nilfs->ns_bdev, > + start * sects_per_block, > + nblocks * sects_per_block, > + GFP_NOFS, 0); > + if (!ret) > + trimmed += nblocks; > + } > + > +out_sem: > + up_read(&NILFS_MDT(sufile)->mi_sem); > +out: > + range->len = trimmed << nilfs->ns_blocksize_bits; > + return ret; > +} > + > +/** > * nilfs_sufile_read - read or get sufile inode > * @sb: super block instance > * @susize: size of a segment usage entry > diff --git a/fs/nilfs2/sufile.h b/fs/nilfs2/sufile.h > index e84bc5b..2434abd 100644 > --- a/fs/nilfs2/sufile.h > +++ b/fs/nilfs2/sufile.h > @@ -65,6 +65,7 @@ void nilfs_sufile_do_set_error(struct inode *, __u64, struct buffer_head *, > int nilfs_sufile_resize(struct inode *sufile, __u64 newnsegs); > int nilfs_sufile_read(struct super_block *sb, size_t susize, > struct nilfs_inode *raw_inode, struct inode **inodep); > +int nilfs_sufile_trim_fs(struct inode *sufile, struct fstrim_range *range); > > /** > * nilfs_sufile_scrap - make a segment garbage > -- > 1.8.5.4 > Regards, Ryusuke Konishi -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <20140217.120000.397306175.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>]
* Re: [PATCH 1/2] nilfs2: add nilfs_sufile_trim_fs to trim clean segs [not found] ` <20140217.120000.397306175.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> @ 2014-02-17 4:27 ` Ryusuke Konishi 2014-02-17 9:06 ` Andreas Rohner 2014-02-17 9:17 ` Andreas Rohner 2 siblings, 0 replies; 18+ messages in thread From: Ryusuke Konishi @ 2014-02-17 4:27 UTC (permalink / raw) To: Andreas Rohner; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA On Mon, 17 Feb 2014 12:00:00 +0900 (JST), Ryusuke Konishi wrote: > Hi Andreas, > On Sat, 15 Feb 2014 14:34:25 +0100, Andreas Rohner wrote: >> + segnum >= nilfs->ns_nsegments || > > This is bad too, because userland programs usually don't know the > segment structure of NILFS. When we specify the partition size to > range->len, FITRIM can fail due to this check. > > The upper limit of (range->start + range->len) should be > the block device size. I mistook for this. segnum was the start segment number, so what we should do was: - Return -EINVAL in ioctl side if range->start is larger than the block device size. - Escape this function without an error setting range->len to zero if segnum >= nilfs->ns_nsegments. And, range->len should not be limited by the device size because it is set to ULLONG_MAX by default. We have to accept larger range->len values. Regards, Ryusuke Konishi -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/2] nilfs2: add nilfs_sufile_trim_fs to trim clean segs [not found] ` <20140217.120000.397306175.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> 2014-02-17 4:27 ` Ryusuke Konishi @ 2014-02-17 9:06 ` Andreas Rohner [not found] ` <5301D130.9050000-hi6Y0CQ0nG0@public.gmane.org> 2014-02-17 9:17 ` Andreas Rohner 2 siblings, 1 reply; 18+ messages in thread From: Andreas Rohner @ 2014-02-17 9:06 UTC (permalink / raw) To: Ryusuke Konishi; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA Hi Ryusuke, On 2014-02-17 04:00, Ryusuke Konishi wrote: >> +int nilfs_sufile_trim_fs(struct inode *sufile, struct fstrim_range *range) >> +{ >> + struct the_nilfs *nilfs = sufile->i_sb->s_fs_info; >> + struct buffer_head *su_bh; >> + struct nilfs_segment_usage *su; >> + void *kaddr; >> + size_t susz = NILFS_MDT(sufile)->mi_entry_size; >> + ssize_t n; >> + sector_t seg_start, seg_end, start = 0, nblocks = 0; >> + u64 segnum, end, minlen, trimmed = 0; >> + int i, ret = 0; >> + unsigned long segusages_per_block; >> + unsigned int sects_per_block; >> + >> + segusages_per_block = nilfs_sufile_segment_usages_per_block(sufile); >> + sects_per_block = (1 << nilfs->ns_blocksize_bits) / >> + bdev_logical_block_size(nilfs->ns_bdev); >> + segnum = nilfs_get_segnum_of_block(nilfs, range->start >> >> + nilfs->ns_blocksize_bits); >> + end = nilfs_get_segnum_of_block(nilfs, (range->start + range->len) >> + >> nilfs->ns_blocksize_bits); >> + minlen = range->minlen >> nilfs->ns_blocksize_bits; >> + >> + if (minlen > nilfs->ns_blocks_per_segment || > > This check looks inappropriate. If we set a lower limit for > range->minlin, I think it should be the file system size (block device > size). For your information, the meaning of minlen is described as > follows in the man page of fstrim command: > > Minimum contiguous free range to discard, in bytes. (This > value is internally rounded up to a multiple of the filesystem > block size). Free ranges smaller than this will be ignored. > By increasing this value, the fstrim operation will complete > more quickly for filesystems with badly fragmented freespace, > although not all blocks will be discarded. Default value is > zero, discard every free block. > > So, too small lower limit interferes purpose of the minlen argument. Agreed. >> + segnum >= nilfs->ns_nsegments || > > This is bad too, because userland programs usually don't know the > segment structure of NILFS. When we specify the partition size to > range->len, FITRIM can fail due to this check. > > The upper limit of (range->start + range->len) should be > the block device size. ext4 also checks the range structure like that. Besides couldn't it be possible, that the block device is bigger than the file system? >> + range->len < nilfs->ns_blocksize) > > This looks OK. > >> + return -EINVAL; > > I think these EINVAL checks should be move to ioctl interface side > because they should be performed in a perspective of general file > system and shouldn't depend on the internal structure of NILFS so > much. If we need to adjust the range in this function, we should do > it silently. I just did it the same way as it is done in ext4. >> + if (end > nilfs->ns_nsegments) >> + end = nilfs->ns_nsegments; > > Yes, this adjustment is what we should do here, but 'end' segnum was > rounded down to segment alighment before. So, it should be: > > if (end >= nilfs->ns_nsegments) > end = nilfs->ns_nsegments - 1; > >> + if (end == segnum) >> + goto out; > > and > > if (end < segnum) > goto out; > >> + >> + down_read(&NILFS_MDT(sufile)->mi_sem); >> + >> + while (segnum < end) { > > and > > while (segnum <= end) { > >> + n = min_t(unsigned long, >> + segusages_per_block - >> + nilfs_sufile_get_offset(sufile, segnum), >> + end - segnum); > > Then, we can reuse nilfs_sufile_segment_usages_in_block() to calculate > 'n'. Yes we can do this, if you want to use nilfs_sufile_segment_usages_in_block(). I would just like to state, that my code is also correct here. > >> + ret = nilfs_sufile_get_segment_usage_block(sufile, segnum, 0, >> + &su_bh); >> + if (ret < 0) { >> + if (ret != -ENOENT) >> + goto out_sem; >> + /* hole */ >> + segnum += n; >> + continue; > > Skipping segments within segment usage blocks which are hole, are OK > since they are never written after the previous mkfs and mkfs.nilfs2 > discards segments. > > However, note that this also separates the extent defined with (start, > nblocks). So, (start, nblocks) should be updated and > blkdev_issue_discard() should be called as needed. As far as I can tell this is not necessary here. But I checked the function again and I found a small bug further below. >> + } >> + >> + kaddr = kmap(su_bh->b_page); > > Avoid using kmap()/kunmap() (if possible), these are expensive for > architectures nowadays which shares virtual address space among > processors. I think we can replace them with kmap_atomic() and > kunmap_atomic() for this function by carefully releasing the highmem > before calling blocking operations such as blkdev_issue_discard() and > nilfs_sufile_get_segment_usage_block(). Ok. Yes that should be possible. >> + su = nilfs_sufile_block_get_segment_usage( >> + sufile, segnum, su_bh, kaddr); >> + for (i = 0; i < n; ++i, ++segnum, su = (void *)su + susz) { >> + if (!nilfs_segment_usage_clean(su)) >> + continue; >> + >> + nilfs_get_segment_range(nilfs, segnum, &seg_start, >> + &seg_end); >> + >> + if (!nblocks) { >> + start = seg_start; >> + nblocks = seg_end - seg_start + 1; >> + } else if (start + nblocks == seg_start) { >> + nblocks += seg_end - seg_start + 1; >> + } else { > > We should ignore the extent if the size is smaller than range->minlen. Agreed. >> + ret = blkdev_issue_discard(nilfs->ns_bdev, >> + start * sects_per_block, >> + nblocks * sects_per_block, >> + GFP_NOFS, 0); >> + if (ret < 0) { >> + kunmap(kaddr); >> + put_bh(su_bh); >> + goto out_sem; >> + } >> + >> + trimmed += nblocks; >> + nblocks = 0; This should be: + start = seg_start; + nblocks = seg_end - seg_start + 1; Regards, Andreas Rohner -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <5301D130.9050000-hi6Y0CQ0nG0@public.gmane.org>]
* Re: [PATCH 1/2] nilfs2: add nilfs_sufile_trim_fs to trim clean segs [not found] ` <5301D130.9050000-hi6Y0CQ0nG0@public.gmane.org> @ 2014-02-17 10:42 ` Ryusuke Konishi [not found] ` <20140217.194200.431478798.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> 0 siblings, 1 reply; 18+ messages in thread From: Ryusuke Konishi @ 2014-02-17 10:42 UTC (permalink / raw) To: Andreas Rohner; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA On Mon, 17 Feb 2014 10:06:56 +0100, Andreas Rohner wrote: > Hi Ryusuke, > > On 2014-02-17 04:00, Ryusuke Konishi wrote: >>> + segnum >= nilfs->ns_nsegments || >> >> This is bad too, because userland programs usually don't know the >> segment structure of NILFS. When we specify the partition size to >> range->len, FITRIM can fail due to this check. >> >> The upper limit of (range->start + range->len) should be >> the block device size. > > ext4 also checks the range structure like that. Besides couldn't it be > possible, that the block device is bigger than the file system? As I mentioned in my second mail, I misunderstood the meaning of this check at first. As you pointed out, the block device size may be bigger than the file system size after shrinking the file system. So, sbp->s_dev_size should be used for for this check. But it is a bit complicated since the current nilfs object doesn't have on-memory copy of it. It is OK to use nilfs->ns_nsegments as its alternative. >>> + ret = blkdev_issue_discard(nilfs->ns_bdev, >>> + start * sects_per_block, >>> + nblocks * sects_per_block, >>> + GFP_NOFS, 0); >>> + if (ret < 0) { >>> + kunmap(kaddr); >>> + put_bh(su_bh); >>> + goto out_sem; >>> + } >>> + >>> + trimmed += nblocks; >>> + nblocks = 0; > > This should be: > > + start = seg_start; > + nblocks = seg_end - seg_start + 1; Yes, that's right. Regards, Ryusuke Konishi -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <20140217.194200.431478798.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>]
* Re: [PATCH 1/2] nilfs2: add nilfs_sufile_trim_fs to trim clean segs [not found] ` <20140217.194200.431478798.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> @ 2014-02-17 11:04 ` Andreas Rohner [not found] ` <5301ECD4.5070200-hi6Y0CQ0nG0@public.gmane.org> 0 siblings, 1 reply; 18+ messages in thread From: Andreas Rohner @ 2014-02-17 11:04 UTC (permalink / raw) To: Ryusuke Konishi; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA On 2014-02-17 11:42, Ryusuke Konishi wrote: > On Mon, 17 Feb 2014 10:06:56 +0100, Andreas Rohner wrote: >> Hi Ryusuke, >> >> On 2014-02-17 04:00, Ryusuke Konishi wrote: >>>> + segnum >= nilfs->ns_nsegments || >>> >>> This is bad too, because userland programs usually don't know the >>> segment structure of NILFS. When we specify the partition size to >>> range->len, FITRIM can fail due to this check. >>> >>> The upper limit of (range->start + range->len) should be >>> the block device size. >> >> ext4 also checks the range structure like that. Besides couldn't it be >> possible, that the block device is bigger than the file system? > > As I mentioned in my second mail, I misunderstood the meaning of this > check at first. > > As you pointed out, the block device size may be bigger than the file > system size after shrinking the file system. So, sbp->s_dev_size > should be used for for this check. But it is a bit complicated since > the current nilfs object doesn't have on-memory copy of it. > > It is OK to use nilfs->ns_nsegments as its alternative. Ok, just to be clear. According to your second mail it should look something like this in nilfs_sufile_trim_fs(): if (segnum >= nilfs->ns_nsegments) goto out; And in nilfs_ioctl_trim_fs(): if (range.len < nilfs->ns_blocksize || range.start >= (nilfs->ns_nsegments * nilfs->ns_blocks_per_segment) << nilfs->ns_blocksize_bits) return -EINVAL; Is that correct? Regards, Andreas Rohner -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <5301ECD4.5070200-hi6Y0CQ0nG0@public.gmane.org>]
* Re: [PATCH 1/2] nilfs2: add nilfs_sufile_trim_fs to trim clean segs [not found] ` <5301ECD4.5070200-hi6Y0CQ0nG0@public.gmane.org> @ 2014-02-17 11:39 ` Ryusuke Konishi 0 siblings, 0 replies; 18+ messages in thread From: Ryusuke Konishi @ 2014-02-17 11:39 UTC (permalink / raw) To: Andreas Rohner; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA On Mon, 17 Feb 2014 12:04:52 +0100, Andreas Rohner wrote: > On 2014-02-17 11:42, Ryusuke Konishi wrote: >> On Mon, 17 Feb 2014 10:06:56 +0100, Andreas Rohner wrote: >>> Hi Ryusuke, >>> >>> On 2014-02-17 04:00, Ryusuke Konishi wrote: >>>>> + segnum >= nilfs->ns_nsegments || >>>> >>>> This is bad too, because userland programs usually don't know the >>>> segment structure of NILFS. When we specify the partition size to >>>> range->len, FITRIM can fail due to this check. >>>> >>>> The upper limit of (range->start + range->len) should be >>>> the block device size. >>> >>> ext4 also checks the range structure like that. Besides couldn't it be >>> possible, that the block device is bigger than the file system? >> >> As I mentioned in my second mail, I misunderstood the meaning of this >> check at first. >> >> As you pointed out, the block device size may be bigger than the file >> system size after shrinking the file system. So, sbp->s_dev_size >> should be used for for this check. But it is a bit complicated since >> the current nilfs object doesn't have on-memory copy of it. >> >> It is OK to use nilfs->ns_nsegments as its alternative. > > Ok, just to be clear. According to your second mail it should look > something like this in nilfs_sufile_trim_fs(): > > if (segnum >= nilfs->ns_nsegments) > goto out; > > And in nilfs_ioctl_trim_fs(): > > if (range.len < nilfs->ns_blocksize || > range.start >= (nilfs->ns_nsegments * > nilfs->ns_blocks_per_segment) << > nilfs->ns_blocksize_bits) > return -EINVAL; > > > Is that correct? Yes, exactly. Regards, Ryusuke Konishi -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/2] nilfs2: add nilfs_sufile_trim_fs to trim clean segs [not found] ` <20140217.120000.397306175.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> 2014-02-17 4:27 ` Ryusuke Konishi 2014-02-17 9:06 ` Andreas Rohner @ 2014-02-17 9:17 ` Andreas Rohner [not found] ` <5301D3C5.6040704-hi6Y0CQ0nG0@public.gmane.org> 2 siblings, 1 reply; 18+ messages in thread From: Andreas Rohner @ 2014-02-17 9:17 UTC (permalink / raw) To: Ryusuke Konishi; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA On 2014-02-17 04:00, Ryusuke Konishi wrote: >> + if (end > nilfs->ns_nsegments) >> + end = nilfs->ns_nsegments; > > Yes, this adjustment is what we should do here, but 'end' segnum was > rounded down to segment alighment before. So, it should be: > > if (end >= nilfs->ns_nsegments) > end = nilfs->ns_nsegments - 1; > >> + if (end == segnum) >> + goto out; > > and > > if (end < segnum) > goto out; > >> + >> + down_read(&NILFS_MDT(sufile)->mi_sem); >> + >> + while (segnum < end) { > > and > > while (segnum <= end) { > >> + n = min_t(unsigned long, >> + segusages_per_block - >> + nilfs_sufile_get_offset(sufile, segnum), >> + end - segnum); > > Then, we can reuse nilfs_sufile_segment_usages_in_block() to calculate > 'n'. Actually I don't think that is correct. What if range->start = 0 and range->end = 8MB. Then segnum = 0 and end = 1. Your code would discard segment 0 and segment 1, whereas my version would discard only segment 0, which seems to be more reasonable. br, Andreas Rohner -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <5301D3C5.6040704-hi6Y0CQ0nG0@public.gmane.org>]
* Re: [PATCH 1/2] nilfs2: add nilfs_sufile_trim_fs to trim clean segs [not found] ` <5301D3C5.6040704-hi6Y0CQ0nG0@public.gmane.org> @ 2014-02-17 10:06 ` Ryusuke Konishi [not found] ` <20140217.190610.135800072.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> 0 siblings, 1 reply; 18+ messages in thread From: Ryusuke Konishi @ 2014-02-17 10:06 UTC (permalink / raw) To: Andreas Rohner; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA On Mon, 17 Feb 2014 10:17:57 +0100, Andreas Rohner wrote: > On 2014-02-17 04:00, Ryusuke Konishi wrote: >>> + if (end > nilfs->ns_nsegments) >>> + end = nilfs->ns_nsegments; >> >> Yes, this adjustment is what we should do here, but 'end' segnum was >> rounded down to segment alighment before. So, it should be: >> >> if (end >= nilfs->ns_nsegments) >> end = nilfs->ns_nsegments - 1; >> >>> + if (end == segnum) >>> + goto out; >> >> and >> >> if (end < segnum) >> goto out; >> >>> + >>> + down_read(&NILFS_MDT(sufile)->mi_sem); >>> + >>> + while (segnum < end) { >> >> and >> >> while (segnum <= end) { >> >>> + n = min_t(unsigned long, >>> + segusages_per_block - >>> + nilfs_sufile_get_offset(sufile, segnum), >>> + end - segnum); >> >> Then, we can reuse nilfs_sufile_segment_usages_in_block() to calculate >> 'n'. > > Actually I don't think that is correct. What if range->start = 0 and > range->end = 8MB. Then segnum = 0 and end = 1. Your code would discard > segment 0 and segment 1, whereas my version would discard only segment > 0, which seems to be more reasonable. The problem seems that 'end' is not calculated properly. I think it should be end = nilfs_get_segnum_of_block( nilfs, (range->start + range->len + <segment-size-in-bytes> - 1) >> nilfs->ns_blocksize_bits) - 1; or can be simplified to end = nilfs_get_segnum_of_block( nilfs, (range->start + range->len - 1) >> nilfs->ns_blocksize_bits); if range->len > 0 is guaranteed. The calculation of segnum extends the range to be discarded since it is rounded down to segment alignment, but that of the current implementation for 'end' truncates the range. Both should be extended. Regards, Ryusuke Konishi -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <20140217.190610.135800072.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>]
* Re: [PATCH 1/2] nilfs2: add nilfs_sufile_trim_fs to trim clean segs [not found] ` <20140217.190610.135800072.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> @ 2014-02-17 10:34 ` Andreas Rohner [not found] ` <5301E5A8.3080701-hi6Y0CQ0nG0@public.gmane.org> 0 siblings, 1 reply; 18+ messages in thread From: Andreas Rohner @ 2014-02-17 10:34 UTC (permalink / raw) To: Ryusuke Konishi; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA On 2014-02-17 11:06, Ryusuke Konishi wrote: > On Mon, 17 Feb 2014 10:17:57 +0100, Andreas Rohner wrote: >> On 2014-02-17 04:00, Ryusuke Konishi wrote: >>>> + if (end > nilfs->ns_nsegments) >>>> + end = nilfs->ns_nsegments; >>> >>> Yes, this adjustment is what we should do here, but 'end' segnum was >>> rounded down to segment alighment before. So, it should be: >>> >>> if (end >= nilfs->ns_nsegments) >>> end = nilfs->ns_nsegments - 1; >>> >>>> + if (end == segnum) >>>> + goto out; >>> >>> and >>> >>> if (end < segnum) >>> goto out; >>> >>>> + >>>> + down_read(&NILFS_MDT(sufile)->mi_sem); >>>> + >>>> + while (segnum < end) { >>> >>> and >>> >>> while (segnum <= end) { >>> >>>> + n = min_t(unsigned long, >>>> + segusages_per_block - >>>> + nilfs_sufile_get_offset(sufile, segnum), >>>> + end - segnum); >>> >>> Then, we can reuse nilfs_sufile_segment_usages_in_block() to calculate >>> 'n'. >> >> Actually I don't think that is correct. What if range->start = 0 and >> range->end = 8MB. Then segnum = 0 and end = 1. Your code would discard >> segment 0 and segment 1, whereas my version would discard only segment >> 0, which seems to be more reasonable. > > The problem seems that 'end' is not calculated properly. > I think it should be > > end = nilfs_get_segnum_of_block( > nilfs, > (range->start + range->len + <segment-size-in-bytes> - 1) > >> nilfs->ns_blocksize_bits) - 1; > > or can be simplified to > > end = nilfs_get_segnum_of_block( > nilfs, > (range->start + range->len - 1) >> nilfs->ns_blocksize_bits); > > if range->len > 0 is guaranteed. > > > The calculation of segnum extends the range to be discarded since it > is rounded down to segment alignment, but that of the current > implementation for 'end' truncates the range. Both should be > extended. Then shouldn't both be truncated? The user expects that less bytes than range->len are discarded but not more. Maybe I am wrong and it is defined somewhere... Regards, Andreas Rohner -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <5301E5A8.3080701-hi6Y0CQ0nG0@public.gmane.org>]
* Re: [PATCH 1/2] nilfs2: add nilfs_sufile_trim_fs to trim clean segs [not found] ` <5301E5A8.3080701-hi6Y0CQ0nG0@public.gmane.org> @ 2014-02-17 11:21 ` Ryusuke Konishi [not found] ` <20140217.202143.05344630.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> 0 siblings, 1 reply; 18+ messages in thread From: Ryusuke Konishi @ 2014-02-17 11:21 UTC (permalink / raw) To: Andreas Rohner; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA On Mon, 17 Feb 2014 11:34:16 +0100, Andreas Rohner wrote: > On 2014-02-17 11:06, Ryusuke Konishi wrote: >> On Mon, 17 Feb 2014 10:17:57 +0100, Andreas Rohner wrote: >>> On 2014-02-17 04:00, Ryusuke Konishi wrote: >>>>> + if (end > nilfs->ns_nsegments) >>>>> + end = nilfs->ns_nsegments; >>>> >>>> Yes, this adjustment is what we should do here, but 'end' segnum was >>>> rounded down to segment alighment before. So, it should be: >>>> >>>> if (end >= nilfs->ns_nsegments) >>>> end = nilfs->ns_nsegments - 1; >>>> >>>>> + if (end == segnum) >>>>> + goto out; >>>> >>>> and >>>> >>>> if (end < segnum) >>>> goto out; >>>> >>>>> + >>>>> + down_read(&NILFS_MDT(sufile)->mi_sem); >>>>> + >>>>> + while (segnum < end) { >>>> >>>> and >>>> >>>> while (segnum <= end) { >>>> >>>>> + n = min_t(unsigned long, >>>>> + segusages_per_block - >>>>> + nilfs_sufile_get_offset(sufile, segnum), >>>>> + end - segnum); >>>> >>>> Then, we can reuse nilfs_sufile_segment_usages_in_block() to calculate >>>> 'n'. >>> >>> Actually I don't think that is correct. What if range->start = 0 and >>> range->end = 8MB. Then segnum = 0 and end = 1. Your code would discard >>> segment 0 and segment 1, whereas my version would discard only segment >>> 0, which seems to be more reasonable. >> >> The problem seems that 'end' is not calculated properly. >> I think it should be >> >> end = nilfs_get_segnum_of_block( >> nilfs, >> (range->start + range->len + <segment-size-in-bytes> - 1) >> >> nilfs->ns_blocksize_bits) - 1; >> >> or can be simplified to >> >> end = nilfs_get_segnum_of_block( >> nilfs, >> (range->start + range->len - 1) >> nilfs->ns_blocksize_bits); >> >> if range->len > 0 is guaranteed. >> >> >> The calculation of segnum extends the range to be discarded since it >> is rounded down to segment alignment, but that of the current >> implementation for 'end' truncates the range. Both should be >> extended. > > Then shouldn't both be truncated? The user expects that less bytes than > range->len are discarded but not more. Maybe I am wrong and it is > defined somewhere... Uum, xfs looks to be truncating both. This seems to be authentic when we think the original meaning of the fitrim range. I first thought truncating the range can generate gaps between consecutive calls of FITRIM (if we use FITRIM like that). But now I'm tempted to take the xfs approach. Let's take this semantics. So, we now have to round up range->start to calculate (start) segnum. Regards, Ryusuke Konishi -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <20140217.202143.05344630.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>]
* Re: [PATCH 1/2] nilfs2: add nilfs_sufile_trim_fs to trim clean segs [not found] ` <20140217.202143.05344630.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> @ 2014-02-17 11:53 ` Andreas Rohner [not found] ` <5301F852.8010105-hi6Y0CQ0nG0@public.gmane.org> 0 siblings, 1 reply; 18+ messages in thread From: Andreas Rohner @ 2014-02-17 11:53 UTC (permalink / raw) To: Ryusuke Konishi; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA On 2014-02-17 12:21, Ryusuke Konishi wrote: > On Mon, 17 Feb 2014 11:34:16 +0100, Andreas Rohner wrote: >> On 2014-02-17 11:06, Ryusuke Konishi wrote: >>> On Mon, 17 Feb 2014 10:17:57 +0100, Andreas Rohner wrote: >>>> On 2014-02-17 04:00, Ryusuke Konishi wrote: >>>>>> + if (end > nilfs->ns_nsegments) >>>>>> + end = nilfs->ns_nsegments; >>>>> >>>>> Yes, this adjustment is what we should do here, but 'end' segnum was >>>>> rounded down to segment alighment before. So, it should be: >>>>> >>>>> if (end >= nilfs->ns_nsegments) >>>>> end = nilfs->ns_nsegments - 1; >>>>> >>>>>> + if (end == segnum) >>>>>> + goto out; >>>>> >>>>> and >>>>> >>>>> if (end < segnum) >>>>> goto out; >>>>> >>>>>> + >>>>>> + down_read(&NILFS_MDT(sufile)->mi_sem); >>>>>> + >>>>>> + while (segnum < end) { >>>>> >>>>> and >>>>> >>>>> while (segnum <= end) { >>>>> >>>>>> + n = min_t(unsigned long, >>>>>> + segusages_per_block - >>>>>> + nilfs_sufile_get_offset(sufile, segnum), >>>>>> + end - segnum); >>>>> >>>>> Then, we can reuse nilfs_sufile_segment_usages_in_block() to calculate >>>>> 'n'. >>>> >>>> Actually I don't think that is correct. What if range->start = 0 and >>>> range->end = 8MB. Then segnum = 0 and end = 1. Your code would discard >>>> segment 0 and segment 1, whereas my version would discard only segment >>>> 0, which seems to be more reasonable. >>> >>> The problem seems that 'end' is not calculated properly. >>> I think it should be >>> >>> end = nilfs_get_segnum_of_block( >>> nilfs, >>> (range->start + range->len + <segment-size-in-bytes> - 1) >>> >> nilfs->ns_blocksize_bits) - 1; >>> >>> or can be simplified to >>> >>> end = nilfs_get_segnum_of_block( >>> nilfs, >>> (range->start + range->len - 1) >> nilfs->ns_blocksize_bits); >>> >>> if range->len > 0 is guaranteed. >>> >>> >>> The calculation of segnum extends the range to be discarded since it >>> is rounded down to segment alignment, but that of the current >>> implementation for 'end' truncates the range. Both should be >>> extended. >> >> Then shouldn't both be truncated? The user expects that less bytes than >> range->len are discarded but not more. Maybe I am wrong and it is >> defined somewhere... > > Uum, xfs looks to be truncating both. This seems to be authentic > when we think the original meaning of the fitrim range. > > I first thought truncating the range can generate gaps between > consecutive calls of FITRIM (if we use FITRIM like that). Hmm good point I didn't think of that. Then by extending both sides the overlapping segments would get discarded twice with consecutive calls, which shouldn't be a problem. > But now I'm > tempted to take the xfs approach. Let's take this semantics. > > So, we now have to round up range->start to calculate (start) segnum. Ok. I think in the end it doesn't really matter, because most users will use the default values: range->start = 0 range->minlen = 0 range->len = ULLONG_MAX Regards, Andreas Rohner -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <5301F852.8010105-hi6Y0CQ0nG0@public.gmane.org>]
* Re: [PATCH 1/2] nilfs2: add nilfs_sufile_trim_fs to trim clean segs [not found] ` <5301F852.8010105-hi6Y0CQ0nG0@public.gmane.org> @ 2014-02-17 13:28 ` Ryusuke Konishi [not found] ` <20140217.222811.27809410.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> 0 siblings, 1 reply; 18+ messages in thread From: Ryusuke Konishi @ 2014-02-17 13:28 UTC (permalink / raw) To: Andreas Rohner; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA On Mon, 17 Feb 2014 12:53:54 +0100, Andreas Rohner wrote: > On 2014-02-17 12:21, Ryusuke Konishi wrote: >> On Mon, 17 Feb 2014 11:34:16 +0100, Andreas Rohner wrote: >>> On 2014-02-17 11:06, Ryusuke Konishi wrote: >>>> On Mon, 17 Feb 2014 10:17:57 +0100, Andreas Rohner wrote: >>>>> On 2014-02-17 04:00, Ryusuke Konishi wrote: >>>>>>> + if (end > nilfs->ns_nsegments) >>>>>>> + end = nilfs->ns_nsegments; >>>>>> >>>>>> Yes, this adjustment is what we should do here, but 'end' segnum was >>>>>> rounded down to segment alighment before. So, it should be: >>>>>> >>>>>> if (end >= nilfs->ns_nsegments) >>>>>> end = nilfs->ns_nsegments - 1; >>>>>> >>>>>>> + if (end == segnum) >>>>>>> + goto out; >>>>>> >>>>>> and >>>>>> >>>>>> if (end < segnum) >>>>>> goto out; >>>>>> >>>>>>> + >>>>>>> + down_read(&NILFS_MDT(sufile)->mi_sem); >>>>>>> + >>>>>>> + while (segnum < end) { >>>>>> >>>>>> and >>>>>> >>>>>> while (segnum <= end) { >>>>>> >>>>>>> + n = min_t(unsigned long, >>>>>>> + segusages_per_block - >>>>>>> + nilfs_sufile_get_offset(sufile, segnum), >>>>>>> + end - segnum); >>>>>> >>>>>> Then, we can reuse nilfs_sufile_segment_usages_in_block() to calculate >>>>>> 'n'. >>>>> >>>>> Actually I don't think that is correct. What if range->start = 0 and >>>>> range->end = 8MB. Then segnum = 0 and end = 1. Your code would discard >>>>> segment 0 and segment 1, whereas my version would discard only segment >>>>> 0, which seems to be more reasonable. >>>> >>>> The problem seems that 'end' is not calculated properly. >>>> I think it should be >>>> >>>> end = nilfs_get_segnum_of_block( >>>> nilfs, >>>> (range->start + range->len + <segment-size-in-bytes> - 1) >>>> >> nilfs->ns_blocksize_bits) - 1; >>>> >>>> or can be simplified to >>>> >>>> end = nilfs_get_segnum_of_block( >>>> nilfs, >>>> (range->start + range->len - 1) >> nilfs->ns_blocksize_bits); >>>> >>>> if range->len > 0 is guaranteed. >>>> >>>> >>>> The calculation of segnum extends the range to be discarded since it >>>> is rounded down to segment alignment, but that of the current >>>> implementation for 'end' truncates the range. Both should be >>>> extended. >>> >>> Then shouldn't both be truncated? The user expects that less bytes than >>> range->len are discarded but not more. Maybe I am wrong and it is >>> defined somewhere... >> >> Uum, xfs looks to be truncating both. This seems to be authentic >> when we think the original meaning of the fitrim range. >> >> I first thought truncating the range can generate gaps between >> consecutive calls of FITRIM (if we use FITRIM like that). > > Hmm good point I didn't think of that. Then by extending both sides the > overlapping segments would get discarded twice with consecutive calls, > which shouldn't be a problem. > >> But now I'm >> tempted to take the xfs approach. Let's take this semantics. >> >> So, we now have to round up range->start to calculate (start) segnum. > > Ok. I think in the end it doesn't really matter, because most users will > use the default values: > > range->start = 0 > range->minlen = 0 > range->len = ULLONG_MAX Or, we have the third option to avoid this issue. It's simply truncating both to sector boundary. In this case, the gap problem will not arise since userland programs usually separate the range considering sector boundary or file system block size boundary. I think this is more preferable than truncating it to segment size. How do you think of this ? Regards, Ryusuke Konishi -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <20140217.222811.27809410.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>]
* Re: [PATCH 1/2] nilfs2: add nilfs_sufile_trim_fs to trim clean segs [not found] ` <20140217.222811.27809410.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> @ 2014-02-17 13:51 ` Andreas Rohner 0 siblings, 0 replies; 18+ messages in thread From: Andreas Rohner @ 2014-02-17 13:51 UTC (permalink / raw) To: Ryusuke Konishi; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA On 2014-02-17 14:28, Ryusuke Konishi wrote: > On Mon, 17 Feb 2014 12:53:54 +0100, Andreas Rohner wrote: >> On 2014-02-17 12:21, Ryusuke Konishi wrote: >>> On Mon, 17 Feb 2014 11:34:16 +0100, Andreas Rohner wrote: >>>> On 2014-02-17 11:06, Ryusuke Konishi wrote: >>>>> On Mon, 17 Feb 2014 10:17:57 +0100, Andreas Rohner wrote: >>>>>> On 2014-02-17 04:00, Ryusuke Konishi wrote: >>>>>>>> + if (end > nilfs->ns_nsegments) >>>>>>>> + end = nilfs->ns_nsegments; >>>>>>> >>>>>>> Yes, this adjustment is what we should do here, but 'end' segnum was >>>>>>> rounded down to segment alighment before. So, it should be: >>>>>>> >>>>>>> if (end >= nilfs->ns_nsegments) >>>>>>> end = nilfs->ns_nsegments - 1; >>>>>>> >>>>>>>> + if (end == segnum) >>>>>>>> + goto out; >>>>>>> >>>>>>> and >>>>>>> >>>>>>> if (end < segnum) >>>>>>> goto out; >>>>>>> >>>>>>>> + >>>>>>>> + down_read(&NILFS_MDT(sufile)->mi_sem); >>>>>>>> + >>>>>>>> + while (segnum < end) { >>>>>>> >>>>>>> and >>>>>>> >>>>>>> while (segnum <= end) { >>>>>>> >>>>>>>> + n = min_t(unsigned long, >>>>>>>> + segusages_per_block - >>>>>>>> + nilfs_sufile_get_offset(sufile, segnum), >>>>>>>> + end - segnum); >>>>>>> >>>>>>> Then, we can reuse nilfs_sufile_segment_usages_in_block() to calculate >>>>>>> 'n'. >>>>>> >>>>>> Actually I don't think that is correct. What if range->start = 0 and >>>>>> range->end = 8MB. Then segnum = 0 and end = 1. Your code would discard >>>>>> segment 0 and segment 1, whereas my version would discard only segment >>>>>> 0, which seems to be more reasonable. >>>>> >>>>> The problem seems that 'end' is not calculated properly. >>>>> I think it should be >>>>> >>>>> end = nilfs_get_segnum_of_block( >>>>> nilfs, >>>>> (range->start + range->len + <segment-size-in-bytes> - 1) >>>>> >> nilfs->ns_blocksize_bits) - 1; >>>>> >>>>> or can be simplified to >>>>> >>>>> end = nilfs_get_segnum_of_block( >>>>> nilfs, >>>>> (range->start + range->len - 1) >> nilfs->ns_blocksize_bits); >>>>> >>>>> if range->len > 0 is guaranteed. >>>>> >>>>> >>>>> The calculation of segnum extends the range to be discarded since it >>>>> is rounded down to segment alignment, but that of the current >>>>> implementation for 'end' truncates the range. Both should be >>>>> extended. >>>> >>>> Then shouldn't both be truncated? The user expects that less bytes than >>>> range->len are discarded but not more. Maybe I am wrong and it is >>>> defined somewhere... >>> >>> Uum, xfs looks to be truncating both. This seems to be authentic >>> when we think the original meaning of the fitrim range. >>> >>> I first thought truncating the range can generate gaps between >>> consecutive calls of FITRIM (if we use FITRIM like that). >> >> Hmm good point I didn't think of that. Then by extending both sides the >> overlapping segments would get discarded twice with consecutive calls, >> which shouldn't be a problem. >> >>> But now I'm >>> tempted to take the xfs approach. Let's take this semantics. >>> >>> So, we now have to round up range->start to calculate (start) segnum. >> >> Ok. I think in the end it doesn't really matter, because most users will >> use the default values: >> >> range->start = 0 >> range->minlen = 0 >> range->len = ULLONG_MAX > > Or, we have the third option to avoid this issue. It's simply > truncating both to sector boundary. In this case, the gap problem > will not arise since userland programs usually separate the range > considering sector boundary or file system block size boundary. I > think this is more preferable than truncating it to segment size. > > How do you think of this ? Yes I think that is a good idea. It shouldn't be hard to implement either. I will try it and then send you a new version of the patch. Regards, Andreas Rohner -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 2/2] nilfs2: add FITRIM ioctl support for nilfs2 [not found] ` <cover.1392470644.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org> 2014-02-15 13:34 ` [PATCH 1/2] nilfs2: add nilfs_sufile_trim_fs to trim clean segs Andreas Rohner @ 2014-02-15 13:34 ` Andreas Rohner [not found] ` <c1b713f94e9ebdb093c539a4c79a0426c45639a9.1392470644.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org> 1 sibling, 1 reply; 18+ messages in thread From: Andreas Rohner @ 2014-02-15 13:34 UTC (permalink / raw) To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA; +Cc: Andreas Rohner This patch adds support for the FITRIM ioctl, which enables user space tools to issue TRIM/DISCARD requests to the underlying device. Every clean segment within the specified range will be discarded. Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org> --- fs/nilfs2/ioctl.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+) diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c index 2b34021..cb1bea4 100644 --- a/fs/nilfs2/ioctl.c +++ b/fs/nilfs2/ioctl.c @@ -1072,6 +1072,49 @@ out: } /** + * nilfs_ioctl_trim_fs() - trim ioctl handle function + * @inode: inode object + * @argp: pointer on argument from userspace + * + * Decription: nilfs_ioctl_trim_fs is the FITRIM ioctl handle function. It + * checks the arguments from userspace and calls nilfs_sufile_trim_fs, which + * performs the actual trim operation. + * + * Return Value: On success, 0 is returned or negative error code, otherwise. + */ +static int nilfs_ioctl_trim_fs(struct inode *inode, void __user *argp) +{ + struct the_nilfs *nilfs = inode->i_sb->s_fs_info; + struct request_queue *q = bdev_get_queue(nilfs->ns_bdev); + struct fstrim_range range; + int ret = 0; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + if (!blk_queue_discard(q)) + return -EOPNOTSUPP; + + if (copy_from_user(&range, argp, sizeof(range))) + return -EFAULT; + + range.minlen = max_t(unsigned int, (unsigned int)range.minlen, + q->limits.discard_granularity); + + down_read(&nilfs->ns_segctor_sem); + ret = nilfs_sufile_trim_fs(nilfs->ns_sufile, &range); + up_read(&nilfs->ns_segctor_sem); + + if (ret < 0) + return ret; + + if (copy_to_user(argp, &range, sizeof(range))) + return -EFAULT; + + return 0; +} + +/** * nilfs_ioctl_set_alloc_range - limit range of segments to be allocated * @inode: inode object * @argp: pointer on argument from userspace @@ -1205,6 +1248,8 @@ long nilfs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) return nilfs_ioctl_resize(inode, filp, argp); case NILFS_IOCTL_SET_ALLOC_RANGE: return nilfs_ioctl_set_alloc_range(inode, argp); + case FITRIM: + return nilfs_ioctl_trim_fs(inode, argp); default: return -ENOTTY; } @@ -1235,6 +1280,7 @@ long nilfs_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) case NILFS_IOCTL_SYNC: case NILFS_IOCTL_RESIZE: case NILFS_IOCTL_SET_ALLOC_RANGE: + case FITRIM: break; default: return -ENOIOCTLCMD; -- 1.8.5.4 -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 18+ messages in thread
[parent not found: <c1b713f94e9ebdb093c539a4c79a0426c45639a9.1392470644.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>]
* Re: [PATCH 2/2] nilfs2: add FITRIM ioctl support for nilfs2 [not found] ` <c1b713f94e9ebdb093c539a4c79a0426c45639a9.1392470644.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org> @ 2014-02-17 3:01 ` Ryusuke Konishi 0 siblings, 0 replies; 18+ messages in thread From: Ryusuke Konishi @ 2014-02-17 3:01 UTC (permalink / raw) To: Andreas Rohner; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA On Sat, 15 Feb 2014 14:34:26 +0100, Andreas Rohner wrote: > This patch adds support for the FITRIM ioctl, which enables user space > tools to issue TRIM/DISCARD requests to the underlying device. Every > clean segment within the specified range will be discarded. > > Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org> > --- > fs/nilfs2/ioctl.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 46 insertions(+) > > diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c > index 2b34021..cb1bea4 100644 > --- a/fs/nilfs2/ioctl.c > +++ b/fs/nilfs2/ioctl.c > @@ -1072,6 +1072,49 @@ out: > } > > /** > + * nilfs_ioctl_trim_fs() - trim ioctl handle function > + * @inode: inode object > + * @argp: pointer on argument from userspace > + * > + * Decription: nilfs_ioctl_trim_fs is the FITRIM ioctl handle function. It > + * checks the arguments from userspace and calls nilfs_sufile_trim_fs, which > + * performs the actual trim operation. > + * > + * Return Value: On success, 0 is returned or negative error code, otherwise. > + */ > +static int nilfs_ioctl_trim_fs(struct inode *inode, void __user *argp) > +{ > + struct the_nilfs *nilfs = inode->i_sb->s_fs_info; > + struct request_queue *q = bdev_get_queue(nilfs->ns_bdev); > + struct fstrim_range range; > + int ret = 0; This initialization is unnecessary. Other looks OK to me. Regards, Ryusuke Konishi > + > + if (!capable(CAP_SYS_ADMIN)) > + return -EPERM; > + > + if (!blk_queue_discard(q)) > + return -EOPNOTSUPP; > + > + if (copy_from_user(&range, argp, sizeof(range))) > + return -EFAULT; > + > + range.minlen = max_t(unsigned int, (unsigned int)range.minlen, > + q->limits.discard_granularity); > + > + down_read(&nilfs->ns_segctor_sem); > + ret = nilfs_sufile_trim_fs(nilfs->ns_sufile, &range); > + up_read(&nilfs->ns_segctor_sem); > + > + if (ret < 0) > + return ret; > + > + if (copy_to_user(argp, &range, sizeof(range))) > + return -EFAULT; > + > + return 0; > +} > + > +/** > * nilfs_ioctl_set_alloc_range - limit range of segments to be allocated > * @inode: inode object > * @argp: pointer on argument from userspace > @@ -1205,6 +1248,8 @@ long nilfs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) > return nilfs_ioctl_resize(inode, filp, argp); > case NILFS_IOCTL_SET_ALLOC_RANGE: > return nilfs_ioctl_set_alloc_range(inode, argp); > + case FITRIM: > + return nilfs_ioctl_trim_fs(inode, argp); > default: > return -ENOTTY; > } > @@ -1235,6 +1280,7 @@ long nilfs_compat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg) > case NILFS_IOCTL_SYNC: > case NILFS_IOCTL_RESIZE: > case NILFS_IOCTL_SET_ALLOC_RANGE: > + case FITRIM: > break; > default: > return -ENOIOCTLCMD; > -- > 1.8.5.4 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 0/2] nilfs2: add support for FITRIM ioctl @ 2014-02-17 22:39 Andreas Rohner 0 siblings, 0 replies; 18+ messages in thread From: Andreas Rohner @ 2014-02-17 22:39 UTC (permalink / raw) To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA; +Cc: Andreas Rohner Hi, This patch adds support for the FITRIM ioctl, which allows user space tools like fstrim to issue TRIM/DISCARD requests to the underlying device. It takes a fstrim_range structure as a parameter and for every clean segment in the specified range the function blkdev_issue_discard is called. The range is truncated to sector boundaries. Best regards, Andreas Rohner --- v1->v2 (based on review by Ryusuke Konishi) * Remove upper limit of minlen * Add check for minlen * Round range to sector boundary instead of segment boundary * Fix minor bug * Use kmap_atomic instead of kmap * Move input checks to ioctl.c * Use nilfs_sufile_segment_usages_in_block() -- Andreas Rohner (2): nilfs2: add nilfs_sufile_trim_fs to trim clean segs nilfs2: add FITRIM ioctl support for nilfs2 fs/nilfs2/ioctl.c | 52 +++++++++++++++++++ fs/nilfs2/sufile.c | 144 +++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/nilfs2/sufile.h | 1 + 3 files changed, 197 insertions(+) -- 1.9.0 -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2014-02-17 22:39 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-02-15 13:34 [PATCH 0/2] nilfs2: add support for FITRIM ioctl Andreas Rohner [not found] ` <cover.1392470644.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org> 2014-02-15 13:34 ` [PATCH 1/2] nilfs2: add nilfs_sufile_trim_fs to trim clean segs Andreas Rohner [not found] ` <02d463460388a9bdc1413aa4d6fdfa5a402b9452.1392470644.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org> 2014-02-17 3:00 ` Ryusuke Konishi [not found] ` <20140217.120000.397306175.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> 2014-02-17 4:27 ` Ryusuke Konishi 2014-02-17 9:06 ` Andreas Rohner [not found] ` <5301D130.9050000-hi6Y0CQ0nG0@public.gmane.org> 2014-02-17 10:42 ` Ryusuke Konishi [not found] ` <20140217.194200.431478798.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> 2014-02-17 11:04 ` Andreas Rohner [not found] ` <5301ECD4.5070200-hi6Y0CQ0nG0@public.gmane.org> 2014-02-17 11:39 ` Ryusuke Konishi 2014-02-17 9:17 ` Andreas Rohner [not found] ` <5301D3C5.6040704-hi6Y0CQ0nG0@public.gmane.org> 2014-02-17 10:06 ` Ryusuke Konishi [not found] ` <20140217.190610.135800072.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> 2014-02-17 10:34 ` Andreas Rohner [not found] ` <5301E5A8.3080701-hi6Y0CQ0nG0@public.gmane.org> 2014-02-17 11:21 ` Ryusuke Konishi [not found] ` <20140217.202143.05344630.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> 2014-02-17 11:53 ` Andreas Rohner [not found] ` <5301F852.8010105-hi6Y0CQ0nG0@public.gmane.org> 2014-02-17 13:28 ` Ryusuke Konishi [not found] ` <20140217.222811.27809410.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org> 2014-02-17 13:51 ` Andreas Rohner 2014-02-15 13:34 ` [PATCH 2/2] nilfs2: add FITRIM ioctl support for nilfs2 Andreas Rohner [not found] ` <c1b713f94e9ebdb093c539a4c79a0426c45639a9.1392470644.git.andreas.rohner-hi6Y0CQ0nG0@public.gmane.org> 2014-02-17 3:01 ` Ryusuke Konishi 2014-02-17 22:39 [PATCH 0/2] nilfs2: add support for FITRIM ioctl Andreas Rohner
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.