All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/1] nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs()
@ 2014-08-31 15:47 Andreas Rohner
       [not found] ` <1409500033-30791-1-git-send-email-andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Andreas Rohner @ 2014-08-31 15:47 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA; +Cc: Andreas Rohner

Hi,

I have looked a bit more into the semantics of the various flags
concerning block device caching behaviour. According to
"Documentation/block/writeback_cache_control.txt" a call to
blkdev_issue_flush() is equivalent to an empty bio with the
REQ_FLUSH flag set. So there is no need to call blkdev_issue_flush()
after a call to nilfs_commit_super(). But if there is no need to write
the super block an additional call to blkdev_issue_flush() is necessary.

To avoid an overhead I introduced the THE_NILFS_FLUSHED flag, which is
cleared whenever new logs are written and set whenever the block device
is flushed. If the super block was written during segment construction
or in nilfs_sync_fs(), then blkdev_issue_flush() is not called.

I am pretty sure, that there are no race conditions, but maybe someone
should check that possibility before merging this.

br,
Andreas Rohner

v1->v2
 * Add new flag THE_NILFS_FLUSHED

Andreas Rohner (1):
  nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs()

 fs/nilfs2/file.c      | 3 ++-
 fs/nilfs2/ioctl.c     | 3 ++-
 fs/nilfs2/segment.c   | 2 ++
 fs/nilfs2/super.c     | 8 ++++++++
 fs/nilfs2/the_nilfs.h | 6 ++++++
 5 files changed, 20 insertions(+), 2 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/1] nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs()
       [not found] ` <1409500033-30791-1-git-send-email-andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
@ 2014-08-31 15:47   ` Andreas Rohner
       [not found]     ` <1409500033-30791-2-git-send-email-andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Andreas Rohner @ 2014-08-31 15:47 UTC (permalink / raw)
  To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA; +Cc: Andreas Rohner

Under normal circumstances nilfs_sync_fs() writes out the super block,
which causes a flush of the underlying block device. But this depends on
the THE_NILFS_SB_DIRTY flag, which is only set if the pointer to the
last segment crosses a segment boundary. So if only a small amount of
data is written before the call to nilfs_sync_fs(), no flush of the
block device occurs.

In the above case an additional call to blkdev_issue_flush() is needed.
To prevent unnecessary overhead, the new flag THE_NILFS_FLUSHED is
introduced, which is cleared whenever new logs are written and set
whenever the block device is flushed.

Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
---
 fs/nilfs2/file.c      | 3 ++-
 fs/nilfs2/ioctl.c     | 3 ++-
 fs/nilfs2/segment.c   | 2 ++
 fs/nilfs2/super.c     | 8 ++++++++
 fs/nilfs2/the_nilfs.h | 6 ++++++
 5 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/fs/nilfs2/file.c b/fs/nilfs2/file.c
index 2497815..7857460 100644
--- a/fs/nilfs2/file.c
+++ b/fs/nilfs2/file.c
@@ -56,7 +56,8 @@ int nilfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 	mutex_unlock(&inode->i_mutex);
 
 	nilfs = inode->i_sb->s_fs_info;
-	if (!err && nilfs_test_opt(nilfs, BARRIER)) {
+	if (!err && nilfs_test_opt(nilfs, BARRIER) &&
+	    !test_and_set_nilfs_flushed(nilfs)) {
 		err = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL);
 		if (err != -EIO)
 			err = 0;
diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
index 422fb54..dc5d101 100644
--- a/fs/nilfs2/ioctl.c
+++ b/fs/nilfs2/ioctl.c
@@ -1022,7 +1022,8 @@ static int nilfs_ioctl_sync(struct inode *inode, struct file *filp,
 		return ret;
 
 	nilfs = inode->i_sb->s_fs_info;
-	if (nilfs_test_opt(nilfs, BARRIER)) {
+	if (nilfs_test_opt(nilfs, BARRIER) &&
+	    !test_and_set_nilfs_flushed(nilfs)) {
 		ret = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL);
 		if (ret == -EIO)
 			return ret;
diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index a1a1916..54a6be1 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -1842,6 +1842,8 @@ static void nilfs_segctor_complete_write(struct nilfs_sc_info *sci)
 		nilfs_segctor_clear_metadata_dirty(sci);
 	} else
 		clear_bit(NILFS_SC_SUPER_ROOT, &sci->sc_flags);
+
+	clear_nilfs_flushed(nilfs);
 }
 
 static int nilfs_segctor_wait(struct nilfs_sc_info *sci)
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index 228f5bd..332fdf0 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -310,6 +310,7 @@ int nilfs_commit_super(struct super_block *sb, int flag)
 					    nilfs->ns_sbsize));
 	}
 	clear_nilfs_sb_dirty(nilfs);
+	set_nilfs_flushed(nilfs);
 	return nilfs_sync_super(sb, flag);
 }
 
@@ -514,6 +515,13 @@ static int nilfs_sync_fs(struct super_block *sb, int wait)
 	}
 	up_write(&nilfs->ns_sem);
 
+	if (wait && !err && nilfs_test_opt(nilfs, BARRIER) &&
+	    !test_and_set_nilfs_flushed(nilfs)) {
+		err = blkdev_issue_flush(sb->s_bdev, GFP_KERNEL, NULL);
+		if (err != -EIO)
+			err = 0;
+	}
+
 	return err;
 }
 
diff --git a/fs/nilfs2/the_nilfs.h b/fs/nilfs2/the_nilfs.h
index d01ead1..d12a8ce 100644
--- a/fs/nilfs2/the_nilfs.h
+++ b/fs/nilfs2/the_nilfs.h
@@ -41,6 +41,7 @@ enum {
 	THE_NILFS_DISCONTINUED,	/* 'next' pointer chain has broken */
 	THE_NILFS_GC_RUNNING,	/* gc process is running */
 	THE_NILFS_SB_DIRTY,	/* super block is dirty */
+	THE_NILFS_FLUSHED,	/* volatile data was flushed to disk */
 };
 
 /**
@@ -202,6 +203,10 @@ struct the_nilfs {
 };
 
 #define THE_NILFS_FNS(bit, name)					\
+static inline int test_and_set_nilfs_##name(struct the_nilfs *nilfs)	\
+{									\
+	return test_and_set_bit(THE_NILFS_##bit, &(nilfs)->ns_flags);	\
+}									\
 static inline void set_nilfs_##name(struct the_nilfs *nilfs)		\
 {									\
 	set_bit(THE_NILFS_##bit, &(nilfs)->ns_flags);			\
@@ -219,6 +224,7 @@ THE_NILFS_FNS(INIT, init)
 THE_NILFS_FNS(DISCONTINUED, discontinued)
 THE_NILFS_FNS(GC_RUNNING, gc_running)
 THE_NILFS_FNS(SB_DIRTY, sb_dirty)
+THE_NILFS_FNS(FLUSHED, flushed)
 
 /*
  * Mount option operations
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/1] nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs()
       [not found]     ` <1409500033-30791-2-git-send-email-andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
@ 2014-09-01 17:59       ` Ryusuke Konishi
       [not found]         ` <20140902.025939.1812841141168890644.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Ryusuke Konishi @ 2014-09-01 17:59 UTC (permalink / raw)
  To: Andreas Rohner; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Andreas,
On Sun, 31 Aug 2014 17:47:13 +0200, Andreas Rohner wrote:
> Under normal circumstances nilfs_sync_fs() writes out the super block,
> which causes a flush of the underlying block device. But this depends on
> the THE_NILFS_SB_DIRTY flag, which is only set if the pointer to the
> last segment crosses a segment boundary. So if only a small amount of
> data is written before the call to nilfs_sync_fs(), no flush of the
> block device occurs.
> 
> In the above case an additional call to blkdev_issue_flush() is needed.
> To prevent unnecessary overhead, the new flag THE_NILFS_FLUSHED is
> introduced, which is cleared whenever new logs are written and set
> whenever the block device is flushed.
> 
> Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>

The patch looks good to me except that I feel the use of atomic
test-and-set bitwise operations something unfavorable (though it's
logically correct).  I will try to send this to upstream as is unless
a comment comes to mind.

Thanks,
Ryusuke Konishi

> ---
>  fs/nilfs2/file.c      | 3 ++-
>  fs/nilfs2/ioctl.c     | 3 ++-
>  fs/nilfs2/segment.c   | 2 ++
>  fs/nilfs2/super.c     | 8 ++++++++
>  fs/nilfs2/the_nilfs.h | 6 ++++++
>  5 files changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/nilfs2/file.c b/fs/nilfs2/file.c
> index 2497815..7857460 100644
> --- a/fs/nilfs2/file.c
> +++ b/fs/nilfs2/file.c
> @@ -56,7 +56,8 @@ int nilfs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
>  	mutex_unlock(&inode->i_mutex);
>  
>  	nilfs = inode->i_sb->s_fs_info;
> -	if (!err && nilfs_test_opt(nilfs, BARRIER)) {
> +	if (!err && nilfs_test_opt(nilfs, BARRIER) &&
> +	    !test_and_set_nilfs_flushed(nilfs)) {
>  		err = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL);
>  		if (err != -EIO)
>  			err = 0;
> diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
> index 422fb54..dc5d101 100644
> --- a/fs/nilfs2/ioctl.c
> +++ b/fs/nilfs2/ioctl.c
> @@ -1022,7 +1022,8 @@ static int nilfs_ioctl_sync(struct inode *inode, struct file *filp,
>  		return ret;
>  
>  	nilfs = inode->i_sb->s_fs_info;
> -	if (nilfs_test_opt(nilfs, BARRIER)) {
> +	if (nilfs_test_opt(nilfs, BARRIER) &&
> +	    !test_and_set_nilfs_flushed(nilfs)) {
>  		ret = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL);
>  		if (ret == -EIO)
>  			return ret;
> diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
> index a1a1916..54a6be1 100644
> --- a/fs/nilfs2/segment.c
> +++ b/fs/nilfs2/segment.c
> @@ -1842,6 +1842,8 @@ static void nilfs_segctor_complete_write(struct nilfs_sc_info *sci)
>  		nilfs_segctor_clear_metadata_dirty(sci);
>  	} else
>  		clear_bit(NILFS_SC_SUPER_ROOT, &sci->sc_flags);
> +
> +	clear_nilfs_flushed(nilfs);
>  }
>  
>  static int nilfs_segctor_wait(struct nilfs_sc_info *sci)
> diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
> index 228f5bd..332fdf0 100644
> --- a/fs/nilfs2/super.c
> +++ b/fs/nilfs2/super.c
> @@ -310,6 +310,7 @@ int nilfs_commit_super(struct super_block *sb, int flag)
>  					    nilfs->ns_sbsize));
>  	}
>  	clear_nilfs_sb_dirty(nilfs);
> +	set_nilfs_flushed(nilfs);
>  	return nilfs_sync_super(sb, flag);
>  }
>  
> @@ -514,6 +515,13 @@ static int nilfs_sync_fs(struct super_block *sb, int wait)
>  	}
>  	up_write(&nilfs->ns_sem);
>  
> +	if (wait && !err && nilfs_test_opt(nilfs, BARRIER) &&
> +	    !test_and_set_nilfs_flushed(nilfs)) {
> +		err = blkdev_issue_flush(sb->s_bdev, GFP_KERNEL, NULL);
> +		if (err != -EIO)
> +			err = 0;
> +	}
> +
>  	return err;
>  }
>  
> diff --git a/fs/nilfs2/the_nilfs.h b/fs/nilfs2/the_nilfs.h
> index d01ead1..d12a8ce 100644
> --- a/fs/nilfs2/the_nilfs.h
> +++ b/fs/nilfs2/the_nilfs.h
> @@ -41,6 +41,7 @@ enum {
>  	THE_NILFS_DISCONTINUED,	/* 'next' pointer chain has broken */
>  	THE_NILFS_GC_RUNNING,	/* gc process is running */
>  	THE_NILFS_SB_DIRTY,	/* super block is dirty */
> +	THE_NILFS_FLUSHED,	/* volatile data was flushed to disk */
>  };
>  
>  /**
> @@ -202,6 +203,10 @@ struct the_nilfs {
>  };
>  
>  #define THE_NILFS_FNS(bit, name)					\
> +static inline int test_and_set_nilfs_##name(struct the_nilfs *nilfs)	\
> +{									\
> +	return test_and_set_bit(THE_NILFS_##bit, &(nilfs)->ns_flags);	\
> +}									\
>  static inline void set_nilfs_##name(struct the_nilfs *nilfs)		\
>  {									\
>  	set_bit(THE_NILFS_##bit, &(nilfs)->ns_flags);			\
> @@ -219,6 +224,7 @@ THE_NILFS_FNS(INIT, init)
>  THE_NILFS_FNS(DISCONTINUED, discontinued)
>  THE_NILFS_FNS(GC_RUNNING, gc_running)
>  THE_NILFS_FNS(SB_DIRTY, sb_dirty)
> +THE_NILFS_FNS(FLUSHED, flushed)
>  
>  /*
>   * Mount option operations
> -- 
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/1] nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs()
       [not found]         ` <20140902.025939.1812841141168890644.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
@ 2014-09-01 18:43           ` Andreas Rohner
       [not found]             ` <5404BE5C.9080302-hi6Y0CQ0nG0@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Andreas Rohner @ 2014-09-01 18:43 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Ryusuke,
On 2014-09-01 19:59, Ryusuke Konishi wrote:
> On Sun, 31 Aug 2014 17:47:13 +0200, Andreas Rohner wrote:
>> Under normal circumstances nilfs_sync_fs() writes out the super block,
>> which causes a flush of the underlying block device. But this depends on
>> the THE_NILFS_SB_DIRTY flag, which is only set if the pointer to the
>> last segment crosses a segment boundary. So if only a small amount of
>> data is written before the call to nilfs_sync_fs(), no flush of the
>> block device occurs.
>>
>> In the above case an additional call to blkdev_issue_flush() is needed.
>> To prevent unnecessary overhead, the new flag THE_NILFS_FLUSHED is
>> introduced, which is cleared whenever new logs are written and set
>> whenever the block device is flushed.
>>
>> Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
> 
> The patch looks good to me except that I feel the use of atomic
> test-and-set bitwise operations something unfavorable (though it's
> logically correct).  I will try to send this to upstream as is unless
> a comment comes to mind.

I originally thought, that it is necessary to do it atomically to avoid
a race condition, but I am not so sure about that any more. I think the
only case we have to avoid is, to call set_nilfs_flushed() after
blkdev_issue_flush(), because this could race with the
clear_nilfs_flushed() from the segment construction. So this should also
work:

 +	if (wait && !err && nilfs_test_opt(nilfs, BARRIER) &&
 +	    !nilfs_flushed(nilfs)) {
 +              set_nilfs_flushed(nilfs);
 +		err = blkdev_issue_flush(sb->s_bdev, GFP_KERNEL, NULL);
 +		if (err != -EIO)
 +			err = 0;
 +	}
 +

What do you think?

br,
Andreas Rohner

> Thanks,
> Ryusuke Konishi

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/1] nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs()
       [not found]             ` <5404BE5C.9080302-hi6Y0CQ0nG0@public.gmane.org>
@ 2014-09-01 19:18               ` Andreas Rohner
       [not found]                 ` <5404C686.4070800-hi6Y0CQ0nG0@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Andreas Rohner @ 2014-09-01 19:18 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On 2014-09-01 20:43, Andreas Rohner wrote:
> Hi Ryusuke,
> On 2014-09-01 19:59, Ryusuke Konishi wrote:
>> On Sun, 31 Aug 2014 17:47:13 +0200, Andreas Rohner wrote:
>>> Under normal circumstances nilfs_sync_fs() writes out the super block,
>>> which causes a flush of the underlying block device. But this depends on
>>> the THE_NILFS_SB_DIRTY flag, which is only set if the pointer to the
>>> last segment crosses a segment boundary. So if only a small amount of
>>> data is written before the call to nilfs_sync_fs(), no flush of the
>>> block device occurs.
>>>
>>> In the above case an additional call to blkdev_issue_flush() is needed.
>>> To prevent unnecessary overhead, the new flag THE_NILFS_FLUSHED is
>>> introduced, which is cleared whenever new logs are written and set
>>> whenever the block device is flushed.
>>>
>>> Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
>>
>> The patch looks good to me except that I feel the use of atomic
>> test-and-set bitwise operations something unfavorable (though it's
>> logically correct).  I will try to send this to upstream as is unless
>> a comment comes to mind.
> 
> I originally thought, that it is necessary to do it atomically to avoid
> a race condition, but I am not so sure about that any more. I think the
> only case we have to avoid is, to call set_nilfs_flushed() after
> blkdev_issue_flush(), because this could race with the
> clear_nilfs_flushed() from the segment construction. So this should also
> work:
> 
>  +	if (wait && !err && nilfs_test_opt(nilfs, BARRIER) &&
>  +	    !nilfs_flushed(nilfs)) {
>  +              set_nilfs_flushed(nilfs);
>  +		err = blkdev_issue_flush(sb->s_bdev, GFP_KERNEL, NULL);
>  +		if (err != -EIO)
>  +			err = 0;
>  +	}
>  +

On the other hand, it says in the comments to set_bit(), that it can be
reordered on architectures other than x86. test_and_set_bit() implies a
memory barrier on all architectures. But I don't think the processor
would reorder set_nilfs_flushed() after the external function call to
blkdev_issue_flush(), would it?

/**
 * set_bit - Atomically set a bit in memory
 * @nr: the bit to set
 * @addr: the address to start counting from
 *
 * This function is atomic and may not be reordered.  See __set_bit()
 * if you do not require the atomic guarantees.
 *
 * Note: there are no guarantees that this function will not be reordered
 * on non x86 architectures, so if you are writing portable code,
 * make sure not to rely on its reordering guarantees.
 */

br,
Andreas Rohner
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/1] nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs()
       [not found]                 ` <5404C686.4070800-hi6Y0CQ0nG0@public.gmane.org>
@ 2014-09-03  0:35                   ` Ryusuke Konishi
       [not found]                     ` <20140903.093525.810247375407684014.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Ryusuke Konishi @ 2014-09-03  0:35 UTC (permalink / raw)
  To: andreas.rohner-hi6Y0CQ0nG0; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Mon, 01 Sep 2014 21:18:30 +0200, Andreas Rohner wrote:
> On 2014-09-01 20:43, Andreas Rohner wrote:
>> Hi Ryusuke,
>> On 2014-09-01 19:59, Ryusuke Konishi wrote:
>>> On Sun, 31 Aug 2014 17:47:13 +0200, Andreas Rohner wrote:
>>>> Under normal circumstances nilfs_sync_fs() writes out the super block,
>>>> which causes a flush of the underlying block device. But this depends on
>>>> the THE_NILFS_SB_DIRTY flag, which is only set if the pointer to the
>>>> last segment crosses a segment boundary. So if only a small amount of
>>>> data is written before the call to nilfs_sync_fs(), no flush of the
>>>> block device occurs.
>>>>
>>>> In the above case an additional call to blkdev_issue_flush() is needed.
>>>> To prevent unnecessary overhead, the new flag THE_NILFS_FLUSHED is
>>>> introduced, which is cleared whenever new logs are written and set
>>>> whenever the block device is flushed.
>>>>
>>>> Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
>>>
>>> The patch looks good to me except that I feel the use of atomic
>>> test-and-set bitwise operations something unfavorable (though it's
>>> logically correct).  I will try to send this to upstream as is unless
>>> a comment comes to mind.
>> 
>> I originally thought, that it is necessary to do it atomically to avoid
>> a race condition, but I am not so sure about that any more. I think the
>> only case we have to avoid is, to call set_nilfs_flushed() after
>> blkdev_issue_flush(), because this could race with the
>> clear_nilfs_flushed() from the segment construction. So this should also
>> work:
>> 
>>  +	if (wait && !err && nilfs_test_opt(nilfs, BARRIER) &&
>>  +	    !nilfs_flushed(nilfs)) {
>>  +              set_nilfs_flushed(nilfs);
>>  +		err = blkdev_issue_flush(sb->s_bdev, GFP_KERNEL, NULL);
>>  +		if (err != -EIO)
>>  +			err = 0;
>>  +	}
>>  +
> 
> On the other hand, it says in the comments to set_bit(), that it can be
> reordered on architectures other than x86. test_and_set_bit() implies a
> memory barrier on all architectures. But I don't think the processor
> would reorder set_nilfs_flushed() after the external function call to
> blkdev_issue_flush(), would it?

I believe compiler doesn't reorder set_bit() operation after an
external function call unless it knows the content of the function and
the function can be optimized.  But, yes, set_bit() doesn't imply
memory barrier unlike test_and_set_bit().  As for
blkdev_issue_flush(), it would imply memory barrier by some lock
functions or other primitive used inside it.  (I haven't actually
confirmed that the premise is true)

On the other hand, we need explicit barrier operation like
smp_mb__after_atomic() if a certain operation is performed after
set_bit() and the changed bit should be visible to other processors
before the operation.

Regards,
Ryusuke Konishi


> 
> /**
>  * set_bit - Atomically set a bit in memory
>  * @nr: the bit to set
>  * @addr: the address to start counting from
>  *
>  * This function is atomic and may not be reordered.  See __set_bit()
>  * if you do not require the atomic guarantees.
>  *
>  * Note: there are no guarantees that this function will not be reordered
>  * on non x86 architectures, so if you are writing portable code,
>  * make sure not to rely on its reordering guarantees.
>  */
> 
> br,
> Andreas Rohner
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/1] nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs()
       [not found]                     ` <20140903.093525.810247375407684014.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
@ 2014-09-03 12:32                       ` Andreas Rohner
       [not found]                         ` <54070A56.9050807-hi6Y0CQ0nG0@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Andreas Rohner @ 2014-09-03 12:32 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On 2014-09-03 02:35, Ryusuke Konishi wrote:
> On Mon, 01 Sep 2014 21:18:30 +0200, Andreas Rohner wrote:
>> On 2014-09-01 20:43, Andreas Rohner wrote:
>>> Hi Ryusuke,
>>> On 2014-09-01 19:59, Ryusuke Konishi wrote:
>>>> On Sun, 31 Aug 2014 17:47:13 +0200, Andreas Rohner wrote:
>>>>> Under normal circumstances nilfs_sync_fs() writes out the super block,
>>>>> which causes a flush of the underlying block device. But this depends on
>>>>> the THE_NILFS_SB_DIRTY flag, which is only set if the pointer to the
>>>>> last segment crosses a segment boundary. So if only a small amount of
>>>>> data is written before the call to nilfs_sync_fs(), no flush of the
>>>>> block device occurs.
>>>>>
>>>>> In the above case an additional call to blkdev_issue_flush() is needed.
>>>>> To prevent unnecessary overhead, the new flag THE_NILFS_FLUSHED is
>>>>> introduced, which is cleared whenever new logs are written and set
>>>>> whenever the block device is flushed.
>>>>>
>>>>> Signed-off-by: Andreas Rohner <andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
>>>>
>>>> The patch looks good to me except that I feel the use of atomic
>>>> test-and-set bitwise operations something unfavorable (though it's
>>>> logically correct).  I will try to send this to upstream as is unless
>>>> a comment comes to mind.
>>>
>>> I originally thought, that it is necessary to do it atomically to avoid
>>> a race condition, but I am not so sure about that any more. I think the
>>> only case we have to avoid is, to call set_nilfs_flushed() after
>>> blkdev_issue_flush(), because this could race with the
>>> clear_nilfs_flushed() from the segment construction. So this should also
>>> work:
>>>
>>>  +	if (wait && !err && nilfs_test_opt(nilfs, BARRIER) &&
>>>  +	    !nilfs_flushed(nilfs)) {
>>>  +              set_nilfs_flushed(nilfs);
>>>  +		err = blkdev_issue_flush(sb->s_bdev, GFP_KERNEL, NULL);
>>>  +		if (err != -EIO)
>>>  +			err = 0;
>>>  +	}
>>>  +
>>
>> On the other hand, it says in the comments to set_bit(), that it can be
>> reordered on architectures other than x86. test_and_set_bit() implies a
>> memory barrier on all architectures. But I don't think the processor
>> would reorder set_nilfs_flushed() after the external function call to
>> blkdev_issue_flush(), would it?
> 
> I believe compiler doesn't reorder set_bit() operation after an
> external function call unless it knows the content of the function and
> the function can be optimized.  But, yes, set_bit() doesn't imply
> memory barrier unlike test_and_set_bit().  As for
> blkdev_issue_flush(), it would imply memory barrier by some lock
> functions or other primitive used inside it.  (I haven't actually
> confirmed that the premise is true)

Yes blkdev_issue_flush() probably implies a memory barrier.

> On the other hand, we need explicit barrier operation like
> smp_mb__after_atomic() if a certain operation is performed after
> set_bit() and the changed bit should be visible to other processors
> before the operation.

Great suggestion. I didn't know about those functions. Do we also need a
call to smp_mb__before_atomic() before clear_nilfs_flushed(nilfs) in
segment.c?

I would be happy to provide another version of the patch with
set_nilfs_flushed(nilfs) and smp_mb__after_atomic() if you prefer that
version over the test_and_set_bit approach...

br,
Andreas Rohner
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/1] nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs()
       [not found]                         ` <54070A56.9050807-hi6Y0CQ0nG0@public.gmane.org>
@ 2014-09-07  5:12                           ` Ryusuke Konishi
       [not found]                             ` <20140907.141232.2124135104047617747.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Ryusuke Konishi @ 2014-09-07  5:12 UTC (permalink / raw)
  To: Andreas Rohner; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

Hi Andreas,
On Wed, 03 Sep 2014 14:32:22 +0200, Andreas Rohner wrote:
> On 2014-09-03 02:35, Ryusuke Konishi wrote:
>> On Mon, 01 Sep 2014 21:18:30 +0200, Andreas Rohner wrote:
>> On the other hand, we need explicit barrier operation like
>> smp_mb__after_atomic() if a certain operation is performed after
>> set_bit() and the changed bit should be visible to other processors
>> before the operation.
> 
> Great suggestion. I didn't know about those functions.

I recommend you read Documentation/memory-barries.txt.  It's an
excellent document summarizing information on what we should know
about memory synchronization on smp.  Documentation/atomic_ops.txt
also contains some information on barriers related to atomic
operations.

> Do we also need a call to smp_mb__before_atomic() before
> clear_nilfs_flushed(nilfs) in segment.c?

I think the timing restrictions of this flag are not so serve.  The
only restrictions that the flag must ensure are:

 1) Bioes for log are completed before this flag is cleared.
 2) Clearing the flag is propagated to the processor executing
    nilfs_sync_fs() and nilfs_sync_file() before log writer returns.

The restriction (1) is guaranteed since nilfs_wait_on_logs() is called
before nilfs_segctor_complete_write().  This sequence appears at
nilfs_segctor_wait() function.

The restriction (2) looks to be satisfied by (at least)
nilfs_segctor_notify() that nilfs_segctor_construct() calls or
nilfs_transaction_unlock() that nilfs_construct_dsync_segment() calls.

> I would be happy to provide another version of the patch with
> set_nilfs_flushed(nilfs) and smp_mb__after_atomic() if you prefer that
> version over the test_and_set_bit approach...

Two additional comments:

- Splitting test_and_set_bit() into test_bit() and set_bit() can
  introduce a race condition.  Two processors can call test_bit() at
  the same time and both can call set_bit() and blkdev_issue_flush().
  But, this race is not critical.  It only allows duplicate
  blkdev_issue_flush() calls in the rare case, and I think it's
  ignorable.

- clear_nilfs_flushed() seems to be called more than necessary.
  Incomplete logs that the mount time recovery of nilfs doesn't
  salvage do not need to be flushed.  In this sense, it may be enough
  only for logs containing a super root and those for datasync
  nilfs_construct_dsync_segment() creates.

By the way, we are using atomic bit operations too much.  Even though
{set,clear}_bit() don't imply a memory barrier, they still imply a
lock prefix to protect the flag from other bit operations on ns_flags.
For load and store of integer variables which are properly aligned to
a cache line, modern processors naturally satisfy atomicity without
additional lock operations.  I think we can replace the flag with just
an integer variable like "int ns_flushed_device".  How do you think ?


Regards,
Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/1] nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs()
       [not found]                             ` <20140907.141232.2124135104047617747.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
@ 2014-09-08 19:03                               ` Andreas Rohner
       [not found]                                 ` <540DFD66.1090904-hi6Y0CQ0nG0@public.gmane.org>
  0 siblings, 1 reply; 10+ messages in thread
From: Andreas Rohner @ 2014-09-08 19:03 UTC (permalink / raw)
  To: Ryusuke Konishi; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA


Hi Ryusuke,

Sorry for the late response I was busy over the weekend.

On 2014-09-07 07:12, Ryusuke Konishi wrote:
> Hi Andreas,
> On Wed, 03 Sep 2014 14:32:22 +0200, Andreas Rohner wrote:
>> On 2014-09-03 02:35, Ryusuke Konishi wrote:
>>> On Mon, 01 Sep 2014 21:18:30 +0200, Andreas Rohner wrote:
>>> On the other hand, we need explicit barrier operation like
>>> smp_mb__after_atomic() if a certain operation is performed after
>>> set_bit() and the changed bit should be visible to other processors
>>> before the operation.
>>
>> Great suggestion. I didn't know about those functions.
> 
> I recommend you read Documentation/memory-barries.txt.  It's an
> excellent document summarizing information on what we should know
> about memory synchronization on smp.  Documentation/atomic_ops.txt
> also contains some information on barriers related to atomic
> operations.
> 
>> Do we also need a call to smp_mb__before_atomic() before
>> clear_nilfs_flushed(nilfs) in segment.c?
> 
> I think the timing restrictions of this flag are not so serve.  The
> only restrictions that the flag must ensure are:
> 
>  1) Bioes for log are completed before this flag is cleared.
>  2) Clearing the flag is propagated to the processor executing
>     nilfs_sync_fs() and nilfs_sync_file() before log writer returns.
> 
> The restriction (1) is guaranteed since nilfs_wait_on_logs() is called
> before nilfs_segctor_complete_write().  This sequence appears at
> nilfs_segctor_wait() function.
> 
> The restriction (2) looks to be satisfied by (at least)
> nilfs_segctor_notify() that nilfs_segctor_construct() calls or
> nilfs_transaction_unlock() that nilfs_construct_dsync_segment() calls.

I agree with both points.

>> I would be happy to provide another version of the patch with
>> set_nilfs_flushed(nilfs) and smp_mb__after_atomic() if you prefer that
>> version over the test_and_set_bit approach...
> 
> Two additional comments:
> 
> - Splitting test_and_set_bit() into test_bit() and set_bit() can
>   introduce a race condition.  Two processors can call test_bit() at
>   the same time and both can call set_bit() and blkdev_issue_flush().
>   But, this race is not critical.  It only allows duplicate
>   blkdev_issue_flush() calls in the rare case, and I think it's
>   ignorable.

I agree.

> - clear_nilfs_flushed() seems to be called more than necessary.
>   Incomplete logs that the mount time recovery of nilfs doesn't
>   salvage do not need to be flushed.  In this sense, it may be enough
>   only for logs containing a super root and those for datasync
>   nilfs_construct_dsync_segment() creates.

Yes you are right I will change that as well.

On the other hand it seems to me, that almost any file operation causes
a super root to be written. Even if you use fdatasync(). If the i_mtime
on the inode has to be changed, then NILFS_I_INODE_DIRTY is set and the
fdatasync() turns into a normal sync(), which always writes a super
root. Every write() to a file causes an update of i_mtime. I could only
make fdatasync() work as intended with mmap(), but maybe I am missing
something. So we may not save us a lot of updates by only updating the
flag in case the log contains a super root...

> By the way, we are using atomic bit operations too much.  Even though
> {set,clear}_bit() don't imply a memory barrier, they still imply a
> lock prefix to protect the flag from other bit operations on ns_flags.
> For load and store of integer variables which are properly aligned to
> a cache line, modern processors naturally satisfy atomicity without
> additional lock operations.  I think we can replace the flag with just
> an integer variable like "int ns_flushed_device".  How do you think ?

I think that is a good idea. I will implement that right away.

br,
Andreas Rohner

--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 1/1] nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs()
       [not found]                                 ` <540DFD66.1090904-hi6Y0CQ0nG0@public.gmane.org>
@ 2014-09-09 18:55                                   ` Ryusuke Konishi
  0 siblings, 0 replies; 10+ messages in thread
From: Ryusuke Konishi @ 2014-09-09 18:55 UTC (permalink / raw)
  To: Andreas Rohner; +Cc: linux-nilfs-u79uwXL29TY76Z2rM5mHXA

On Mon, 08 Sep 2014 21:03:02 +0200, Andreas Rohner wrote:
> 
> Hi Ryusuke,
> 
> Sorry for the late response I was busy over the weekend.
> 
> On 2014-09-07 07:12, Ryusuke Konishi wrote:
>> - clear_nilfs_flushed() seems to be called more than necessary.
>>   Incomplete logs that the mount time recovery of nilfs doesn't
>>   salvage do not need to be flushed.  In this sense, it may be enough
>>   only for logs containing a super root and those for datasync
>>   nilfs_construct_dsync_segment() creates.
> 
> Yes you are right I will change that as well.
> 
> On the other hand it seems to me, that almost any file operation causes
> a super root to be written. Even if you use fdatasync(). If the i_mtime
> on the inode has to be changed, then NILFS_I_INODE_DIRTY is set and the
> fdatasync() turns into a normal sync(), which always writes a super
> root. Every write() to a file causes an update of i_mtime. I could only
> make fdatasync() work as intended with mmap(), but maybe I am missing
> something. So we may not save us a lot of updates by only updating the
> flag in case the log contains a super root...

Uum, this looks another problem. We need efficient fsync/fdatasync/
msync operation.  The above legacy implementation should be replaced
with some new ideas in which metadata update is handled more
efficiently and checkpoint creation is suppressed...

Ryusuke Konishi
--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-09-09 18:55 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-31 15:47 [PATCH v2 0/1] nilfs2: add missing blkdev_issue_flush() to nilfs_sync_fs() Andreas Rohner
     [not found] ` <1409500033-30791-1-git-send-email-andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
2014-08-31 15:47   ` [PATCH v2 1/1] " Andreas Rohner
     [not found]     ` <1409500033-30791-2-git-send-email-andreas.rohner-hi6Y0CQ0nG0@public.gmane.org>
2014-09-01 17:59       ` Ryusuke Konishi
     [not found]         ` <20140902.025939.1812841141168890644.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
2014-09-01 18:43           ` Andreas Rohner
     [not found]             ` <5404BE5C.9080302-hi6Y0CQ0nG0@public.gmane.org>
2014-09-01 19:18               ` Andreas Rohner
     [not found]                 ` <5404C686.4070800-hi6Y0CQ0nG0@public.gmane.org>
2014-09-03  0:35                   ` Ryusuke Konishi
     [not found]                     ` <20140903.093525.810247375407684014.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
2014-09-03 12:32                       ` Andreas Rohner
     [not found]                         ` <54070A56.9050807-hi6Y0CQ0nG0@public.gmane.org>
2014-09-07  5:12                           ` Ryusuke Konishi
     [not found]                             ` <20140907.141232.2124135104047617747.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
2014-09-08 19:03                               ` Andreas Rohner
     [not found]                                 ` <540DFD66.1090904-hi6Y0CQ0nG0@public.gmane.org>
2014-09-09 18:55                                   ` Ryusuke Konishi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.