linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] zonefs: introduce -o explicit-open mount option
@ 2020-09-04 11:23 Johannes Thumshirn
  2020-09-04 11:23 ` [PATCH 1/3] zonefs: introduce helper for zone management Johannes Thumshirn
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Johannes Thumshirn @ 2020-09-04 11:23 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: linux-fsdevel, Johannes Thumshirn

Introduce a mount option for explicitly opening a device's zones when opening
the seq zone file for writing. This way we prevent resource exhaustion on
devices that export a maximum open zones limit. 

Johannes Thumshirn (3):
  zonefs: introduce helper for zone management
  zonefs: open/close zone on file open/close
  zonefs: document the explicit-open mount option

 Documentation/filesystems/zonefs.rst |  15 +++
 fs/zonefs/super.c                    | 179 +++++++++++++++++++++++++--
 fs/zonefs/zonefs.h                   |  10 ++
 3 files changed, 196 insertions(+), 8 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] zonefs: introduce helper for zone management
  2020-09-04 11:23 [PATCH 0/3] zonefs: introduce -o explicit-open mount option Johannes Thumshirn
@ 2020-09-04 11:23 ` Johannes Thumshirn
  2020-09-07  2:55   ` Damien Le Moal
  2020-09-04 11:23 ` [PATCH 2/3] zonefs: open/close zone on file open/close Johannes Thumshirn
  2020-09-04 11:23 ` [PATCH 3/3] zonefs: document the explicit-open mount option Johannes Thumshirn
  2 siblings, 1 reply; 11+ messages in thread
From: Johannes Thumshirn @ 2020-09-04 11:23 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: linux-fsdevel, Johannes Thumshirn

Introduce a helper function for sending zone management commands to the
block device.

As zone management commands can change a zone write pointer position
reflected in the size of the zone file, this function expects the truncate
mutex to be held.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/zonefs/super.c | 27 +++++++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index 8ec7c8f109d7..9573aebee146 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -24,6 +24,26 @@
 
 #include "zonefs.h"
 
+static inline int zonefs_zone_mgmt(struct inode *inode,
+				   enum req_opf op)
+{
+	struct zonefs_inode_info *zi = ZONEFS_I(inode);
+	int ret;
+
+	lockdep_assert_held(&zi->i_truncate_mutex);
+
+	ret = blkdev_zone_mgmt(inode->i_sb->s_bdev, op, zi->i_zsector,
+			       zi->i_zone_size >> SECTOR_SHIFT, GFP_NOFS);
+	if (ret) {
+		zonefs_err(inode->i_sb,
+			   "Zone management operation %d at %llu failed %d",
+			   op, zi->i_zsector, ret);
+		return ret;
+	}
+
+	return 0;
+}
+
 static int zonefs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 			      unsigned int flags, struct iomap *iomap,
 			      struct iomap *srcmap)
@@ -397,12 +417,11 @@ static int zonefs_file_truncate(struct inode *inode, loff_t isize)
 	if (isize == old_isize)
 		goto unlock;
 
-	ret = blkdev_zone_mgmt(inode->i_sb->s_bdev, op, zi->i_zsector,
-			       zi->i_zone_size >> SECTOR_SHIFT, GFP_NOFS);
+	ret = zonefs_zone_mgmt(inode, op);
 	if (ret) {
 		zonefs_err(inode->i_sb,
-			   "Zone management operation at %llu failed %d",
-			   zi->i_zsector, ret);
+			   "Zone management operation %s at %llu failed %d",
+			   blk_op_str(op), zi->i_zsector, ret);
 		goto unlock;
 	}
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] zonefs: open/close zone on file open/close
  2020-09-04 11:23 [PATCH 0/3] zonefs: introduce -o explicit-open mount option Johannes Thumshirn
  2020-09-04 11:23 ` [PATCH 1/3] zonefs: introduce helper for zone management Johannes Thumshirn
@ 2020-09-04 11:23 ` Johannes Thumshirn
  2020-09-07  3:06   ` Damien Le Moal
  2020-09-07  4:32   ` Damien Le Moal
  2020-09-04 11:23 ` [PATCH 3/3] zonefs: document the explicit-open mount option Johannes Thumshirn
  2 siblings, 2 replies; 11+ messages in thread
From: Johannes Thumshirn @ 2020-09-04 11:23 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: linux-fsdevel, Johannes Thumshirn

NVMe Zoned Namespace introduced the concept of active zones, which are
zones in the implicit open, explicit open or closed condition. Drives may
have a limit on the number of zones that can be simultaneously active.
This potential limitation translate into a risk for applications to see
write IO errors due to this limit if the zone of a file being written to is
not already active when a write request is issued.

To avoid these potential errors, the zone of a file can explicitly be made
active using an open zone command when the file is open for the first
time. If the zone open command succeeds, the application is then
guaranteed that write requests can be processed. This indirect management
of active zones relies on the maximum number of open zones of a drive,
which is always lower or equal to the maximum number of active zones.

On the first open of a sequential zone file, send a REQ_OP_ZONE_OPEN
command to the block device. Conversely, on the last release of a zone
file and send a REQ_OP_ZONE_CLOSE to the device if the zone is not full or
empty.

As truncating a zone file to 0 or max can deactivate a zone as well, we
need to serialize against truncates and also be careful not to close a
zone that has active writers.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/zonefs/super.c  | 152 +++++++++++++++++++++++++++++++++++++++++++--
 fs/zonefs/zonefs.h |  10 +++
 2 files changed, 158 insertions(+), 4 deletions(-)

diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index 9573aebee146..3e32050d2de8 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -44,6 +44,80 @@ static inline int zonefs_zone_mgmt(struct inode *inode,
 	return 0;
 }
 
+static int zonefs_open_zone(struct inode *inode)
+{
+	struct zonefs_inode_info *zi = ZONEFS_I(inode);
+	struct zonefs_sb_info *sbi = ZONEFS_SB(inode->i_sb);
+	int ret = 0;
+
+	mutex_lock(&zi->i_truncate_mutex);
+
+	zi->i_wr_refcnt++;
+	if (zi->i_wr_refcnt == 1) {
+
+		if (atomic_inc_return(&sbi->s_open_zones) > sbi->s_max_open_zones) {
+			atomic_dec(&sbi->s_open_zones);
+			ret = -EBUSY;
+			goto unlock;
+		}
+
+		if (i_size_read(inode) < zi->i_max_size) {
+			ret = zonefs_zone_mgmt(inode, REQ_OP_ZONE_OPEN);
+			if (ret) {
+				zi->i_wr_refcnt--;
+				atomic_dec(&sbi->s_open_zones);
+				goto unlock;
+			}
+			zi->i_flags |= ZONEFS_ZONE_OPEN;
+		}
+	}
+
+unlock:
+	mutex_unlock(&zi->i_truncate_mutex);
+
+	return ret;
+}
+
+static int zonefs_close_zone(struct inode *inode)
+{
+	struct zonefs_inode_info *zi = ZONEFS_I(inode);
+	int ret = 0;
+
+	mutex_lock(&zi->i_truncate_mutex);
+
+	zi->i_wr_refcnt--;
+	if (!zi->i_wr_refcnt) {
+		struct zonefs_sb_info *sbi = ZONEFS_SB(inode->i_sb);
+
+		if (zi->i_flags & ZONEFS_ZONE_OPEN) {
+			ret = zonefs_zone_mgmt(inode, REQ_OP_ZONE_CLOSE);
+			if (ret)
+				goto unlock;
+			zi->i_flags &= ~ZONEFS_ZONE_OPEN;
+		}
+
+		atomic_dec(&sbi->s_open_zones);
+	}
+
+unlock:
+	mutex_unlock(&zi->i_truncate_mutex);
+
+	return ret;
+}
+
+static inline void zonefs_i_size_write(struct inode *inode, loff_t isize)
+{
+	struct zonefs_inode_info *zi = ZONEFS_I(inode);
+
+	i_size_write(inode, isize);
+	/*
+	 * A full zone is no longer open/active and does not need
+	 * explicit closing.
+	 */
+	if (isize >= zi->i_max_size)
+		zi->i_flags &= ~ZONEFS_ZONE_OPEN;
+}
+
 static int zonefs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
 			      unsigned int flags, struct iomap *iomap,
 			      struct iomap *srcmap)
@@ -335,7 +409,7 @@ static int zonefs_io_error_cb(struct blk_zone *zone, unsigned int idx,
 	 * invalid data.
 	 */
 	zonefs_update_stats(inode, data_size);
-	i_size_write(inode, data_size);
+	zonefs_i_size_write(inode, data_size);
 	zi->i_wpoffset = data_size;
 
 	return 0;
@@ -425,6 +499,25 @@ static int zonefs_file_truncate(struct inode *inode, loff_t isize)
 		goto unlock;
 	}
 
+	/*
+	 * If the mount option ZONEFS_MNTOPT_EXPLICIT_OPEN is set,
+	 * take care of open zones.
+	 */
+	if (zi->i_flags & ZONEFS_ZONE_OPEN) {
+		/*
+		 * Truncating a zone to EMPTY or FULL is the equivalent of
+		 * closing the zone. For a truncation to 0, we need to
+		 * re-open the zone to ensure new writes can be processed.
+		 * For a truncation to the maximum file size, the zone is
+		 * closed and writes cannot be accepted anymore, so clear
+		 * the open flag.
+		 */
+		if (!isize)
+			ret = zonefs_zone_mgmt(inode, REQ_OP_ZONE_OPEN);
+		else
+			zi->i_flags &= ~ZONEFS_ZONE_OPEN;
+	}
+
 	zonefs_update_stats(inode, isize);
 	truncate_setsize(inode, isize);
 	zi->i_wpoffset = isize;
@@ -603,7 +696,7 @@ static int zonefs_file_write_dio_end_io(struct kiocb *iocb, ssize_t size,
 		mutex_lock(&zi->i_truncate_mutex);
 		if (i_size_read(inode) < iocb->ki_pos + size) {
 			zonefs_update_stats(inode, iocb->ki_pos + size);
-			i_size_write(inode, iocb->ki_pos + size);
+			zonefs_i_size_write(inode, iocb->ki_pos + size);
 		}
 		mutex_unlock(&zi->i_truncate_mutex);
 	}
@@ -884,8 +977,47 @@ static ssize_t zonefs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	return ret;
 }
 
+static inline bool zonefs_file_use_exp_open(struct inode *inode, struct file *file)
+{
+	struct zonefs_inode_info *zi = ZONEFS_I(inode);
+	struct zonefs_sb_info *sbi = ZONEFS_SB(inode->i_sb);
+
+	if (!(sbi->s_mount_opts & ZONEFS_MNTOPT_EXPLICIT_OPEN))
+		return false;
+
+	if (zi->i_ztype != ZONEFS_ZTYPE_SEQ)
+		return false;
+
+	if (!(file->f_mode & FMODE_WRITE))
+		return false;
+
+	return true;
+}
+
+static int zonefs_file_open(struct inode *inode, struct file *file)
+{
+	int ret;
+
+	ret = generic_file_open(inode, file);
+	if (ret)
+		return ret;
+
+	if (zonefs_file_use_exp_open(inode, file))
+		return zonefs_open_zone(inode);
+
+	return 0;
+}
+
+static int zonefs_file_release(struct inode *inode, struct file *file)
+{
+	if (zonefs_file_use_exp_open(inode, file))
+		return zonefs_close_zone(inode);
+	return 0;
+}
+
 static const struct file_operations zonefs_file_operations = {
-	.open		= generic_file_open,
+	.open		= zonefs_file_open,
+	.release	= zonefs_file_release,
 	.fsync		= zonefs_file_fsync,
 	.mmap		= zonefs_file_mmap,
 	.llseek		= zonefs_file_llseek,
@@ -909,6 +1041,7 @@ static struct inode *zonefs_alloc_inode(struct super_block *sb)
 	inode_init_once(&zi->i_vnode);
 	mutex_init(&zi->i_truncate_mutex);
 	init_rwsem(&zi->i_mmap_sem);
+	zi->i_wr_refcnt = 0;
 
 	return &zi->i_vnode;
 }
@@ -959,7 +1092,7 @@ static int zonefs_statfs(struct dentry *dentry, struct kstatfs *buf)
 
 enum {
 	Opt_errors_ro, Opt_errors_zro, Opt_errors_zol, Opt_errors_repair,
-	Opt_err,
+	Opt_explicit_open, Opt_err,
 };
 
 static const match_table_t tokens = {
@@ -967,6 +1100,7 @@ static const match_table_t tokens = {
 	{ Opt_errors_zro,	"errors=zone-ro"},
 	{ Opt_errors_zol,	"errors=zone-offline"},
 	{ Opt_errors_repair,	"errors=repair"},
+	{ Opt_explicit_open,	"explicit-open" },
 	{ Opt_err,		NULL}
 };
 
@@ -1003,6 +1137,9 @@ static int zonefs_parse_options(struct super_block *sb, char *options)
 			sbi->s_mount_opts &= ~ZONEFS_MNTOPT_ERRORS_MASK;
 			sbi->s_mount_opts |= ZONEFS_MNTOPT_ERRORS_REPAIR;
 			break;
+		case Opt_explicit_open:
+			sbi->s_mount_opts |= ZONEFS_MNTOPT_EXPLICIT_OPEN;
+			break;
 		default:
 			return -EINVAL;
 		}
@@ -1422,6 +1559,13 @@ static int zonefs_fill_super(struct super_block *sb, void *data, int silent)
 	sbi->s_gid = GLOBAL_ROOT_GID;
 	sbi->s_perm = 0640;
 	sbi->s_mount_opts = ZONEFS_MNTOPT_ERRORS_RO;
+	sbi->s_max_open_zones = bdev_max_open_zones(sb->s_bdev);
+	atomic_set(&sbi->s_open_zones, 0);
+	if (!sbi->s_max_open_zones &&
+	    sbi->s_mount_opts & ZONEFS_MNTOPT_EXPLICIT_OPEN) {
+		zonefs_info(sb, "No open zones limit. Ignoring explicit_open mount option\n");
+		sbi->s_mount_opts &= ~ZONEFS_MNTOPT_EXPLICIT_OPEN;
+	}
 
 	ret = zonefs_read_super(sb);
 	if (ret)
diff --git a/fs/zonefs/zonefs.h b/fs/zonefs/zonefs.h
index 55b39970acb2..51141907097c 100644
--- a/fs/zonefs/zonefs.h
+++ b/fs/zonefs/zonefs.h
@@ -38,6 +38,8 @@ static inline enum zonefs_ztype zonefs_zone_type(struct blk_zone *zone)
 	return ZONEFS_ZTYPE_SEQ;
 }
 
+#define ZONEFS_ZONE_OPEN	(1 << 0)
+
 /*
  * In-memory inode data.
  */
@@ -74,6 +76,10 @@ struct zonefs_inode_info {
 	 */
 	struct mutex		i_truncate_mutex;
 	struct rw_semaphore	i_mmap_sem;
+
+	/* guarded by i_truncate_mutex */
+	unsigned int		i_wr_refcnt;
+	unsigned int		i_flags;
 };
 
 static inline struct zonefs_inode_info *ZONEFS_I(struct inode *inode)
@@ -154,6 +160,7 @@ enum zonefs_features {
 #define ZONEFS_MNTOPT_ERRORS_MASK	\
 	(ZONEFS_MNTOPT_ERRORS_RO | ZONEFS_MNTOPT_ERRORS_ZRO | \
 	 ZONEFS_MNTOPT_ERRORS_ZOL | ZONEFS_MNTOPT_ERRORS_REPAIR)
+#define ZONEFS_MNTOPT_EXPLICIT_OPEN	(1 << 4) /* Explicit open/close of zones on open/close */
 
 /*
  * In-memory Super block information.
@@ -175,6 +182,9 @@ struct zonefs_sb_info {
 
 	loff_t			s_blocks;
 	loff_t			s_used_blocks;
+
+	unsigned int		s_max_open_zones;
+	atomic_t		s_open_zones;
 };
 
 static inline struct zonefs_sb_info *ZONEFS_SB(struct super_block *sb)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] zonefs: document the explicit-open mount option
  2020-09-04 11:23 [PATCH 0/3] zonefs: introduce -o explicit-open mount option Johannes Thumshirn
  2020-09-04 11:23 ` [PATCH 1/3] zonefs: introduce helper for zone management Johannes Thumshirn
  2020-09-04 11:23 ` [PATCH 2/3] zonefs: open/close zone on file open/close Johannes Thumshirn
@ 2020-09-04 11:23 ` Johannes Thumshirn
  2020-09-04 15:17   ` Randy Dunlap
  2020-09-07  3:09   ` Damien Le Moal
  2 siblings, 2 replies; 11+ messages in thread
From: Johannes Thumshirn @ 2020-09-04 11:23 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: linux-fsdevel, Johannes Thumshirn

Document the newly introduced explicit-open mount option.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 Documentation/filesystems/zonefs.rst | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/Documentation/filesystems/zonefs.rst b/Documentation/filesystems/zonefs.rst
index 6c18bc8ce332..ff8bc3634bad 100644
--- a/Documentation/filesystems/zonefs.rst
+++ b/Documentation/filesystems/zonefs.rst
@@ -326,6 +326,21 @@ discover the amount of data that has been written to the zone. In the case of a
 read-only zone discovered at run-time, as indicated in the previous section.
 The size of the zone file is left unchanged from its last updated value.
 
+A zoned block device (e.g. a NVMe Zoned Namespace device) may have
+limits on the number of zones that can be active, that is, zones that
+are in the the implicit open, explicit open or closed conditions.
+This potential limitation translate into a risk for applications to see
+write IO errors due to this limit being exceeded if the zone of a file
+is not already active when a write request is issued by the user.
+
+To avoid these potential errors, the "explicit-open" mount option
+forces zones to be made active using an open zone command when a file
+is open for writing for the first time. If the zone open command
+succeeds, the application is then guaranteed that write requests can be
+processed. Conversely, the "explicit-open" mount option will result in
+a zone close command being issued to the device on the last close() of
+a zone file if the zone is not full nor empty.
+
 Zonefs User Space Tools
 =======================
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] zonefs: document the explicit-open mount option
  2020-09-04 11:23 ` [PATCH 3/3] zonefs: document the explicit-open mount option Johannes Thumshirn
@ 2020-09-04 15:17   ` Randy Dunlap
  2020-09-07  3:09   ` Damien Le Moal
  1 sibling, 0 replies; 11+ messages in thread
From: Randy Dunlap @ 2020-09-04 15:17 UTC (permalink / raw)
  To: Johannes Thumshirn, Damien Le Moal; +Cc: linux-fsdevel

Hi--

On 9/4/20 4:23 AM, Johannes Thumshirn wrote:
> Document the newly introduced explicit-open mount option.
> 
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>  Documentation/filesystems/zonefs.rst | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/Documentation/filesystems/zonefs.rst b/Documentation/filesystems/zonefs.rst
> index 6c18bc8ce332..ff8bc3634bad 100644
> --- a/Documentation/filesystems/zonefs.rst
> +++ b/Documentation/filesystems/zonefs.rst
> @@ -326,6 +326,21 @@ discover the amount of data that has been written to the zone. In the case of a
>  read-only zone discovered at run-time, as indicated in the previous section.
>  The size of the zone file is left unchanged from its last updated value.
>  
> +A zoned block device (e.g. a NVMe Zoned Namespace device) may have

I would say                   an NVMe

> +limits on the number of zones that can be active, that is, zones that
> +are in the the implicit open, explicit open or closed conditions.

          ^^ duplicate "the"

> +This potential limitation translate into a risk for applications to see

                             translates

> +write IO errors due to this limit being exceeded if the zone of a file
> +is not already active when a write request is issued by the user.
> +
> +To avoid these potential errors, the "explicit-open" mount option
> +forces zones to be made active using an open zone command when a file
> +is open for writing for the first time. If the zone open command

      opened

> +succeeds, the application is then guaranteed that write requests can be
> +processed. Conversely, the "explicit-open" mount option will result in
> +a zone close command being issued to the device on the last close() of
> +a zone file if the zone is not full nor empty.
> +
>  Zonefs User Space Tools
>  =======================
>  
> 

thanks.
-- 
~Randy


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] zonefs: introduce helper for zone management
  2020-09-04 11:23 ` [PATCH 1/3] zonefs: introduce helper for zone management Johannes Thumshirn
@ 2020-09-07  2:55   ` Damien Le Moal
  0 siblings, 0 replies; 11+ messages in thread
From: Damien Le Moal @ 2020-09-07  2:55 UTC (permalink / raw)
  To: Johannes Thumshirn; +Cc: linux-fsdevel

On 2020/09/04 20:23, Johannes Thumshirn wrote:
> Introduce a helper function for sending zone management commands to the
> block device.
> 
> As zone management commands can change a zone write pointer position
> reflected in the size of the zone file, this function expects the truncate
> mutex to be held.
> 
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>  fs/zonefs/super.c | 27 +++++++++++++++++++++++----
>  1 file changed, 23 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
> index 8ec7c8f109d7..9573aebee146 100644
> --- a/fs/zonefs/super.c
> +++ b/fs/zonefs/super.c
> @@ -24,6 +24,26 @@
>  
>  #include "zonefs.h"
>  
> +static inline int zonefs_zone_mgmt(struct inode *inode,
> +				   enum req_opf op)
> +{
> +	struct zonefs_inode_info *zi = ZONEFS_I(inode);
> +	int ret;
> +
> +	lockdep_assert_held(&zi->i_truncate_mutex);
> +
> +	ret = blkdev_zone_mgmt(inode->i_sb->s_bdev, op, zi->i_zsector,
> +			       zi->i_zone_size >> SECTOR_SHIFT, GFP_NOFS);
> +	if (ret) {
> +		zonefs_err(inode->i_sb,
> +			   "Zone management operation %d at %llu failed %d",
> +			   op, zi->i_zsector, ret);

Printing blk_op_str() instead of the raw op value would make the message easier
to understand.

> +		return ret;
> +	}
> +
> +	return 0;
> +}
> +
>  static int zonefs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
>  			      unsigned int flags, struct iomap *iomap,
>  			      struct iomap *srcmap)
> @@ -397,12 +417,11 @@ static int zonefs_file_truncate(struct inode *inode, loff_t isize)
>  	if (isize == old_isize)
>  		goto unlock;
>  
> -	ret = blkdev_zone_mgmt(inode->i_sb->s_bdev, op, zi->i_zsector,
> -			       zi->i_zone_size >> SECTOR_SHIFT, GFP_NOFS);
> +	ret = zonefs_zone_mgmt(inode, op);
>  	if (ret) {
>  		zonefs_err(inode->i_sb,
> -			   "Zone management operation at %llu failed %d",
> -			   zi->i_zsector, ret);
> +			   "Zone management operation %s at %llu failed %d",
> +			   blk_op_str(op), zi->i_zsector, ret);

That repeats the error message already printed by zonefs_zone_mgmt(). There is
no need for this one.

>  		goto unlock;
>  	}
>  
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] zonefs: open/close zone on file open/close
  2020-09-04 11:23 ` [PATCH 2/3] zonefs: open/close zone on file open/close Johannes Thumshirn
@ 2020-09-07  3:06   ` Damien Le Moal
  2020-09-07  8:49     ` Johannes Thumshirn
  2020-09-07  4:32   ` Damien Le Moal
  1 sibling, 1 reply; 11+ messages in thread
From: Damien Le Moal @ 2020-09-07  3:06 UTC (permalink / raw)
  To: Johannes Thumshirn; +Cc: linux-fsdevel

On 2020/09/04 20:23, Johannes Thumshirn wrote:
> NVMe Zoned Namespace introduced the concept of active zones, which are
> zones in the implicit open, explicit open or closed condition. Drives may
> have a limit on the number of zones that can be simultaneously active.
> This potential limitation translate into a risk for applications to see
> write IO errors due to this limit if the zone of a file being written to is
> not already active when a write request is issued.
> 
> To avoid these potential errors, the zone of a file can explicitly be made
> active using an open zone command when the file is open for the first
> time. If the zone open command succeeds, the application is then
> guaranteed that write requests can be processed. This indirect management
> of active zones relies on the maximum number of open zones of a drive,
> which is always lower or equal to the maximum number of active zones.
> 
> On the first open of a sequential zone file, send a REQ_OP_ZONE_OPEN
> command to the block device. Conversely, on the last release of a zone
> file and send a REQ_OP_ZONE_CLOSE to the device if the zone is not full or
> empty.
> 
> As truncating a zone file to 0 or max can deactivate a zone as well, we
> need to serialize against truncates and also be careful not to close a
> zone that has active writers.

May be you meant something like "leave a zone not active after a truncate when
the zone file is open for writing" ?

> 
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>  fs/zonefs/super.c  | 152 +++++++++++++++++++++++++++++++++++++++++++--
>  fs/zonefs/zonefs.h |  10 +++
>  2 files changed, 158 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
> index 9573aebee146..3e32050d2de8 100644
> --- a/fs/zonefs/super.c
> +++ b/fs/zonefs/super.c
> @@ -44,6 +44,80 @@ static inline int zonefs_zone_mgmt(struct inode *inode,
>  	return 0;
>  }
>  
> +static int zonefs_open_zone(struct inode *inode)
> +{
> +	struct zonefs_inode_info *zi = ZONEFS_I(inode);
> +	struct zonefs_sb_info *sbi = ZONEFS_SB(inode->i_sb);
> +	int ret = 0;
> +
> +	mutex_lock(&zi->i_truncate_mutex);
> +
> +	zi->i_wr_refcnt++;
> +	if (zi->i_wr_refcnt == 1) {
> +
> +		if (atomic_inc_return(&sbi->s_open_zones) > sbi->s_max_open_zones) {
> +			atomic_dec(&sbi->s_open_zones);
> +			ret = -EBUSY;
> +			goto unlock;
> +		}
> +
> +		if (i_size_read(inode) < zi->i_max_size) {
> +			ret = zonefs_zone_mgmt(inode, REQ_OP_ZONE_OPEN);
> +			if (ret) {
> +				zi->i_wr_refcnt--;
> +				atomic_dec(&sbi->s_open_zones);
> +				goto unlock;
> +			}
> +			zi->i_flags |= ZONEFS_ZONE_OPEN;
> +		}
> +	}
> +
> +unlock:
> +	mutex_unlock(&zi->i_truncate_mutex);
> +
> +	return ret;
> +}
> +
> +static int zonefs_close_zone(struct inode *inode)
> +{
> +	struct zonefs_inode_info *zi = ZONEFS_I(inode);
> +	int ret = 0;
> +
> +	mutex_lock(&zi->i_truncate_mutex);
> +
> +	zi->i_wr_refcnt--;
> +	if (!zi->i_wr_refcnt) {
> +		struct zonefs_sb_info *sbi = ZONEFS_SB(inode->i_sb);
> +
> +		if (zi->i_flags & ZONEFS_ZONE_OPEN) {
> +			ret = zonefs_zone_mgmt(inode, REQ_OP_ZONE_CLOSE);> +			if (ret)
> +				goto unlock;
> +			zi->i_flags &= ~ZONEFS_ZONE_OPEN;
> +		}
> +
> +		atomic_dec(&sbi->s_open_zones);
> +	}
> +
> +unlock:
> +	mutex_unlock(&zi->i_truncate_mutex);
> +
> +	return ret;
> +}
> +
> +static inline void zonefs_i_size_write(struct inode *inode, loff_t isize)
> +{
> +	struct zonefs_inode_info *zi = ZONEFS_I(inode);
> +
> +	i_size_write(inode, isize);
> +	/*
> +	 * A full zone is no longer open/active and does not need
> +	 * explicit closing.
> +	 */
> +	if (isize >= zi->i_max_size)
> +		zi->i_flags &= ~ZONEFS_ZONE_OPEN;
> +}
> +
>  static int zonefs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
>  			      unsigned int flags, struct iomap *iomap,
>  			      struct iomap *srcmap)
> @@ -335,7 +409,7 @@ static int zonefs_io_error_cb(struct blk_zone *zone, unsigned int idx,
>  	 * invalid data.
>  	 */
>  	zonefs_update_stats(inode, data_size);
> -	i_size_write(inode, data_size);
> +	zonefs_i_size_write(inode, data_size);
>  	zi->i_wpoffset = data_size;
>  
>  	return 0;
> @@ -425,6 +499,25 @@ static int zonefs_file_truncate(struct inode *inode, loff_t isize)
>  		goto unlock;
>  	}
>  
> +	/*
> +	 * If the mount option ZONEFS_MNTOPT_EXPLICIT_OPEN is set,
> +	 * take care of open zones.
> +	 */
> +	if (zi->i_flags & ZONEFS_ZONE_OPEN) {
> +		/*
> +		 * Truncating a zone to EMPTY or FULL is the equivalent of
> +		 * closing the zone. For a truncation to 0, we need to
> +		 * re-open the zone to ensure new writes can be processed.
> +		 * For a truncation to the maximum file size, the zone is
> +		 * closed and writes cannot be accepted anymore, so clear
> +		 * the open flag.
> +		 */
> +		if (!isize)
> +			ret = zonefs_zone_mgmt(inode, REQ_OP_ZONE_OPEN);
> +		else
> +			zi->i_flags &= ~ZONEFS_ZONE_OPEN;
> +	}
> +
>  	zonefs_update_stats(inode, isize);
>  	truncate_setsize(inode, isize);
>  	zi->i_wpoffset = isize;
> @@ -603,7 +696,7 @@ static int zonefs_file_write_dio_end_io(struct kiocb *iocb, ssize_t size,
>  		mutex_lock(&zi->i_truncate_mutex);
>  		if (i_size_read(inode) < iocb->ki_pos + size) {
>  			zonefs_update_stats(inode, iocb->ki_pos + size);
> -			i_size_write(inode, iocb->ki_pos + size);
> +			zonefs_i_size_write(inode, iocb->ki_pos + size);
>  		}
>  		mutex_unlock(&zi->i_truncate_mutex);
>  	}
> @@ -884,8 +977,47 @@ static ssize_t zonefs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>  	return ret;
>  }
>  
> +static inline bool zonefs_file_use_exp_open(struct inode *inode, struct file *file)
> +{
> +	struct zonefs_inode_info *zi = ZONEFS_I(inode);
> +	struct zonefs_sb_info *sbi = ZONEFS_SB(inode->i_sb);
> +
> +	if (!(sbi->s_mount_opts & ZONEFS_MNTOPT_EXPLICIT_OPEN))
> +		return false;
> +
> +	if (zi->i_ztype != ZONEFS_ZTYPE_SEQ)
> +		return false;
> +
> +	if (!(file->f_mode & FMODE_WRITE))
> +		return false;
> +
> +	return true;
> +}
> +
> +static int zonefs_file_open(struct inode *inode, struct file *file)
> +{
> +	int ret;
> +
> +	ret = generic_file_open(inode, file);
> +	if (ret)
> +		return ret;
> +
> +	if (zonefs_file_use_exp_open(inode, file))
> +		return zonefs_open_zone(inode);
> +
> +	return 0;
> +}
> +
> +static int zonefs_file_release(struct inode *inode, struct file *file)
> +{
> +	if (zonefs_file_use_exp_open(inode, file))
> +		return zonefs_close_zone(inode);
> +	return 0;
> +}
> +
>  static const struct file_operations zonefs_file_operations = {
> -	.open		= generic_file_open,
> +	.open		= zonefs_file_open,
> +	.release	= zonefs_file_release,
>  	.fsync		= zonefs_file_fsync,
>  	.mmap		= zonefs_file_mmap,
>  	.llseek		= zonefs_file_llseek,
> @@ -909,6 +1041,7 @@ static struct inode *zonefs_alloc_inode(struct super_block *sb)
>  	inode_init_once(&zi->i_vnode);
>  	mutex_init(&zi->i_truncate_mutex);
>  	init_rwsem(&zi->i_mmap_sem);
> +	zi->i_wr_refcnt = 0;
>  
>  	return &zi->i_vnode;
>  }
> @@ -959,7 +1092,7 @@ static int zonefs_statfs(struct dentry *dentry, struct kstatfs *buf)
>  
>  enum {
>  	Opt_errors_ro, Opt_errors_zro, Opt_errors_zol, Opt_errors_repair,
> -	Opt_err,
> +	Opt_explicit_open, Opt_err,
>  };
>  
>  static const match_table_t tokens = {
> @@ -967,6 +1100,7 @@ static const match_table_t tokens = {
>  	{ Opt_errors_zro,	"errors=zone-ro"},
>  	{ Opt_errors_zol,	"errors=zone-offline"},
>  	{ Opt_errors_repair,	"errors=repair"},
> +	{ Opt_explicit_open,	"explicit-open" },
>  	{ Opt_err,		NULL}
>  };
>  
> @@ -1003,6 +1137,9 @@ static int zonefs_parse_options(struct super_block *sb, char *options)
>  			sbi->s_mount_opts &= ~ZONEFS_MNTOPT_ERRORS_MASK;
>  			sbi->s_mount_opts |= ZONEFS_MNTOPT_ERRORS_REPAIR;
>  			break;
> +		case Opt_explicit_open:
> +			sbi->s_mount_opts |= ZONEFS_MNTOPT_EXPLICIT_OPEN;
> +			break;
>  		default:
>  			return -EINVAL;
>  		}
> @@ -1422,6 +1559,13 @@ static int zonefs_fill_super(struct super_block *sb, void *data, int silent)
>  	sbi->s_gid = GLOBAL_ROOT_GID;
>  	sbi->s_perm = 0640;
>  	sbi->s_mount_opts = ZONEFS_MNTOPT_ERRORS_RO;
> +	sbi->s_max_open_zones = bdev_max_open_zones(sb->s_bdev);
> +	atomic_set(&sbi->s_open_zones, 0);
> +	if (!sbi->s_max_open_zones &&
> +	    sbi->s_mount_opts & ZONEFS_MNTOPT_EXPLICIT_OPEN) {
> +		zonefs_info(sb, "No open zones limit. Ignoring explicit_open mount option\n");
> +		sbi->s_mount_opts &= ~ZONEFS_MNTOPT_EXPLICIT_OPEN;
> +	}
>  
>  	ret = zonefs_read_super(sb);
>  	if (ret)
> diff --git a/fs/zonefs/zonefs.h b/fs/zonefs/zonefs.h
> index 55b39970acb2..51141907097c 100644
> --- a/fs/zonefs/zonefs.h
> +++ b/fs/zonefs/zonefs.h
> @@ -38,6 +38,8 @@ static inline enum zonefs_ztype zonefs_zone_type(struct blk_zone *zone)
>  	return ZONEFS_ZTYPE_SEQ;
>  }
>  
> +#define ZONEFS_ZONE_OPEN	(1 << 0)
> +
>  /*
>   * In-memory inode data.
>   */
> @@ -74,6 +76,10 @@ struct zonefs_inode_info {
>  	 */
>  	struct mutex		i_truncate_mutex;
>  	struct rw_semaphore	i_mmap_sem;
> +
> +	/* guarded by i_truncate_mutex */
> +	unsigned int		i_wr_refcnt;
> +	unsigned int		i_flags;
>  };
>  
>  static inline struct zonefs_inode_info *ZONEFS_I(struct inode *inode)
> @@ -154,6 +160,7 @@ enum zonefs_features {
>  #define ZONEFS_MNTOPT_ERRORS_MASK	\
>  	(ZONEFS_MNTOPT_ERRORS_RO | ZONEFS_MNTOPT_ERRORS_ZRO | \
>  	 ZONEFS_MNTOPT_ERRORS_ZOL | ZONEFS_MNTOPT_ERRORS_REPAIR)
> +#define ZONEFS_MNTOPT_EXPLICIT_OPEN	(1 << 4) /* Explicit open/close of zones on open/close */
>  
>  /*
>   * In-memory Super block information.
> @@ -175,6 +182,9 @@ struct zonefs_sb_info {
>  
>  	loff_t			s_blocks;
>  	loff_t			s_used_blocks;
> +
> +	unsigned int		s_max_open_zones;
> +	atomic_t		s_open_zones;
>  };
>  
>  static inline struct zonefs_sb_info *ZONEFS_SB(struct super_block *sb)
> 

Apart from the commit message nit, looks good to me.

Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] zonefs: document the explicit-open mount option
  2020-09-04 11:23 ` [PATCH 3/3] zonefs: document the explicit-open mount option Johannes Thumshirn
  2020-09-04 15:17   ` Randy Dunlap
@ 2020-09-07  3:09   ` Damien Le Moal
  1 sibling, 0 replies; 11+ messages in thread
From: Damien Le Moal @ 2020-09-07  3:09 UTC (permalink / raw)
  To: Johannes Thumshirn; +Cc: linux-fsdevel

On 2020/09/04 20:23, Johannes Thumshirn wrote:
> Document the newly introduced explicit-open mount option.
> 
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>  Documentation/filesystems/zonefs.rst | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/Documentation/filesystems/zonefs.rst b/Documentation/filesystems/zonefs.rst
> index 6c18bc8ce332..ff8bc3634bad 100644
> --- a/Documentation/filesystems/zonefs.rst
> +++ b/Documentation/filesystems/zonefs.rst
> @@ -326,6 +326,21 @@ discover the amount of data that has been written to the zone. In the case of a
>  read-only zone discovered at run-time, as indicated in the previous section.
>  The size of the zone file is left unchanged from its last updated value.
>  
> +A zoned block device (e.g. a NVMe Zoned Namespace device) may have
> +limits on the number of zones that can be active, that is, zones that
> +are in the the implicit open, explicit open or closed conditions.
> +This potential limitation translate into a risk for applications to see
> +write IO errors due to this limit being exceeded if the zone of a file
> +is not already active when a write request is issued by the user.
> +
> +To avoid these potential errors, the "explicit-open" mount option
> +forces zones to be made active using an open zone command when a file
> +is open for writing for the first time. If the zone open command
> +succeeds, the application is then guaranteed that write requests can be
> +processed. Conversely, the "explicit-open" mount option will result in
> +a zone close command being issued to the device on the last close() of
> +a zone file if the zone is not full nor empty.
> +
>  Zonefs User Space Tools
>  =======================
>  
> 

Looks good. Please resend with Randy's comments addressed.
Thanks !

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] zonefs: open/close zone on file open/close
  2020-09-04 11:23 ` [PATCH 2/3] zonefs: open/close zone on file open/close Johannes Thumshirn
  2020-09-07  3:06   ` Damien Le Moal
@ 2020-09-07  4:32   ` Damien Le Moal
  1 sibling, 0 replies; 11+ messages in thread
From: Damien Le Moal @ 2020-09-07  4:32 UTC (permalink / raw)
  To: Johannes Thumshirn; +Cc: linux-fsdevel

On 2020/09/04 20:23, Johannes Thumshirn wrote:
> NVMe Zoned Namespace introduced the concept of active zones, which are
> zones in the implicit open, explicit open or closed condition. Drives may
> have a limit on the number of zones that can be simultaneously active.
> This potential limitation translate into a risk for applications to see
> write IO errors due to this limit if the zone of a file being written to is
> not already active when a write request is issued.
> 
> To avoid these potential errors, the zone of a file can explicitly be made
> active using an open zone command when the file is open for the first
> time. If the zone open command succeeds, the application is then
> guaranteed that write requests can be processed. This indirect management
> of active zones relies on the maximum number of open zones of a drive,
> which is always lower or equal to the maximum number of active zones.
> 
> On the first open of a sequential zone file, send a REQ_OP_ZONE_OPEN
> command to the block device. Conversely, on the last release of a zone
> file and send a REQ_OP_ZONE_CLOSE to the device if the zone is not full or
> empty.
> 
> As truncating a zone file to 0 or max can deactivate a zone as well, we
> need to serialize against truncates and also be careful not to close a
> zone that has active writers.
> 
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>  fs/zonefs/super.c  | 152 +++++++++++++++++++++++++++++++++++++++++++--
>  fs/zonefs/zonefs.h |  10 +++
>  2 files changed, 158 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
> index 9573aebee146..3e32050d2de8 100644
> --- a/fs/zonefs/super.c
> +++ b/fs/zonefs/super.c
> @@ -44,6 +44,80 @@ static inline int zonefs_zone_mgmt(struct inode *inode,
>  	return 0;
>  }
>  
> +static int zonefs_open_zone(struct inode *inode)
> +{
> +	struct zonefs_inode_info *zi = ZONEFS_I(inode);
> +	struct zonefs_sb_info *sbi = ZONEFS_SB(inode->i_sb);
> +	int ret = 0;
> +
> +	mutex_lock(&zi->i_truncate_mutex);
> +
> +	zi->i_wr_refcnt++;
> +	if (zi->i_wr_refcnt == 1) {
> +
> +		if (atomic_inc_return(&sbi->s_open_zones) > sbi->s_max_open_zones) {
> +			atomic_dec(&sbi->s_open_zones);
> +			ret = -EBUSY;
> +			goto unlock;
> +		}
> +
> +		if (i_size_read(inode) < zi->i_max_size) {
> +			ret = zonefs_zone_mgmt(inode, REQ_OP_ZONE_OPEN);
> +			if (ret) {
> +				zi->i_wr_refcnt--;
> +				atomic_dec(&sbi->s_open_zones);
> +				goto unlock;
> +			}
> +			zi->i_flags |= ZONEFS_ZONE_OPEN;
> +		}
> +	}
> +
> +unlock:
> +	mutex_unlock(&zi->i_truncate_mutex);
> +
> +	return ret;
> +}
> +
> +static int zonefs_close_zone(struct inode *inode)
> +{
> +	struct zonefs_inode_info *zi = ZONEFS_I(inode);
> +	int ret = 0;
> +
> +	mutex_lock(&zi->i_truncate_mutex);
> +
> +	zi->i_wr_refcnt--;
> +	if (!zi->i_wr_refcnt) {
> +		struct zonefs_sb_info *sbi = ZONEFS_SB(inode->i_sb);
> +
> +		if (zi->i_flags & ZONEFS_ZONE_OPEN) {
> +			ret = zonefs_zone_mgmt(inode, REQ_OP_ZONE_CLOSE);
> +			if (ret)
> +				goto unlock;
> +			zi->i_flags &= ~ZONEFS_ZONE_OPEN;
> +		}
> +
> +		atomic_dec(&sbi->s_open_zones);
> +	}
> +
> +unlock:
> +	mutex_unlock(&zi->i_truncate_mutex);
> +
> +	return ret;
> +}
> +
> +static inline void zonefs_i_size_write(struct inode *inode, loff_t isize)
> +{
> +	struct zonefs_inode_info *zi = ZONEFS_I(inode);
> +
> +	i_size_write(inode, isize);
> +	/*
> +	 * A full zone is no longer open/active and does not need
> +	 * explicit closing.
> +	 */
> +	if (isize >= zi->i_max_size)
> +		zi->i_flags &= ~ZONEFS_ZONE_OPEN;
> +}
> +
>  static int zonefs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
>  			      unsigned int flags, struct iomap *iomap,
>  			      struct iomap *srcmap)
> @@ -335,7 +409,7 @@ static int zonefs_io_error_cb(struct blk_zone *zone, unsigned int idx,
>  	 * invalid data.
>  	 */
>  	zonefs_update_stats(inode, data_size);
> -	i_size_write(inode, data_size);
> +	zonefs_i_size_write(inode, data_size);
>  	zi->i_wpoffset = data_size;
>  
>  	return 0;
> @@ -425,6 +499,25 @@ static int zonefs_file_truncate(struct inode *inode, loff_t isize)
>  		goto unlock;
>  	}
>  
> +	/*
> +	 * If the mount option ZONEFS_MNTOPT_EXPLICIT_OPEN is set,
> +	 * take care of open zones.
> +	 */
> +	if (zi->i_flags & ZONEFS_ZONE_OPEN) {
> +		/*
> +		 * Truncating a zone to EMPTY or FULL is the equivalent of
> +		 * closing the zone. For a truncation to 0, we need to
> +		 * re-open the zone to ensure new writes can be processed.
> +		 * For a truncation to the maximum file size, the zone is
> +		 * closed and writes cannot be accepted anymore, so clear
> +		 * the open flag.
> +		 */
> +		if (!isize)
> +			ret = zonefs_zone_mgmt(inode, REQ_OP_ZONE_OPEN);
> +		else
> +			zi->i_flags &= ~ZONEFS_ZONE_OPEN;
> +	}
> +
>  	zonefs_update_stats(inode, isize);
>  	truncate_setsize(inode, isize);
>  	zi->i_wpoffset = isize;
> @@ -603,7 +696,7 @@ static int zonefs_file_write_dio_end_io(struct kiocb *iocb, ssize_t size,
>  		mutex_lock(&zi->i_truncate_mutex);
>  		if (i_size_read(inode) < iocb->ki_pos + size) {
>  			zonefs_update_stats(inode, iocb->ki_pos + size);
> -			i_size_write(inode, iocb->ki_pos + size);
> +			zonefs_i_size_write(inode, iocb->ki_pos + size);
>  		}
>  		mutex_unlock(&zi->i_truncate_mutex);
>  	}
> @@ -884,8 +977,47 @@ static ssize_t zonefs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>  	return ret;
>  }
>  
> +static inline bool zonefs_file_use_exp_open(struct inode *inode, struct file *file)
> +{
> +	struct zonefs_inode_info *zi = ZONEFS_I(inode);
> +	struct zonefs_sb_info *sbi = ZONEFS_SB(inode->i_sb);
> +
> +	if (!(sbi->s_mount_opts & ZONEFS_MNTOPT_EXPLICIT_OPEN))
> +		return false;
> +
> +	if (zi->i_ztype != ZONEFS_ZTYPE_SEQ)
> +		return false;
> +
> +	if (!(file->f_mode & FMODE_WRITE))
> +		return false;
> +
> +	return true;
> +}
> +
> +static int zonefs_file_open(struct inode *inode, struct file *file)
> +{
> +	int ret;
> +
> +	ret = generic_file_open(inode, file);
> +	if (ret)
> +		return ret;
> +
> +	if (zonefs_file_use_exp_open(inode, file))
> +		return zonefs_open_zone(inode);
> +
> +	return 0;
> +}
> +
> +static int zonefs_file_release(struct inode *inode, struct file *file)
> +{
> +	if (zonefs_file_use_exp_open(inode, file))
> +		return zonefs_close_zone(inode);

Thinking more about this, the main reason a zone close would fail is if the zone
went into read-only or offline condition. For these, close will fail and should
thus not be attempted or the user will never be able to close the file, and to
unmount the file system for recovery.

So in case of failure here, zonefs_io_error() should be called to detect
read-only offline zones and the close error ignored so that the file is indeed
closed.

Conversely, in zonefs_io_error(), if an open zone (ZONEFS_ZONE_OPEN flag is set)
is detected as being read-only or offline, the ZONEFS_ZONE_OPEN flag should be
cleared so that the close zone command is not attempted when that zone file is
closed.

To avoid complicating this patch or making it too large, maybe add another patch
to handle error path ?

> +	return 0;
> +}
> +
>  static const struct file_operations zonefs_file_operations = {
> -	.open		= generic_file_open,
> +	.open		= zonefs_file_open,
> +	.release	= zonefs_file_release,
>  	.fsync		= zonefs_file_fsync,
>  	.mmap		= zonefs_file_mmap,
>  	.llseek		= zonefs_file_llseek,
> @@ -909,6 +1041,7 @@ static struct inode *zonefs_alloc_inode(struct super_block *sb)
>  	inode_init_once(&zi->i_vnode);
>  	mutex_init(&zi->i_truncate_mutex);
>  	init_rwsem(&zi->i_mmap_sem);
> +	zi->i_wr_refcnt = 0;
>  
>  	return &zi->i_vnode;
>  }
> @@ -959,7 +1092,7 @@ static int zonefs_statfs(struct dentry *dentry, struct kstatfs *buf)
>  
>  enum {
>  	Opt_errors_ro, Opt_errors_zro, Opt_errors_zol, Opt_errors_repair,
> -	Opt_err,
> +	Opt_explicit_open, Opt_err,
>  };
>  
>  static const match_table_t tokens = {
> @@ -967,6 +1100,7 @@ static const match_table_t tokens = {
>  	{ Opt_errors_zro,	"errors=zone-ro"},
>  	{ Opt_errors_zol,	"errors=zone-offline"},
>  	{ Opt_errors_repair,	"errors=repair"},
> +	{ Opt_explicit_open,	"explicit-open" },
>  	{ Opt_err,		NULL}
>  };
>  
> @@ -1003,6 +1137,9 @@ static int zonefs_parse_options(struct super_block *sb, char *options)
>  			sbi->s_mount_opts &= ~ZONEFS_MNTOPT_ERRORS_MASK;
>  			sbi->s_mount_opts |= ZONEFS_MNTOPT_ERRORS_REPAIR;
>  			break;
> +		case Opt_explicit_open:
> +			sbi->s_mount_opts |= ZONEFS_MNTOPT_EXPLICIT_OPEN;
> +			break;
>  		default:
>  			return -EINVAL;
>  		}
> @@ -1422,6 +1559,13 @@ static int zonefs_fill_super(struct super_block *sb, void *data, int silent)
>  	sbi->s_gid = GLOBAL_ROOT_GID;
>  	sbi->s_perm = 0640;
>  	sbi->s_mount_opts = ZONEFS_MNTOPT_ERRORS_RO;
> +	sbi->s_max_open_zones = bdev_max_open_zones(sb->s_bdev);
> +	atomic_set(&sbi->s_open_zones, 0);
> +	if (!sbi->s_max_open_zones &&
> +	    sbi->s_mount_opts & ZONEFS_MNTOPT_EXPLICIT_OPEN) {
> +		zonefs_info(sb, "No open zones limit. Ignoring explicit_open mount option\n");
> +		sbi->s_mount_opts &= ~ZONEFS_MNTOPT_EXPLICIT_OPEN;
> +	}
>  
>  	ret = zonefs_read_super(sb);
>  	if (ret)
> diff --git a/fs/zonefs/zonefs.h b/fs/zonefs/zonefs.h
> index 55b39970acb2..51141907097c 100644
> --- a/fs/zonefs/zonefs.h
> +++ b/fs/zonefs/zonefs.h
> @@ -38,6 +38,8 @@ static inline enum zonefs_ztype zonefs_zone_type(struct blk_zone *zone)
>  	return ZONEFS_ZTYPE_SEQ;
>  }
>  
> +#define ZONEFS_ZONE_OPEN	(1 << 0)
> +
>  /*
>   * In-memory inode data.
>   */
> @@ -74,6 +76,10 @@ struct zonefs_inode_info {
>  	 */
>  	struct mutex		i_truncate_mutex;
>  	struct rw_semaphore	i_mmap_sem;
> +
> +	/* guarded by i_truncate_mutex */
> +	unsigned int		i_wr_refcnt;
> +	unsigned int		i_flags;
>  };
>  
>  static inline struct zonefs_inode_info *ZONEFS_I(struct inode *inode)
> @@ -154,6 +160,7 @@ enum zonefs_features {
>  #define ZONEFS_MNTOPT_ERRORS_MASK	\
>  	(ZONEFS_MNTOPT_ERRORS_RO | ZONEFS_MNTOPT_ERRORS_ZRO | \
>  	 ZONEFS_MNTOPT_ERRORS_ZOL | ZONEFS_MNTOPT_ERRORS_REPAIR)
> +#define ZONEFS_MNTOPT_EXPLICIT_OPEN	(1 << 4) /* Explicit open/close of zones on open/close */
>  
>  /*
>   * In-memory Super block information.
> @@ -175,6 +182,9 @@ struct zonefs_sb_info {
>  
>  	loff_t			s_blocks;
>  	loff_t			s_used_blocks;
> +
> +	unsigned int		s_max_open_zones;
> +	atomic_t		s_open_zones;
>  };
>  
>  static inline struct zonefs_sb_info *ZONEFS_SB(struct super_block *sb)
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] zonefs: open/close zone on file open/close
  2020-09-07  3:06   ` Damien Le Moal
@ 2020-09-07  8:49     ` Johannes Thumshirn
  2020-09-07  9:52       ` Damien Le Moal
  0 siblings, 1 reply; 11+ messages in thread
From: Johannes Thumshirn @ 2020-09-07  8:49 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: linux-fsdevel

On 07/09/2020 05:06, Damien Le Moal wrote:
> May be you meant something like "leave a zone not active after a truncate when
> the zone file is open for writing" ?

No I meant, we shouldn't decrement the 's_open_zones' count on truncate to 0
or full, when a zone is still opened for write. Because if we do, another thread
could open the last available open zone and the application won't be able to write
to a previously opened zone, if that makes sense.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] zonefs: open/close zone on file open/close
  2020-09-07  8:49     ` Johannes Thumshirn
@ 2020-09-07  9:52       ` Damien Le Moal
  0 siblings, 0 replies; 11+ messages in thread
From: Damien Le Moal @ 2020-09-07  9:52 UTC (permalink / raw)
  To: Johannes Thumshirn; +Cc: linux-fsdevel

On Mon, 2020-09-07 at 08:49 +0000, Johannes Thumshirn wrote:
> On 07/09/2020 05:06, Damien Le Moal wrote:
> > May be you meant something like "leave a zone not active after a truncate when
> > the zone file is open for writing" ?
> 
> No I meant, we shouldn't decrement the 's_open_zones' count on truncate to 0
> or full, when a zone is still opened for write. Because if we do, another thread
> could open the last available open zone and the application won't be able to write
> to a previously opened zone, if that makes sense.

I understood that, but the "active writers" is hard to
decode/understand. So may be just replace that with "as the file may
still be open for writing, e.g. the user called ftruncate(). If the
zone file is not open and a process does a truncate(), then no close
operation is needed." 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-09-07  9:52 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-04 11:23 [PATCH 0/3] zonefs: introduce -o explicit-open mount option Johannes Thumshirn
2020-09-04 11:23 ` [PATCH 1/3] zonefs: introduce helper for zone management Johannes Thumshirn
2020-09-07  2:55   ` Damien Le Moal
2020-09-04 11:23 ` [PATCH 2/3] zonefs: open/close zone on file open/close Johannes Thumshirn
2020-09-07  3:06   ` Damien Le Moal
2020-09-07  8:49     ` Johannes Thumshirn
2020-09-07  9:52       ` Damien Le Moal
2020-09-07  4:32   ` Damien Le Moal
2020-09-04 11:23 ` [PATCH 3/3] zonefs: document the explicit-open mount option Johannes Thumshirn
2020-09-04 15:17   ` Randy Dunlap
2020-09-07  3:09   ` Damien Le Moal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).