linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Randy Dunlap <rdunlap@infradead.org>
To: Damien Le Moal <damien.lemoal@wdc.com>,
	linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>
Cc: Johannes Thumshirn <jth@kernel.org>,
	Naohiro Aota <naohiro.aota@wdc.com>,
	"Darrick J . Wong" <darrick.wong@oracle.com>,
	Hannes Reinecke <hare@suse.de>,
	Dave Chinner <david@fromorbit.com>
Subject: Re: [PATCH v13 2/2] zonefs: Add documentation
Date: Wed, 19 Feb 2020 16:55:17 -0800	[thread overview]
Message-ID: <a6f0eaf4-933f-8c15-6f0c-18400204791f@infradead.org> (raw)
In-Reply-To: <20200207031606.641231-3-damien.lemoal@wdc.com>

Hi Damien,

Typo etc. corrections below:

On 2/6/20 7:16 PM, Damien Le Moal wrote:
> Add the new file Documentation/filesystems/zonefs.txt to document
> zonefs principles and user-space tool usage.
> 
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> ---
>  Documentation/filesystems/zonefs.txt | 404 +++++++++++++++++++++++++++
>  MAINTAINERS                          |   1 +
>  2 files changed, 405 insertions(+)
>  create mode 100644 Documentation/filesystems/zonefs.txt
> 
> diff --git a/Documentation/filesystems/zonefs.txt b/Documentation/filesystems/zonefs.txt
> new file mode 100644
> index 000000000000..935bf22031ca
> --- /dev/null
> +++ b/Documentation/filesystems/zonefs.txt
> @@ -0,0 +1,404 @@
> +ZoneFS - Zone filesystem for Zoned block devices
> +
> +Introduction
> +============
> +
...
> +
> +Zoned block devices
> +-------------------
> +
...
> +
> +Zonefs Overview
> +===============
> +
...

> +
> +On-disk metadata
> +----------------
> +
...

> +
> +Zone type sub-directories
> +-------------------------
> +
...

> +
> +Zone files
> +----------
> +
...

> +
> +Conventional zone files
> +-----------------------
> +
...

> +
> +Sequential zone files
> +---------------------
> +
> +The size of sequential zone files grouped in the "seq" sub-directory represents
> +the file's zone write pointer position relative to the zone start sector.
> +
> +Sequential zone files can only be written sequentially, starting from the file
> +end, that is, write operations can only be append writes. Zonefs makes no
> +attempt at accepting random writes and will fail any write request that has a
> +start offset not corresponding to the end of the file, or to the end of the last
> +write issued and still in-flight (for asynchrnous I/O operations).
                                         asynchronous

> +
> +Since dirty page writeback by the page cache does not guarantee a sequential
> +write pattern, zonefs prevents buffered writes and writeable shared mappings
> +on sequential files. Only direct I/O writes are accepted for these files.
> +zonefs relies on the sequential delivery of write I/O requests to the device
> +implemented by the block layer elevator. An elevator implementing the sequential
> +write feature for zoned block device (ELEVATOR_F_ZBD_SEQ_WRITE elevator feature)
> +must be used. This type of elevator (e.g. mq-deadline) is the set by default

                                                          is set by default

> +for zoned block devices on device initialization.
> +
...

> +
> +Format options
> +--------------
> +
...

> +
> +IO error handling
> +-----------------
> +
...

> +
> +
> +* Unaligned write errors: These errors result from the host issuing write
> +  requests with a start sector that does not correspond to a zone write pointer
> +  position when the write request is executed by the device. Even though zonefs
> +  enforces sequential file write for sequential zones, unaligned write errors
> +  may still happen in the case of a partial failure of a very large direct I/O
> +  operation split into multiple BIOs/requests or asynchronous I/O operations.
> +  If one of the write request within the set of sequential write requests
> +  issued to the device fails, all write requests after queued after it will

                                           requests queued after it

> +  become unaligned and fail.
> +
...

> +
> +All I/O errors detected by zonefs are notified to the user with an error code
> +return for the system call that trigered or detected the error. The recovery

                                   triggered

> +actions taken by zonefs in response to I/O errors depend on the I/O type (read
> +vs write) and on the reason for the error (bad sector, unaligned writes or zone
> +condition change).
> +
...

> +
> +Zonefs minimal I/O error recovery may change a file size and a file access

                                                            and file access

> +permissions.
> +
> +* File size changes:
> +  Immediate or delayed write errors in a sequential zone file may cause the file
> +  inode size to be inconsistent with the amount of data successfully written in
> +  the file zone. For instance, the partial failure of a multi-BIO large write
> +  operation will cause the zone write pointer to advance partially, even though
> +  the entire write operation will be reported as failed to the user. In such
> +  case, the file inode size must be advanced to reflect the zone write pointer
> +  change and eventually allow the user to restart writing at the end of the
> +  file.
> +  A file size may also be reduced to reflect a delayed write error detected on
> +  fsync(): in this case, the amount of data effectively written in the zone may
> +  be less than originally indicated by the file inode size. After such I/O
> +  error, zonefs always fixes a file inode size to reflect the amount of data

                          fixes the file inode size

> +  persistently stored in the file zone.
> +
> +* Access permission changes:
...

> +
> +Further notes:
> +* The "errors=remount-ro" mount option is the default behavior of zonefs I/O
> +  error processing if no errors mount option is specified.
> +* With the "errors=remount-ro" mount option, the change of the file access
> +  permissions to read-only applies to all files. The file system is remounted
> +  read-only.
> +* Access permission and file size changes due to the device transitioning zones
> +  to the offline condition are permanent. Remounting or reformating the device

                                             usually:      reformatting

> +  with mkfs.zonefs (mkzonefs) will not change back offline zone files to a good
> +  state.
> +* File access permission changes to read-only due to the device transitioning
> +  zones to the read-only condition are permanent. Remounting or reformating

                                                                   reformatting

> +  the device will not re-enable file write access.
> +* File access permission changes implied by the remount-ro, zone-ro and
> +  zone-offline mount options are temporary for zones in a good condition.
> +  Unmounting and remounting the file system will restore the previous default
> +  (format time values) access rights to the files affected.
> +* The repair mount option triggers only the minimal set of I/O error recovery
> +  actions, that is, file size fixes for zones in a good condition. Zones
> +  indicated as being read-only or offline by the device still imply changes to
> +  the zone file access permissions as noted in the table above.
> +
> +Mount options
> +-------------
> +
> +zonefs define the "errors=<behavior>" mount option to allow the user to specify
> +zonefs behavior in response to I/O errors, inode size inconsistencies or zone
> +condition chages. The defined behaviors are as follow:

             changes.

> +* remount-ro (default)
> +* zone-ro
> +* zone-offline
> +* repair
> +
> +The I/O error actions defined for each behavior is detailed in the previous

                                                   are

> +section.
> +
> +Zonefs User Space Tools
> +=======================
> +
...
> +
> +Examples
> +--------
> +
...


HTH.
-- 
~Randy

  reply	other threads:[~2020-02-20  0:55 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-07  3:16 [PATCH v13 0/2] New zonefs file system Damien Le Moal
2020-02-07  3:16 ` [PATCH v13 1/2] fs: " Damien Le Moal
2020-02-07  4:06   ` Dave Chinner
2020-02-07  3:16 ` [PATCH v13 2/2] zonefs: Add documentation Damien Le Moal
2020-02-20  0:55   ` Randy Dunlap [this message]
2020-02-20  0:59     ` Damien Le Moal
2020-02-20  1:15       ` Randy Dunlap

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a6f0eaf4-933f-8c15-6f0c-18400204791f@infradead.org \
    --to=rdunlap@infradead.org \
    --cc=damien.lemoal@wdc.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=hare@suse.de \
    --cc=jth@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=naohiro.aota@wdc.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).