From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: Parav Pandit Subject: [PATCH v4 03/20] virtio-blk: Maintain block device spec in separate directory Date: Thu, 12 Jan 2023 23:22:37 +0200 Message-ID: <20230112212254.763783-4-parav@nvidia.com> In-Reply-To: <20230112212254.763783-1-parav@nvidia.com> References: <20230112212254.763783-1-parav@nvidia.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain To: mst@redhat.com, virtio-dev@lists.oasis-open.org, cohuck@redhat.com Cc: virtio-comment@lists.oasis-open.org, david@redhat.com, Parav Pandit List-ID: Move virtio block device specification to its own file similar to recent virtio devices. While at it, place device specification, its driver and device conformance into its own directory to have self contained device specification. Fixes: https://github.com/oasis-tcs/virtio-spec/issues/153 Signed-off-by: Parav Pandit --- changelog: v3->v4: - rename device-types/virtio-block/ to device-types/blk/ v2->v3: - file name changed from device.tex to description.tex - use input instead of import to insert a file v1->v2: - removed extra blank lines at end of file v0->v1: - moved to device specific directory --- conformance.tex | 20 +- content.tex | 1315 +---------------------- device-types/blk/description.tex | 1313 ++++++++++++++++++++++ device-types/blk/device-conformance.tex | 8 + device-types/blk/driver-conformance.tex | 8 + 5 files changed, 1332 insertions(+), 1332 deletions(-) create mode 100644 device-types/blk/description.tex create mode 100644 device-types/blk/device-conformance.tex create mode 100644 device-types/blk/driver-conformance.tex diff --git a/conformance.tex b/conformance.tex index 32fe124..7e6f6e0 100644 --- a/conformance.tex +++ b/conformance.tex @@ -135,15 +135,7 @@ \section{Conformance Targets}\label{sec:Conformance / = Conformance Targets} \end{itemize} =20 \input{device-types/net/driver-conformance.tex} - -\conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance = / Driver Conformance / Block Driver Conformance} - -A block driver MUST conform to the following normative statements: - -\begin{itemize} -\item \ref{drivernormative:Device Types / Block Device / Device Initializa= tion} -\item \ref{drivernormative:Device Types / Block Device / Device Operation} -\end{itemize} +\input{device-types/blk/driver-conformance.tex} =20 \conformance{\subsection}{Console Driver Conformance}\label{sec:Conformanc= e / Driver Conformance / Console Driver Conformance} =20 @@ -386,15 +378,7 @@ \section{Conformance Targets}\label{sec:Conformance / = Conformance Targets} \end{itemize} =20 \input{device-types/net/device-conformance.tex} - -\conformance{\subsection}{Block Device Conformance}\label{sec:Conformance = / Device Conformance / Block Device Conformance} - -A block device MUST conform to the following normative statements: - -\begin{itemize} -\item \ref{devicenormative:Device Types / Block Device / Device Initializa= tion} -\item \ref{devicenormative:Device Types / Block Device / Device Operation} -\end{itemize} +\input{device-types/blk/device-conformance.tex} =20 \conformance{\subsection}{Console Device Conformance}\label{sec:Conformanc= e / Device Conformance / Console Device Conformance} =20 diff --git a/content.tex b/content.tex index a3a31ab..3be9d4e 100644 --- a/content.tex +++ b/content.tex @@ -3004,1320 +3004,7 @@ \chapter{Device Types}\label{sec:Device Types} them no further. =20 \input{device-types/net/description.tex} - -\section{Block Device}\label{sec:Device Types / Block Device} - -The virtio block device is a simple virtual block device (ie. -disk). Read and write requests (and other exotic requests) are -placed in one of its queues, and serviced (probably out of order) by the -device except where noted. - -\subsection{Device ID}\label{sec:Device Types / Block Device / Device ID} - 2 - -\subsection{Virtqueues}\label{sec:Device Types / Block Device / Virtqueues= } -\begin{description} -\item[0] requestq1 -\item[\ldots] -\item[N-1] requestqN -\end{description} - - N=3D1 if VIRTIO_BLK_F_MQ is not negotiated, otherwise N is set by - \field{num_queues}. - -\subsection{Feature bits}\label{sec:Device Types / Block Device / Feature = bits} - -\begin{description} -\item[VIRTIO_BLK_F_SIZE_MAX (1)] Maximum size of any single segment is - in \field{size_max}. - -\item[VIRTIO_BLK_F_SEG_MAX (2)] Maximum number of segments in a - request is in \field{seg_max}. - -\item[VIRTIO_BLK_F_GEOMETRY (4)] Disk-style geometry specified in - \field{geometry}. - -\item[VIRTIO_BLK_F_RO (5)] Device is read-only. - -\item[VIRTIO_BLK_F_BLK_SIZE (6)] Block size of disk is in \field{blk_size}= . - -\item[VIRTIO_BLK_F_FLUSH (9)] Cache flush command support. - -\item[VIRTIO_BLK_F_TOPOLOGY (10)] Device exports information on optimal I/= O - alignment. - -\item[VIRTIO_BLK_F_CONFIG_WCE (11)] Device can toggle its cache between wr= iteback - and writethrough modes. - -\item[VIRTIO_BLK_F_MQ (12)] Device supports multiqueue. - -\item[VIRTIO_BLK_F_DISCARD (13)] Device can support discard command, maxim= um - discard sectors size in \field{max_discard_sectors} and maximum discar= d - segment number in \field{max_discard_seg}. - -\item[VIRTIO_BLK_F_WRITE_ZEROES (14)] Device can support write zeroes comm= and, - maximum write zeroes sectors size in \field{max_write_zeroes_sectors}= and - maximum write zeroes segment number in \field{max_write_zeroes_seg}. - -\item[VIRTIO_BLK_F_LIFETIME (15)] Device supports providing storage lifeti= me - information. - -\item[VIRTIO_BLK_F_SECURE_ERASE (16)] Device supports secure erase command= , - maximum erase sectors count in \field{max_secure_erase_sectors} and - maximum erase segment number in \field{max_secure_erase_seg}. - -\item[VIRTIO_BLK_F_ZONED(17)] Device is a Zoned Block Device, that is, a d= evice -=09that follows the zoned storage device behavior that is also supported b= y -=09industry standards such as the T10 Zoned Block Command standard (ZBC r0= 5) or -=09the NVMe(TM) NVM Express Zoned Namespace Command Set Specification 1.1b -=09(ZNS). For brevity, these standard documents are referred as "ZBD stand= ards" -=09from this point on in the text. - -\end{description} - -\subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Bl= ock Device / Feature bits / Legacy Interface: Feature bits} - -\begin{description} -\item[VIRTIO_BLK_F_BARRIER (0)] Device supports request barriers. - -\item[VIRTIO_BLK_F_SCSI (7)] Device supports scsi packet commands. -\end{description} - -\begin{note} - In the legacy interface, VIRTIO_BLK_F_FLUSH was also - called VIRTIO_BLK_F_WCE. -\end{note} - -\subsection{Device configuration layout}\label{sec:Device Types / Block De= vice / Device configuration layout} - -The \field{capacity} of the device (expressed in 512-byte sectors) is alwa= ys -present. The availability of the others all depend on various feature -bits as indicated above. - -The field \field{num_queues} only exists if VIRTIO_BLK_F_MQ is set. This f= ield specifies -the number of queues. - -The parameters in the configuration space of the device \field{max_discard= _sectors} -\field{discard_sector_alignment} are expressed in 512-byte units if the -VIRTIO_BLK_F_DISCARD feature bit is negotiated. The \field{max_write_zeroe= s_sectors} -is expressed in 512-byte units if the VIRTIO_BLK_F_WRITE_ZEROES feature -bit is negotiated. The parameters in the configuration space of the device -\field{max_secure_erase_sectors} \field{secure_erase_sector_alignment} are= expressed -in 512-byte units if the VIRTIO_BLK_F_SECURE_ERASE feature bit is negotiat= ed. - -If the VIRTIO_BLK_F_ZONED feature is negotiated, then in -\field{virtio_blk_zoned_characteristics}, -\begin{itemize} -\item \field{zone_sectors} value is expressed in 512-byte sectors. -\item \field{max_append_sectors} value is expressed in 512-byte sectors. -\item \field{write_granularity} value is expressed in bytes. -\end{itemize} - -The \field{model} field in \field{zoned} may have the following values: - -\begin{lstlisting} -#define VIRTIO_BLK_Z_NONE 0 -#define VIRTIO_BLK_Z_HM 1 -#define VIRTIO_BLK_Z_HA 2 -\end{lstlisting} - -Depending on their design, zoned block devices may follow several possible -models of operation. The three models that are standardized for ZBDs are -drive-managed, host-managed and host-aware. - -While being zoned internally, drive-managed ZBDs behave exactly like regul= ar, -non-zoned block devices. For the purposes of virtio standardization, -drive-managed ZBDs can always be treated as non-zoned devices. These devic= es -have the VIRTIO_BLK_Z_NONE model value set in the \field{model} field in -\field{zoned}. - -Devices that offer the VIRTIO_BLK_F_ZONED feature while reporting the -VIRTIO_BLK_Z_NONE zoned model are drive-managed zoned block devices. In th= is -case, the driver treats the device as a regular non-zoned block device. - -Host-managed zoned block devices have their LBA range divided into Sequent= ial -Write Required (SWR) zones that require some additional handling by the ho= st -for correct operation. All write requests to SWR zones are required be -sequential and zones containing some written data need to be reset before = that -data can be rewritten. Host-managed devices support a set of ZBD-specific = I/O -requests that can be used by the host to manage device zones. Host-managed -devices report VIRTIO_BLK_Z_HM in the \field{model} field in \field{zoned}= . - -Host-aware zoned block devices have their LBA range divided to Sequential -Write Preferred (SWP) zones that support random write access, similar to -regular non-zoned devices. However, the device I/O performance might not b= e -optimal if SWP zones are used in a random I/O pattern. SWP zones also supp= ort -the same set of ZBD-specific I/O requests as host-managed devices that all= ow -host-aware devices to be managed by any host that supports zoned block dev= ices -to achieve its optimum performance. Host-aware devices report VIRTIO_BLK_Z= _HA -in the \field{model} field in \field{zoned}. - -Both SWR zones and SWP zones are sometimes referred as sequential zones. - -During device operation, sequential zones can be in one of the following s= tates: -empty, implicitly-open, explicitly-open, closed and full. The state machin= e that -governs the transitions between these states is described later in this do= cument. - -SWR and SWP zones consume volatile device resources while being in certain -states and the device may set limits on the number of zones that can be in= these -states simultaneously. - -Zoned block devices use two internal counters to account for the device -resources in use, the number of currently open zones and the number of cur= rently -active zones. - -Any zone state transition from a state that doesn't consume a zone resourc= e to a -state that consumes the same resource increments the internal device count= er for -that resource. Any zone transition out of a state that consumes a zone res= ource -to a state that doesn't consume the same resource decrements the counter. = Any -request that causes the device to exceed the reported zone resource limits= is -terminated by the device with a "zone resources exceeded" error as defined= for -specific commands later. - -\begin{lstlisting} -struct virtio_blk_config { - le64 capacity; - le32 size_max; - le32 seg_max; - struct virtio_blk_geometry { - le16 cylinders; - u8 heads; - u8 sectors; - } geometry; - le32 blk_size; - struct virtio_blk_topology { - // # of logical blocks per physical block (log2) - u8 physical_block_exp; - // offset of first aligned logical block - u8 alignment_offset; - // suggested minimum I/O size in blocks - le16 min_io_size; - // optimal (suggested maximum) I/O size in blocks - le32 opt_io_size; - } topology; - u8 writeback; - u8 unused0; - u16 num_queues; - le32 max_discard_sectors; - le32 max_discard_seg; - le32 discard_sector_alignment; - le32 max_write_zeroes_sectors; - le32 max_write_zeroes_seg; - u8 write_zeroes_may_unmap; - u8 unused1[3]; - le32 max_secure_erase_sectors; - le32 max_secure_erase_seg; - le32 secure_erase_sector_alignment; - struct virtio_blk_zoned_characteristics { - le32 zone_sectors; - le32 max_open_zones; - le32 max_active_zones; - le32 max_append_sectors; - le32 write_granularity; - u8 model; - u8 unused2[3]; - } zoned; -}; -\end{lstlisting} - - -\subsubsection{Legacy Interface: Device configuration layout}\label{sec:De= vice Types / Block Device / Device configuration layout / Legacy Interface:= Device configuration layout} -When using the legacy interface, transitional devices and drivers -MUST format the fields in struct virtio_blk_config -according to the native endian of the guest rather than -(necessarily when not using the legacy interface) little-endian. - - -\subsection{Device Initialization}\label{sec:Device Types / Block Device /= Device Initialization} - -\begin{enumerate} -\item The device size can be read from \field{capacity}. - -\item If the VIRTIO_BLK_F_BLK_SIZE feature is negotiated, - \field{blk_size} can be read to determine the optimal sector size - for the driver to use. This does not affect the units used in - the protocol (always 512 bytes), but awareness of the correct - value can affect performance. - -\item If the VIRTIO_BLK_F_RO feature is set by the device, any write - requests will fail. - -\item If the VIRTIO_BLK_F_TOPOLOGY feature is negotiated, the fields in th= e - \field{topology} struct can be read to determine the physical block size= and optimal - I/O lengths for the driver to use. This also does not affect the units - in the protocol, only performance. - -\item If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the cache - mode can be read or set through the \field{writeback} field. 0 correspo= nds - to a writethrough cache, 1 to a writeback cache\footnote{Consistent with - \ref{devicenormative:Device Types / Block Device / Device Operation}, - a writethrough cache can be defined broadly as a cache that commits - writes to persistent device backend storage before reporting their - completion. For example, a battery-backed writeback cache actually - counts as writethrough according to this definition.}. The cache mode - after reset can be either writeback or writethrough. The actual - mode can be determined by reading \field{writeback} after feature - negotiation. - -\item If the VIRTIO_BLK_F_DISCARD feature is negotiated, - \field{max_discard_sectors} and \field{max_discard_seg} can be read - to determine the maximum discard sectors and maximum number of discard - segments for the block driver to use. \field{discard_sector_alignment} - can be used by OS when splitting a request based on alignment. - -\item If the VIRTIO_BLK_F_WRITE_ZEROES feature is negotiated, - \field{max_write_zeroes_sectors} and \field{max_write_zeroes_seg} can - be read to determine the maximum write zeroes sectors and maximum - number of write zeroes segments for the block driver to use. - -\item If the VIRTIO_BLK_F_MQ feature is negotiated, \field{num_queues} fie= ld - can be read to determine the number of queues. - -\item If the VIRTIO_BLK_F_SECURE_ERASE feature is negotiated, - \field{max_secure_erase_sectors} and \field{max_secure_erase_seg} can = be read - to determine the maximum secure erase sectors and maximum number of - secure erase segments for the block driver to use. - \field{secure_erase_sector_alignment} can be used by OS when splitting= a - request based on alignment. - -\item If the VIRTIO_BLK_F_ZONED feature is negotiated, the fields in - \field{zoned} can be read by the driver to determine the zone - characteristics of the device. All \field{zoned} fields are read-only. - -\end{enumerate} - -\drivernormative{\subsubsection}{Device Initialization}{Device Types / Blo= ck Device / Device Initialization} - -Drivers SHOULD NOT negotiate VIRTIO_BLK_F_FLUSH if they are incapable of -sending VIRTIO_BLK_T_FLUSH commands. - -If neither VIRTIO_BLK_F_CONFIG_WCE nor VIRTIO_BLK_F_FLUSH are -negotiated, the driver MAY deduce the presence of a writethrough cache. -If VIRTIO_BLK_F_CONFIG_WCE was not negotiated but VIRTIO_BLK_F_FLUSH was, -the driver SHOULD assume presence of a writeback cache. - -The driver MUST NOT read \field{writeback} before setting -the FEATURES_OK \field{device status} bit. - -Drivers MUST NOT negotiate the VIRTIO_BLK_F_ZONED feature if they are inca= pable -of supporting devices with the VIRTIO_BLK_Z_HM, VIRTIO_BLK_Z_HA or -VIRTIO_BLK_Z_NONE zoned model. - -If the VIRTIO_BLK_F_ZONED feature is offered by the device with the -VIRTIO_BLK_Z_HM zone model, then the VIRTIO_BLK_F_DISCARD feature MUST NOT= be -offered by the driver. - -If the VIRTIO_BLK_F_ZONED feature and VIRTIO_BLK_F_DISCARD feature are bot= h -offered by the device with the VIRTIO_BLK_Z_HA or VIRTIO_BLK_Z_NONE zone m= odel, -then the driver MAY negotiate these two bits independently. - -If the VIRTIO_BLK_F_ZONED feature is negotiated, then -\begin{itemize} -\item if the driver that can not support host-managed zoned devices - reads VIRTIO_BLK_Z_HM from the \field{model} field of \field{zoned}, t= he - driver MUST NOT set FEATURES_OK flag and instead set the FAILED bit. - -\item if the driver that can not support zoned devices reads VIRTIO_BLK_Z_= HA - from the \field{model} field of \field{zoned}, the driver - MAY handle the device as a non-zoned device. In this case, the - driver SHOULD ignore all other fields in \field{zoned}. -\end{itemize} - -\devicenormative{\subsubsection}{Device Initialization}{Device Types / Blo= ck Device / Device Initialization} - -Devices SHOULD always offer VIRTIO_BLK_F_FLUSH, and MUST offer it -if they offer VIRTIO_BLK_F_CONFIG_WCE. - -If VIRTIO_BLK_F_CONFIG_WCE is negotiated but VIRTIO_BLK_F_FLUSH -is not, the device MUST initialize \field{writeback} to 0. - -The device MUST initialize padding bytes \field{unused0} and -\field{unused1} to 0. - -If the device that is being initialized is a not a zoned device, the devic= e -SHOULD NOT offer the VIRTIO_BLK_F_ZONED feature. - -The VIRTIO_BLK_F_ZONED feature cannot be properly negotiated without -FEATURES_OK bit. Legacy devices MUST NOT offer VIRTIO_BLK_F_ZONED feature = bit. - -If the VIRTIO_BLK_F_ZONED feature is not accepted by the driver, -\begin{itemize} -\item the device with the VIRTIO_BLK_Z_HA or VIRTIO_BLK_Z_NONE zone model = SHOULD - proceed with the initialization while setting all zoned characteristic= s - fields to zero. - -\item the device with the VIRTIO_BLK_Z_HM zone model MUST fail to set the - FEATURES_OK device status bit when the driver writes the Device Status - field. -\end{itemize} - -If the VIRTIO_BLK_F_ZONED feature is negotiated, then the \field{model} fi= eld in -\field{zoned} struct in the configuration space MUST be set by the device -\begin{itemize} -\item to the value of VIRTIO_BLK_Z_NONE if it operates as a drive-managed - zoned block device or a non-zoned block device. - -\item to the value of VIRTIO_BLK_Z_HM if it operates as a host-managed zon= ed - block device. - -\item to the value of VIRTIO_BLK_Z_HA if it operates as a host-aware zoned - block device. -\end{itemize} - -If the VIRTIO_BLK_F_ZONED feature is negotiated and the device \field{mode= l} -field in \field{zoned} struct is VIRTIO_BLK_Z_HM or VIRTIO_BLK_Z_HA, - -\begin{itemize} -\item the \field{zone_sectors} field of \field{zoned} MUST be set by the d= evice - to the size of a single zone on the device. All zones of the device ha= ve the - same size indicated by \field{zone_sectors} except for the last zone t= hat - MAY be smaller than all other zones. The driver can calculate the numb= er of - zones on the device as - \begin{lstlisting} - nr_zones =3D (capacity + zone_sectors - 1) / zone_sectors; - \end{lstlisting} - and the size of the last zone as - \begin{lstlisting} - zs_last =3D capacity - (nr_zones - 1) * zone_sectors; - \end{lstlisting} - -\item The \field{max_open_zones} field of the \field{zoned} structure MUST= be - set by the device to the maximum number of zones that can be open on t= he - device (zones in the implicit open or explicit open state). A value - of zero indicates that the device does not have any limit on the numbe= r of - open zones. - -\item The \field{max_active_zones} field of the \field{zoned} structure MU= ST - be set by the device to the maximum number zones that can be active on= the - device (zones in the implicit open, explicit open or closed state). A = value - of zero indicates that the device does not have any limit on the numbe= r of - active zones. - -\item the \field{max_append_sectors} field of \field{zoned} MUST be set by - the device to the maximum data size of a VIRTIO_BLK_T_ZONE_APPEND requ= est - that can be successfully issued to the device. The value of this field= MUST - NOT exceed the \field{seg_max} * \field{size_max} value. A device MAY = set - the \field{max_append_sectors} to zero if it doesn't support - VIRTIO_BLK_T_ZONE_APPEND requests. - -\item the \field{write_granularity} field of \field{zoned} MUST be set by = the - device to the offset and size alignment constraint for VIRTIO_BLK_T_OU= T - and VIRTIO_BLK_T_ZONE_APPEND requests issued to a sequential zone of t= he - device. - -\item the device MUST initialize padding bytes \field{unused2} to 0. -\end{itemize} - -\subsubsection{Legacy Interface: Device Initialization}\label{sec:Device T= ypes / Block Device / Device Initialization / Legacy Interface: Device Init= ialization} - -Because legacy devices do not have FEATURES_OK, transitional devices -MUST implement slightly different behavior around feature negotiation -when used through the legacy interface. In particular, when using the -legacy interface: - -\begin{itemize} -\item the driver MAY read or write \field{writeback} before setting - the DRIVER or DRIVER_OK \field{device status} bit - -\item the device MUST NOT modify the cache mode (and \field{writeback}) - as a result of a driver setting a status bit, unless - the DRIVER_OK bit is being set and the driver has not set the - VIRTIO_BLK_F_CONFIG_WCE driver feature bit. - -\item the device MUST NOT modify the cache mode (and \field{writeback}) - as a result of a driver modifying the driver feature bits, for example - if the driver sets the VIRTIO_BLK_F_CONFIG_WCE driver feature bit but - does not set the VIRTIO_BLK_F_FLUSH bit. -\end{itemize} - - -\subsection{Device Operation}\label{sec:Device Types / Block Device / Devi= ce Operation} - -The driver queues requests to the virtqueues, and they are used by -the device (not necessarily in order). Each request except -VIRTIO_BLK_T_ZONE_APPEND is of form: - -\begin{lstlisting} -struct virtio_blk_req { - le32 type; - le32 reserved; - le64 sector; - u8 data[]; - u8 status; -}; -\end{lstlisting} - -The type of the request is either a read (VIRTIO_BLK_T_IN), a write -(VIRTIO_BLK_T_OUT), a discard (VIRTIO_BLK_T_DISCARD), a write zeroes -(VIRTIO_BLK_T_WRITE_ZEROES), a flush (VIRTIO_BLK_T_FLUSH), a get device ID -string command (VIRTIO_BLK_T_GET_ID), a secure erase -(VIRTIO_BLK_T_SECURE_ERASE), or a get device lifetime command -(VIRTIO_BLK_T_GET_LIFETIME). - -\begin{lstlisting} -#define VIRTIO_BLK_T_IN 0 -#define VIRTIO_BLK_T_OUT 1 -#define VIRTIO_BLK_T_FLUSH 4 -#define VIRTIO_BLK_T_GET_ID 8 -#define VIRTIO_BLK_T_GET_LIFETIME 10 -#define VIRTIO_BLK_T_DISCARD 11 -#define VIRTIO_BLK_T_WRITE_ZEROES 13 -#define VIRTIO_BLK_T_SECURE_ERASE 14 -\end{lstlisting} - -The \field{sector} number indicates the offset (multiplied by 512) where -the read or write is to occur. This field is unused and set to 0 for -commands other than read, write and some zone operations. - -VIRTIO_BLK_T_IN requests populate \field{data} with the contents of sector= s -read from the block device (in multiples of 512 bytes). VIRTIO_BLK_T_OUT -requests write the contents of \field{data} to the block device (in multip= les -of 512 bytes). - -The \field{data} used for discard, secure erase or write zeroes commands -consists of one or more segments. The maximum number of segments is -\field{max_discard_seg} for discard commands, \field{max_secure_erase_seg}= for -secure erase commands and \field{max_write_zeroes_seg} for write zeroes -commands. -Each segment is of form: - -\begin{lstlisting} -struct virtio_blk_discard_write_zeroes { - le64 sector; - le32 num_sectors; - struct { - le32 unmap:1; - le32 reserved:31; - } flags; -}; -\end{lstlisting} - -\field{sector} indicates the starting offset (in 512-byte units) of the -segment, while \field{num_sectors} indicates the number of sectors in each -discarded range. \field{unmap} is only used in write zeroes commands and a= llows -the device to discard the specified range, provided that following reads r= eturn -zeroes. - -VIRTIO_BLK_T_GET_ID requests fetch the device ID string from the device in= to -\field{data}. The device ID string is a NUL-padded ASCII string up to 20 = bytes -long. If the string is 20 bytes long then there is no NUL terminator. - -The \field{data} used for VIRTIO_BLK_T_GET_LIFETIME requests is populated -by the device, and is of the form - -\begin{lstlisting} -struct virtio_blk_lifetime { - le16 pre_eol_info; - le16 device_lifetime_est_typ_a; - le16 device_lifetime_est_typ_b; -}; -\end{lstlisting} - -The \field{pre_eol_info} specifies the percentage of reserved blocks -that are consumed and will have one of these values: - -\begin{lstlisting} -/* Value not available */ -#define VIRTIO_BLK_PRE_EOL_INFO_UNDEFINED 0 -/* < 80% of reserved blocks are consumed */ -#define VIRTIO_BLK_PRE_EOL_INFO_NORMAL 1 -/* 80% of reserved blocks are consumed */ -#define VIRTIO_BLK_PRE_EOL_INFO_WARNING 2 -/* 90% of reserved blocks are consumed */ -#define VIRTIO_BLK_PRE_EOL_INFO_URGENT 3 -/* All others values are reserved */ -\end{lstlisting} - -The \field{device_lifetime_est_typ_a} refers to wear of SLC cells and is p= rovided -in increments of 10%, with 0 meaning undefined, 1 meaning up-to 10% of lif= etime -used, and so on, thru to 11 meaning estimated lifetime exceeded. -All values above 11 are reserved. - -The \field{device_lifetime_est_typ_b} refers to wear of MLC cells and is p= rovided -with the same semantics as \field{device_lifetime_est_typ_a}. - -The final \field{status} byte is written by the device: either -VIRTIO_BLK_S_OK for success, VIRTIO_BLK_S_IOERR for device or driver -error or VIRTIO_BLK_S_UNSUPP for a request unsupported by device: - -\begin{lstlisting} -#define VIRTIO_BLK_S_OK 0 -#define VIRTIO_BLK_S_IOERR 1 -#define VIRTIO_BLK_S_UNSUPP 2 -\end{lstlisting} - -The status of individual segments is indeterminate when a discard or write= zero -command produces VIRTIO_BLK_S_IOERR. A segment may have completed -successfully, failed, or not been processed by the device. - -The following requirements only apply if the VIRTIO_BLK_F_ZONED feature is -negotiated. - -In addition to the request types defined for non-zoned devices, the type o= f the -request can be a zone report (VIRTIO_BLK_T_ZONE_REPORT), an explicit zone = open -(VIRTIO_BLK_T_ZONE_OPEN), a zone close (VIRTIO_BLK_T_ZONE_CLOSE), a zone f= inish -(VIRTIO_BLK_T_ZONE_FINISH), a zone_append (VIRTIO_BLK_T_ZONE_APPEND), a zo= ne -reset (VIRTIO_BLK_T_ZONE_RESET) or a zone reset all -(VIRTIO_BLK_T_ZONE_RESET_ALL). - -\begin{lstlisting} -#define VIRTIO_BLK_T_ZONE_APPEND 15 -#define VIRTIO_BLK_T_ZONE_REPORT 16 -#define VIRTIO_BLK_T_ZONE_OPEN 18 -#define VIRTIO_BLK_T_ZONE_CLOSE 20 -#define VIRTIO_BLK_T_ZONE_FINISH 22 -#define VIRTIO_BLK_T_ZONE_RESET 24 -#define VIRTIO_BLK_T_ZONE_RESET_ALL 26 -\end{lstlisting} - -Requests of type VIRTIO_BLK_T_OUT, VIRTIO_BLK_T_ZONE_OPEN, -VIRTIO_BLK_T_ZONE_CLOSE, VIRTIO_BLK_T_ZONE_FINISH, VIRTIO_BLK_T_ZONE_APPEN= D, -VIRTIO_BLK_T_ZONE_RESET or VIRTIO_BLK_T_ZONE_RESET_ALL may be completed by= the -device with VIRTIO_BLK_S_OK, VIRTIO_BLK_S_IOERR or VIRTIO_BLK_S_UNSUPP -\field{status}, or, additionally, with VIRTIO_BLK_S_ZONE_INVALID_CMD, -VIRTIO_BLK_S_ZONE_UNALIGNED_WP, VIRTIO_BLK_S_ZONE_OPEN_RESOURCE or -VIRTIO_BLK_S_ZONE_ACTIVE_RESOURCE ZBD-specific status codes. - -Besides the request status, VIRTIO_BLK_T_ZONE_APPEND requests return the -starting sector of the appended data back to the driver. For this reason, -the VIRTIO_BLK_T_ZONE_APPEND request has the layout that is extended to ha= ve -the \field{append_sector} field to carry this value: - -\begin{lstlisting} -struct virtio_blk_req_za { - le32 type; - le32 reserved; - le64 sector; - u8 data[]; - le64 append_sector; - u8 status; -}; -\end{lstlisting} - -\begin{lstlisting} -#define VIRTIO_BLK_S_ZONE_INVALID_CMD 3 -#define VIRTIO_BLK_S_ZONE_UNALIGNED_WP 4 -#define VIRTIO_BLK_S_ZONE_OPEN_RESOURCE 5 -#define VIRTIO_BLK_S_ZONE_ACTIVE_RESOURCE 6 -\end{lstlisting} - -Requests of the type VIRTIO_BLK_T_ZONE_REPORT are reads and requests of th= e type -VIRTIO_BLK_T_ZONE_APPEND are writes. VIRTIO_BLK_T_ZONE_OPEN, -VIRTIO_BLK_T_ZONE_CLOSE, VIRTIO_BLK_T_ZONE_FINISH, VIRTIO_BLK_T_ZONE_RESET= and -VIRTIO_BLK_T_ZONE_RESET_ALL are non-data requests. - -Zone sector address is a 64-bit address of the first 512-byte sector of th= e -zone. - -VIRTIO_BLK_T_ZONE_OPEN, VIRTIO_BLK_T_ZONE_CLOSE, VIRTIO_BLK_T_ZONE_FINISH = and -VIRTIO_BLK_T_ZONE_RESET requests make the zone operation to act on a parti= cular -zone specified by the zone sector address in the \field{sector} of the req= uest. - -VIRTIO_BLK_T_ZONE_RESET_ALL request acts upon all applicable zones of the -device. The \field{sector} value is not used for this request. - -In ZBD standards, the VIRTIO_BLK_T_ZONE_REPORT request belongs to "Zone -Management Receive" command category and VIRTIO_BLK_T_ZONE_OPEN, -VIRTIO_BLK_T_ZONE_CLOSE, VIRTIO_BLK_T_ZONE_FINISH and -VIRTIO_BLK_T_ZONE_RESET/VIRTIO_BLK_T_ZONE_RESET_ALL requests are categoriz= ed as -"Zone Management Send" commands. VIRTIO_BLK_T_ZONE_APPEND is categorized -separately from zone management commands and is the only request that uses -the \field{append_secctor} field \field{virtio_blk_req_za} to return -to the driver the sector at which the data has been appended to the zone. - -VIRTIO_BLK_T_ZONE_REPORT is a read request that returns the information ab= out -the current state of zones on the device starting from the zone containing= the -\field{sector} of the request. The report consists of a header followed by= zero -or more zone descriptors. - -A zone report reply has the following structure: - -\begin{lstlisting} -struct virtio_blk_zone_report { - le64 nr_zones; - u8 reserved[56]; - struct virtio_blk_zone_descriptor zones[]; -}; -\end{lstlisting} - -The device sets the \field{nr_zones} field in the report header to the num= ber of -fully transferred zone descriptors in the data buffer. - -A zone descriptor has the following structure: - -\begin{lstlisting} -struct virtio_blk_zone_descriptor { - le64 z_cap; - le64 z_start; - le64 z_wp; - u8 z_type; - u8 z_state; - u8 reserved[38]; -}; -\end{lstlisting} - -The zone descriptor field \field{z_type} \field{virtio_blk_zone_descriptor= } -indicates the type of the zone. - -The following zone types are available: - -\begin{lstlisting} -#define VIRTIO_BLK_ZT_CONV 1 -#define VIRTIO_BLK_ZT_SWR 2 -#define VIRTIO_BLK_ZT_SWP 3 -\end{lstlisting} - -Read and write operations into zones with the VIRTIO_BLK_ZT_CONV (Conventi= onal) -type have the same behavior as read and write operations on a regular bloc= k -device. Any block in a conventional zone can be read or written at any tim= e and -in any order. - -Zones with VIRTIO_BLK_ZT_SWR can be read randomly, but must be written -sequentially at a certain point in the zone called the Write Pointer (WP).= With -every write, the Write Pointer is incremented by the number of sectors wri= tten. - -Zones with VIRTIO_BLK_ZT_SWP can be read randomly and should be written -sequentially, similarly to SWR zones. However, SWP zones can accept random= write -operations, that is, VIRTIO_BLK_T_OUT requests with a start sector differe= nt -from the zone write pointer position. - -The field \field{z_state} of \field{virtio_blk_zone_descriptor} indicates = the -state of the device zone. - -The following zone states are available: - -\begin{lstlisting} -#define VIRTIO_BLK_ZS_NOT_WP 0 -#define VIRTIO_BLK_ZS_EMPTY 1 -#define VIRTIO_BLK_ZS_IOPEN 2 -#define VIRTIO_BLK_ZS_EOPEN 3 -#define VIRTIO_BLK_ZS_CLOSED 4 -#define VIRTIO_BLK_ZS_RDONLY 13 -#define VIRTIO_BLK_ZS_FULL 14 -#define VIRTIO_BLK_ZS_OFFLINE 15 -\end{lstlisting} - -Zones of the type VIRTIO_BLK_ZT_CONV are always reported by the device to = be in -the VIRTIO_BLK_ZS_NOT_WP state. Zones of the types VIRTIO_BLK_ZT_SWR and -VIRTIO_BLK_ZT_SWP can not transition to the VIRTIO_BLK_ZS_NOT_WP state. - -Zones in VIRTIO_BLK_ZS_EMPTY (Empty), VIRTIO_BLK_ZS_IOPEN (Implicitly Open= ), -VIRTIO_BLK_ZS_EOPEN (Explicitly Open) and VIRTIO_BLK_ZS_CLOSED (Closed) st= ate -are writable, but zones in VIRTIO_BLK_ZS_RDONLY (Read-Only), VIRTIO_BLK_ZS= _FULL -(Full) and VIRTIO_BLK_ZS_OFFLINE (Offline) state are not. The write pointe= r -value (\field{z_wp}) is not valid for Read-Only, Full and Offline zones. - -The zone descriptor field \field{z_cap} contains the maximum number of 512= -byte -sectors that are available to be written with user data when the zone is i= n the -Empty state. This value shall be less than or equal to the \field{zone_sec= tors} -value in \field{virtio_blk_zoned_characteristics} structure in the device -configuration space. - -The zone descriptor field \field{z_start} contains the zone sector address= . - -The zone descriptor field \field{z_wp} contains the sector address where t= he -next write operation for this zone should be issued. This value is undefin= ed -for conventional zones and for zones in VIRTIO_BLK_ZS_RDONLY, -VIRTIO_BLK_ZS_FULL and VIRTIO_BLK_ZS_OFFLINE state. - -Depending on their state, zones consume resources as follows: -\begin{itemize} -\item a zone in VIRTIO_BLK_ZS_IOPEN and VIRTIO_BLK_ZS_EOPEN state consumes= one - open zone resource and, additionally, - -\item a zone in VIRTIO_BLK_ZS_IOPEN, VIRTIO_BLK_ZS_EOPEN and - VIRTIO_BLK_ZS_CLOSED state consumes one active resource. -\end{itemize} - -Attempts for zone transitions that violate zone resource limits must fail = with -VIRTIO_BLK_S_ZONE_OPEN_RESOURCE or VIRTIO_BLK_S_ZONE_ACTIVE_RESOURCE -\field{status}. - -Zones in the VIRTIO_BLK_ZS_EMPTY (Empty) state have the write pointer valu= e -equal to the sector address of the zone. In this state, the entire capacit= y of -the zone is available for writing. A zone can transition from this state t= o -\begin{itemize} -\item VIRTIO_BLK_ZS_IOPEN when a successful VIRTIO_BLK_T_OUT request or - VIRTIO_BLK_T_ZONE_APPEND with a non-zero data size is received for the= zone. - -\item VIRTIO_BLK_ZS_EOPEN when a successful VIRTIO_BLK_T_ZONE_OPEN request= is - received for the zone -\end{itemize} - -When a VIRTIO_BLK_T_ZONE_RESET request is issued to an Empty zone, the req= uest -is completed successfully and the zone stays in the VIRTIO_BLK_ZS_EMPTY st= ate. - -Zones in the VIRTIO_BLK_ZS_IOPEN (Implicitly Open) state transition from -this state to -\begin{itemize} -\item VIRTIO_BLK_ZS_EMPTY when a successful VIRTIO_BLK_T_ZONE_RESET reques= t is - received for the zone, - -\item VIRTIO_BLK_ZS_EMPTY when a successful VIRTIO_BLK_T_ZONE_RESET_ALL re= quest - is received by the device, - -\item VIRTIO_BLK_ZS_EOPEN when a successful VIRTIO_BLK_T_ZONE_OPEN request= is - received for the zone, - -\item VIRTIO_BLK_ZS_CLOSED when a successful VIRTIO_BLK_T_ZONE_CLOSE reque= st is - received for the zone, - -\item VIRTIO_BLK_ZS_CLOSED implicitly by the device when another zone is - entering the VIRTIO_BLK_ZS_IOPEN or VIRTIO_BLK_ZS_EOPEN state and the = number - of currently open zones is at \field{max_open_zones} limit, - -\item VIRTIO_BLK_ZS_FULL when a successful VIRTIO_BLK_T_ZONE_FINISH reques= t is - received for the zone. - -\item VIRTIO_BLK_ZS_FULL when a successful VIRTIO_BLK_T_OUT or - VIRTIO_BLK_T_ZONE_APPEND request that causes the zone to reach its wri= table - capacity is received for the zone. -\end{itemize} - -Zones in the VIRTIO_BLK_ZS_EOPEN (Explicitly Open) state transition from -this state to -\begin{itemize} -\item VIRTIO_BLK_ZS_EMPTY when a successful VIRTIO_BLK_T_ZONE_RESET reques= t is - received for the zone, - -\item VIRTIO_BLK_ZS_EMPTY when a successful VIRTIO_BLK_T_ZONE_RESET_ALL re= quest - is received by the device, - -\item VIRTIO_BLK_ZS_EMPTY when a successful VIRTIO_BLK_T_ZONE_CLOSE reques= t is - received for the zone and the write pointer of the zone has the value = equal - to the start sector of the zone, - -\item VIRTIO_BLK_ZS_CLOSED when a successful VIRTIO_BLK_T_ZONE_CLOSE reque= st is - received for the zone and the zone write pointer is larger then the st= art - sector of the zone, - -\item VIRTIO_BLK_ZS_FULL when a successful VIRTIO_BLK_T_ZONE_FINISH reques= t is - received for the zone, - -\item VIRTIO_BLK_ZS_FULL when a successful VIRTIO_BLK_T_OUT or - VIRTIO_BLK_T_ZONE_APPEND request that causes the zone to reach its wri= table - capacity is received for the zone. -\end{itemize} - -When a VIRTIO_BLK_T_ZONE_EOPEN request is issued to an Explicitly Open zon= e, the -request is completed successfully and the zone stays in the VIRTIO_BLK_ZS_= EOPEN -state. - -Zones in the VIRTIO_BLK_ZS_CLOSED (Closed) state transition from this stat= e -to -\begin{itemize} -\item VIRTIO_BLK_ZS_EMPTY when a successful VIRTIO_BLK_T_ZONE_RESET reques= t is - received for the zone, - -\item VIRTIO_BLK_ZS_EMPTY when a successful VIRTIO_BLK_T_ZONE_RESET_ALL re= quest - is received by the device, - -\item VIRTIO_BLK_ZS_IOPEN when a successful VIRTIO_BLK_T_OUT request or - VIRTIO_BLK_T_ZONE_APPEND with a non-zero data size is received for the= zone. - -\item VIRTIO_BLK_ZS_EOPEN when a successful VIRTIO_BLK_T_ZONE_OPEN request= is - received for the zone, -\end{itemize} - -When a VIRTIO_BLK_T_ZONE_CLOSE request is issued to a Closed zone, the req= uest -is completed successfully and the zone stays in the VIRTIO_BLK_ZS_CLOSED s= tate. - -Zones in the VIRTIO_BLK_ZS_FULL (Full) state transition from this state to -VIRTIO_BLK_ZS_EMPTY when a successful VIRTIO_BLK_T_ZONE_RESET request is -received for the zone or a successful VIRTIO_BLK_T_ZONE_RESET_ALL request = is -received by the device. - -When a VIRTIO_BLK_T_ZONE_FINISH request is issued to a Full zone, the requ= est -is completed successfully and the zone stays in the VIRTIO_BLK_ZS_FULL sta= te. - -The device may automatically transition zones to VIRTIO_BLK_ZS_RDONLY -(Read-Only) or VIRTIO_BLK_ZS_OFFLINE (Offline) state from any other state.= The -device may also automatically transition zones in the Read-Only state to t= he -Offline state. Zones in the Offline state may not transition to any other = state. -Such automatic transitions usually indicate hardware failures. The previou= sly -written data may only be read from zones in the Read-Only state. Zones in = the -Offline state can not be read or written. - -VIRTIO_BLK_S_ZONE_UNALIGNED_WP is set by the device when the request recei= ved -from the driver attempts to perform a write to an SWR zone and at least on= e of -the following conditions is met: - -\begin{itemize} -\item the starting sector of the request is not equal to the current value= of - the zone write pointer. - -\item the ending sector of the request data multiplied by 512 is not a mul= tiple - of the value reported by the device in the field \field{write_granular= ity} - in the device configuration space. -\end{itemize} - -VIRTIO_BLK_S_ZONE_OPEN_RESOURCE is set by the device when a zone operation= or -write request received from the driver can not be handled without exceedin= g the -\field{max_open_zones} limit value reported by the device in the configura= tion -space. - -VIRTIO_BLK_S_ZONE_ACTIVE_RESOURCE is set by the device when a zone operati= on or -write request received from the driver can not be handled without exceedin= g the -\field{max_active_zones} limit value reported by the device in the configu= ration -space. - -A zone transition request that leads to both the \field{max_open_zones} an= d the -\field{max_active_zones} limits to be exceeded is terminated by the device= with -VIRTIO_BLK_S_ZONE_ACTIVE_RESOURCE \field{status} value. - -The device reports all other error conditions related to zoned block model -operation by setting the VIRTIO_BLK_S_ZONE_INVALID_CMD value in -\field{status} of \field{virtio_blk_req} structure. - -\drivernormative{\subsubsection}{Device Operation}{Device Types / Block De= vice / Device Operation} - -The driver SHOULD check if the content of the \field{capacity} field has -changed upon receiving a configuration change notification. - -A driver MUST NOT submit a request which would cause a read or write -beyond \field{capacity}. - -A driver SHOULD accept the VIRTIO_BLK_F_RO feature if offered. - -A driver MUST set \field{sector} to 0 for a VIRTIO_BLK_T_FLUSH request. -A driver SHOULD NOT include any data in a VIRTIO_BLK_T_FLUSH request. - -The length of \field{data} MUST be a multiple of 512 bytes for VIRTIO_BLK_= T_IN -and VIRTIO_BLK_T_OUT requests. - -The length of \field{data} MUST be a multiple of the size of struct -virtio_blk_discard_write_zeroes for VIRTIO_BLK_T_DISCARD, -VIRTIO_BLK_T_SECURE_ERASE and VIRTIO_BLK_T_WRITE_ZEROES requests. - -The length of \field{data} MUST be 20 bytes for VIRTIO_BLK_T_GET_ID reques= ts. - -VIRTIO_BLK_T_DISCARD requests MUST NOT contain more than -\field{max_discard_seg} struct virtio_blk_discard_write_zeroes segments in -\field{data}. - -VIRTIO_BLK_T_SECURE_ERASE requests MUST NOT contain more than -\field{max_secure_erase_seg} struct virtio_blk_discard_write_zeroes segmen= ts in -\field{data}. - -VIRTIO_BLK_T_WRITE_ZEROES requests MUST NOT contain more than -\field{max_write_zeroes_seg} struct virtio_blk_discard_write_zeroes segmen= ts in -\field{data}. - -If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the driver MAY -switch to writethrough or writeback mode by writing respectively 0 and -1 to the \field{writeback} field. After writing a 0 to \field{writeback}, -the driver MUST NOT assume that any volatile writes have been committed -to persistent device backend storage. - -The \field{unmap} bit MUST be zero for discard commands. The driver -MUST NOT assume anything about the data returned by read requests after -a range of sectors has been discarded. - -A driver MUST NOT assume that individual segments in a multi-segment -VIRTIO_BLK_T_DISCARD or VIRTIO_BLK_T_WRITE_ZEROES request completed -successfully, failed, or were processed by the device at all if the reques= t -failed with VIRTIO_BLK_S_IOERR. - -The following requirements only apply if the VIRTIO_BLK_F_ZONED feature is -negotiated. - -A zone sector address provided by the driver MUST be a multiple of 512 byt= es. - -When forming a VIRTIO_BLK_T_ZONE_REPORT request, the driver MUST set a sec= tor -within the sector range of the starting zone to report to \field{sector} f= ield. -It MAY be a sector that is different from the zone sector address. - -In VIRTIO_BLK_T_ZONE_OPEN, VIRTIO_BLK_T_ZONE_CLOSE, VIRTIO_BLK_T_ZONE_FINI= SH and -VIRTIO_BLK_T_ZONE_RESET requests, the driver MUST set \field{sector} field= to -point at the first sector in the target zone. - -In VIRTIO_BLK_T_ZONE_RESET_ALL request, the driver MUST set the field -\field{sector} to zero value. - -The \field{sector} field of the VIRTIO_BLK_T_ZONE_APPEND request MUST spec= ify -the zone sector address of the zone to which data is to be appended at the -position of the write pointer. The size of the data that is appended MUST = be a -multiple of \field{write_granularity} bytes and MUST NOT exceed the -\field{max_append_sectors} value provided by the device in -\field{virtio_blk_zoned_characteristics} configuration space structure. - -Upon a successful completion of a VIRTIO_BLK_T_ZONE_APPEND request, the dr= iver -MAY read the starting sector location of the written data from the request -field \field{append_sector}. - -All VIRTIO_BLK_T_OUT requests issued by the driver to sequential zones and -VIRTIO_BLK_T_ZONE_APPEND requests MUST have: - -\begin{enumerate} -\item the data size that is a multiple of the number of bytes reported - by the device in the field \field{write_granularity} in the - \field{virtio_blk_zoned_characteristics} configuration space structure= . - -\item the value of the field \field{sector} that is a multiple of the numb= er of - bytes reported by the device in the field \field{write_granularity} in= the - \field{virtio_blk_zoned_characteristics} configuration space structure= . - -\item the data size that will not exceed the writable zone capacity when i= ts - value is added to the current value of the write pointer of the zone. - -\end{enumerate} - -\devicenormative{\subsubsection}{Device Operation}{Device Types / Block De= vice / Device Operation} - -The device MAY change the content of the \field{capacity} field during -operation of the device. When this happens, the device SHOULD trigger a -configuration change notification. - -A device MUST set the \field{status} byte to VIRTIO_BLK_S_IOERR -for a write request if the VIRTIO_BLK_F_RO feature if offered, and MUST NO= T -write any data. - -The device MUST set the \field{status} byte to VIRTIO_BLK_S_UNSUPP for -discard, secure erase and write zeroes commands if any unknown flag is set= . -Furthermore, the device MUST set the \field{status} byte to -VIRTIO_BLK_S_UNSUPP for discard commands if the \field{unmap} flag is set. - -For discard commands, the device MAY deallocate the specified range of -sectors in the device backend storage. - -For write zeroes commands, if the \field{unmap} is set, the device MAY -deallocate the specified range of sectors in the device backend storage, -as if the discard command had been sent. After a write zeroes command -is completed, reads of the specified ranges of sectors MUST return -zeroes. This is true independent of whether \field{unmap} was set or clea= r. - -The device SHOULD clear the \field{write_zeroes_may_unmap} field of the -virtio configuration space if and only if a write zeroes request cannot -result in deallocating one or more sectors. The device MAY change the -content of the field during operation of the device; when this happens, -the device SHOULD trigger a configuration change notification. - -A write is considered volatile when it is submitted; the contents of -sectors covered by a volatile write are undefined in persistent device -backend storage until the write becomes stable. A write becomes stable -once it is completed and one or more of the following conditions is true: - -\begin{enumerate} -\item\label{item:flush1} neither VIRTIO_BLK_F_CONFIG_WCE nor - VIRTIO_BLK_F_FLUSH feature were negotiated, but VIRTIO_BLK_F_FLUSH was - offered by the device; - -\item\label{item:flush2} the VIRTIO_BLK_F_CONFIG_WCE feature was negotiate= d and the - \field{writeback} field in configuration space was 0 \textbf{all the tim= e between - the submission of the write and its completion}; - -\item\label{item:flush3} a VIRTIO_BLK_T_FLUSH request is sent \textbf{afte= r the write is - completed} and is completed itself. -\end{enumerate} - -If the device is backed by persistent storage, the device MUST ensure that -stable writes are committed to it, before reporting completion of the writ= e -(cases~\ref{item:flush1} and~\ref{item:flush2}) or the flush -(case~\ref{item:flush3}). Failure to do so can cause data loss -in case of a crash. - -If the driver changes \field{writeback} between the submission of the writ= e -and its completion, the write could be either volatile or stable when -its completion is reported; in other words, the exact behavior is undefine= d. - -% According to the device requirements for device initialization: -% Offer(CONFIG_WCE) =3D> Offer(FLUSH). -% -% After reversing the implication: -% not Offer(FLUSH) =3D> not Offer(CONFIG_WCE). - -If VIRTIO_BLK_F_FLUSH was not offered by the - device\footnote{Note that in this case, according to - \ref{devicenormative:Device Types / Block Device / Device Initialization= }, - the device will not have offered VIRTIO_BLK_F_CONFIG_WCE either.}, the -device MAY also commit writes to persistent device backend storage before -reporting their completion. Unlike case~\ref{item:flush1}, however, this -is not an absolute requirement of the specification. - -\begin{note} - An implementation that does not offer VIRTIO_BLK_F_FLUSH and does not co= mmit - completed writes will not be resilient to data loss in case of crashes. - Not offering VIRTIO_BLK_F_FLUSH is an absolute requirement - for implementations that do not wish to be safe against such data losses= . -\end{note} - -If the device is backed by storage providing lifetime metrics (such as eMM= C -or UFS persistent storage), the device SHOULD offer the VIRTIO_BLK_F_LIFET= IME -flag. The flag MUST NOT be offered if the device is backed by storage for = which -the lifetime metrics described in this document cannot be obtained or for = which -such metrics have no useful meaning. If the metrics are offered, the devic= e MUST NOT -send any reserved values, as defined in this specification. - -\begin{note} - The device lifetime metrics \field{pre_eol_info}, \field{device_lifetime= _est_a} - and \field{device_lifetime_est_b} are discussed in the JESD84-B50 specif= ication. - - The complete JESD84-B50 is available at the JEDEC website (https://www.j= edec.org) - pursuant to JEDEC's licensing terms and conditions. This information is = provided to - simplfy passthrough implementations from eMMC devices. -\end{note} - -If the VIRTIO_BLK_F_ZONED feature is not negotiated, the device MUST rejec= t -VIRTIO_BLK_T_ZONE_REPORT, VIRTIO_BLK_T_ZONE_OPEN, VIRTIO_BLK_T_ZONE_CLOSE, -VIRTIO_BLK_T_ZONE_FINISH, VIRTIO_BLK_T_ZONE_APPEND, VIRTIO_BLK_T_ZONE_RESE= T and -VIRTIO_BLK_T_ZONE_RESET_ALL requests with VIRTIO_BLK_S_UNSUPP status. - -The following device requirements only apply if the VIRTIO_BLK_F_ZONED fea= ture -is negotiated. - -If a request of type VIRTIO_BLK_T_ZONE_OPEN, VIRTIO_BLK_T_ZONE_CLOSE, -VIRTIO_BLK_T_ZONE_FINISH or VIRTIO_BLK_T_ZONE_RESET is issued for a Conven= tional -zone (type VIRTIO_BLK_ZT_CONV), the device MUST complete the request with -VIRTIO_BLK_S_ZONE_INVALID_CMD \field{status}. - -If the zone specified by the VIRTIO_BLK_T_ZONE_APPEND request is not a SWR= zone, -then the request SHALL be completed with VIRTIO_BLK_S_ZONE_INVALID_CMD -\field{status}. - -The device handles a VIRTIO_BLK_T_ZONE_OPEN request by attempting to chang= e the -state of the zone with the \field{sector} address to VIRTIO_BLK_ZS_EOPEN. = If the -transition to this state can not be performed, the request MUST be complet= ed -with VIRTIO_BLK_S_ZONE_INVALID_CMD \field{status}. If, while processing th= is -request, the available zone resources are insufficient, then the zone stat= e does -not change and the request MUST be completed with -VIRTIO_BLK_S_ZONE_OPEN_RESOURCE or VIRTIO_BLK_S_ZONE_ACTIVE_RESOURCE value= in -the field \field{status}. - -The device handles a VIRTIO_BLK_T_ZONE_CLOSE request by attempting to chan= ge the -state of the zone with the \field{sector} address to VIRTIO_BLK_ZS_CLOSED.= If -the transition to this state can not be performed, the request MUST be com= pleted -with VIRTIO_BLK_S_ZONE_INVALID_CMD value in the field \field{status}. - -The device handles a VIRTIO_BLK_T_ZONE_FINISH request by attempting to cha= nge -the state of the zone with the \field{sector} address to VIRTIO_BLK_ZS_FUL= L. If -the transition to this state can not be performed, the zone state does not -change and the request MUST be completed with VIRTIO_BLK_S_ZONE_INVALID_CM= D -value in the field \field{status}. - -The device handles a VIRTIO_BLK_T_ZONE_RESET request by attempting to chan= ge the -state of the zone with the \field{sector} address to VIRTIO_BLK_ZS_EMPTY s= tate. -If the transition to this state can not be performed, the zone state does = not -change and the request MUST be completed with VIRTIO_BLK_S_ZONE_INVALID_CM= D -value in the field \field{status}. - -The device handles a VIRTIO_BLK_T_ZONE_RESET_ALL request by transitioning = all -sequential device zones in VIRTIO_BLK_ZS_IOPEN, VIRTIO_BLK_ZS_EOPEN, -VIRTIO_BLK_ZS_CLOSED and VIRTIO_BLK_ZS_FULL state to VIRTIO_BLK_ZS_EMPTY s= tate. - -Upon receiving a VIRTIO_BLK_T_ZONE_APPEND request or a VIRTIO_BLK_T_OUT -request issued to a SWR zone in VIRTIO_BLK_ZS_EMPTY or VIRTIO_BLK_ZS_CLOSE= D -state, the device attempts to perform the transition of the zone to -VIRTIO_BLK_ZS_IOPEN state before writing data. This transition may fail du= e to -insufficient open and/or active zone resources available on the device. In= this -case, the request MUST be completed with VIRTIO_BLK_S_ZONE_OPEN_RESOURCE o= r -VIRTIO_BLK_S_ZONE_ACTIVE_RESOURCE value in the \field{status}. - -If the \field{sector} field in the VIRTIO_BLK_T_ZONE_APPEND request does n= ot -specify the lowest sector for a zone, then the request SHALL be completed = with -VIRTIO_BLK_S_ZONE_INVALID_CMD value in \field{status}. - -A VIRTIO_BLK_T_ZONE_APPEND request or a VIRTIO_BLK_T_OUT request that has = the -data range that exceeds the remaining writable capacity for the zone, then= the -request SHALL be completed with VIRTIO_BLK_S_ZONE_INVALID_CMD value in -\field{status}. - -If a request of the type VIRTIO_BLK_T_ZONE_APPEND is completed with -VIRTIO_BLK_S_OK status, the field \field{append_sector} in -\field{virtio_blk_req_za} MUST be set by the device to contain the first s= ector -of the data written to the zone. - -If a request of the type VIRTIO_BLK_T_ZONE_APPEND is completed with a stat= us -other than VIRTIO_BLK_S_OK, the value of \field{append_sector} field in -\field{virtio_blk_req_za} is undefined. - -A VIRTIO_BLK_T_ZONE_APPEND request that has the data size that exceeds -\field{max_append_sectors} configuration space value, then, -\begin{itemize} -\item if \field{max_append_sectors} configuration space value is reported = as - zero by the device, the request SHALL be completed with VIRTIO_BLK_S_U= NSUPP - \field{status}. - -\item if \field{max_append_sectors} configuration space value is reported = as - a non-zero value by the device, the request SHALL be completed with - VIRTIO_BLK_S_ZONE_INVALID_CMD \field{status}. -\end{itemize} - -If a VIRTIO_BLK_T_ZONE_APPEND request, a VIRTIO_BLK_T_IN request or a -VIRTIO_BLK_T_OUT request issued to a SWR zone has the range that has secto= rs in -more than one zone, then the request SHALL be completed with -VIRTIO_BLK_S_ZONE_INVALID_CMD value in the field \field{status}. - -A VIRTIO_BLK_T_OUT request that has the \field{sector} value that is not a= ligned -with the write pointer for the zone, then the request SHALL be completed w= ith -VIRTIO_BLK_S_ZONE_UNALIGNED_WP value in the field \field{status}. - -In order to avoid resource-related errors while opening zones implicitly, = the -device MAY automatically transition zones in VIRTIO_BLK_ZS_IOPEN state to -VIRTIO_BLK_ZS_CLOSED state. - -All VIRTIO_BLK_T_OUT requests or VIRTIO_BLK_T_ZONE_APPEND requests issued -to a zone in the VIRTIO_BLK_ZS_RDONLY state SHALL be completed with -VIRTIO_BLK_S_ZONE_INVALID_CMD \field{status}. - -All requests issued to a zone in the VIRTIO_BLK_ZS_OFFLINE state SHALL be -completed with VIRTIO_BLK_S_ZONE_INVALID_CMD value in the field \field{sta= tus}. - -The device MUST consider the sectors that are read between the write point= er -position of a zone and the end of the last sector of the zone as unwritten= data. -The sectors between the write pointer position and the end of the last sec= tor -within the zone capacity during VIRTIO_BLK_T_ZONE_FINISH request processin= g are -also considered unwritten data. - -When unwritten data is present in the sector range of a read request, the = device -MUST process this data in one of the following ways - - -\begin{enumerate} -\item Fill the unwritten data with a device-specific byte pattern. The -configuration, control and reporting of this byte pattern is beyond the sc= ope -of this standard. This is the preferred approach. - -\item Fail the request. Depending on the driver implementation, this may p= revent -the device from becoming operational. -\end{enumerate} - -If both the VIRTIO_BLK_F_ZONED and VIRTIO_BLK_F_SECURE_ERASE features are -negotiated, then - -\begin{enumerate} -\item the field \field{secure_erase_sector_alignment} in the configuration= space -of the device MUST be a multiple of \field{zone_sectors} value reported in= the -device configuration space. - -\item the data size in VIRTIO_BLK_T_SECURE_ERASE requests MUST be a multip= le of -\field{zone_sectors} value in the device configuration space. -\end{enumerate} - -The device MUST handle a VIRTIO_BLK_T_SECURE_ERASE request in the same way= it -handles VIRTIO_BLK_T_ZONE_RESET request for the zone range specified in th= e -VIRTIO_BLK_T_SECURE_ERASE request. - -\subsubsection{Legacy Interface: Device Operation}\label{sec:Device Types = / Block Device / Device Operation / Legacy Interface: Device Operation} -When using the legacy interface, transitional devices and drivers -MUST format the fields in struct virtio_blk_req -according to the native endian of the guest rather than -(necessarily when not using the legacy interface) little-endian. - -When using the legacy interface, transitional drivers -SHOULD ignore the used length values. -\begin{note} -Historically, some devices put the total descriptor length, -or the total length of device-writable buffers there, -even when only the status byte was actually written. -\end{note} - -The \field{reserved} field was previously called \field{ioprio}. \field{i= oprio} -is a hint about the relative priorities of requests to the device: -higher numbers indicate more important requests. - -\begin{lstlisting} -#define VIRTIO_BLK_T_FLUSH_OUT 5 -\end{lstlisting} - -The command VIRTIO_BLK_T_FLUSH_OUT was a synonym for VIRTIO_BLK_T_FLUSH; -a driver MUST treat it as a VIRTIO_BLK_T_FLUSH command. - -\begin{lstlisting} -#define VIRTIO_BLK_T_BARRIER 0x80000000 -\end{lstlisting} - -If the device has VIRTIO_BLK_F_BARRIER -feature the high bit (VIRTIO_BLK_T_BARRIER) indicates that this -request acts as a barrier and that all preceding requests SHOULD be -complete before this one, and all following requests SHOULD NOT be -started until this is complete. - -\begin{note} A barrier does not flush -caches in the underlying backend device in host, and thus does not -serve as data consistency guarantee. Only a VIRTIO_BLK_T_FLUSH request -does that. -\end{note} - -Some older legacy devices did not commit completed writes to persistent -device backend storage when VIRTIO_BLK_F_FLUSH was offered but not -negotiated. In order to work around this, the driver MAY set the -\field{writeback} to 0 (if available) or it MAY send an explicit flush -request after every completed write. - -If the device has VIRTIO_BLK_F_SCSI feature, it can also support -scsi packet command requests, each of these requests is of form: - -\begin{lstlisting} -/* All fields are in guest's native endian. */ -struct virtio_scsi_pc_req { - u32 type; - u32 ioprio; - u64 sector; - u8 cmd[]; - u8 data[][512]; -#define SCSI_SENSE_BUFFERSIZE 96 - u8 sense[SCSI_SENSE_BUFFERSIZE]; - u32 errors; - u32 data_len; - u32 sense_len; - u32 residual; - u8 status; -}; -\end{lstlisting} - -A request type can also be a scsi packet command (VIRTIO_BLK_T_SCSI_CMD or -VIRTIO_BLK_T_SCSI_CMD_OUT). The two types are equivalent, the device -does not distinguish between them: - -\begin{lstlisting} -#define VIRTIO_BLK_T_SCSI_CMD 2 -#define VIRTIO_BLK_T_SCSI_CMD_OUT 3 -\end{lstlisting} - -The \field{cmd} field is only present for scsi packet command requests, -and indicates the command to perform. This field MUST reside in a -single, separate device-readable buffer; command length can be derived -from the length of this buffer. - -Note that these first three (four for scsi packet commands) -fields are always device-readable: \field{data} is either device-readable -or device-writable, depending on the request. The size of the read or -write can be derived from the total size of the request buffers. - -\field{sense} is only present for scsi packet command requests, -and indicates the buffer for scsi sense data. - -\field{data_len} is only present for scsi packet command -requests, this field is deprecated, and SHOULD be ignored by the -driver. Historically, devices copied data length there. - -\field{sense_len} is only present for scsi packet command -requests and indicates the number of bytes actually written to -the \field{sense} buffer. - -\field{residual} field is only present for scsi packet command -requests and indicates the residual size, calculated as data -length - number of bytes actually transferred. - -\subsubsection{Legacy Interface: Framing Requirements}\label{sec:Device Ty= pes / Block Device / Legacy Interface: Framing Requirements} - -When using legacy interfaces, transitional drivers which have not -negotiated VIRTIO_F_ANY_LAYOUT: - -\begin{itemize} -\item MUST use a single 8-byte descriptor containing \field{type}, - \field{reserved} and \field{sector}, followed by descriptors - for \field{data}, then finally a separate 1-byte descriptor - for \field{status}. - -\item For SCSI commands there are additional constraints. - \field{sense} MUST reside in a - single separate device-writable descriptor of size 96 bytes, - and \field{errors}, \field{data_len}, \field{sense_len} and - \field{residual} MUST reside a single separate - device-writable descriptor. -\end{itemize} - -See \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Message Fr= aming}. +\input{device-types/blk/description.tex} =20 \section{Console Device}\label{sec:Device Types / Console Device} =20 diff --git a/device-types/blk/description.tex b/device-types/blk/descriptio= n.tex new file mode 100644 index 0000000..20007e3 --- /dev/null +++ b/device-types/blk/description.tex @@ -0,0 +1,1313 @@ +\section{Block Device}\label{sec:Device Types / Block Device} + +The virtio block device is a simple virtual block device (ie. +disk). Read and write requests (and other exotic requests) are +placed in one of its queues, and serviced (probably out of order) by the +device except where noted. + +\subsection{Device ID}\label{sec:Device Types / Block Device / Device ID} + 2 + +\subsection{Virtqueues}\label{sec:Device Types / Block Device / Virtqueues= } +\begin{description} +\item[0] requestq1 +\item[\ldots] +\item[N-1] requestqN +\end{description} + + N=3D1 if VIRTIO_BLK_F_MQ is not negotiated, otherwise N is set by + \field{num_queues}. + +\subsection{Feature bits}\label{sec:Device Types / Block Device / Feature = bits} + +\begin{description} +\item[VIRTIO_BLK_F_SIZE_MAX (1)] Maximum size of any single segment is + in \field{size_max}. + +\item[VIRTIO_BLK_F_SEG_MAX (2)] Maximum number of segments in a + request is in \field{seg_max}. + +\item[VIRTIO_BLK_F_GEOMETRY (4)] Disk-style geometry specified in + \field{geometry}. + +\item[VIRTIO_BLK_F_RO (5)] Device is read-only. + +\item[VIRTIO_BLK_F_BLK_SIZE (6)] Block size of disk is in \field{blk_size}= . + +\item[VIRTIO_BLK_F_FLUSH (9)] Cache flush command support. + +\item[VIRTIO_BLK_F_TOPOLOGY (10)] Device exports information on optimal I/= O + alignment. + +\item[VIRTIO_BLK_F_CONFIG_WCE (11)] Device can toggle its cache between wr= iteback + and writethrough modes. + +\item[VIRTIO_BLK_F_MQ (12)] Device supports multiqueue. + +\item[VIRTIO_BLK_F_DISCARD (13)] Device can support discard command, maxim= um + discard sectors size in \field{max_discard_sectors} and maximum discar= d + segment number in \field{max_discard_seg}. + +\item[VIRTIO_BLK_F_WRITE_ZEROES (14)] Device can support write zeroes comm= and, + maximum write zeroes sectors size in \field{max_write_zeroes_sectors}= and + maximum write zeroes segment number in \field{max_write_zeroes_seg}. + +\item[VIRTIO_BLK_F_LIFETIME (15)] Device supports providing storage lifeti= me + information. + +\item[VIRTIO_BLK_F_SECURE_ERASE (16)] Device supports secure erase command= , + maximum erase sectors count in \field{max_secure_erase_sectors} and + maximum erase segment number in \field{max_secure_erase_seg}. + +\item[VIRTIO_BLK_F_ZONED(17)] Device is a Zoned Block Device, that is, a d= evice +=09that follows the zoned storage device behavior that is also supported b= y +=09industry standards such as the T10 Zoned Block Command standard (ZBC r0= 5) or +=09the NVMe(TM) NVM Express Zoned Namespace Command Set Specification 1.1b +=09(ZNS). For brevity, these standard documents are referred as "ZBD stand= ards" +=09from this point on in the text. + +\end{description} + +\subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Bl= ock Device / Feature bits / Legacy Interface: Feature bits} + +\begin{description} +\item[VIRTIO_BLK_F_BARRIER (0)] Device supports request barriers. + +\item[VIRTIO_BLK_F_SCSI (7)] Device supports scsi packet commands. +\end{description} + +\begin{note} + In the legacy interface, VIRTIO_BLK_F_FLUSH was also + called VIRTIO_BLK_F_WCE. +\end{note} + +\subsection{Device configuration layout}\label{sec:Device Types / Block De= vice / Device configuration layout} + +The \field{capacity} of the device (expressed in 512-byte sectors) is alwa= ys +present. The availability of the others all depend on various feature +bits as indicated above. + +The field \field{num_queues} only exists if VIRTIO_BLK_F_MQ is set. This f= ield specifies +the number of queues. + +The parameters in the configuration space of the device \field{max_discard= _sectors} +\field{discard_sector_alignment} are expressed in 512-byte units if the +VIRTIO_BLK_F_DISCARD feature bit is negotiated. The \field{max_write_zeroe= s_sectors} +is expressed in 512-byte units if the VIRTIO_BLK_F_WRITE_ZEROES feature +bit is negotiated. The parameters in the configuration space of the device +\field{max_secure_erase_sectors} \field{secure_erase_sector_alignment} are= expressed +in 512-byte units if the VIRTIO_BLK_F_SECURE_ERASE feature bit is negotiat= ed. + +If the VIRTIO_BLK_F_ZONED feature is negotiated, then in +\field{virtio_blk_zoned_characteristics}, +\begin{itemize} +\item \field{zone_sectors} value is expressed in 512-byte sectors. +\item \field{max_append_sectors} value is expressed in 512-byte sectors. +\item \field{write_granularity} value is expressed in bytes. +\end{itemize} + +The \field{model} field in \field{zoned} may have the following values: + +\begin{lstlisting} +#define VIRTIO_BLK_Z_NONE 0 +#define VIRTIO_BLK_Z_HM 1 +#define VIRTIO_BLK_Z_HA 2 +\end{lstlisting} + +Depending on their design, zoned block devices may follow several possible +models of operation. The three models that are standardized for ZBDs are +drive-managed, host-managed and host-aware. + +While being zoned internally, drive-managed ZBDs behave exactly like regul= ar, +non-zoned block devices. For the purposes of virtio standardization, +drive-managed ZBDs can always be treated as non-zoned devices. These devic= es +have the VIRTIO_BLK_Z_NONE model value set in the \field{model} field in +\field{zoned}. + +Devices that offer the VIRTIO_BLK_F_ZONED feature while reporting the +VIRTIO_BLK_Z_NONE zoned model are drive-managed zoned block devices. In th= is +case, the driver treats the device as a regular non-zoned block device. + +Host-managed zoned block devices have their LBA range divided into Sequent= ial +Write Required (SWR) zones that require some additional handling by the ho= st +for correct operation. All write requests to SWR zones are required be +sequential and zones containing some written data need to be reset before = that +data can be rewritten. Host-managed devices support a set of ZBD-specific = I/O +requests that can be used by the host to manage device zones. Host-managed +devices report VIRTIO_BLK_Z_HM in the \field{model} field in \field{zoned}= . + +Host-aware zoned block devices have their LBA range divided to Sequential +Write Preferred (SWP) zones that support random write access, similar to +regular non-zoned devices. However, the device I/O performance might not b= e +optimal if SWP zones are used in a random I/O pattern. SWP zones also supp= ort +the same set of ZBD-specific I/O requests as host-managed devices that all= ow +host-aware devices to be managed by any host that supports zoned block dev= ices +to achieve its optimum performance. Host-aware devices report VIRTIO_BLK_Z= _HA +in the \field{model} field in \field{zoned}. + +Both SWR zones and SWP zones are sometimes referred as sequential zones. + +During device operation, sequential zones can be in one of the following s= tates: +empty, implicitly-open, explicitly-open, closed and full. The state machin= e that +governs the transitions between these states is described later in this do= cument. + +SWR and SWP zones consume volatile device resources while being in certain +states and the device may set limits on the number of zones that can be in= these +states simultaneously. + +Zoned block devices use two internal counters to account for the device +resources in use, the number of currently open zones and the number of cur= rently +active zones. + +Any zone state transition from a state that doesn't consume a zone resourc= e to a +state that consumes the same resource increments the internal device count= er for +that resource. Any zone transition out of a state that consumes a zone res= ource +to a state that doesn't consume the same resource decrements the counter. = Any +request that causes the device to exceed the reported zone resource limits= is +terminated by the device with a "zone resources exceeded" error as defined= for +specific commands later. + +\begin{lstlisting} +struct virtio_blk_config { + le64 capacity; + le32 size_max; + le32 seg_max; + struct virtio_blk_geometry { + le16 cylinders; + u8 heads; + u8 sectors; + } geometry; + le32 blk_size; + struct virtio_blk_topology { + // # of logical blocks per physical block (log2) + u8 physical_block_exp; + // offset of first aligned logical block + u8 alignment_offset; + // suggested minimum I/O size in blocks + le16 min_io_size; + // optimal (suggested maximum) I/O size in blocks + le32 opt_io_size; + } topology; + u8 writeback; + u8 unused0; + u16 num_queues; + le32 max_discard_sectors; + le32 max_discard_seg; + le32 discard_sector_alignment; + le32 max_write_zeroes_sectors; + le32 max_write_zeroes_seg; + u8 write_zeroes_may_unmap; + u8 unused1[3]; + le32 max_secure_erase_sectors; + le32 max_secure_erase_seg; + le32 secure_erase_sector_alignment; + struct virtio_blk_zoned_characteristics { + le32 zone_sectors; + le32 max_open_zones; + le32 max_active_zones; + le32 max_append_sectors; + le32 write_granularity; + u8 model; + u8 unused2[3]; + } zoned; +}; +\end{lstlisting} + + +\subsubsection{Legacy Interface: Device configuration layout}\label{sec:De= vice Types / Block Device / Device configuration layout / Legacy Interface:= Device configuration layout} +When using the legacy interface, transitional devices and drivers +MUST format the fields in struct virtio_blk_config +according to the native endian of the guest rather than +(necessarily when not using the legacy interface) little-endian. + + +\subsection{Device Initialization}\label{sec:Device Types / Block Device /= Device Initialization} + +\begin{enumerate} +\item The device size can be read from \field{capacity}. + +\item If the VIRTIO_BLK_F_BLK_SIZE feature is negotiated, + \field{blk_size} can be read to determine the optimal sector size + for the driver to use. This does not affect the units used in + the protocol (always 512 bytes), but awareness of the correct + value can affect performance. + +\item If the VIRTIO_BLK_F_RO feature is set by the device, any write + requests will fail. + +\item If the VIRTIO_BLK_F_TOPOLOGY feature is negotiated, the fields in th= e + \field{topology} struct can be read to determine the physical block size= and optimal + I/O lengths for the driver to use. This also does not affect the units + in the protocol, only performance. + +\item If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the cache + mode can be read or set through the \field{writeback} field. 0 correspo= nds + to a writethrough cache, 1 to a writeback cache\footnote{Consistent with + \ref{devicenormative:Device Types / Block Device / Device Operation}, + a writethrough cache can be defined broadly as a cache that commits + writes to persistent device backend storage before reporting their + completion. For example, a battery-backed writeback cache actually + counts as writethrough according to this definition.}. The cache mode + after reset can be either writeback or writethrough. The actual + mode can be determined by reading \field{writeback} after feature + negotiation. + +\item If the VIRTIO_BLK_F_DISCARD feature is negotiated, + \field{max_discard_sectors} and \field{max_discard_seg} can be read + to determine the maximum discard sectors and maximum number of discard + segments for the block driver to use. \field{discard_sector_alignment} + can be used by OS when splitting a request based on alignment. + +\item If the VIRTIO_BLK_F_WRITE_ZEROES feature is negotiated, + \field{max_write_zeroes_sectors} and \field{max_write_zeroes_seg} can + be read to determine the maximum write zeroes sectors and maximum + number of write zeroes segments for the block driver to use. + +\item If the VIRTIO_BLK_F_MQ feature is negotiated, \field{num_queues} fie= ld + can be read to determine the number of queues. + +\item If the VIRTIO_BLK_F_SECURE_ERASE feature is negotiated, + \field{max_secure_erase_sectors} and \field{max_secure_erase_seg} can = be read + to determine the maximum secure erase sectors and maximum number of + secure erase segments for the block driver to use. + \field{secure_erase_sector_alignment} can be used by OS when splitting= a + request based on alignment. + +\item If the VIRTIO_BLK_F_ZONED feature is negotiated, the fields in + \field{zoned} can be read by the driver to determine the zone + characteristics of the device. All \field{zoned} fields are read-only. + +\end{enumerate} + +\drivernormative{\subsubsection}{Device Initialization}{Device Types / Blo= ck Device / Device Initialization} + +Drivers SHOULD NOT negotiate VIRTIO_BLK_F_FLUSH if they are incapable of +sending VIRTIO_BLK_T_FLUSH commands. + +If neither VIRTIO_BLK_F_CONFIG_WCE nor VIRTIO_BLK_F_FLUSH are +negotiated, the driver MAY deduce the presence of a writethrough cache. +If VIRTIO_BLK_F_CONFIG_WCE was not negotiated but VIRTIO_BLK_F_FLUSH was, +the driver SHOULD assume presence of a writeback cache. + +The driver MUST NOT read \field{writeback} before setting +the FEATURES_OK \field{device status} bit. + +Drivers MUST NOT negotiate the VIRTIO_BLK_F_ZONED feature if they are inca= pable +of supporting devices with the VIRTIO_BLK_Z_HM, VIRTIO_BLK_Z_HA or +VIRTIO_BLK_Z_NONE zoned model. + +If the VIRTIO_BLK_F_ZONED feature is offered by the device with the +VIRTIO_BLK_Z_HM zone model, then the VIRTIO_BLK_F_DISCARD feature MUST NOT= be +offered by the driver. + +If the VIRTIO_BLK_F_ZONED feature and VIRTIO_BLK_F_DISCARD feature are bot= h +offered by the device with the VIRTIO_BLK_Z_HA or VIRTIO_BLK_Z_NONE zone m= odel, +then the driver MAY negotiate these two bits independently. + +If the VIRTIO_BLK_F_ZONED feature is negotiated, then +\begin{itemize} +\item if the driver that can not support host-managed zoned devices + reads VIRTIO_BLK_Z_HM from the \field{model} field of \field{zoned}, t= he + driver MUST NOT set FEATURES_OK flag and instead set the FAILED bit. + +\item if the driver that can not support zoned devices reads VIRTIO_BLK_Z_= HA + from the \field{model} field of \field{zoned}, the driver + MAY handle the device as a non-zoned device. In this case, the + driver SHOULD ignore all other fields in \field{zoned}. +\end{itemize} + +\devicenormative{\subsubsection}{Device Initialization}{Device Types / Blo= ck Device / Device Initialization} + +Devices SHOULD always offer VIRTIO_BLK_F_FLUSH, and MUST offer it +if they offer VIRTIO_BLK_F_CONFIG_WCE. + +If VIRTIO_BLK_F_CONFIG_WCE is negotiated but VIRTIO_BLK_F_FLUSH +is not, the device MUST initialize \field{writeback} to 0. + +The device MUST initialize padding bytes \field{unused0} and +\field{unused1} to 0. + +If the device that is being initialized is a not a zoned device, the devic= e +SHOULD NOT offer the VIRTIO_BLK_F_ZONED feature. + +The VIRTIO_BLK_F_ZONED feature cannot be properly negotiated without +FEATURES_OK bit. Legacy devices MUST NOT offer VIRTIO_BLK_F_ZONED feature = bit. + +If the VIRTIO_BLK_F_ZONED feature is not accepted by the driver, +\begin{itemize} +\item the device with the VIRTIO_BLK_Z_HA or VIRTIO_BLK_Z_NONE zone model = SHOULD + proceed with the initialization while setting all zoned characteristic= s + fields to zero. + +\item the device with the VIRTIO_BLK_Z_HM zone model MUST fail to set the + FEATURES_OK device status bit when the driver writes the Device Status + field. +\end{itemize} + +If the VIRTIO_BLK_F_ZONED feature is negotiated, then the \field{model} fi= eld in +\field{zoned} struct in the configuration space MUST be set by the device +\begin{itemize} +\item to the value of VIRTIO_BLK_Z_NONE if it operates as a drive-managed + zoned block device or a non-zoned block device. + +\item to the value of VIRTIO_BLK_Z_HM if it operates as a host-managed zon= ed + block device. + +\item to the value of VIRTIO_BLK_Z_HA if it operates as a host-aware zoned + block device. +\end{itemize} + +If the VIRTIO_BLK_F_ZONED feature is negotiated and the device \field{mode= l} +field in \field{zoned} struct is VIRTIO_BLK_Z_HM or VIRTIO_BLK_Z_HA, + +\begin{itemize} +\item the \field{zone_sectors} field of \field{zoned} MUST be set by the d= evice + to the size of a single zone on the device. All zones of the device ha= ve the + same size indicated by \field{zone_sectors} except for the last zone t= hat + MAY be smaller than all other zones. The driver can calculate the numb= er of + zones on the device as + \begin{lstlisting} + nr_zones =3D (capacity + zone_sectors - 1) / zone_sectors; + \end{lstlisting} + and the size of the last zone as + \begin{lstlisting} + zs_last =3D capacity - (nr_zones - 1) * zone_sectors; + \end{lstlisting} + +\item The \field{max_open_zones} field of the \field{zoned} structure MUST= be + set by the device to the maximum number of zones that can be open on t= he + device (zones in the implicit open or explicit open state). A value + of zero indicates that the device does not have any limit on the numbe= r of + open zones. + +\item The \field{max_active_zones} field of the \field{zoned} structure MU= ST + be set by the device to the maximum number zones that can be active on= the + device (zones in the implicit open, explicit open or closed state). A = value + of zero indicates that the device does not have any limit on the numbe= r of + active zones. + +\item the \field{max_append_sectors} field of \field{zoned} MUST be set by + the device to the maximum data size of a VIRTIO_BLK_T_ZONE_APPEND requ= est + that can be successfully issued to the device. The value of this field= MUST + NOT exceed the \field{seg_max} * \field{size_max} value. A device MAY = set + the \field{max_append_sectors} to zero if it doesn't support + VIRTIO_BLK_T_ZONE_APPEND requests. + +\item the \field{write_granularity} field of \field{zoned} MUST be set by = the + device to the offset and size alignment constraint for VIRTIO_BLK_T_OU= T + and VIRTIO_BLK_T_ZONE_APPEND requests issued to a sequential zone of t= he + device. + +\item the device MUST initialize padding bytes \field{unused2} to 0. +\end{itemize} + +\subsubsection{Legacy Interface: Device Initialization}\label{sec:Device T= ypes / Block Device / Device Initialization / Legacy Interface: Device Init= ialization} + +Because legacy devices do not have FEATURES_OK, transitional devices +MUST implement slightly different behavior around feature negotiation +when used through the legacy interface. In particular, when using the +legacy interface: + +\begin{itemize} +\item the driver MAY read or write \field{writeback} before setting + the DRIVER or DRIVER_OK \field{device status} bit + +\item the device MUST NOT modify the cache mode (and \field{writeback}) + as a result of a driver setting a status bit, unless + the DRIVER_OK bit is being set and the driver has not set the + VIRTIO_BLK_F_CONFIG_WCE driver feature bit. + +\item the device MUST NOT modify the cache mode (and \field{writeback}) + as a result of a driver modifying the driver feature bits, for example + if the driver sets the VIRTIO_BLK_F_CONFIG_WCE driver feature bit but + does not set the VIRTIO_BLK_F_FLUSH bit. +\end{itemize} + + +\subsection{Device Operation}\label{sec:Device Types / Block Device / Devi= ce Operation} + +The driver queues requests to the virtqueues, and they are used by +the device (not necessarily in order). Each request except +VIRTIO_BLK_T_ZONE_APPEND is of form: + +\begin{lstlisting} +struct virtio_blk_req { + le32 type; + le32 reserved; + le64 sector; + u8 data[]; + u8 status; +}; +\end{lstlisting} + +The type of the request is either a read (VIRTIO_BLK_T_IN), a write +(VIRTIO_BLK_T_OUT), a discard (VIRTIO_BLK_T_DISCARD), a write zeroes +(VIRTIO_BLK_T_WRITE_ZEROES), a flush (VIRTIO_BLK_T_FLUSH), a get device ID +string command (VIRTIO_BLK_T_GET_ID), a secure erase +(VIRTIO_BLK_T_SECURE_ERASE), or a get device lifetime command +(VIRTIO_BLK_T_GET_LIFETIME). + +\begin{lstlisting} +#define VIRTIO_BLK_T_IN 0 +#define VIRTIO_BLK_T_OUT 1 +#define VIRTIO_BLK_T_FLUSH 4 +#define VIRTIO_BLK_T_GET_ID 8 +#define VIRTIO_BLK_T_GET_LIFETIME 10 +#define VIRTIO_BLK_T_DISCARD 11 +#define VIRTIO_BLK_T_WRITE_ZEROES 13 +#define VIRTIO_BLK_T_SECURE_ERASE 14 +\end{lstlisting} + +The \field{sector} number indicates the offset (multiplied by 512) where +the read or write is to occur. This field is unused and set to 0 for +commands other than read, write and some zone operations. + +VIRTIO_BLK_T_IN requests populate \field{data} with the contents of sector= s +read from the block device (in multiples of 512 bytes). VIRTIO_BLK_T_OUT +requests write the contents of \field{data} to the block device (in multip= les +of 512 bytes). + +The \field{data} used for discard, secure erase or write zeroes commands +consists of one or more segments. The maximum number of segments is +\field{max_discard_seg} for discard commands, \field{max_secure_erase_seg}= for +secure erase commands and \field{max_write_zeroes_seg} for write zeroes +commands. +Each segment is of form: + +\begin{lstlisting} +struct virtio_blk_discard_write_zeroes { + le64 sector; + le32 num_sectors; + struct { + le32 unmap:1; + le32 reserved:31; + } flags; +}; +\end{lstlisting} + +\field{sector} indicates the starting offset (in 512-byte units) of the +segment, while \field{num_sectors} indicates the number of sectors in each +discarded range. \field{unmap} is only used in write zeroes commands and a= llows +the device to discard the specified range, provided that following reads r= eturn +zeroes. + +VIRTIO_BLK_T_GET_ID requests fetch the device ID string from the device in= to +\field{data}. The device ID string is a NUL-padded ASCII string up to 20 = bytes +long. If the string is 20 bytes long then there is no NUL terminator. + +The \field{data} used for VIRTIO_BLK_T_GET_LIFETIME requests is populated +by the device, and is of the form + +\begin{lstlisting} +struct virtio_blk_lifetime { + le16 pre_eol_info; + le16 device_lifetime_est_typ_a; + le16 device_lifetime_est_typ_b; +}; +\end{lstlisting} + +The \field{pre_eol_info} specifies the percentage of reserved blocks +that are consumed and will have one of these values: + +\begin{lstlisting} +/* Value not available */ +#define VIRTIO_BLK_PRE_EOL_INFO_UNDEFINED 0 +/* < 80% of reserved blocks are consumed */ +#define VIRTIO_BLK_PRE_EOL_INFO_NORMAL 1 +/* 80% of reserved blocks are consumed */ +#define VIRTIO_BLK_PRE_EOL_INFO_WARNING 2 +/* 90% of reserved blocks are consumed */ +#define VIRTIO_BLK_PRE_EOL_INFO_URGENT 3 +/* All others values are reserved */ +\end{lstlisting} + +The \field{device_lifetime_est_typ_a} refers to wear of SLC cells and is p= rovided +in increments of 10%, with 0 meaning undefined, 1 meaning up-to 10% of lif= etime +used, and so on, thru to 11 meaning estimated lifetime exceeded. +All values above 11 are reserved. + +The \field{device_lifetime_est_typ_b} refers to wear of MLC cells and is p= rovided +with the same semantics as \field{device_lifetime_est_typ_a}. + +The final \field{status} byte is written by the device: either +VIRTIO_BLK_S_OK for success, VIRTIO_BLK_S_IOERR for device or driver +error or VIRTIO_BLK_S_UNSUPP for a request unsupported by device: + +\begin{lstlisting} +#define VIRTIO_BLK_S_OK 0 +#define VIRTIO_BLK_S_IOERR 1 +#define VIRTIO_BLK_S_UNSUPP 2 +\end{lstlisting} + +The status of individual segments is indeterminate when a discard or write= zero +command produces VIRTIO_BLK_S_IOERR. A segment may have completed +successfully, failed, or not been processed by the device. + +The following requirements only apply if the VIRTIO_BLK_F_ZONED feature is +negotiated. + +In addition to the request types defined for non-zoned devices, the type o= f the +request can be a zone report (VIRTIO_BLK_T_ZONE_REPORT), an explicit zone = open +(VIRTIO_BLK_T_ZONE_OPEN), a zone close (VIRTIO_BLK_T_ZONE_CLOSE), a zone f= inish +(VIRTIO_BLK_T_ZONE_FINISH), a zone_append (VIRTIO_BLK_T_ZONE_APPEND), a zo= ne +reset (VIRTIO_BLK_T_ZONE_RESET) or a zone reset all +(VIRTIO_BLK_T_ZONE_RESET_ALL). + +\begin{lstlisting} +#define VIRTIO_BLK_T_ZONE_APPEND 15 +#define VIRTIO_BLK_T_ZONE_REPORT 16 +#define VIRTIO_BLK_T_ZONE_OPEN 18 +#define VIRTIO_BLK_T_ZONE_CLOSE 20 +#define VIRTIO_BLK_T_ZONE_FINISH 22 +#define VIRTIO_BLK_T_ZONE_RESET 24 +#define VIRTIO_BLK_T_ZONE_RESET_ALL 26 +\end{lstlisting} + +Requests of type VIRTIO_BLK_T_OUT, VIRTIO_BLK_T_ZONE_OPEN, +VIRTIO_BLK_T_ZONE_CLOSE, VIRTIO_BLK_T_ZONE_FINISH, VIRTIO_BLK_T_ZONE_APPEN= D, +VIRTIO_BLK_T_ZONE_RESET or VIRTIO_BLK_T_ZONE_RESET_ALL may be completed by= the +device with VIRTIO_BLK_S_OK, VIRTIO_BLK_S_IOERR or VIRTIO_BLK_S_UNSUPP +\field{status}, or, additionally, with VIRTIO_BLK_S_ZONE_INVALID_CMD, +VIRTIO_BLK_S_ZONE_UNALIGNED_WP, VIRTIO_BLK_S_ZONE_OPEN_RESOURCE or +VIRTIO_BLK_S_ZONE_ACTIVE_RESOURCE ZBD-specific status codes. + +Besides the request status, VIRTIO_BLK_T_ZONE_APPEND requests return the +starting sector of the appended data back to the driver. For this reason, +the VIRTIO_BLK_T_ZONE_APPEND request has the layout that is extended to ha= ve +the \field{append_sector} field to carry this value: + +\begin{lstlisting} +struct virtio_blk_req_za { + le32 type; + le32 reserved; + le64 sector; + u8 data[]; + le64 append_sector; + u8 status; +}; +\end{lstlisting} + +\begin{lstlisting} +#define VIRTIO_BLK_S_ZONE_INVALID_CMD 3 +#define VIRTIO_BLK_S_ZONE_UNALIGNED_WP 4 +#define VIRTIO_BLK_S_ZONE_OPEN_RESOURCE 5 +#define VIRTIO_BLK_S_ZONE_ACTIVE_RESOURCE 6 +\end{lstlisting} + +Requests of the type VIRTIO_BLK_T_ZONE_REPORT are reads and requests of th= e type +VIRTIO_BLK_T_ZONE_APPEND are writes. VIRTIO_BLK_T_ZONE_OPEN, +VIRTIO_BLK_T_ZONE_CLOSE, VIRTIO_BLK_T_ZONE_FINISH, VIRTIO_BLK_T_ZONE_RESET= and +VIRTIO_BLK_T_ZONE_RESET_ALL are non-data requests. + +Zone sector address is a 64-bit address of the first 512-byte sector of th= e +zone. + +VIRTIO_BLK_T_ZONE_OPEN, VIRTIO_BLK_T_ZONE_CLOSE, VIRTIO_BLK_T_ZONE_FINISH = and +VIRTIO_BLK_T_ZONE_RESET requests make the zone operation to act on a parti= cular +zone specified by the zone sector address in the \field{sector} of the req= uest. + +VIRTIO_BLK_T_ZONE_RESET_ALL request acts upon all applicable zones of the +device. The \field{sector} value is not used for this request. + +In ZBD standards, the VIRTIO_BLK_T_ZONE_REPORT request belongs to "Zone +Management Receive" command category and VIRTIO_BLK_T_ZONE_OPEN, +VIRTIO_BLK_T_ZONE_CLOSE, VIRTIO_BLK_T_ZONE_FINISH and +VIRTIO_BLK_T_ZONE_RESET/VIRTIO_BLK_T_ZONE_RESET_ALL requests are categoriz= ed as +"Zone Management Send" commands. VIRTIO_BLK_T_ZONE_APPEND is categorized +separately from zone management commands and is the only request that uses +the \field{append_secctor} field \field{virtio_blk_req_za} to return +to the driver the sector at which the data has been appended to the zone. + +VIRTIO_BLK_T_ZONE_REPORT is a read request that returns the information ab= out +the current state of zones on the device starting from the zone containing= the +\field{sector} of the request. The report consists of a header followed by= zero +or more zone descriptors. + +A zone report reply has the following structure: + +\begin{lstlisting} +struct virtio_blk_zone_report { + le64 nr_zones; + u8 reserved[56]; + struct virtio_blk_zone_descriptor zones[]; +}; +\end{lstlisting} + +The device sets the \field{nr_zones} field in the report header to the num= ber of +fully transferred zone descriptors in the data buffer. + +A zone descriptor has the following structure: + +\begin{lstlisting} +struct virtio_blk_zone_descriptor { + le64 z_cap; + le64 z_start; + le64 z_wp; + u8 z_type; + u8 z_state; + u8 reserved[38]; +}; +\end{lstlisting} + +The zone descriptor field \field{z_type} \field{virtio_blk_zone_descriptor= } +indicates the type of the zone. + +The following zone types are available: + +\begin{lstlisting} +#define VIRTIO_BLK_ZT_CONV 1 +#define VIRTIO_BLK_ZT_SWR 2 +#define VIRTIO_BLK_ZT_SWP 3 +\end{lstlisting} + +Read and write operations into zones with the VIRTIO_BLK_ZT_CONV (Conventi= onal) +type have the same behavior as read and write operations on a regular bloc= k +device. Any block in a conventional zone can be read or written at any tim= e and +in any order. + +Zones with VIRTIO_BLK_ZT_SWR can be read randomly, but must be written +sequentially at a certain point in the zone called the Write Pointer (WP).= With +every write, the Write Pointer is incremented by the number of sectors wri= tten. + +Zones with VIRTIO_BLK_ZT_SWP can be read randomly and should be written +sequentially, similarly to SWR zones. However, SWP zones can accept random= write +operations, that is, VIRTIO_BLK_T_OUT requests with a start sector differe= nt +from the zone write pointer position. + +The field \field{z_state} of \field{virtio_blk_zone_descriptor} indicates = the +state of the device zone. + +The following zone states are available: + +\begin{lstlisting} +#define VIRTIO_BLK_ZS_NOT_WP 0 +#define VIRTIO_BLK_ZS_EMPTY 1 +#define VIRTIO_BLK_ZS_IOPEN 2 +#define VIRTIO_BLK_ZS_EOPEN 3 +#define VIRTIO_BLK_ZS_CLOSED 4 +#define VIRTIO_BLK_ZS_RDONLY 13 +#define VIRTIO_BLK_ZS_FULL 14 +#define VIRTIO_BLK_ZS_OFFLINE 15 +\end{lstlisting} + +Zones of the type VIRTIO_BLK_ZT_CONV are always reported by the device to = be in +the VIRTIO_BLK_ZS_NOT_WP state. Zones of the types VIRTIO_BLK_ZT_SWR and +VIRTIO_BLK_ZT_SWP can not transition to the VIRTIO_BLK_ZS_NOT_WP state. + +Zones in VIRTIO_BLK_ZS_EMPTY (Empty), VIRTIO_BLK_ZS_IOPEN (Implicitly Open= ), +VIRTIO_BLK_ZS_EOPEN (Explicitly Open) and VIRTIO_BLK_ZS_CLOSED (Closed) st= ate +are writable, but zones in VIRTIO_BLK_ZS_RDONLY (Read-Only), VIRTIO_BLK_ZS= _FULL +(Full) and VIRTIO_BLK_ZS_OFFLINE (Offline) state are not. The write pointe= r +value (\field{z_wp}) is not valid for Read-Only, Full and Offline zones. + +The zone descriptor field \field{z_cap} contains the maximum number of 512= -byte +sectors that are available to be written with user data when the zone is i= n the +Empty state. This value shall be less than or equal to the \field{zone_sec= tors} +value in \field{virtio_blk_zoned_characteristics} structure in the device +configuration space. + +The zone descriptor field \field{z_start} contains the zone sector address= . + +The zone descriptor field \field{z_wp} contains the sector address where t= he +next write operation for this zone should be issued. This value is undefin= ed +for conventional zones and for zones in VIRTIO_BLK_ZS_RDONLY, +VIRTIO_BLK_ZS_FULL and VIRTIO_BLK_ZS_OFFLINE state. + +Depending on their state, zones consume resources as follows: +\begin{itemize} +\item a zone in VIRTIO_BLK_ZS_IOPEN and VIRTIO_BLK_ZS_EOPEN state consumes= one + open zone resource and, additionally, + +\item a zone in VIRTIO_BLK_ZS_IOPEN, VIRTIO_BLK_ZS_EOPEN and + VIRTIO_BLK_ZS_CLOSED state consumes one active resource. +\end{itemize} + +Attempts for zone transitions that violate zone resource limits must fail = with +VIRTIO_BLK_S_ZONE_OPEN_RESOURCE or VIRTIO_BLK_S_ZONE_ACTIVE_RESOURCE +\field{status}. + +Zones in the VIRTIO_BLK_ZS_EMPTY (Empty) state have the write pointer valu= e +equal to the sector address of the zone. In this state, the entire capacit= y of +the zone is available for writing. A zone can transition from this state t= o +\begin{itemize} +\item VIRTIO_BLK_ZS_IOPEN when a successful VIRTIO_BLK_T_OUT request or + VIRTIO_BLK_T_ZONE_APPEND with a non-zero data size is received for the= zone. + +\item VIRTIO_BLK_ZS_EOPEN when a successful VIRTIO_BLK_T_ZONE_OPEN request= is + received for the zone +\end{itemize} + +When a VIRTIO_BLK_T_ZONE_RESET request is issued to an Empty zone, the req= uest +is completed successfully and the zone stays in the VIRTIO_BLK_ZS_EMPTY st= ate. + +Zones in the VIRTIO_BLK_ZS_IOPEN (Implicitly Open) state transition from +this state to +\begin{itemize} +\item VIRTIO_BLK_ZS_EMPTY when a successful VIRTIO_BLK_T_ZONE_RESET reques= t is + received for the zone, + +\item VIRTIO_BLK_ZS_EMPTY when a successful VIRTIO_BLK_T_ZONE_RESET_ALL re= quest + is received by the device, + +\item VIRTIO_BLK_ZS_EOPEN when a successful VIRTIO_BLK_T_ZONE_OPEN request= is + received for the zone, + +\item VIRTIO_BLK_ZS_CLOSED when a successful VIRTIO_BLK_T_ZONE_CLOSE reque= st is + received for the zone, + +\item VIRTIO_BLK_ZS_CLOSED implicitly by the device when another zone is + entering the VIRTIO_BLK_ZS_IOPEN or VIRTIO_BLK_ZS_EOPEN state and the = number + of currently open zones is at \field{max_open_zones} limit, + +\item VIRTIO_BLK_ZS_FULL when a successful VIRTIO_BLK_T_ZONE_FINISH reques= t is + received for the zone. + +\item VIRTIO_BLK_ZS_FULL when a successful VIRTIO_BLK_T_OUT or + VIRTIO_BLK_T_ZONE_APPEND request that causes the zone to reach its wri= table + capacity is received for the zone. +\end{itemize} + +Zones in the VIRTIO_BLK_ZS_EOPEN (Explicitly Open) state transition from +this state to +\begin{itemize} +\item VIRTIO_BLK_ZS_EMPTY when a successful VIRTIO_BLK_T_ZONE_RESET reques= t is + received for the zone, + +\item VIRTIO_BLK_ZS_EMPTY when a successful VIRTIO_BLK_T_ZONE_RESET_ALL re= quest + is received by the device, + +\item VIRTIO_BLK_ZS_EMPTY when a successful VIRTIO_BLK_T_ZONE_CLOSE reques= t is + received for the zone and the write pointer of the zone has the value = equal + to the start sector of the zone, + +\item VIRTIO_BLK_ZS_CLOSED when a successful VIRTIO_BLK_T_ZONE_CLOSE reque= st is + received for the zone and the zone write pointer is larger then the st= art + sector of the zone, + +\item VIRTIO_BLK_ZS_FULL when a successful VIRTIO_BLK_T_ZONE_FINISH reques= t is + received for the zone, + +\item VIRTIO_BLK_ZS_FULL when a successful VIRTIO_BLK_T_OUT or + VIRTIO_BLK_T_ZONE_APPEND request that causes the zone to reach its wri= table + capacity is received for the zone. +\end{itemize} + +When a VIRTIO_BLK_T_ZONE_EOPEN request is issued to an Explicitly Open zon= e, the +request is completed successfully and the zone stays in the VIRTIO_BLK_ZS_= EOPEN +state. + +Zones in the VIRTIO_BLK_ZS_CLOSED (Closed) state transition from this stat= e +to +\begin{itemize} +\item VIRTIO_BLK_ZS_EMPTY when a successful VIRTIO_BLK_T_ZONE_RESET reques= t is + received for the zone, + +\item VIRTIO_BLK_ZS_EMPTY when a successful VIRTIO_BLK_T_ZONE_RESET_ALL re= quest + is received by the device, + +\item VIRTIO_BLK_ZS_IOPEN when a successful VIRTIO_BLK_T_OUT request or + VIRTIO_BLK_T_ZONE_APPEND with a non-zero data size is received for the= zone. + +\item VIRTIO_BLK_ZS_EOPEN when a successful VIRTIO_BLK_T_ZONE_OPEN request= is + received for the zone, +\end{itemize} + +When a VIRTIO_BLK_T_ZONE_CLOSE request is issued to a Closed zone, the req= uest +is completed successfully and the zone stays in the VIRTIO_BLK_ZS_CLOSED s= tate. + +Zones in the VIRTIO_BLK_ZS_FULL (Full) state transition from this state to +VIRTIO_BLK_ZS_EMPTY when a successful VIRTIO_BLK_T_ZONE_RESET request is +received for the zone or a successful VIRTIO_BLK_T_ZONE_RESET_ALL request = is +received by the device. + +When a VIRTIO_BLK_T_ZONE_FINISH request is issued to a Full zone, the requ= est +is completed successfully and the zone stays in the VIRTIO_BLK_ZS_FULL sta= te. + +The device may automatically transition zones to VIRTIO_BLK_ZS_RDONLY +(Read-Only) or VIRTIO_BLK_ZS_OFFLINE (Offline) state from any other state.= The +device may also automatically transition zones in the Read-Only state to t= he +Offline state. Zones in the Offline state may not transition to any other = state. +Such automatic transitions usually indicate hardware failures. The previou= sly +written data may only be read from zones in the Read-Only state. Zones in = the +Offline state can not be read or written. + +VIRTIO_BLK_S_ZONE_UNALIGNED_WP is set by the device when the request recei= ved +from the driver attempts to perform a write to an SWR zone and at least on= e of +the following conditions is met: + +\begin{itemize} +\item the starting sector of the request is not equal to the current value= of + the zone write pointer. + +\item the ending sector of the request data multiplied by 512 is not a mul= tiple + of the value reported by the device in the field \field{write_granular= ity} + in the device configuration space. +\end{itemize} + +VIRTIO_BLK_S_ZONE_OPEN_RESOURCE is set by the device when a zone operation= or +write request received from the driver can not be handled without exceedin= g the +\field{max_open_zones} limit value reported by the device in the configura= tion +space. + +VIRTIO_BLK_S_ZONE_ACTIVE_RESOURCE is set by the device when a zone operati= on or +write request received from the driver can not be handled without exceedin= g the +\field{max_active_zones} limit value reported by the device in the configu= ration +space. + +A zone transition request that leads to both the \field{max_open_zones} an= d the +\field{max_active_zones} limits to be exceeded is terminated by the device= with +VIRTIO_BLK_S_ZONE_ACTIVE_RESOURCE \field{status} value. + +The device reports all other error conditions related to zoned block model +operation by setting the VIRTIO_BLK_S_ZONE_INVALID_CMD value in +\field{status} of \field{virtio_blk_req} structure. + +\drivernormative{\subsubsection}{Device Operation}{Device Types / Block De= vice / Device Operation} + +The driver SHOULD check if the content of the \field{capacity} field has +changed upon receiving a configuration change notification. + +A driver MUST NOT submit a request which would cause a read or write +beyond \field{capacity}. + +A driver SHOULD accept the VIRTIO_BLK_F_RO feature if offered. + +A driver MUST set \field{sector} to 0 for a VIRTIO_BLK_T_FLUSH request. +A driver SHOULD NOT include any data in a VIRTIO_BLK_T_FLUSH request. + +The length of \field{data} MUST be a multiple of 512 bytes for VIRTIO_BLK_= T_IN +and VIRTIO_BLK_T_OUT requests. + +The length of \field{data} MUST be a multiple of the size of struct +virtio_blk_discard_write_zeroes for VIRTIO_BLK_T_DISCARD, +VIRTIO_BLK_T_SECURE_ERASE and VIRTIO_BLK_T_WRITE_ZEROES requests. + +The length of \field{data} MUST be 20 bytes for VIRTIO_BLK_T_GET_ID reques= ts. + +VIRTIO_BLK_T_DISCARD requests MUST NOT contain more than +\field{max_discard_seg} struct virtio_blk_discard_write_zeroes segments in +\field{data}. + +VIRTIO_BLK_T_SECURE_ERASE requests MUST NOT contain more than +\field{max_secure_erase_seg} struct virtio_blk_discard_write_zeroes segmen= ts in +\field{data}. + +VIRTIO_BLK_T_WRITE_ZEROES requests MUST NOT contain more than +\field{max_write_zeroes_seg} struct virtio_blk_discard_write_zeroes segmen= ts in +\field{data}. + +If the VIRTIO_BLK_F_CONFIG_WCE feature is negotiated, the driver MAY +switch to writethrough or writeback mode by writing respectively 0 and +1 to the \field{writeback} field. After writing a 0 to \field{writeback}, +the driver MUST NOT assume that any volatile writes have been committed +to persistent device backend storage. + +The \field{unmap} bit MUST be zero for discard commands. The driver +MUST NOT assume anything about the data returned by read requests after +a range of sectors has been discarded. + +A driver MUST NOT assume that individual segments in a multi-segment +VIRTIO_BLK_T_DISCARD or VIRTIO_BLK_T_WRITE_ZEROES request completed +successfully, failed, or were processed by the device at all if the reques= t +failed with VIRTIO_BLK_S_IOERR. + +The following requirements only apply if the VIRTIO_BLK_F_ZONED feature is +negotiated. + +A zone sector address provided by the driver MUST be a multiple of 512 byt= es. + +When forming a VIRTIO_BLK_T_ZONE_REPORT request, the driver MUST set a sec= tor +within the sector range of the starting zone to report to \field{sector} f= ield. +It MAY be a sector that is different from the zone sector address. + +In VIRTIO_BLK_T_ZONE_OPEN, VIRTIO_BLK_T_ZONE_CLOSE, VIRTIO_BLK_T_ZONE_FINI= SH and +VIRTIO_BLK_T_ZONE_RESET requests, the driver MUST set \field{sector} field= to +point at the first sector in the target zone. + +In VIRTIO_BLK_T_ZONE_RESET_ALL request, the driver MUST set the field +\field{sector} to zero value. + +The \field{sector} field of the VIRTIO_BLK_T_ZONE_APPEND request MUST spec= ify +the zone sector address of the zone to which data is to be appended at the +position of the write pointer. The size of the data that is appended MUST = be a +multiple of \field{write_granularity} bytes and MUST NOT exceed the +\field{max_append_sectors} value provided by the device in +\field{virtio_blk_zoned_characteristics} configuration space structure. + +Upon a successful completion of a VIRTIO_BLK_T_ZONE_APPEND request, the dr= iver +MAY read the starting sector location of the written data from the request +field \field{append_sector}. + +All VIRTIO_BLK_T_OUT requests issued by the driver to sequential zones and +VIRTIO_BLK_T_ZONE_APPEND requests MUST have: + +\begin{enumerate} +\item the data size that is a multiple of the number of bytes reported + by the device in the field \field{write_granularity} in the + \field{virtio_blk_zoned_characteristics} configuration space structure= . + +\item the value of the field \field{sector} that is a multiple of the numb= er of + bytes reported by the device in the field \field{write_granularity} in= the + \field{virtio_blk_zoned_characteristics} configuration space structure= . + +\item the data size that will not exceed the writable zone capacity when i= ts + value is added to the current value of the write pointer of the zone. + +\end{enumerate} + +\devicenormative{\subsubsection}{Device Operation}{Device Types / Block De= vice / Device Operation} + +The device MAY change the content of the \field{capacity} field during +operation of the device. When this happens, the device SHOULD trigger a +configuration change notification. + +A device MUST set the \field{status} byte to VIRTIO_BLK_S_IOERR +for a write request if the VIRTIO_BLK_F_RO feature if offered, and MUST NO= T +write any data. + +The device MUST set the \field{status} byte to VIRTIO_BLK_S_UNSUPP for +discard, secure erase and write zeroes commands if any unknown flag is set= . +Furthermore, the device MUST set the \field{status} byte to +VIRTIO_BLK_S_UNSUPP for discard commands if the \field{unmap} flag is set. + +For discard commands, the device MAY deallocate the specified range of +sectors in the device backend storage. + +For write zeroes commands, if the \field{unmap} is set, the device MAY +deallocate the specified range of sectors in the device backend storage, +as if the discard command had been sent. After a write zeroes command +is completed, reads of the specified ranges of sectors MUST return +zeroes. This is true independent of whether \field{unmap} was set or clea= r. + +The device SHOULD clear the \field{write_zeroes_may_unmap} field of the +virtio configuration space if and only if a write zeroes request cannot +result in deallocating one or more sectors. The device MAY change the +content of the field during operation of the device; when this happens, +the device SHOULD trigger a configuration change notification. + +A write is considered volatile when it is submitted; the contents of +sectors covered by a volatile write are undefined in persistent device +backend storage until the write becomes stable. A write becomes stable +once it is completed and one or more of the following conditions is true: + +\begin{enumerate} +\item\label{item:flush1} neither VIRTIO_BLK_F_CONFIG_WCE nor + VIRTIO_BLK_F_FLUSH feature were negotiated, but VIRTIO_BLK_F_FLUSH was + offered by the device; + +\item\label{item:flush2} the VIRTIO_BLK_F_CONFIG_WCE feature was negotiate= d and the + \field{writeback} field in configuration space was 0 \textbf{all the tim= e between + the submission of the write and its completion}; + +\item\label{item:flush3} a VIRTIO_BLK_T_FLUSH request is sent \textbf{afte= r the write is + completed} and is completed itself. +\end{enumerate} + +If the device is backed by persistent storage, the device MUST ensure that +stable writes are committed to it, before reporting completion of the writ= e +(cases~\ref{item:flush1} and~\ref{item:flush2}) or the flush +(case~\ref{item:flush3}). Failure to do so can cause data loss +in case of a crash. + +If the driver changes \field{writeback} between the submission of the writ= e +and its completion, the write could be either volatile or stable when +its completion is reported; in other words, the exact behavior is undefine= d. + +% According to the device requirements for device initialization: +% Offer(CONFIG_WCE) =3D> Offer(FLUSH). +% +% After reversing the implication: +% not Offer(FLUSH) =3D> not Offer(CONFIG_WCE). + +If VIRTIO_BLK_F_FLUSH was not offered by the + device\footnote{Note that in this case, according to + \ref{devicenormative:Device Types / Block Device / Device Initialization= }, + the device will not have offered VIRTIO_BLK_F_CONFIG_WCE either.}, the +device MAY also commit writes to persistent device backend storage before +reporting their completion. Unlike case~\ref{item:flush1}, however, this +is not an absolute requirement of the specification. + +\begin{note} + An implementation that does not offer VIRTIO_BLK_F_FLUSH and does not co= mmit + completed writes will not be resilient to data loss in case of crashes. + Not offering VIRTIO_BLK_F_FLUSH is an absolute requirement + for implementations that do not wish to be safe against such data losses= . +\end{note} + +If the device is backed by storage providing lifetime metrics (such as eMM= C +or UFS persistent storage), the device SHOULD offer the VIRTIO_BLK_F_LIFET= IME +flag. The flag MUST NOT be offered if the device is backed by storage for = which +the lifetime metrics described in this document cannot be obtained or for = which +such metrics have no useful meaning. If the metrics are offered, the devic= e MUST NOT +send any reserved values, as defined in this specification. + +\begin{note} + The device lifetime metrics \field{pre_eol_info}, \field{device_lifetime= _est_a} + and \field{device_lifetime_est_b} are discussed in the JESD84-B50 specif= ication. + + The complete JESD84-B50 is available at the JEDEC website (https://www.j= edec.org) + pursuant to JEDEC's licensing terms and conditions. This information is = provided to + simplfy passthrough implementations from eMMC devices. +\end{note} + +If the VIRTIO_BLK_F_ZONED feature is not negotiated, the device MUST rejec= t +VIRTIO_BLK_T_ZONE_REPORT, VIRTIO_BLK_T_ZONE_OPEN, VIRTIO_BLK_T_ZONE_CLOSE, +VIRTIO_BLK_T_ZONE_FINISH, VIRTIO_BLK_T_ZONE_APPEND, VIRTIO_BLK_T_ZONE_RESE= T and +VIRTIO_BLK_T_ZONE_RESET_ALL requests with VIRTIO_BLK_S_UNSUPP status. + +The following device requirements only apply if the VIRTIO_BLK_F_ZONED fea= ture +is negotiated. + +If a request of type VIRTIO_BLK_T_ZONE_OPEN, VIRTIO_BLK_T_ZONE_CLOSE, +VIRTIO_BLK_T_ZONE_FINISH or VIRTIO_BLK_T_ZONE_RESET is issued for a Conven= tional +zone (type VIRTIO_BLK_ZT_CONV), the device MUST complete the request with +VIRTIO_BLK_S_ZONE_INVALID_CMD \field{status}. + +If the zone specified by the VIRTIO_BLK_T_ZONE_APPEND request is not a SWR= zone, +then the request SHALL be completed with VIRTIO_BLK_S_ZONE_INVALID_CMD +\field{status}. + +The device handles a VIRTIO_BLK_T_ZONE_OPEN request by attempting to chang= e the +state of the zone with the \field{sector} address to VIRTIO_BLK_ZS_EOPEN. = If the +transition to this state can not be performed, the request MUST be complet= ed +with VIRTIO_BLK_S_ZONE_INVALID_CMD \field{status}. If, while processing th= is +request, the available zone resources are insufficient, then the zone stat= e does +not change and the request MUST be completed with +VIRTIO_BLK_S_ZONE_OPEN_RESOURCE or VIRTIO_BLK_S_ZONE_ACTIVE_RESOURCE value= in +the field \field{status}. + +The device handles a VIRTIO_BLK_T_ZONE_CLOSE request by attempting to chan= ge the +state of the zone with the \field{sector} address to VIRTIO_BLK_ZS_CLOSED.= If +the transition to this state can not be performed, the request MUST be com= pleted +with VIRTIO_BLK_S_ZONE_INVALID_CMD value in the field \field{status}. + +The device handles a VIRTIO_BLK_T_ZONE_FINISH request by attempting to cha= nge +the state of the zone with the \field{sector} address to VIRTIO_BLK_ZS_FUL= L. If +the transition to this state can not be performed, the zone state does not +change and the request MUST be completed with VIRTIO_BLK_S_ZONE_INVALID_CM= D +value in the field \field{status}. + +The device handles a VIRTIO_BLK_T_ZONE_RESET request by attempting to chan= ge the +state of the zone with the \field{sector} address to VIRTIO_BLK_ZS_EMPTY s= tate. +If the transition to this state can not be performed, the zone state does = not +change and the request MUST be completed with VIRTIO_BLK_S_ZONE_INVALID_CM= D +value in the field \field{status}. + +The device handles a VIRTIO_BLK_T_ZONE_RESET_ALL request by transitioning = all +sequential device zones in VIRTIO_BLK_ZS_IOPEN, VIRTIO_BLK_ZS_EOPEN, +VIRTIO_BLK_ZS_CLOSED and VIRTIO_BLK_ZS_FULL state to VIRTIO_BLK_ZS_EMPTY s= tate. + +Upon receiving a VIRTIO_BLK_T_ZONE_APPEND request or a VIRTIO_BLK_T_OUT +request issued to a SWR zone in VIRTIO_BLK_ZS_EMPTY or VIRTIO_BLK_ZS_CLOSE= D +state, the device attempts to perform the transition of the zone to +VIRTIO_BLK_ZS_IOPEN state before writing data. This transition may fail du= e to +insufficient open and/or active zone resources available on the device. In= this +case, the request MUST be completed with VIRTIO_BLK_S_ZONE_OPEN_RESOURCE o= r +VIRTIO_BLK_S_ZONE_ACTIVE_RESOURCE value in the \field{status}. + +If the \field{sector} field in the VIRTIO_BLK_T_ZONE_APPEND request does n= ot +specify the lowest sector for a zone, then the request SHALL be completed = with +VIRTIO_BLK_S_ZONE_INVALID_CMD value in \field{status}. + +A VIRTIO_BLK_T_ZONE_APPEND request or a VIRTIO_BLK_T_OUT request that has = the +data range that exceeds the remaining writable capacity for the zone, then= the +request SHALL be completed with VIRTIO_BLK_S_ZONE_INVALID_CMD value in +\field{status}. + +If a request of the type VIRTIO_BLK_T_ZONE_APPEND is completed with +VIRTIO_BLK_S_OK status, the field \field{append_sector} in +\field{virtio_blk_req_za} MUST be set by the device to contain the first s= ector +of the data written to the zone. + +If a request of the type VIRTIO_BLK_T_ZONE_APPEND is completed with a stat= us +other than VIRTIO_BLK_S_OK, the value of \field{append_sector} field in +\field{virtio_blk_req_za} is undefined. + +A VIRTIO_BLK_T_ZONE_APPEND request that has the data size that exceeds +\field{max_append_sectors} configuration space value, then, +\begin{itemize} +\item if \field{max_append_sectors} configuration space value is reported = as + zero by the device, the request SHALL be completed with VIRTIO_BLK_S_U= NSUPP + \field{status}. + +\item if \field{max_append_sectors} configuration space value is reported = as + a non-zero value by the device, the request SHALL be completed with + VIRTIO_BLK_S_ZONE_INVALID_CMD \field{status}. +\end{itemize} + +If a VIRTIO_BLK_T_ZONE_APPEND request, a VIRTIO_BLK_T_IN request or a +VIRTIO_BLK_T_OUT request issued to a SWR zone has the range that has secto= rs in +more than one zone, then the request SHALL be completed with +VIRTIO_BLK_S_ZONE_INVALID_CMD value in the field \field{status}. + +A VIRTIO_BLK_T_OUT request that has the \field{sector} value that is not a= ligned +with the write pointer for the zone, then the request SHALL be completed w= ith +VIRTIO_BLK_S_ZONE_UNALIGNED_WP value in the field \field{status}. + +In order to avoid resource-related errors while opening zones implicitly, = the +device MAY automatically transition zones in VIRTIO_BLK_ZS_IOPEN state to +VIRTIO_BLK_ZS_CLOSED state. + +All VIRTIO_BLK_T_OUT requests or VIRTIO_BLK_T_ZONE_APPEND requests issued +to a zone in the VIRTIO_BLK_ZS_RDONLY state SHALL be completed with +VIRTIO_BLK_S_ZONE_INVALID_CMD \field{status}. + +All requests issued to a zone in the VIRTIO_BLK_ZS_OFFLINE state SHALL be +completed with VIRTIO_BLK_S_ZONE_INVALID_CMD value in the field \field{sta= tus}. + +The device MUST consider the sectors that are read between the write point= er +position of a zone and the end of the last sector of the zone as unwritten= data. +The sectors between the write pointer position and the end of the last sec= tor +within the zone capacity during VIRTIO_BLK_T_ZONE_FINISH request processin= g are +also considered unwritten data. + +When unwritten data is present in the sector range of a read request, the = device +MUST process this data in one of the following ways - + +\begin{enumerate} +\item Fill the unwritten data with a device-specific byte pattern. The +configuration, control and reporting of this byte pattern is beyond the sc= ope +of this standard. This is the preferred approach. + +\item Fail the request. Depending on the driver implementation, this may p= revent +the device from becoming operational. +\end{enumerate} + +If both the VIRTIO_BLK_F_ZONED and VIRTIO_BLK_F_SECURE_ERASE features are +negotiated, then + +\begin{enumerate} +\item the field \field{secure_erase_sector_alignment} in the configuration= space +of the device MUST be a multiple of \field{zone_sectors} value reported in= the +device configuration space. + +\item the data size in VIRTIO_BLK_T_SECURE_ERASE requests MUST be a multip= le of +\field{zone_sectors} value in the device configuration space. +\end{enumerate} + +The device MUST handle a VIRTIO_BLK_T_SECURE_ERASE request in the same way= it +handles VIRTIO_BLK_T_ZONE_RESET request for the zone range specified in th= e +VIRTIO_BLK_T_SECURE_ERASE request. + +\subsubsection{Legacy Interface: Device Operation}\label{sec:Device Types = / Block Device / Device Operation / Legacy Interface: Device Operation} +When using the legacy interface, transitional devices and drivers +MUST format the fields in struct virtio_blk_req +according to the native endian of the guest rather than +(necessarily when not using the legacy interface) little-endian. + +When using the legacy interface, transitional drivers +SHOULD ignore the used length values. +\begin{note} +Historically, some devices put the total descriptor length, +or the total length of device-writable buffers there, +even when only the status byte was actually written. +\end{note} + +The \field{reserved} field was previously called \field{ioprio}. \field{i= oprio} +is a hint about the relative priorities of requests to the device: +higher numbers indicate more important requests. + +\begin{lstlisting} +#define VIRTIO_BLK_T_FLUSH_OUT 5 +\end{lstlisting} + +The command VIRTIO_BLK_T_FLUSH_OUT was a synonym for VIRTIO_BLK_T_FLUSH; +a driver MUST treat it as a VIRTIO_BLK_T_FLUSH command. + +\begin{lstlisting} +#define VIRTIO_BLK_T_BARRIER 0x80000000 +\end{lstlisting} + +If the device has VIRTIO_BLK_F_BARRIER +feature the high bit (VIRTIO_BLK_T_BARRIER) indicates that this +request acts as a barrier and that all preceding requests SHOULD be +complete before this one, and all following requests SHOULD NOT be +started until this is complete. + +\begin{note} A barrier does not flush +caches in the underlying backend device in host, and thus does not +serve as data consistency guarantee. Only a VIRTIO_BLK_T_FLUSH request +does that. +\end{note} + +Some older legacy devices did not commit completed writes to persistent +device backend storage when VIRTIO_BLK_F_FLUSH was offered but not +negotiated. In order to work around this, the driver MAY set the +\field{writeback} to 0 (if available) or it MAY send an explicit flush +request after every completed write. + +If the device has VIRTIO_BLK_F_SCSI feature, it can also support +scsi packet command requests, each of these requests is of form: + +\begin{lstlisting} +/* All fields are in guest's native endian. */ +struct virtio_scsi_pc_req { + u32 type; + u32 ioprio; + u64 sector; + u8 cmd[]; + u8 data[][512]; +#define SCSI_SENSE_BUFFERSIZE 96 + u8 sense[SCSI_SENSE_BUFFERSIZE]; + u32 errors; + u32 data_len; + u32 sense_len; + u32 residual; + u8 status; +}; +\end{lstlisting} + +A request type can also be a scsi packet command (VIRTIO_BLK_T_SCSI_CMD or +VIRTIO_BLK_T_SCSI_CMD_OUT). The two types are equivalent, the device +does not distinguish between them: + +\begin{lstlisting} +#define VIRTIO_BLK_T_SCSI_CMD 2 +#define VIRTIO_BLK_T_SCSI_CMD_OUT 3 +\end{lstlisting} + +The \field{cmd} field is only present for scsi packet command requests, +and indicates the command to perform. This field MUST reside in a +single, separate device-readable buffer; command length can be derived +from the length of this buffer. + +Note that these first three (four for scsi packet commands) +fields are always device-readable: \field{data} is either device-readable +or device-writable, depending on the request. The size of the read or +write can be derived from the total size of the request buffers. + +\field{sense} is only present for scsi packet command requests, +and indicates the buffer for scsi sense data. + +\field{data_len} is only present for scsi packet command +requests, this field is deprecated, and SHOULD be ignored by the +driver. Historically, devices copied data length there. + +\field{sense_len} is only present for scsi packet command +requests and indicates the number of bytes actually written to +the \field{sense} buffer. + +\field{residual} field is only present for scsi packet command +requests and indicates the residual size, calculated as data +length - number of bytes actually transferred. + +\subsubsection{Legacy Interface: Framing Requirements}\label{sec:Device Ty= pes / Block Device / Legacy Interface: Framing Requirements} + +When using legacy interfaces, transitional drivers which have not +negotiated VIRTIO_F_ANY_LAYOUT: + +\begin{itemize} +\item MUST use a single 8-byte descriptor containing \field{type}, + \field{reserved} and \field{sector}, followed by descriptors + for \field{data}, then finally a separate 1-byte descriptor + for \field{status}. + +\item For SCSI commands there are additional constraints. + \field{sense} MUST reside in a + single separate device-writable descriptor of size 96 bytes, + and \field{errors}, \field{data_len}, \field{sense_len} and + \field{residual} MUST reside a single separate + device-writable descriptor. +\end{itemize} + +See \ref{sec:Basic Facilities of a Virtio Device / Virtqueues / Message Fr= aming}. diff --git a/device-types/blk/device-conformance.tex b/device-types/blk/dev= ice-conformance.tex new file mode 100644 index 0000000..b4fbc8b --- /dev/null +++ b/device-types/blk/device-conformance.tex @@ -0,0 +1,8 @@ +\conformance{\subsection}{Block Device Conformance}\label{sec:Conformance = / Device Conformance / Block Device Conformance} + +A block device MUST conform to the following normative statements: + +\begin{itemize} +\item \ref{devicenormative:Device Types / Block Device / Device Initializa= tion} +\item \ref{devicenormative:Device Types / Block Device / Device Operation} +\end{itemize} diff --git a/device-types/blk/driver-conformance.tex b/device-types/blk/dri= ver-conformance.tex new file mode 100644 index 0000000..0f69866 --- /dev/null +++ b/device-types/blk/driver-conformance.tex @@ -0,0 +1,8 @@ +\conformance{\subsection}{Block Driver Conformance}\label{sec:Conformance = / Driver Conformance / Block Driver Conformance} + +A block driver MUST conform to the following normative statements: + +\begin{itemize} +\item \ref{drivernormative:Device Types / Block Device / Device Initializa= tion} +\item \ref{drivernormative:Device Types / Block Device / Device Operation} +\end{itemize} --=20 2.26.2