All of lore.kernel.org
 help / color / mirror / Atom feed
* [virtio-dev] [PATCH v3 0/2] virtio-fs: add virtio file system device
@ 2019-02-20 12:46 Stefan Hajnoczi
  2019-02-20 12:46 ` [virtio-dev] [PATCH v3 1/2] content: " Stefan Hajnoczi
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Stefan Hajnoczi @ 2019-02-20 12:46 UTC (permalink / raw)
  To: virtio-dev
  Cc: Miklos Szeredi, Sage Weil, Vivek Goyal, Steven Whitehouse,
	Dr. David Alan Gilbert, Paolo Bonzini, Stefan Hajnoczi

v3:
 * Remove notifications virtqueue, it's unimplemented and can be added when
   needed [Miklos]
 * Add Security Considerations and Live Migration Considerations sections
   [Michael]
v2:
 * Clean up core virtio file system device spec
 * Add DAX window

These patches add the virtio file system device, which is based on Linux FUSE
but includes the DAX window extension.  Similar to virtio-scsi, which
transports SCSI commands, virtio-fs transports FUSE requests and the protocol
documentation is not duplicated here.

The DAX window allows file contents to be accessed directly from shared memory.
This eliminates copying of data, reduces the number of vmexits, and reduces the
guest's memory footprint.  It also allows coherent mmap MAP_SHARED semantics
between guests on the same host.

Stefan Hajnoczi (2):
  content: add virtio file system device
  virtio-fs: add DAX window

 content.tex      |   3 +
 introduction.tex |   3 +
 virtio-fs.tex    | 221 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 227 insertions(+)
 create mode 100644 virtio-fs.tex

-- 
2.20.1


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [virtio-dev] [PATCH v3 1/2] content: add virtio file system device
  2019-02-20 12:46 [virtio-dev] [PATCH v3 0/2] virtio-fs: add virtio file system device Stefan Hajnoczi
@ 2019-02-20 12:46 ` Stefan Hajnoczi
  2019-02-22 14:31   ` Dr. David Alan Gilbert
                     ` (2 more replies)
  2019-02-20 12:46 ` [virtio-dev] [PATCH v3 2/2] virtio-fs: add DAX window Stefan Hajnoczi
  2019-06-19  1:30 ` [virtio-dev] [PATCH v3 0/2] virtio-fs: add virtio file system device Michael S. Tsirkin
  2 siblings, 3 replies; 21+ messages in thread
From: Stefan Hajnoczi @ 2019-02-20 12:46 UTC (permalink / raw)
  To: virtio-dev
  Cc: Miklos Szeredi, Sage Weil, Vivek Goyal, Steven Whitehouse,
	Dr. David Alan Gilbert, Paolo Bonzini, Stefan Hajnoczi

The virtio file system device transports Linux FUSE requests between a
FUSE daemon running on the host and the FUSE driver inside the guest.

The actual FUSE request definitions are not duplicated in the virtio
specification, similar to how virtio-scsi does not document SCSI
command details.  FUSE request definitions are available here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/fuse.h

This patch documents the core virtio file system device, which is
functional but lacks the DAX feature introduced in the next patch.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 content.tex      |   3 +
 introduction.tex |   3 +
 virtio-fs.tex    | 196 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 202 insertions(+)
 create mode 100644 virtio-fs.tex

diff --git a/content.tex b/content.tex
index 836ee52..ac41fdb 100644
--- a/content.tex
+++ b/content.tex
@@ -2634,6 +2634,8 @@ Device ID  &  Virtio Device    \\
 \hline
 24         &   Memory device \\
 \hline
+26         &   file system device \\
+\hline
 \end{tabular}
 
 Some of the devices above are unspecified by this document,
@@ -5559,6 +5561,7 @@ descriptor for the \field{sense_len}, \field{residual},
 \input{virtio-input.tex}
 \input{virtio-crypto.tex}
 \input{virtio-vsock.tex}
+\input{virtio-fs.tex}
 
 \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
 
diff --git a/introduction.tex b/introduction.tex
index a4ac01d..6eeda5d 100644
--- a/introduction.tex
+++ b/introduction.tex
@@ -60,6 +60,9 @@ Levels'', BCP 14, RFC 2119, March 1997. \newline\url{http://www.ietf.org/rfc/rfc
 	\phantomsection\label{intro:SCSI MMC}\textbf{[SCSI MMC]} &
         SCSI Multimedia Commands,
         \newline\url{http://www.t10.org/cgi-bin/ac.pl?t=f&f=mmc6r00.pdf}\\
+	\phantomsection\label{intro:FUSE}\textbf{[FUSE]} &
+	Linux FUSE interface,
+	\newline\url{https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/fuse.h}\\
 
 \end{longtable}
 
diff --git a/virtio-fs.tex b/virtio-fs.tex
new file mode 100644
index 0000000..5df5b9c
--- /dev/null
+++ b/virtio-fs.tex
@@ -0,0 +1,196 @@
+\section{File System Device}\label{sec:Device Types / File System Device}
+
+The virtio file system device provides file system access.  The device may
+directly manage a file system or act as a gateway to a remote file system.  The
+details of how files are accessed are hidden by the device interface, allowing
+for a range of use cases.
+
+Unlike block-level storage devices such as virtio block and SCSI, the virtio
+file system device provides file-level access to data.  The device interface is
+based on the Linux Filesystem in Userspace (FUSE) protocol.  This consists of
+requests for file system traversal and access the files and directories within
+it.  The protocol details are defined by \hyperref[intro:FUSE]{FUSE}.
+
+The device acts as the FUSE file system daemon and the driver acts as the FUSE
+client mounting the file system.  The virtio file system device provides the
+mechanism for transporting FUSE requests, much like /dev/fuse in a traditional
+FUSE application.
+
+This section relies on definitions from \hyperref[intro:FUSE]{FUSE}.
+
+\subsection{Device ID}\label{sec:Device Types / File System Device / Device ID}
+  26
+
+\subsection{Virtqueues}\label{sec:Device Types / File System Device / Virtqueues}
+
+\begin{description}
+\item[0] hiprio
+\item[1\ldots n] request queues
+\end{description}
+
+\subsection{Feature bits}\label{sec:Device Types / File System Device / Feature bits}
+
+There are currently no feature bits defined.
+
+\subsection{Device configuration layout}\label{sec:Device Types / File System Device / Device configuration layout}
+
+All fields of this configuration are always available.
+
+\begin{lstlisting}
+struct virtio_fs_config {
+        char tag[36];
+        le32 num_queues;
+};
+\end{lstlisting}
+
+\begin{description}
+\item[\field{tag}] is the name associated with this file system.  The tag is
+    encoded in UTF-8 and padded with NUL bytes if shorter than the
+    available space.  This field is not NUL-terminated if the encoded bytes
+    take up the entire field.
+\item[\field{num_queues}] is the total number of request virtqueues exposed by
+    the device. The driver MAY use only one request queue,
+    or it can use more to achieve better performance.
+\end{description}
+
+\drivernormative{\subsubsection}{Device configuration layout}{Device Types / File System Device / Device configuration layout}
+
+The driver MUST NOT write to device configuration fields.
+
+\devicenormative{\subsubsection}{Device configuration layout}{Device Types / File System Device / Device configuration layout}
+
+The device MUST set \field{num_queues} to 1 or greater.
+
+\devicenormative{\subsection}{Device Initialization}{Device Types / File System Device / Device Initialization}
+
+On initialization the driver MUST first discover the
+device's virtqueues.
+
+\subsection{Device Operation}\label{sec:Device Types / File System Device / Device Operation}
+
+Device operation consists of operating the virtqueues to facilitate file system
+access.
+
+The FUSE request types are as follows:
+\begin{itemize}
+\item Normal requests are submitted by the driver and completed by the device.
+\item Interrupt requests are submitted by the driver to abort requests that the
+      device may have yet to complete.
+\end{itemize}
+
+Note that FUSE notification requests are not supported.
+
+\subsubsection{Device Operation: Request Queues}\label{sec:Device Types / File System Device / Device Operation / Device Operation: Request Queues}
+
+The driver enqueues normal requests on an arbitrary request queue and they are
+completed by the device on that same queue. It is the responsibility of the
+driver to ensure strict request ordering for commands placed on different
+queues, because they are consumed with no order constraints.
+
+Requests have the following format:
+
+\begin{lstlisting}
+struct virtio_fs_req {
+        // Device-readable part
+        struct fuse_in_header in;
+        u8 datain[];
+
+        // Device-writable part
+        struct fuse_out_header out;
+        u8 dataout[];
+};
+\end{lstlisting}
+
+Note that the words "in" and "out" follow the FUSE meaning and do not indicate
+the direction of data transfer under VIRTIO.  "In" means input to a request and
+"out" means output from processing a request.
+
+\field{in} is the common header for all types of FUSE requests.
+
+\field{datain} consists of request-specific data, if any.  This is identical to
+the data read from the /dev/fuse device by a FUSE daemon.
+
+\field{out} is the completion header common to all types of FUSE requests.
+
+\field{dataout} consists of request-specific data, if any.  This is identical
+to the data written to the /dev/fuse device by a FUSE daemon.
+
+For example, the full layout of a FUSE_READ request is as follows:
+
+\begin{lstlisting}
+struct virtio_fs_read_req {
+        // Device-readable part
+        struct fuse_in_header in;
+        union {
+                struct fuse_read_in readin;
+                u8 datain[sizeof(struct fuse_read_in)];
+        };
+
+        // Device-writable part
+        struct fuse_out_header out;
+        u8 dataout[out.len - sizeof(struct fuse_out_header)];
+};
+\end{lstlisting}
+
+The FUSE protocol documented in \hyperref[intro:FUSE]{FUSE} specifies the set
+of request types and their contents.  All request fields are little-endian.
+
+\subsubsection{Device Operation: High Priority Queue}\label{sec:Device Types / File System Device / Device Operation / Device Operation: High Priority Queue}
+
+The hiprio queue follows the same request format as the requests queue.  This
+queue only contains FUSE_INTERRUPT, FUSE_FORGET, and FUSE_BATCH_FORGET
+requests.
+
+Interrupt and forget requests have a higher priority than normal requests.  In
+order to ensure that they can always be delivered, even if all request queues
+are full, a separate queue is used.
+
+\devicenormative{\paragraph}{Device Operation: High Priority Queue}{Device Types / File System Device / Device Operation / Device Operation: High Priority Queue}
+
+The device SHOULD attempt to process the hiprio queue promptly.
+
+The device MAY process request queues concurrently with the hiprio queue.
+
+\drivernormative{\paragraph}{Device Operation: High Priority Queue}{Device Types / File System Device / Device Operation / Device Operation: High Priority Queue}
+
+The driver MUST submit FUSE_INTERRUPT, FUSE_FORGET, and FUSE_BATCH_FORGET requests solely on the hiprio queue.
+
+The driver MUST anticipate that request queues are processed concurrently with the hiprio queue.
+
+\subsubsection{Security Considerations}\label{sec:Device Types / File System Device / Security Considerations}
+
+The device provides access to a file system that may contain files owned by
+different POSIX user ids and group ids.  The device has no secure way of
+differentiating between users originating requests via the driver.  Therefore
+the device accepts the POSIX user ids and group ids provided by the driver and
+security is enforced by the driver rather than the device.  It is nevertheless
+possible for devices to implement POSIX user id and group id mapping or
+whitelisting to control the ownership and access available to the driver.
+
+The file system may contain special files including device nodes and setuid
+executable files.  These properties are defined by the file type and mode,
+which may be set by the driver when creating new files or changed at a later
+time.  These special files present a security risk when the file system is
+shared with another system, such as the host or another guest.  This issue can
+be solved on some operating systems using mount options that ignore special
+files.  It is also possible for devices to implement restrictions on special
+files by refusing their creation.
+
+When the device provides shared access to a file system the possibility of
+symlink race conditions, exhausting file system capacity, and overwriting or
+deleting files used by others must be taken into account.  These issues have a
+long history in multi-user operating systems and should not be overlooked with
+virtio devices.
+
+\subsubsection{Live migration considerations}\label{sec:Device Types / File System Device / Live Migration Considerations}
+
+When a guest is migrated to a new host it is necessary to consider the FUSE
+session and its state.  The continuity of FUSE inode numbers (also known as
+nodeids) and fh values is necessary so the driver can continue operation
+without disruption.  Therefore it is trivial to migrate before a FUSE session
+has been started with FUSE_INIT.
+
+It is possible to maintain the FUSE session across live migration either by
+transferring the state or by redirecting requests from the new host to the old
+host where the state resides.  The details of how to achieve this are
+implementation-dependent and are not visible at the device interface level.
-- 
2.20.1


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [virtio-dev] [PATCH v3 2/2] virtio-fs: add DAX window
  2019-02-20 12:46 [virtio-dev] [PATCH v3 0/2] virtio-fs: add virtio file system device Stefan Hajnoczi
  2019-02-20 12:46 ` [virtio-dev] [PATCH v3 1/2] content: " Stefan Hajnoczi
@ 2019-02-20 12:46 ` Stefan Hajnoczi
  2019-06-19  1:41   ` Michael S. Tsirkin
  2019-06-19  1:30 ` [virtio-dev] [PATCH v3 0/2] virtio-fs: add virtio file system device Michael S. Tsirkin
  2 siblings, 1 reply; 21+ messages in thread
From: Stefan Hajnoczi @ 2019-02-20 12:46 UTC (permalink / raw)
  To: virtio-dev
  Cc: Miklos Szeredi, Sage Weil, Vivek Goyal, Steven Whitehouse,
	Dr. David Alan Gilbert, Paolo Bonzini, Stefan Hajnoczi

Describe how shared memory region ID 0 is the DAX window and how
FUSE_SETUPMAPPING maps file ranges into the window.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
Note that this depends on the shared memory resource specification
extension that David Gilbert is working on.
https://lists.oasis-open.org/archives/virtio-comment/201901/msg00000.html

The FUSE_SETUPMAPPING message is part of the virtio-fs Linux patches:
https://gitlab.com/virtio-fs/linux/blob/virtio-fs/include/uapi/linux/fuse.h
---
 virtio-fs.tex | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/virtio-fs.tex b/virtio-fs.tex
index 5df5b9c..abb1e48 100644
--- a/virtio-fs.tex
+++ b/virtio-fs.tex
@@ -157,6 +157,31 @@ The driver MUST submit FUSE_INTERRUPT, FUSE_FORGET, and FUSE_BATCH_FORGET reques
 
 The driver MUST anticipate that request queues are processed concurrently with the hiprio queue.
 
+\subsubsection{Device Operation: DAX Window}\label{sec:Device Types / File System Device / Device Operation / Device Operation: DAX Window}
+
+FUSE\_READ and FUSE\_WRITE requests transfer file contents between the
+driver-provided buffer and the device.  In cases where data transfer is
+undesirable, the device can map file contents into the DAX window shared memory
+region.  The driver then accesses file contents directly in device-owned memory
+without a data transfer.
+
+Shared memory region ID 0 is called the DAX window.  The driver maps a file
+range into the DAX window using the FUSE\_SETUPMAPPING request.  The mapping is
+removed using the FUSE\_REMOVEMAPPING request.
+
+After FUSE\_SETUPMAPPING has completed successfully the file range is accessible
+from the DAX window at the offset provided by the driver in the request.
+
+\devicenormative{\paragraph}{Device Operation: DAX Window}{Device Types / File System Device / Device Operation / Device Operation: DAX Window}
+
+The device MUST allow mappings that completely or partially overlap existing mappings within the DAX window.
+
+The device MUST reject mappings that would go beyond the end of the DAX window.
+
+\drivernormative{\paragraph}{Device Operation: DAX Window}{Device Types / File System Device / Device Operation / Device Operation: DAX Window}
+
+The driver SHOULD be prepared to find shared memory region ID 0 absent and fall back to FUSE\_READ and FUSE\_WRITE requests.
+
 \subsubsection{Security Considerations}\label{sec:Device Types / File System Device / Security Considerations}
 
 The device provides access to a file system that may contain files owned by
-- 
2.20.1


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [virtio-dev] [PATCH v3 1/2] content: add virtio file system device
  2019-02-20 12:46 ` [virtio-dev] [PATCH v3 1/2] content: " Stefan Hajnoczi
@ 2019-02-22 14:31   ` Dr. David Alan Gilbert
  2019-02-25 15:54     ` Stefan Hajnoczi
  2019-02-25 16:11   ` [virtio-dev] " Dr. David Alan Gilbert
  2019-06-19  1:29   ` [virtio-dev] " Michael S. Tsirkin
  2 siblings, 1 reply; 21+ messages in thread
From: Dr. David Alan Gilbert @ 2019-02-22 14:31 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: virtio-dev, Miklos Szeredi, Sage Weil, Vivek Goyal,
	Steven Whitehouse, Paolo Bonzini

* Stefan Hajnoczi (stefanha@redhat.com) wrote:

> +It is possible to maintain the FUSE session across live migration either by
> +transferring the state or by redirecting requests from the new host to the old
> +host where the state resides.  The details of how to achieve this are
> +implementation-dependent and are not visible at the device interface level.

I wonder how 'transferring the state' will work;  it is tricky to
specify because that state could be filesystem dependent so wouldn't
really fit at this level of spec.

Dave

> -- 
> 2.20.1
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [virtio-dev] [PATCH v3 1/2] content: add virtio file system device
  2019-02-22 14:31   ` Dr. David Alan Gilbert
@ 2019-02-25 15:54     ` Stefan Hajnoczi
  0 siblings, 0 replies; 21+ messages in thread
From: Stefan Hajnoczi @ 2019-02-25 15:54 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: virtio-dev, Miklos Szeredi, Sage Weil, Vivek Goyal,
	Steven Whitehouse, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 742 bytes --]

On Fri, Feb 22, 2019 at 02:31:02PM +0000, Dr. David Alan Gilbert wrote:
> * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> 
> > +It is possible to maintain the FUSE session across live migration either by
> > +transferring the state or by redirecting requests from the new host to the old
> > +host where the state resides.  The details of how to achieve this are
> > +implementation-dependent and are not visible at the device interface level.
> 
> I wonder how 'transferring the state' will work;  it is tricky to
> specify because that state could be filesystem dependent so wouldn't
> really fit at this level of spec.

Yes, the state is file system daemon-specific and beyond the scope of
the VIRTIO device spec.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [virtio-dev] Re: [PATCH v3 1/2] content: add virtio file system device
  2019-02-20 12:46 ` [virtio-dev] [PATCH v3 1/2] content: " Stefan Hajnoczi
  2019-02-22 14:31   ` Dr. David Alan Gilbert
@ 2019-02-25 16:11   ` Dr. David Alan Gilbert
  2019-02-27 16:19     ` Stefan Hajnoczi
  2019-06-19  1:29   ` [virtio-dev] " Michael S. Tsirkin
  2 siblings, 1 reply; 21+ messages in thread
From: Dr. David Alan Gilbert @ 2019-02-25 16:11 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: virtio-dev, Miklos Szeredi, Sage Weil, Vivek Goyal,
	Steven Whitehouse, Paolo Bonzini

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> The virtio file system device transports Linux FUSE requests between a
> FUSE daemon running on the host and the FUSE driver inside the guest.
> 
> The actual FUSE request definitions are not duplicated in the virtio
> specification, similar to how virtio-scsi does not document SCSI
> command details.  FUSE request definitions are available here:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/fuse.h
> 
> This patch documents the core virtio file system device, which is
> functional but lacks the DAX feature introduced in the next patch.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  content.tex      |   3 +
>  introduction.tex |   3 +
>  virtio-fs.tex    | 196 +++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 202 insertions(+)
>  create mode 100644 virtio-fs.tex
> 
> diff --git a/content.tex b/content.tex
> index 836ee52..ac41fdb 100644

> +The FUSE protocol documented in \hyperref[intro:FUSE]{FUSE} specifies the set
> +of request types and their contents.  All request fields are little-endian.

FUSE doesn't seem to define it's endianness - and I'm not sure it's
worth it for us to define it and do the work of adding a byteswapping
shim.  It would be reasonably invasive in the kernel code to do that
if we're the only user, and really I think the chances of having a
cross-endian user are pretty slim; so adding invasive code for a
non-existent user seems a bad thing.
IMHO we should just stick with the existing FUSE definition; I believe
it's possible to detect a byteswapped interface by validating the
'opcode' field of the fuse_in_header; existing daemons should just error
on this; in the unlikely event that someone discovers they really want
a cross endian implementation then they can add that byteswapping in
their daemon.

We should still keep the virtio level little endian of course.

Dave

> +\subsubsection{Device Operation: High Priority Queue}\label{sec:Device Types / File System Device / Device Operation / Device Operation: High Priority Queue}
> +
> +The hiprio queue follows the same request format as the requests queue.  This
> +queue only contains FUSE_INTERRUPT, FUSE_FORGET, and FUSE_BATCH_FORGET
> +requests.
> +
> +Interrupt and forget requests have a higher priority than normal requests.  In
> +order to ensure that they can always be delivered, even if all request queues
> +are full, a separate queue is used.
> +
> +\devicenormative{\paragraph}{Device Operation: High Priority Queue}{Device Types / File System Device / Device Operation / Device Operation: High Priority Queue}
> +
> +The device SHOULD attempt to process the hiprio queue promptly.
> +
> +The device MAY process request queues concurrently with the hiprio queue.
> +
> +\drivernormative{\paragraph}{Device Operation: High Priority Queue}{Device Types / File System Device / Device Operation / Device Operation: High Priority Queue}
> +
> +The driver MUST submit FUSE_INTERRUPT, FUSE_FORGET, and FUSE_BATCH_FORGET requests solely on the hiprio queue.
> +
> +The driver MUST anticipate that request queues are processed concurrently with the hiprio queue.
> +
> +\subsubsection{Security Considerations}\label{sec:Device Types / File System Device / Security Considerations}
> +
> +The device provides access to a file system that may contain files owned by
> +different POSIX user ids and group ids.  The device has no secure way of
> +differentiating between users originating requests via the driver.  Therefore
> +the device accepts the POSIX user ids and group ids provided by the driver and
> +security is enforced by the driver rather than the device.  It is nevertheless
> +possible for devices to implement POSIX user id and group id mapping or
> +whitelisting to control the ownership and access available to the driver.
> +
> +The file system may contain special files including device nodes and setuid
> +executable files.  These properties are defined by the file type and mode,
> +which may be set by the driver when creating new files or changed at a later
> +time.  These special files present a security risk when the file system is
> +shared with another system, such as the host or another guest.  This issue can
> +be solved on some operating systems using mount options that ignore special
> +files.  It is also possible for devices to implement restrictions on special
> +files by refusing their creation.
> +
> +When the device provides shared access to a file system the possibility of
> +symlink race conditions, exhausting file system capacity, and overwriting or
> +deleting files used by others must be taken into account.  These issues have a
> +long history in multi-user operating systems and should not be overlooked with
> +virtio devices.
> +
> +\subsubsection{Live migration considerations}\label{sec:Device Types / File System Device / Live Migration Considerations}
> +
> +When a guest is migrated to a new host it is necessary to consider the FUSE
> +session and its state.  The continuity of FUSE inode numbers (also known as
> +nodeids) and fh values is necessary so the driver can continue operation
> +without disruption.  Therefore it is trivial to migrate before a FUSE session
> +has been started with FUSE_INIT.
> +
> +It is possible to maintain the FUSE session across live migration either by
> +transferring the state or by redirecting requests from the new host to the old
> +host where the state resides.  The details of how to achieve this are
> +implementation-dependent and are not visible at the device interface level.
> -- 
> 2.20.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [virtio-dev] Re: [PATCH v3 1/2] content: add virtio file system device
  2019-02-25 16:11   ` [virtio-dev] " Dr. David Alan Gilbert
@ 2019-02-27 16:19     ` Stefan Hajnoczi
  0 siblings, 0 replies; 21+ messages in thread
From: Stefan Hajnoczi @ 2019-02-27 16:19 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: virtio-dev, Miklos Szeredi, Sage Weil, Vivek Goyal,
	Steven Whitehouse, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 2310 bytes --]

On Mon, Feb 25, 2019 at 04:11:49PM +0000, Dr. David Alan Gilbert wrote:
> * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > The virtio file system device transports Linux FUSE requests between a
> > FUSE daemon running on the host and the FUSE driver inside the guest.
> > 
> > The actual FUSE request definitions are not duplicated in the virtio
> > specification, similar to how virtio-scsi does not document SCSI
> > command details.  FUSE request definitions are available here:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/fuse.h
> > 
> > This patch documents the core virtio file system device, which is
> > functional but lacks the DAX feature introduced in the next patch.
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >  content.tex      |   3 +
> >  introduction.tex |   3 +
> >  virtio-fs.tex    | 196 +++++++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 202 insertions(+)
> >  create mode 100644 virtio-fs.tex
> > 
> > diff --git a/content.tex b/content.tex
> > index 836ee52..ac41fdb 100644
> 
> > +The FUSE protocol documented in \hyperref[intro:FUSE]{FUSE} specifies the set
> > +of request types and their contents.  All request fields are little-endian.
> 
> FUSE doesn't seem to define it's endianness - and I'm not sure it's
> worth it for us to define it and do the work of adding a byteswapping
> shim.  It would be reasonably invasive in the kernel code to do that
> if we're the only user, and really I think the chances of having a
> cross-endian user are pretty slim; so adding invasive code for a
> non-existent user seems a bad thing.
> IMHO we should just stick with the existing FUSE definition; I believe
> it's possible to detect a byteswapped interface by validating the
> 'opcode' field of the fuse_in_header; existing daemons should just error
> on this; in the unlikely event that someone discovers they really want
> a cross endian implementation then they can add that byteswapping in
> their daemon.
> 
> We should still keep the virtio level little endian of course.

Sounds good to me but I will double-check the FUSE protocol to make sure
using the opcode field as a byte order mark will work.

Thanks, will fix in v4.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [virtio-dev] [PATCH v3 1/2] content: add virtio file system device
  2019-02-20 12:46 ` [virtio-dev] [PATCH v3 1/2] content: " Stefan Hajnoczi
  2019-02-22 14:31   ` Dr. David Alan Gilbert
  2019-02-25 16:11   ` [virtio-dev] " Dr. David Alan Gilbert
@ 2019-06-19  1:29   ` Michael S. Tsirkin
  2019-07-23 15:58     ` Stefan Hajnoczi
  2 siblings, 1 reply; 21+ messages in thread
From: Michael S. Tsirkin @ 2019-06-19  1:29 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: virtio-dev, Miklos Szeredi, Sage Weil, Vivek Goyal,
	Steven Whitehouse, Dr. David Alan Gilbert, Paolo Bonzini

On Wed, Feb 20, 2019 at 12:46:12PM +0000, Stefan Hajnoczi wrote:
> The virtio file system device transports Linux FUSE requests between a
> FUSE daemon running on the host and the FUSE driver inside the guest.
> 
> The actual FUSE request definitions are not duplicated in the virtio
> specification, similar to how virtio-scsi does not document SCSI
> command details.  FUSE request definitions are available here:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/fuse.h
> 
> This patch documents the core virtio file system device, which is
> functional but lacks the DAX feature introduced in the next patch.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  content.tex      |   3 +
>  introduction.tex |   3 +
>  virtio-fs.tex    | 196 +++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 202 insertions(+)
>  create mode 100644 virtio-fs.tex
> 
> diff --git a/content.tex b/content.tex
> index 836ee52..ac41fdb 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -2634,6 +2634,8 @@ Device ID  &  Virtio Device    \\
>  \hline
>  24         &   Memory device \\
>  \hline
> +26         &   file system device \\
> +\hline
>  \end{tabular}
>  
>  Some of the devices above are unspecified by this document,
> @@ -5559,6 +5561,7 @@ descriptor for the \field{sense_len}, \field{residual},
>  \input{virtio-input.tex}
>  \input{virtio-crypto.tex}
>  \input{virtio-vsock.tex}
> +\input{virtio-fs.tex}
>  
>  \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>  
> diff --git a/introduction.tex b/introduction.tex
> index a4ac01d..6eeda5d 100644
> --- a/introduction.tex
> +++ b/introduction.tex
> @@ -60,6 +60,9 @@ Levels'', BCP 14, RFC 2119, March 1997. \newline\url{http://www.ietf.org/rfc/rfc
>  	\phantomsection\label{intro:SCSI MMC}\textbf{[SCSI MMC]} &
>          SCSI Multimedia Commands,
>          \newline\url{http://www.t10.org/cgi-bin/ac.pl?t=f&f=mmc6r00.pdf}\\
> +	\phantomsection\label{intro:FUSE}\textbf{[FUSE]} &
> +	Linux FUSE interface,
> +	\newline\url{https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/fuse.h}\\
>  
>  \end{longtable}
>  
> diff --git a/virtio-fs.tex b/virtio-fs.tex
> new file mode 100644
> index 0000000..5df5b9c
> --- /dev/null
> +++ b/virtio-fs.tex
> @@ -0,0 +1,196 @@
> +\section{File System Device}\label{sec:Device Types / File System Device}
> +
> +The virtio file system device provides file system access.  The device may
> +directly manage a file system or act as a gateway to a remote file system.  The
> +details of how files are accessed are hidden by the device interface, allowing
> +for a range of use cases.
> +
> +Unlike block-level storage devices such as virtio block and SCSI, the virtio
> +file system device provides file-level access to data.  The device interface is
> +based on the Linux Filesystem in Userspace (FUSE) protocol.  This consists of
> +requests for file system traversal and access the files and directories within
> +it.  The protocol details are defined by \hyperref[intro:FUSE]{FUSE}.
> +
> +The device acts as the FUSE file system daemon and the driver acts as the FUSE
> +client mounting the file system.  The virtio file system device provides the
> +mechanism for transporting FUSE requests, much like /dev/fuse in a traditional
> +FUSE application.
> +
> +This section relies on definitions from \hyperref[intro:FUSE]{FUSE}.
> +
> +\subsection{Device ID}\label{sec:Device Types / File System Device / Device ID}
> +  26
> +
> +\subsection{Virtqueues}\label{sec:Device Types / File System Device / Virtqueues}
> +
> +\begin{description}
> +\item[0] hiprio
> +\item[1\ldots n] request queues
> +\end{description}
> +
> +\subsection{Feature bits}\label{sec:Device Types / File System Device / Feature bits}
> +
> +There are currently no feature bits defined.
> +
> +\subsection{Device configuration layout}\label{sec:Device Types / File System Device / Device configuration layout}
> +
> +All fields of this configuration are always available.
> +
> +\begin{lstlisting}
> +struct virtio_fs_config {
> +        char tag[36];
> +        le32 num_queues;
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{tag}] is the name associated with this file system.  The tag is
> +    encoded in UTF-8 and padded with NUL bytes if shorter than the
> +    available space.  This field is not NUL-terminated if the encoded bytes
> +    take up the entire field.
> +\item[\field{num_queues}] is the total number of request virtqueues exposed by
> +    the device. The driver MAY use only one request queue,
> +    or it can use more to achieve better performance.

Pls copy instances of MAY,SHOULD,MUST into conformance sections.
Pls convert text outside of conformance sections to "can"
or "is allowed to" etc.

> +\end{description}
> +
> +\drivernormative{\subsubsection}{Device configuration layout}{Device Types / File System Device / Device configuration layout}
> +
> +The driver MUST NOT write to device configuration fields.
> +
> +\devicenormative{\subsubsection}{Device configuration layout}{Device Types / File System Device / Device configuration layout}
> +
> +The device MUST set \field{num_queues} to 1 or greater.
> +
> +\devicenormative{\subsection}{Device Initialization}{Device Types / File System Device / Device Initialization}
> +
> +On initialization the driver MUST first discover the
> +device's virtqueues.
> +
> +\subsection{Device Operation}\label{sec:Device Types / File System Device / Device Operation}
> +
> +Device operation consists of operating the virtqueues to facilitate file system
> +access.
> +
> +The FUSE request types are as follows:
> +\begin{itemize}
> +\item Normal requests are submitted by the driver and completed by the device.

made available and used, right?

> +\item Interrupt requests are submitted by the driver

again made available?

> to abort requests that the
> +      device may have yet to complete.

... did not use yet

> +\end{itemize}
> +
> +Note that FUSE notification requests are not supported.
> +
> +\subsubsection{Device Operation: Request Queues}\label{sec:Device Types / File System Device / Device Operation / Device Operation: Request Queues}
> +
> +The driver enqueues normal requests on an arbitrary request queue and they are
> +completed by the device on that same queue. It is the responsibility of the
> +driver to ensure strict request ordering for commands placed on different
> +queues, because they are consumed with no order constraints.

again available/used?

> +
> +Requests have the following format:
> +
> +\begin{lstlisting}
> +struct virtio_fs_req {
> +        // Device-readable part
> +        struct fuse_in_header in;
> +        u8 datain[];
> +
> +        // Device-writable part
> +        struct fuse_out_header out;
> +        u8 dataout[];
> +};
> +\end{lstlisting}
> +
> +Note that the words "in" and "out" follow the FUSE meaning and do not indicate
> +the direction of data transfer under VIRTIO.  "In" means input to a request and
> +"out" means output from processing a request.
> +
> +\field{in} is the common header for all types of FUSE requests.
> +
> +\field{datain} consists of request-specific data, if any.  This is identical to
> +the data read from the /dev/fuse device by a FUSE daemon.
> +
> +\field{out} is the completion header common to all types of FUSE requests.
> +
> +\field{dataout} consists of request-specific data, if any.  This is identical
> +to the data written to the /dev/fuse device by a FUSE daemon.
> +
> +For example, the full layout of a FUSE_READ request is as follows:
> +
> +\begin{lstlisting}
> +struct virtio_fs_read_req {
> +        // Device-readable part
> +        struct fuse_in_header in;
> +        union {
> +                struct fuse_read_in readin;
> +                u8 datain[sizeof(struct fuse_read_in)];
> +        };
> +
> +        // Device-writable part
> +        struct fuse_out_header out;
> +        u8 dataout[out.len - sizeof(struct fuse_out_header)];
> +};
> +\end{lstlisting}
> +
> +The FUSE protocol documented in \hyperref[intro:FUSE]{FUSE} specifies the set
> +of request types and their contents.  All request fields are little-endian.

I think this bears stressing some more.
Maybe "note that standard FUSE format does not specify endian-ness.
for virtio-fs, all fields are little-endian".

> +
> +\subsubsection{Device Operation: High Priority Queue}\label{sec:Device Types / File System Device / Device Operation / Device Operation: High Priority Queue}
> +
> +The hiprio queue follows the same request format as the requests queue.

As request queues?

>  This
> +queue only contains FUSE_INTERRUPT, FUSE_FORGET, and FUSE_BATCH_FORGET
> +requests.
> +
> +Interrupt and forget requests have a higher priority than normal requests.  In
> +order to ensure that they can always be delivered, even if all request queues
> +are full,


> a separate queue is used.

.. the separate hiprio queue is used for these requests - otherwise one wonders what is
that separate queue/

and I would change the order of this otherwise it's unclear
whether it's only used when queues are full.
E.g.

	The separate hiprio queue is used for these requests in order to
	ensure that they can be delivered even if all request queues
	are full.



> +
> +\devicenormative{\paragraph}{Device Operation: High Priority Queue}{Device Types / File System Device / Device Operation / Device Operation: High Priority Queue}
> +
> +The device SHOULD attempt to process the hiprio queue promptly.
> +
> +The device MAY process request queues concurrently with the hiprio queue.


I think one can make a stronger requirement: device must not
block processing hiprio because of a request queue - is
that right?

> +
> +\drivernormative{\paragraph}{Device Operation: High Priority Queue}{Device Types / File System Device / Device Operation / Device Operation: High Priority Queue}
> +
> +The driver MUST submit FUSE_INTERRUPT, FUSE_FORGET, and FUSE_BATCH_FORGET requests solely on the hiprio queue.
> +
> +The driver MUST anticipate that request queues are processed concurrently with the hiprio queue.
> +
> +\subsubsection{Security Considerations}\label{sec:Device Types / File System Device / Security Considerations}
> +
> +The device provides access to a file system that may contain files owned by
> +different POSIX user ids and group ids.  The device has no secure way of
> +differentiating between users originating requests via the driver.  Therefore
> +the device accepts the POSIX user ids and group ids provided by the driver and
> +security is enforced by the driver rather than the device.  It is nevertheless
> +possible for devices to implement POSIX user id and group id mapping or
> +whitelisting to control the ownership and access available to the driver.
> +
> +The file system may contain special files including device nodes and setuid
> +executable files.  These properties are defined by the file type and mode,
> +which may be set by the driver when creating new files or changed at a later
> +time.  These special files present a security risk when the file system is
> +shared with another system, such as the host or another guest.  This issue can
> +be solved on some operating systems using mount options that ignore special
> +files.  It is also possible for devices to implement restrictions on special
> +files by refusing their creation.
> +
> +When the device provides shared access to a file system the possibility of
> +symlink race conditions, exhausting file system capacity, and overwriting or
> +deleting files used by others must be taken into account.  These issues have a
> +long history in multi-user operating systems and should not be overlooked with
> +virtio devices.
> +
> +\subsubsection{Live migration considerations}\label{sec:Device Types / File System Device / Live Migration Considerations}
> +
> +When a guest is migrated to a new host it is necessary to consider the FUSE
> +session and its state.  The continuity of FUSE inode numbers (also known as
> +nodeids) and fh values is necessary so the driver can continue operation
> +without disruption.  Therefore it is trivial to migrate before a FUSE session
> +has been started with FUSE_INIT.


Last sentence is unclear. where does it follow from? did you mean
"however"?

> +
> +It is possible to maintain the FUSE session across live migration either by
> +transferring the state or by redirecting requests from the new host to the old
> +host where the state resides.  The details of how to achieve this are
> +implementation-dependent and are not visible at the device interface level.

One of the questions around transferring state is how to handle
version compatibility.
Linux does not need to worry because it never moves processes
to a different kernel without killing all userspace processes.


FUSE has version negotiation so I think it's solvable:
basically when device is instantiated it can be forced to
downgrade to support a specific protocol version,
that works on both sides.

But I think it's worth describing here at this high level.

> -- 
> 2.20.1
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [virtio-dev] [PATCH v3 0/2] virtio-fs: add virtio file system device
  2019-02-20 12:46 [virtio-dev] [PATCH v3 0/2] virtio-fs: add virtio file system device Stefan Hajnoczi
  2019-02-20 12:46 ` [virtio-dev] [PATCH v3 1/2] content: " Stefan Hajnoczi
  2019-02-20 12:46 ` [virtio-dev] [PATCH v3 2/2] virtio-fs: add DAX window Stefan Hajnoczi
@ 2019-06-19  1:30 ` Michael S. Tsirkin
  2019-06-24 12:23   ` Stefan Hajnoczi
  2 siblings, 1 reply; 21+ messages in thread
From: Michael S. Tsirkin @ 2019-06-19  1:30 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: virtio-dev, Miklos Szeredi, Sage Weil, Vivek Goyal,
	Steven Whitehouse, Dr. David Alan Gilbert, Paolo Bonzini

On Wed, Feb 20, 2019 at 12:46:11PM +0000, Stefan Hajnoczi wrote:
> v3:
>  * Remove notifications virtqueue, it's unimplemented and can be added when
>    needed [Miklos]
>  * Add Security Considerations and Live Migration Considerations sections
>    [Michael]
> v2:
>  * Clean up core virtio file system device spec
>  * Add DAX window
> 
> These patches add the virtio file system device, which is based on Linux FUSE
> but includes the DAX window extension.  Similar to virtio-scsi, which
> transports SCSI commands, virtio-fs transports FUSE requests and the protocol
> documentation is not duplicated here.

I think I prefer virtio-fuse as a name. Let's be a bit more
specific: we might want to add more filesystem devices later on.


> The DAX window allows file contents to be accessed directly from shared memory.
> This eliminates copying of data, reduces the number of vmexits, and reduces the
> guest's memory footprint.  It also allows coherent mmap MAP_SHARED semantics
> between guests on the same host.
> 
> Stefan Hajnoczi (2):
>   content: add virtio file system device
>   virtio-fs: add DAX window
> 
>  content.tex      |   3 +
>  introduction.tex |   3 +
>  virtio-fs.tex    | 221 +++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 227 insertions(+)
>  create mode 100644 virtio-fs.tex
> 
> -- 
> 2.20.1
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [virtio-dev] [PATCH v3 2/2] virtio-fs: add DAX window
  2019-02-20 12:46 ` [virtio-dev] [PATCH v3 2/2] virtio-fs: add DAX window Stefan Hajnoczi
@ 2019-06-19  1:41   ` Michael S. Tsirkin
  2019-06-24 13:58     ` Stefan Hajnoczi
  0 siblings, 1 reply; 21+ messages in thread
From: Michael S. Tsirkin @ 2019-06-19  1:41 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: virtio-dev, Miklos Szeredi, Sage Weil, Vivek Goyal,
	Steven Whitehouse, Dr. David Alan Gilbert, Paolo Bonzini

On Wed, Feb 20, 2019 at 12:46:13PM +0000, Stefan Hajnoczi wrote:
> Describe how shared memory region ID 0 is the DAX window and how
> FUSE_SETUPMAPPING maps file ranges into the window.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> Note that this depends on the shared memory resource specification
> extension that David Gilbert is working on.
> https://lists.oasis-open.org/archives/virtio-comment/201901/msg00000.html
> 
> The FUSE_SETUPMAPPING message is part of the virtio-fs Linux patches:
> https://gitlab.com/virtio-fs/linux/blob/virtio-fs/include/uapi/linux/fuse.h
> ---
>  virtio-fs.tex | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/virtio-fs.tex b/virtio-fs.tex
> index 5df5b9c..abb1e48 100644
> --- a/virtio-fs.tex
> +++ b/virtio-fs.tex
> @@ -157,6 +157,31 @@ The driver MUST submit FUSE_INTERRUPT, FUSE_FORGET, and FUSE_BATCH_FORGET reques
>  
>  The driver MUST anticipate that request queues are processed concurrently with the hiprio queue.
>  
> +\subsubsection{Device Operation: DAX Window}\label{sec:Device Types / File System Device / Device Operation / Device Operation: DAX Window}
> +
> +FUSE\_READ and FUSE\_WRITE requests transfer file contents between the
> +driver-provided buffer and the device.  In cases where data transfer is
> +undesirable, the device can map file contents into the DAX window shared memory
> +region.  The driver then accesses file contents directly in device-owned memory
> +without a data transfer.
> +
> +Shared memory region ID 0 is called the DAX window.  The driver maps a file
> +range into the DAX window using the FUSE\_SETUPMAPPING request.  The mapping is
> +removed using the FUSE\_REMOVEMAPPING request.

I don't see FUSE\_SETUPMAPPING or FUSE\_REMOVEMAPPING  under
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/fuse.h
Is it just me?

> +
> +After FUSE\_SETUPMAPPING has completed successfully the file range is accessible
> +from the DAX window at the offset provided by the driver in the request.

Dgilbert's patches describing shared memory say that
the legal ways to set up mappings are all implementation-dependent.
How does driver know which attributes to use for the
mapping?

Also, we recently had a discussion about DAX support on hosts
and safety wrt crashes. Do we need to expose this
information to guests maybe?


Finally, do we want to have a way to express that the filesystem
only allows RO mappings?


> +
> +\devicenormative{\paragraph}{Device Operation: DAX Window}{Device Types / File System Device / Device Operation / Device Operation: DAX Window}
> +
> +The device MUST allow mappings that completely or partially overlap existing mappings within the DAX window.


Any alignment requirements?

Also, with no limit on mappings, it looks like guest can use up lots of
host VMAs quickly. Shouldn't there be a limit on # of mappings?


> +
> +The device MUST reject mappings that would go beyond the end of the DAX window.
> +
> +\drivernormative{\paragraph}{Device Operation: DAX Window}{Device Types / File System Device / Device Operation / Device Operation: DAX Window}
> +
> +The driver SHOULD be prepared to find shared memory region ID 0 absent and fall back to FUSE\_READ and FUSE\_WRITE requests.
> +
>  \subsubsection{Security Considerations}\label{sec:Device Types / File System Device / Security Considerations}
>  
>  The device provides access to a file system that may contain files owned by
> -- 
> 2.20.1
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [virtio-dev] [PATCH v3 0/2] virtio-fs: add virtio file system device
  2019-06-19  1:30 ` [virtio-dev] [PATCH v3 0/2] virtio-fs: add virtio file system device Michael S. Tsirkin
@ 2019-06-24 12:23   ` Stefan Hajnoczi
  2019-06-24 13:57     ` Michael S. Tsirkin
  0 siblings, 1 reply; 21+ messages in thread
From: Stefan Hajnoczi @ 2019-06-24 12:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev, Miklos Szeredi, Sage Weil, Vivek Goyal,
	Steven Whitehouse, Dr. David Alan Gilbert, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 1596 bytes --]

On Tue, Jun 18, 2019 at 09:30:54PM -0400, Michael S. Tsirkin wrote:
> On Wed, Feb 20, 2019 at 12:46:11PM +0000, Stefan Hajnoczi wrote:
> > v3:
> >  * Remove notifications virtqueue, it's unimplemented and can be added when
> >    needed [Miklos]
> >  * Add Security Considerations and Live Migration Considerations sections
> >    [Michael]
> > v2:
> >  * Clean up core virtio file system device spec
> >  * Add DAX window
> > 
> > These patches add the virtio file system device, which is based on Linux FUSE
> > but includes the DAX window extension.  Similar to virtio-scsi, which
> > transports SCSI commands, virtio-fs transports FUSE requests and the protocol
> > documentation is not duplicated here.
> 
> I think I prefer virtio-fuse as a name. Let's be a bit more
> specific: we might want to add more filesystem devices later on.

virtio-fs is not FUSE.  Existing FUSE file system daemons cannot be
used.

It would be confusing to call it FUSE.  The wire protocol is indeed
based on FUSE but from a user perspective it's a completely different
system.

The virtio-fs device extends the FUSE protocol with the
virtualization-specific DAX window (and the in-development shared
metadata versioning data structures that will be added to a future
spec).

Finally, users are likely to be confused and associate virtio-fuse with
outdated FUSE issues that have long since been solved.  virtio-fs
performs really well and that's not something that people associate with
FUSE :-).

For these reasons I think we should continue to call it virtio-fs.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [virtio-dev] [PATCH v3 0/2] virtio-fs: add virtio file system device
  2019-06-24 12:23   ` Stefan Hajnoczi
@ 2019-06-24 13:57     ` Michael S. Tsirkin
  0 siblings, 0 replies; 21+ messages in thread
From: Michael S. Tsirkin @ 2019-06-24 13:57 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: virtio-dev, Miklos Szeredi, Sage Weil, Vivek Goyal,
	Steven Whitehouse, Dr. David Alan Gilbert, Paolo Bonzini

On Mon, Jun 24, 2019 at 01:23:18PM +0100, Stefan Hajnoczi wrote:
> On Tue, Jun 18, 2019 at 09:30:54PM -0400, Michael S. Tsirkin wrote:
> > On Wed, Feb 20, 2019 at 12:46:11PM +0000, Stefan Hajnoczi wrote:
> > > v3:
> > >  * Remove notifications virtqueue, it's unimplemented and can be added when
> > >    needed [Miklos]
> > >  * Add Security Considerations and Live Migration Considerations sections
> > >    [Michael]
> > > v2:
> > >  * Clean up core virtio file system device spec
> > >  * Add DAX window
> > > 
> > > These patches add the virtio file system device, which is based on Linux FUSE
> > > but includes the DAX window extension.  Similar to virtio-scsi, which
> > > transports SCSI commands, virtio-fs transports FUSE requests and the protocol
> > > documentation is not duplicated here.
> > 
> > I think I prefer virtio-fuse as a name. Let's be a bit more
> > specific: we might want to add more filesystem devices later on.
> 
> virtio-fs is not FUSE.  Existing FUSE file system daemons cannot be
> used.

Right.

> It would be confusing to call it FUSE.  The wire protocol is indeed
> based on FUSE but from a user perspective it's a completely different
> system.

I agree.

> The virtio-fs device extends the FUSE protocol with the
> virtualization-specific DAX window (and the in-development shared
> metadata versioning data structures that will be added to a future
> spec).
> 
> Finally, users are likely to be confused and associate virtio-fuse with
> outdated FUSE issues that have long since been solved.  virtio-fs
> performs really well and that's not something that people associate with
> FUSE :-).
> 
> For these reasons I think we should continue to call it virtio-fs.
> 
> Stefan

I just wish we came up with something more specific than
"a filesystem". Any ideas?

-- 
MST

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [virtio-dev] [PATCH v3 2/2] virtio-fs: add DAX window
  2019-06-19  1:41   ` Michael S. Tsirkin
@ 2019-06-24 13:58     ` Stefan Hajnoczi
  2019-06-24 14:10       ` Michael S. Tsirkin
  0 siblings, 1 reply; 21+ messages in thread
From: Stefan Hajnoczi @ 2019-06-24 13:58 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev, Miklos Szeredi, Sage Weil, Vivek Goyal,
	Steven Whitehouse, Dr. David Alan Gilbert, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 5497 bytes --]

On Tue, Jun 18, 2019 at 09:41:25PM -0400, Michael S. Tsirkin wrote:
> On Wed, Feb 20, 2019 at 12:46:13PM +0000, Stefan Hajnoczi wrote:
> > Describe how shared memory region ID 0 is the DAX window and how
> > FUSE_SETUPMAPPING maps file ranges into the window.
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> > Note that this depends on the shared memory resource specification
> > extension that David Gilbert is working on.
> > https://lists.oasis-open.org/archives/virtio-comment/201901/msg00000.html
> > 
> > The FUSE_SETUPMAPPING message is part of the virtio-fs Linux patches:
> > https://gitlab.com/virtio-fs/linux/blob/virtio-fs/include/uapi/linux/fuse.h
> > ---
> >  virtio-fs.tex | 25 +++++++++++++++++++++++++
> >  1 file changed, 25 insertions(+)
> > 
> > diff --git a/virtio-fs.tex b/virtio-fs.tex
> > index 5df5b9c..abb1e48 100644
> > --- a/virtio-fs.tex
> > +++ b/virtio-fs.tex
> > @@ -157,6 +157,31 @@ The driver MUST submit FUSE_INTERRUPT, FUSE_FORGET, and FUSE_BATCH_FORGET reques
> >  
> >  The driver MUST anticipate that request queues are processed concurrently with the hiprio queue.
> >  
> > +\subsubsection{Device Operation: DAX Window}\label{sec:Device Types / File System Device / Device Operation / Device Operation: DAX Window}
> > +
> > +FUSE\_READ and FUSE\_WRITE requests transfer file contents between the
> > +driver-provided buffer and the device.  In cases where data transfer is
> > +undesirable, the device can map file contents into the DAX window shared memory
> > +region.  The driver then accesses file contents directly in device-owned memory
> > +without a data transfer.
> > +
> > +Shared memory region ID 0 is called the DAX window.  The driver maps a file
> > +range into the DAX window using the FUSE\_SETUPMAPPING request.  The mapping is
> > +removed using the FUSE\_REMOVEMAPPING request.
> 
> I don't see FUSE\_SETUPMAPPING or FUSE\_REMOVEMAPPING  under
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/fuse.h
> Is it just me?

They are not upstream yet and can be found here:

https://gitlab.com/virtio-fs/linux/blob/virtio-fs/include/uapi/linux/fuse.h#L384

There is a chicken-and-egg problem.  Linux should merge this once the
spec has been accepted.  The spec makes reference to a new FUSE command
that is being added to Linux.  :D

I suggest we break it by merging the VIRTIO spec change first.  There
won't be a spec release so soon anyway and we can revert it in case
there are issues Linux.  Miklos, the FUSE maintainer, is well aware of
virtio-fs and contributes to it, so it's unlikely that Linux will reject
these commands.

> > +
> > +After FUSE\_SETUPMAPPING has completed successfully the file range is accessible
> > +from the DAX window at the offset provided by the driver in the request.
> 
> Dgilbert's patches describing shared memory say that
> the legal ways to set up mappings are all implementation-dependent.
> How does driver know which attributes to use for the
> mapping?

Two different types of mappings:
1. The DAX window shared memory region described by DaveG's spec.
2. The file mappings established using FUSE_SETUPMAPPING.

The virtio_fs.ko driver maps the DAX window, e.g. from a PCI BAR in an
implementation-defined way.  virtio_pci_*.c in Linux will have to help
out with the implementation-specific details here.

The only flags currently supported by FUSE_SETUPMAPPING are READ and
WRITE.  This depends on the file's access mode.  There is nothing
implementation-specific in FUSE_SETUPMAPPING.

> Also, we recently had a discussion about DAX support on hosts
> and safety wrt crashes. Do we need to expose this
> information to guests maybe?

No.  Although virtio-fs uses the DAX subsystem, it does not use NVDIMM's
persistence model (e.g. CPU cache flush for persistence).  FUSE_FSYNC is
sent when persistence is required.  Therefore virtio-fs is still using
the traditional file/block persistence model.  No changes necessary for
power failure, etc.

> Finally, do we want to have a way to express that the filesystem
> only allows RO mappings?

Thanks for this idea.  I'm discussing it with the FUSE community because
mount -o ro with FUSE currently doesn't involve the file system daemon.

> > +
> > +\devicenormative{\paragraph}{Device Operation: DAX Window}{Device Types / File System Device / Device Operation / Device Operation: DAX Window}
> > +
> > +The device MUST allow mappings that completely or partially overlap existing mappings within the DAX window.
> 
> 
> Any alignment requirements?

Good point.  There are alignment requirements and the driver has no way
of knowing what they are.  I'll find a way to communicate them into the
guest, either via virtio or via FUSE.

> Also, with no limit on mappings, it looks like guest can use up lots of
> host VMAs quickly. Shouldn't there be a limit on # of mappings?

The VM can only deteriorate its own performance, right?

We haven't seen catastrophic problems that bring the system to it's
knees.  But we're aware that increasing the number VMAs slows down the
lookup.  There is currently no imposed limit.

Ideas have been discussed to avoid using (so many) VMAs but it seems
like that will take some time to develop and get upstream.  This will
not affect the virtio specification because the device interface doesn't
need to know about this.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [virtio-dev] [PATCH v3 2/2] virtio-fs: add DAX window
  2019-06-24 13:58     ` Stefan Hajnoczi
@ 2019-06-24 14:10       ` Michael S. Tsirkin
  2019-06-25  9:55         ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 21+ messages in thread
From: Michael S. Tsirkin @ 2019-06-24 14:10 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: virtio-dev, Miklos Szeredi, Sage Weil, Vivek Goyal,
	Steven Whitehouse, Dr. David Alan Gilbert, Paolo Bonzini

On Mon, Jun 24, 2019 at 02:58:08PM +0100, Stefan Hajnoczi wrote:
> On Tue, Jun 18, 2019 at 09:41:25PM -0400, Michael S. Tsirkin wrote:
> > On Wed, Feb 20, 2019 at 12:46:13PM +0000, Stefan Hajnoczi wrote:
> > > Describe how shared memory region ID 0 is the DAX window and how
> > > FUSE_SETUPMAPPING maps file ranges into the window.
> > > 
> > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > ---
> > > Note that this depends on the shared memory resource specification
> > > extension that David Gilbert is working on.
> > > https://lists.oasis-open.org/archives/virtio-comment/201901/msg00000.html
> > > 
> > > The FUSE_SETUPMAPPING message is part of the virtio-fs Linux patches:
> > > https://gitlab.com/virtio-fs/linux/blob/virtio-fs/include/uapi/linux/fuse.h
> > > ---
> > >  virtio-fs.tex | 25 +++++++++++++++++++++++++
> > >  1 file changed, 25 insertions(+)
> > > 
> > > diff --git a/virtio-fs.tex b/virtio-fs.tex
> > > index 5df5b9c..abb1e48 100644
> > > --- a/virtio-fs.tex
> > > +++ b/virtio-fs.tex
> > > @@ -157,6 +157,31 @@ The driver MUST submit FUSE_INTERRUPT, FUSE_FORGET, and FUSE_BATCH_FORGET reques
> > >  
> > >  The driver MUST anticipate that request queues are processed concurrently with the hiprio queue.
> > >  
> > > +\subsubsection{Device Operation: DAX Window}\label{sec:Device Types / File System Device / Device Operation / Device Operation: DAX Window}
> > > +
> > > +FUSE\_READ and FUSE\_WRITE requests transfer file contents between the
> > > +driver-provided buffer and the device.  In cases where data transfer is
> > > +undesirable, the device can map file contents into the DAX window shared memory
> > > +region.  The driver then accesses file contents directly in device-owned memory
> > > +without a data transfer.
> > > +
> > > +Shared memory region ID 0 is called the DAX window.  The driver maps a file
> > > +range into the DAX window using the FUSE\_SETUPMAPPING request.  The mapping is
> > > +removed using the FUSE\_REMOVEMAPPING request.
> > 
> > I don't see FUSE\_SETUPMAPPING or FUSE\_REMOVEMAPPING  under
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/fuse.h
> > Is it just me?
> 
> They are not upstream yet and can be found here:
> 
> https://gitlab.com/virtio-fs/linux/blob/virtio-fs/include/uapi/linux/fuse.h#L384
> 
> There is a chicken-and-egg problem.  Linux should merge this once the
> spec has been accepted.  The spec makes reference to a new FUSE command
> that is being added to Linux.  :D
> 
> I suggest we break it by merging the VIRTIO spec change first.  There
> won't be a spec release so soon anyway and we can revert it in case
> there are issues Linux.  Miklos, the FUSE maintainer, is well aware of
> virtio-fs and contributes to it, so it's unlikely that Linux will reject
> these commands.
> 
> > > +
> > > +After FUSE\_SETUPMAPPING has completed successfully the file range is accessible
> > > +from the DAX window at the offset provided by the driver in the request.
> > 
> > Dgilbert's patches describing shared memory say that
> > the legal ways to set up mappings are all implementation-dependent.
> > How does driver know which attributes to use for the
> > mapping?
> 
> Two different types of mappings:
> 1. The DAX window shared memory region described by DaveG's spec.
> 2. The file mappings established using FUSE_SETUPMAPPING.
> 
> The virtio_fs.ko driver maps the DAX window, e.g. from a PCI BAR in an
> implementation-defined way.  virtio_pci_*.c in Linux will have to help
> out with the implementation-specific details here.
> 
> The only flags currently supported by FUSE_SETUPMAPPING are READ and
> WRITE.  This depends on the file's access mode.  There is nothing
> implementation-specific in FUSE_SETUPMAPPING.

Sorry - I'm being unclear.
The guest driver maps parts of the PCI BAR.
What are the attributes of this mapping?
This is unrelated to FUSE_SETUPMAPPING things -
mapping is created by creatig PTEs and such
within guest, not by virtio things.


> > Also, we recently had a discussion about DAX support on hosts
> > and safety wrt crashes. Do we need to expose this
> > information to guests maybe?
> 
> No.  Although virtio-fs uses the DAX subsystem, it does not use NVDIMM's
> persistence model (e.g. CPU cache flush for persistence).  FUSE_FSYNC is
> sent when persistence is required.  Therefore virtio-fs is still using
> the traditional file/block persistence model.  No changes necessary for
> power failure, etc.
> 
> > Finally, do we want to have a way to express that the filesystem
> > only allows RO mappings?
> 
> Thanks for this idea.  I'm discussing it with the FUSE community because
> mount -o ro with FUSE currently doesn't involve the file system daemon.
> 
> > > +
> > > +\devicenormative{\paragraph}{Device Operation: DAX Window}{Device Types / File System Device / Device Operation / Device Operation: DAX Window}
> > > +
> > > +The device MUST allow mappings that completely or partially overlap existing mappings within the DAX window.
> > 
> > 
> > Any alignment requirements?
> 
> Good point.  There are alignment requirements and the driver has no way
> of knowing what they are.  I'll find a way to communicate them into the
> guest, either via virtio or via FUSE.
> 
> > Also, with no limit on mappings, it looks like guest can use up lots of
> > host VMAs quickly. Shouldn't there be a limit on # of mappings?
> 
> The VM can only deteriorate its own performance, right?

Only if QEMU is put in a container where virtual memory is
limited.
It's generally not a good idea where the only way for
host to make progress is to allocate more memory
without any limit.

If we are in a situation where we need to either kill
the guest or hit swap, none of the choices is good.




> We haven't seen catastrophic problems that bring the system to it's
> knees.

Because you are not running malicious guests?

>  But we're aware that increasing the number VMAs slows down the
> lookup.  There is currently no imposed limit.
> 
> Ideas have been discussed to avoid using (so many) VMAs but it seems
> like that will take some time to develop and get upstream.  This will
> not affect the virtio specification because the device interface doesn't
> need to know about this.
> 
> Stefan


One way to address this is to expose the # of mappings
in the config space.

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [virtio-dev] [PATCH v3 2/2] virtio-fs: add DAX window
  2019-06-24 14:10       ` Michael S. Tsirkin
@ 2019-06-25  9:55         ` Dr. David Alan Gilbert
  2019-06-27 14:09           ` Michael S. Tsirkin
  0 siblings, 1 reply; 21+ messages in thread
From: Dr. David Alan Gilbert @ 2019-06-25  9:55 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Stefan Hajnoczi, virtio-dev, Miklos Szeredi, Sage Weil,
	Vivek Goyal, Steven Whitehouse, Paolo Bonzini

* Michael S. Tsirkin (mst@redhat.com) wrote:
> On Mon, Jun 24, 2019 at 02:58:08PM +0100, Stefan Hajnoczi wrote:
> > On Tue, Jun 18, 2019 at 09:41:25PM -0400, Michael S. Tsirkin wrote:
> > > On Wed, Feb 20, 2019 at 12:46:13PM +0000, Stefan Hajnoczi wrote:
> > > > Describe how shared memory region ID 0 is the DAX window and how
> > > > FUSE_SETUPMAPPING maps file ranges into the window.
> > > > 
> > > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > ---
> > > > Note that this depends on the shared memory resource specification
> > > > extension that David Gilbert is working on.
> > > > https://lists.oasis-open.org/archives/virtio-comment/201901/msg00000.html
> > > > 
> > > > The FUSE_SETUPMAPPING message is part of the virtio-fs Linux patches:
> > > > https://gitlab.com/virtio-fs/linux/blob/virtio-fs/include/uapi/linux/fuse.h
> > > > ---
> > > >  virtio-fs.tex | 25 +++++++++++++++++++++++++
> > > >  1 file changed, 25 insertions(+)
> > > > 
> > > > diff --git a/virtio-fs.tex b/virtio-fs.tex
> > > > index 5df5b9c..abb1e48 100644
> > > > --- a/virtio-fs.tex
> > > > +++ b/virtio-fs.tex
> > > > @@ -157,6 +157,31 @@ The driver MUST submit FUSE_INTERRUPT, FUSE_FORGET, and FUSE_BATCH_FORGET reques
> > > >  
> > > >  The driver MUST anticipate that request queues are processed concurrently with the hiprio queue.
> > > >  
> > > > +\subsubsection{Device Operation: DAX Window}\label{sec:Device Types / File System Device / Device Operation / Device Operation: DAX Window}
> > > > +
> > > > +FUSE\_READ and FUSE\_WRITE requests transfer file contents between the
> > > > +driver-provided buffer and the device.  In cases where data transfer is
> > > > +undesirable, the device can map file contents into the DAX window shared memory
> > > > +region.  The driver then accesses file contents directly in device-owned memory
> > > > +without a data transfer.
> > > > +
> > > > +Shared memory region ID 0 is called the DAX window.  The driver maps a file
> > > > +range into the DAX window using the FUSE\_SETUPMAPPING request.  The mapping is
> > > > +removed using the FUSE\_REMOVEMAPPING request.
> > > 
> > > I don't see FUSE\_SETUPMAPPING or FUSE\_REMOVEMAPPING  under
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/fuse.h
> > > Is it just me?
> > 
> > They are not upstream yet and can be found here:
> > 
> > https://gitlab.com/virtio-fs/linux/blob/virtio-fs/include/uapi/linux/fuse.h#L384
> > 
> > There is a chicken-and-egg problem.  Linux should merge this once the
> > spec has been accepted.  The spec makes reference to a new FUSE command
> > that is being added to Linux.  :D
> > 
> > I suggest we break it by merging the VIRTIO spec change first.  There
> > won't be a spec release so soon anyway and we can revert it in case
> > there are issues Linux.  Miklos, the FUSE maintainer, is well aware of
> > virtio-fs and contributes to it, so it's unlikely that Linux will reject
> > these commands.
> > 
> > > > +
> > > > +After FUSE\_SETUPMAPPING has completed successfully the file range is accessible
> > > > +from the DAX window at the offset provided by the driver in the request.
> > > 
> > > Dgilbert's patches describing shared memory say that
> > > the legal ways to set up mappings are all implementation-dependent.
> > > How does driver know which attributes to use for the
> > > mapping?
> > 
> > Two different types of mappings:
> > 1. The DAX window shared memory region described by DaveG's spec.
> > 2. The file mappings established using FUSE_SETUPMAPPING.
> > 
> > The virtio_fs.ko driver maps the DAX window, e.g. from a PCI BAR in an
> > implementation-defined way.  virtio_pci_*.c in Linux will have to help
> > out with the implementation-specific details here.
> > 
> > The only flags currently supported by FUSE_SETUPMAPPING are READ and
> > WRITE.  This depends on the file's access mode.  There is nothing
> > implementation-specific in FUSE_SETUPMAPPING.
> 
> Sorry - I'm being unclear.
> The guest driver maps parts of the PCI BAR.
> What are the attributes of this mapping?
> This is unrelated to FUSE_SETUPMAPPING things -
> mapping is created by creatig PTEs and such
> within guest, not by virtio things.

By attributes you mean... memory ordering, cachability etc?

> 
> > > Also, we recently had a discussion about DAX support on hosts
> > > and safety wrt crashes. Do we need to expose this
> > > information to guests maybe?
> > 
> > No.  Although virtio-fs uses the DAX subsystem, it does not use NVDIMM's
> > persistence model (e.g. CPU cache flush for persistence).  FUSE_FSYNC is
> > sent when persistence is required.  Therefore virtio-fs is still using
> > the traditional file/block persistence model.  No changes necessary for
> > power failure, etc.
> > 
> > > Finally, do we want to have a way to express that the filesystem
> > > only allows RO mappings?
> > 
> > Thanks for this idea.  I'm discussing it with the FUSE community because
> > mount -o ro with FUSE currently doesn't involve the file system daemon.
> > 
> > > > +
> > > > +\devicenormative{\paragraph}{Device Operation: DAX Window}{Device Types / File System Device / Device Operation / Device Operation: DAX Window}
> > > > +
> > > > +The device MUST allow mappings that completely or partially overlap existing mappings within the DAX window.
> > > 
> > > 
> > > Any alignment requirements?
> > 
> > Good point.  There are alignment requirements and the driver has no way
> > of knowing what they are.  I'll find a way to communicate them into the
> > guest, either via virtio or via FUSE.
> > 
> > > Also, with no limit on mappings, it looks like guest can use up lots of
> > > host VMAs quickly. Shouldn't there be a limit on # of mappings?
> > 
> > The VM can only deteriorate its own performance, right?
> 
> Only if QEMU is put in a container where virtual memory is
> limited.
> It's generally not a good idea where the only way for
> host to make progress is to allocate more memory
> without any limit.
> 
> If we are in a situation where we need to either kill
> the guest or hit swap, none of the choices is good.

There is a bound; it's cache region size / page size - so
that's ~1M mappings worst case (e.g. 4GB cache, 4kB page size)
That limit can be bought down if we impose a larger granularity
somewhere (and the reality is our kernel uses 2MB mapping chunks I
think).

> > We haven't seen catastrophic problems that bring the system to it's
> > knees.
> 
> Because you are not running malicious guests?

Hmm, I didn't realise a process having an excessive number of mappings
could harm any other process.

Dave

> >  But we're aware that increasing the number VMAs slows down the
> > lookup.  There is currently no imposed limit.
> > 
> > Ideas have been discussed to avoid using (so many) VMAs but it seems
> > like that will take some time to develop and get upstream.  This will
> > not affect the virtio specification because the device interface doesn't
> > need to know about this.
> > 
> > Stefan
> 
> 
> One way to address this is to expose the # of mappings
> in the config space.
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [virtio-dev] [PATCH v3 2/2] virtio-fs: add DAX window
  2019-06-25  9:55         ` Dr. David Alan Gilbert
@ 2019-06-27 14:09           ` Michael S. Tsirkin
  2019-07-17 10:48             ` Stefan Hajnoczi
       [not found]             ` <20190717124258.GA13761@redhat.com>
  0 siblings, 2 replies; 21+ messages in thread
From: Michael S. Tsirkin @ 2019-06-27 14:09 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Stefan Hajnoczi, virtio-dev, Miklos Szeredi, Sage Weil,
	Vivek Goyal, Steven Whitehouse, Paolo Bonzini

On Tue, Jun 25, 2019 at 10:55:15AM +0100, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (mst@redhat.com) wrote:
> > On Mon, Jun 24, 2019 at 02:58:08PM +0100, Stefan Hajnoczi wrote:
> > > On Tue, Jun 18, 2019 at 09:41:25PM -0400, Michael S. Tsirkin wrote:
> > > > On Wed, Feb 20, 2019 at 12:46:13PM +0000, Stefan Hajnoczi wrote:
> > > > > Describe how shared memory region ID 0 is the DAX window and how
> > > > > FUSE_SETUPMAPPING maps file ranges into the window.
> > > > > 
> > > > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > > ---
> > > > > Note that this depends on the shared memory resource specification
> > > > > extension that David Gilbert is working on.
> > > > > https://lists.oasis-open.org/archives/virtio-comment/201901/msg00000.html
> > > > > 
> > > > > The FUSE_SETUPMAPPING message is part of the virtio-fs Linux patches:
> > > > > https://gitlab.com/virtio-fs/linux/blob/virtio-fs/include/uapi/linux/fuse.h
> > > > > ---
> > > > >  virtio-fs.tex | 25 +++++++++++++++++++++++++
> > > > >  1 file changed, 25 insertions(+)
> > > > > 
> > > > > diff --git a/virtio-fs.tex b/virtio-fs.tex
> > > > > index 5df5b9c..abb1e48 100644
> > > > > --- a/virtio-fs.tex
> > > > > +++ b/virtio-fs.tex
> > > > > @@ -157,6 +157,31 @@ The driver MUST submit FUSE_INTERRUPT, FUSE_FORGET, and FUSE_BATCH_FORGET reques
> > > > >  
> > > > >  The driver MUST anticipate that request queues are processed concurrently with the hiprio queue.
> > > > >  
> > > > > +\subsubsection{Device Operation: DAX Window}\label{sec:Device Types / File System Device / Device Operation / Device Operation: DAX Window}
> > > > > +
> > > > > +FUSE\_READ and FUSE\_WRITE requests transfer file contents between the
> > > > > +driver-provided buffer and the device.  In cases where data transfer is
> > > > > +undesirable, the device can map file contents into the DAX window shared memory
> > > > > +region.  The driver then accesses file contents directly in device-owned memory
> > > > > +without a data transfer.
> > > > > +
> > > > > +Shared memory region ID 0 is called the DAX window.  The driver maps a file
> > > > > +range into the DAX window using the FUSE\_SETUPMAPPING request.  The mapping is
> > > > > +removed using the FUSE\_REMOVEMAPPING request.
> > > > 
> > > > I don't see FUSE\_SETUPMAPPING or FUSE\_REMOVEMAPPING  under
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/fuse.h
> > > > Is it just me?
> > > 
> > > They are not upstream yet and can be found here:
> > > 
> > > https://gitlab.com/virtio-fs/linux/blob/virtio-fs/include/uapi/linux/fuse.h#L384
> > > 
> > > There is a chicken-and-egg problem.  Linux should merge this once the
> > > spec has been accepted.  The spec makes reference to a new FUSE command
> > > that is being added to Linux.  :D
> > > 
> > > I suggest we break it by merging the VIRTIO spec change first.  There
> > > won't be a spec release so soon anyway and we can revert it in case
> > > there are issues Linux.  Miklos, the FUSE maintainer, is well aware of
> > > virtio-fs and contributes to it, so it's unlikely that Linux will reject
> > > these commands.
> > > 
> > > > > +
> > > > > +After FUSE\_SETUPMAPPING has completed successfully the file range is accessible
> > > > > +from the DAX window at the offset provided by the driver in the request.
> > > > 
> > > > Dgilbert's patches describing shared memory say that
> > > > the legal ways to set up mappings are all implementation-dependent.
> > > > How does driver know which attributes to use for the
> > > > mapping?
> > > 
> > > Two different types of mappings:
> > > 1. The DAX window shared memory region described by DaveG's spec.
> > > 2. The file mappings established using FUSE_SETUPMAPPING.
> > > 
> > > The virtio_fs.ko driver maps the DAX window, e.g. from a PCI BAR in an
> > > implementation-defined way.  virtio_pci_*.c in Linux will have to help
> > > out with the implementation-specific details here.
> > > 
> > > The only flags currently supported by FUSE_SETUPMAPPING are READ and
> > > WRITE.  This depends on the file's access mode.  There is nothing
> > > implementation-specific in FUSE_SETUPMAPPING.
> > 
> > Sorry - I'm being unclear.
> > The guest driver maps parts of the PCI BAR.
> > What are the attributes of this mapping?
> > This is unrelated to FUSE_SETUPMAPPING things -
> > mapping is created by creatig PTEs and such
> > within guest, not by virtio things.
> 
> By attributes you mean... memory ordering, cachability etc?

I mean non-cacheable, writeback, write combining.
standard mmap of pci bar is non-cacheable.

> > 
> > > > Also, we recently had a discussion about DAX support on hosts
> > > > and safety wrt crashes. Do we need to expose this
> > > > information to guests maybe?
> > > 
> > > No.  Although virtio-fs uses the DAX subsystem, it does not use NVDIMM's
> > > persistence model (e.g. CPU cache flush for persistence).  FUSE_FSYNC is
> > > sent when persistence is required.  Therefore virtio-fs is still using
> > > the traditional file/block persistence model.  No changes necessary for
> > > power failure, etc.
> > > 
> > > > Finally, do we want to have a way to express that the filesystem
> > > > only allows RO mappings?
> > > 
> > > Thanks for this idea.  I'm discussing it with the FUSE community because
> > > mount -o ro with FUSE currently doesn't involve the file system daemon.
> > > 
> > > > > +
> > > > > +\devicenormative{\paragraph}{Device Operation: DAX Window}{Device Types / File System Device / Device Operation / Device Operation: DAX Window}
> > > > > +
> > > > > +The device MUST allow mappings that completely or partially overlap existing mappings within the DAX window.
> > > > 
> > > > 
> > > > Any alignment requirements?
> > > 
> > > Good point.  There are alignment requirements and the driver has no way
> > > of knowing what they are.  I'll find a way to communicate them into the
> > > guest, either via virtio or via FUSE.
> > > 
> > > > Also, with no limit on mappings, it looks like guest can use up lots of
> > > > host VMAs quickly. Shouldn't there be a limit on # of mappings?
> > > 
> > > The VM can only deteriorate its own performance, right?
> > 
> > Only if QEMU is put in a container where virtual memory is
> > limited.
> > It's generally not a good idea where the only way for
> > host to make progress is to allocate more memory
> > without any limit.
> > 
> > If we are in a situation where we need to either kill
> > the guest or hit swap, none of the choices is good.
> 
> There is a bound; it's cache region size / page size - so
> that's ~1M mappings worst case (e.g. 4GB cache, 4kB page size)
> That limit can be bought down if we impose a larger granularity
> somewhere (and the reality is our kernel uses 2MB mapping chunks I
> think).
> 
> > > We haven't seen catastrophic problems that bring the system to it's
> > > knees.
> > 
> > Because you are not running malicious guests?
> 
> Hmm, I didn't realise a process having an excessive number of mappings
> could harm any other process.
> 
> Dave

Well it allocates resources on the host. If you don't
contain qemu then even just allocating virtual memory
can make host swap, right? If you contain it then
qemu will get killed instead but then you need to tell
guest what not to do so as not to get qemu killed.


> > >  But we're aware that increasing the number VMAs slows down the
> > > lookup.  There is currently no imposed limit.
> > > 
> > > Ideas have been discussed to avoid using (so many) VMAs but it seems
> > > like that will take some time to develop and get upstream.  This will
> > > not affect the virtio specification because the device interface doesn't
> > > need to know about this.
> > > 
> > > Stefan
> > 
> > 
> > One way to address this is to expose the # of mappings
> > in the config space.
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [virtio-dev] [PATCH v3 2/2] virtio-fs: add DAX window
  2019-06-27 14:09           ` Michael S. Tsirkin
@ 2019-07-17 10:48             ` Stefan Hajnoczi
       [not found]             ` <20190717124258.GA13761@redhat.com>
  1 sibling, 0 replies; 21+ messages in thread
From: Stefan Hajnoczi @ 2019-07-17 10:48 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Dr. David Alan Gilbert, virtio-dev, Miklos Szeredi, Sage Weil,
	Vivek Goyal, Steven Whitehouse, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 3370 bytes --]

On Thu, Jun 27, 2019 at 10:09:16AM -0400, Michael S. Tsirkin wrote:
> On Tue, Jun 25, 2019 at 10:55:15AM +0100, Dr. David Alan Gilbert wrote:
> > * Michael S. Tsirkin (mst@redhat.com) wrote:
> > > On Mon, Jun 24, 2019 at 02:58:08PM +0100, Stefan Hajnoczi wrote:
> > > > On Tue, Jun 18, 2019 at 09:41:25PM -0400, Michael S. Tsirkin wrote:
> > > > > On Wed, Feb 20, 2019 at 12:46:13PM +0000, Stefan Hajnoczi wrote:
> > > > > > +
> > > > > > +\devicenormative{\paragraph}{Device Operation: DAX Window}{Device Types / File System Device / Device Operation / Device Operation: DAX Window}
> > > > > > +
> > > > > > +The device MUST allow mappings that completely or partially overlap existing mappings within the DAX window.
> > > > > 
> > > > > 
> > > > > Any alignment requirements?
> > > > 
> > > > Good point.  There are alignment requirements and the driver has no way
> > > > of knowing what they are.  I'll find a way to communicate them into the
> > > > guest, either via virtio or via FUSE.
> > > > 
> > > > > Also, with no limit on mappings, it looks like guest can use up lots of
> > > > > host VMAs quickly. Shouldn't there be a limit on # of mappings?
> > > > 
> > > > The VM can only deteriorate its own performance, right?
> > > 
> > > Only if QEMU is put in a container where virtual memory is
> > > limited.
> > > It's generally not a good idea where the only way for
> > > host to make progress is to allocate more memory
> > > without any limit.
> > > 
> > > If we are in a situation where we need to either kill
> > > the guest or hit swap, none of the choices is good.
> > 
> > There is a bound; it's cache region size / page size - so
> > that's ~1M mappings worst case (e.g. 4GB cache, 4kB page size)
> > That limit can be bought down if we impose a larger granularity
> > somewhere (and the reality is our kernel uses 2MB mapping chunks I
> > think).
> > 
> > > > We haven't seen catastrophic problems that bring the system to it's
> > > > knees.
> > > 
> > > Because you are not running malicious guests?
> > 
> > Hmm, I didn't realise a process having an excessive number of mappings
> > could harm any other process.
> > 
> > Dave
> 
> Well it allocates resources on the host. If you don't
> contain qemu then even just allocating virtual memory
> can make host swap, right? If you contain it then
> qemu will get killed instead but then you need to tell
> guest what not to do so as not to get qemu killed.

I investigated a little.  Linux has a maximum VMA count sysctl that is
affected by mmap and any other places that add/split VMAs:

  vm.max_map_count = 65530

This is a sysctl tunable and is kept below 65536 for legacy reasons.
ELF coredumps used to only support ~65536 sections.

The QEMU process needs its own VMAs for shared libraries and other
purposes, so each virtio-fs device should expose a significantly lower
DAX Window mapping limit to the driver.  Let's add a configuration space
field as Michael has suggested.

Regarding denial of service, the DAX Window size determines the overall
amount of host page cache that is accessible by the driver.  Together
with an enforced maximum map count we can allow the administrator to
configure devices so they only provide access to a fraction of the host
page cache, mitigating denial of service issues.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [virtio-dev] [PATCH v3 2/2] virtio-fs: add DAX window
       [not found]             ` <20190717124258.GA13761@redhat.com>
@ 2019-07-23 13:32               ` Stefan Hajnoczi
       [not found]                 ` <20190723140855.GA11628@redhat.com>
  0 siblings, 1 reply; 21+ messages in thread
From: Stefan Hajnoczi @ 2019-07-23 13:32 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Michael S. Tsirkin, Dr. David Alan Gilbert, virtio-dev,
	Miklos Szeredi, Sage Weil, Steven Whitehouse, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 917 bytes --]

On Wed, Jul 17, 2019 at 08:42:58AM -0400, Vivek Goyal wrote:
> On Thu, Jun 27, 2019 at 10:09:16AM -0400, Michael S. Tsirkin wrote:
> 
> [..]
> > Well it allocates resources on the host. If you don't
> > contain qemu then even just allocating virtual memory
> > can make host swap, right? If you contain it then
> > qemu will get killed instead but then you need to tell
> > guest what not to do so as not to get qemu killed.
> 
> How is it different than running a malicious unpriviliged process on host?
> 
> Denial of service should be mitigated by per process resource limits which
> will be true in this case as well.
> 
> I am not sure I understand the concern here.

There is a practical problem that the QEMU process may hit the mmap
limit and be unable to perform its own mmaps due to the DAX Window.  A
limit must be enforced on the host so that QEMU's internal mmaps
succeed.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [virtio-dev] [PATCH v3 2/2] virtio-fs: add DAX window
       [not found]                 ` <20190723140855.GA11628@redhat.com>
@ 2019-07-23 14:52                   ` Stefan Hajnoczi
       [not found]                     ` <20190723155623.GA19189@redhat.com>
  0 siblings, 1 reply; 21+ messages in thread
From: Stefan Hajnoczi @ 2019-07-23 14:52 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Michael S. Tsirkin, Dr. David Alan Gilbert, virtio-dev,
	Miklos Szeredi, Sage Weil, Steven Whitehouse, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 1739 bytes --]

On Tue, Jul 23, 2019 at 10:08:55AM -0400, Vivek Goyal wrote:
> On Tue, Jul 23, 2019 at 02:32:27PM +0100, Stefan Hajnoczi wrote:
> > On Wed, Jul 17, 2019 at 08:42:58AM -0400, Vivek Goyal wrote:
> > > On Thu, Jun 27, 2019 at 10:09:16AM -0400, Michael S. Tsirkin wrote:
> > > 
> > > [..]
> > > > Well it allocates resources on the host. If you don't
> > > > contain qemu then even just allocating virtual memory
> > > > can make host swap, right? If you contain it then
> > > > qemu will get killed instead but then you need to tell
> > > > guest what not to do so as not to get qemu killed.
> > > 
> > > How is it different than running a malicious unpriviliged process on host?
> > > 
> > > Denial of service should be mitigated by per process resource limits which
> > > will be true in this case as well.
> > > 
> > > I am not sure I understand the concern here.
> > 
> > There is a practical problem that the QEMU process may hit the mmap
> > limit and be unable to perform its own mmaps due to the DAX Window.  A
> > limit must be enforced on the host so that QEMU's internal mmaps
> > succeed.
> 
> But number of mmaps are already limited by dax window size which is
> controlled by virtiofsd. So all user has to do is start with smaller
> dax window size if this ever becomes a concern.

Yes, the DAX Window size sets a hard limit on the number of mappings
(window_size / page_size).  I think we should still communicate a
maximum number of mappings to provide more control for cases where the
page size and DAX Window size don't produce a desirable number.

Consider that window_size = 8 GB and page_size = 4 KB already yields
2,097,152 maximum mappings.  Oops, that number is too large!

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [virtio-dev] [PATCH v3 1/2] content: add virtio file system device
  2019-06-19  1:29   ` [virtio-dev] " Michael S. Tsirkin
@ 2019-07-23 15:58     ` Stefan Hajnoczi
  0 siblings, 0 replies; 21+ messages in thread
From: Stefan Hajnoczi @ 2019-07-23 15:58 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev, Miklos Szeredi, Sage Weil, Vivek Goyal,
	Steven Whitehouse, Dr. David Alan Gilbert, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 910 bytes --]

On Tue, Jun 18, 2019 at 09:29:13PM -0400, Michael S. Tsirkin wrote:
> On Wed, Feb 20, 2019 at 12:46:12PM +0000, Stefan Hajnoczi wrote:
> > +\begin{description}
> > +\item[\field{tag}] is the name associated with this file system.  The tag is
> > +    encoded in UTF-8 and padded with NUL bytes if shorter than the
> > +    available space.  This field is not NUL-terminated if the encoded bytes
> > +    take up the entire field.
> > +\item[\field{num_queues}] is the total number of request virtqueues exposed by
> > +    the device. The driver MAY use only one request queue,
> > +    or it can use more to achieve better performance.
> 
> Pls copy instances of MAY,SHOULD,MUST into conformance sections.
> Pls convert text outside of conformance sections to "can"
> or "is allowed to" etc.

Thanks for your feedback.  All of these comments will be addressed in
the next revision.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [virtio-dev] [PATCH v3 2/2] virtio-fs: add DAX window
       [not found]                     ` <20190723155623.GA19189@redhat.com>
@ 2019-07-24  8:33                       ` Stefan Hajnoczi
  0 siblings, 0 replies; 21+ messages in thread
From: Stefan Hajnoczi @ 2019-07-24  8:33 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Michael S. Tsirkin, Dr. David Alan Gilbert, virtio-dev,
	Miklos Szeredi, Sage Weil, Steven Whitehouse, Paolo Bonzini

[-- Attachment #1: Type: text/plain, Size: 2342 bytes --]

On Tue, Jul 23, 2019 at 11:56:23AM -0400, Vivek Goyal wrote:
> On Tue, Jul 23, 2019 at 03:52:50PM +0100, Stefan Hajnoczi wrote:
> > On Tue, Jul 23, 2019 at 10:08:55AM -0400, Vivek Goyal wrote:
> > > On Tue, Jul 23, 2019 at 02:32:27PM +0100, Stefan Hajnoczi wrote:
> > > > On Wed, Jul 17, 2019 at 08:42:58AM -0400, Vivek Goyal wrote:
> > > > > On Thu, Jun 27, 2019 at 10:09:16AM -0400, Michael S. Tsirkin wrote:
> > > > > 
> > > > > [..]
> > > > > > Well it allocates resources on the host. If you don't
> > > > > > contain qemu then even just allocating virtual memory
> > > > > > can make host swap, right? If you contain it then
> > > > > > qemu will get killed instead but then you need to tell
> > > > > > guest what not to do so as not to get qemu killed.
> > > > > 
> > > > > How is it different than running a malicious unpriviliged process on host?
> > > > > 
> > > > > Denial of service should be mitigated by per process resource limits which
> > > > > will be true in this case as well.
> > > > > 
> > > > > I am not sure I understand the concern here.
> > > > 
> > > > There is a practical problem that the QEMU process may hit the mmap
> > > > limit and be unable to perform its own mmaps due to the DAX Window.  A
> > > > limit must be enforced on the host so that QEMU's internal mmaps
> > > > succeed.
> > > 
> > > But number of mmaps are already limited by dax window size which is
> > > controlled by virtiofsd. So all user has to do is start with smaller
> > > dax window size if this ever becomes a concern.
> > 
> > Yes, the DAX Window size sets a hard limit on the number of mappings
> > (window_size / page_size).  I think we should still communicate a
> > maximum number of mappings to provide more control for cases where the
> > page size and DAX Window size don't produce a desirable number.
> > 
> > Consider that window_size = 8 GB and page_size = 4 KB already yields
> > 2,097,152 maximum mappings.  Oops, that number is too large!
> 
> Ok, so that's something which will be in virtiofsd where maximum number
> of outstanding can be configured by user and if we cross that limit,
> FUSE_SETUPMAPPING will be returned with some error?

Yes, virtiofsd will enforce the limit.

The guest driver will also be aware of the limit via VIRTIO
configuration space.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2019-07-24  8:33 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-20 12:46 [virtio-dev] [PATCH v3 0/2] virtio-fs: add virtio file system device Stefan Hajnoczi
2019-02-20 12:46 ` [virtio-dev] [PATCH v3 1/2] content: " Stefan Hajnoczi
2019-02-22 14:31   ` Dr. David Alan Gilbert
2019-02-25 15:54     ` Stefan Hajnoczi
2019-02-25 16:11   ` [virtio-dev] " Dr. David Alan Gilbert
2019-02-27 16:19     ` Stefan Hajnoczi
2019-06-19  1:29   ` [virtio-dev] " Michael S. Tsirkin
2019-07-23 15:58     ` Stefan Hajnoczi
2019-02-20 12:46 ` [virtio-dev] [PATCH v3 2/2] virtio-fs: add DAX window Stefan Hajnoczi
2019-06-19  1:41   ` Michael S. Tsirkin
2019-06-24 13:58     ` Stefan Hajnoczi
2019-06-24 14:10       ` Michael S. Tsirkin
2019-06-25  9:55         ` Dr. David Alan Gilbert
2019-06-27 14:09           ` Michael S. Tsirkin
2019-07-17 10:48             ` Stefan Hajnoczi
     [not found]             ` <20190717124258.GA13761@redhat.com>
2019-07-23 13:32               ` Stefan Hajnoczi
     [not found]                 ` <20190723140855.GA11628@redhat.com>
2019-07-23 14:52                   ` Stefan Hajnoczi
     [not found]                     ` <20190723155623.GA19189@redhat.com>
2019-07-24  8:33                       ` Stefan Hajnoczi
2019-06-19  1:30 ` [virtio-dev] [PATCH v3 0/2] virtio-fs: add virtio file system device Michael S. Tsirkin
2019-06-24 12:23   ` Stefan Hajnoczi
2019-06-24 13:57     ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.