All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RESEND] virtio-pmem: PMEM device spec
@ 2021-07-28 15:04 Pankaj Gupta
  2021-07-30 11:53 ` [virtio-dev] " Cornelia Huck
  2021-08-04 11:07 ` [virtio-dev] " Stefan Hajnoczi
  0 siblings, 2 replies; 17+ messages in thread
From: Pankaj Gupta @ 2021-07-28 15:04 UTC (permalink / raw)
  To: virtio-dev
  Cc: dan.j.williams, david, mst, cohuck, tstark, pankaj.gupta, Pankaj Gupta

Posting virtio specification for virtio pmem device. Virtio pmem is a
paravirtualized device which allows the guest to bypass page cache.
Virtio pmem kernel driver is merged in Upstream Kernel 5.3. Also, Qemu
device is merged in Qemu 4.1.

Signed-off-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
---
Sorry, It took me long time to get back on this. There is
an enhancement to this spec by "Taylor Stark" CCed in the list.
Request for feedback and merging. 

RFC is posted here [1]
[1] https://lists.oasis-open.org/archives/virtio-dev/201903/msg00083.html

 conformance.tex |  19 ++++++-
 content.tex     |   1 +
 virtio-pmem.tex | 132 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 150 insertions(+), 2 deletions(-)
 create mode 100644 virtio-pmem.tex

diff --git a/conformance.tex b/conformance.tex
index 94d7a06..818ddda 100644
--- a/conformance.tex
+++ b/conformance.tex
@@ -31,7 +31,8 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
 \ref{sec:Conformance / Driver Conformance / Sound Driver Conformance},
 \ref{sec:Conformance / Driver Conformance / Memory Driver Conformance},
 \ref{sec:Conformance / Driver Conformance / I2C Adapter Driver Conformance} or
-\ref{sec:Conformance / Driver Conformance / SCMI Driver Conformance}.
+\ref{sec:Conformance / Driver Conformance / SCMI Driver Conformance},
+\ref{sec:Conformance / Driver Conformance / PMEM Driver Conformance}.
 
     \item Clause \ref{sec:Conformance / Legacy Interface: Transitional Device and Transitional Driver Conformance}.
   \end{itemize}
@@ -55,7 +56,8 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
 \ref{sec:Conformance / Device Conformance / Sound Device Conformance},
 \ref{sec:Conformance / Device Conformance / Memory Device Conformance},
 \ref{sec:Conformance / Device Conformance / I2C Adapter Device Conformance} or
-\ref{sec:Conformance / Device Conformance / SCMI Device Conformance}.
+\ref{sec:Conformance / Device Conformance / SCMI Device Conformance},
+\ref{sec:Conformance / Device Conformance / PMEM Driver Conformance}.
 
     \item Clause \ref{sec:Conformance / Legacy Interface: Transitional Device and Transitional Driver Conformance}.
   \end{itemize}
@@ -301,6 +303,19 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
 \item \ref{drivernormative:Device Types / SCMI Device / Device Operation / Setting Up eventq Buffers}
 \end{itemize}
 
+\conformance{\subsection}{PMEM Driver Conformance}\label{sec:Conformance / Driver Conformance / PMEM Driver Conformance}
+
+A PMEM driver MUST conform to the following normative statements:
+
+\begin{itemize}
+\item \ref{devicenormative:Device Types / PMEM Device / Device Initialization}
+\item \ref{drivernormative:Device Types / PMEM Driver / Driver Initialization / Direct access}
+\item \ref{drivernormative:Device Types / PMEM Driver / Driver Initialization / Virtio flush}
+\item \ref{drivernormative:Device Types / PMEM Driver / Driver Operation / Virtqueue command}
+\item \ref{devicenormative:Device Types / PMEM Device / Device Operation / Virtqueue flush}
+\item \ref{devicenormative:Device Types / PMEM Device / Device Operation / Virtqueue return}
+\end{itemize}
+
 \conformance{\section}{Device Conformance}\label{sec:Conformance / Device Conformance}
 
 A device MUST conform to the following normative statements:
diff --git a/content.tex b/content.tex
index ceb2562..6acc785 100644
--- a/content.tex
+++ b/content.tex
@@ -6583,6 +6583,7 @@ \subsubsection{Legacy Interface: Framing Requirements}\label{sec:Device
 \input{virtio-mem.tex}
 \input{virtio-i2c.tex}
 \input{virtio-scmi.tex}
+\input{virtio-pmem.tex}
 
 \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
 
diff --git a/virtio-pmem.tex b/virtio-pmem.tex
new file mode 100644
index 0000000..a2b888e
--- /dev/null
+++ b/virtio-pmem.tex
@@ -0,0 +1,132 @@
+\section{PMEM Device}\label{sec:Device Types / PMEM Device}
+
+The virtio pmem is a fake persistent memory (NVDIMM) device
+used to bypass the guest page cache and provide a virtio
+based asynchronous flush mechanism.This avoids the need
+of a separate page cache in guest and keeps page cache only
+in the host. Under memory pressure, the host makes use of
+effecient memory reclaim decisions for page cache pages
+of all the guests. This helps to reduce the memory footprint
+and fit more guests in the host system.
+
+\subsection{Device ID}\label{sec:Device Types / PMEM Device / Device ID}
+  27
+
+\subsection{Virtqueues}\label{sec:Device Types / PMEM Device / Virtqueues}
+\begin{description}
+\item[0] req_vq
+\end{description}
+
+\subsection{Feature bits}\label{sec:Device Types / PMEM Device / Feature bits}
+
+There are currently no feature bits defined for this device.
+
+\subsection{Device configuration layout}\label{sec:Device Types / PMEM Device / Device configuration layout}
+
+\begin{lstlisting}
+struct virtio_pmem_config {
+	le64 start;
+	le64 size;
+};
+\end{lstlisting}
+
+\begin{description}
+\item[\field{start}] contains the start address from the guest physical address range
+to be hotplugged into the guest address space using the pmem API.
+
+\item[\field{size}] contains the length of this address range.
+\end{description}
+
+\subsection{Device Initialization}\label{sec:Device Types / PMEM Device / Device Initialization}
+
+Device hotplugs physical memory to guest address space. Persistent memory device
+is emulated with file backed memory at host side.
+
+\begin{enumerate}
+\item Guest vpmem start is read from \field{start}.
+\item Guest vpmem end is read from \field{size}.
+\end{enumerate}
+
+\devicenormative{\subsubsection}{Device Initialization}{Device Types / PMEM Device / Device Initialization}
+
+File backed memory MUST be memory mapped to guest address space with SHARED
+memory mapping.
+
+\subsection{Driver Initialization}\label{sec:Device Types / PMEM Driver / Driver Initialization}
+
+Driver hotplugs the physical memory and registers associated region with the pmem API.
+Also, configures a flush callback function with the corresponding region.
+
+\drivernormative{\subsubsection}{Driver Initialization: Filesystem direct access}{Device Types / PMEM Driver / Driver Initialization / Direct access}
+
+Driver MUST enable filesystem direct access operations for read/write on the device.
+
+\drivernormative{\subsubsection}{Driver Initialization: Virtio flush}{Device Types / PMEM Driver / Driver Initialization / Virtio flush}
+
+Driver MUST implement a virtio based flush callback.
+
+Driver MUST disable other FLUSH/SYNC mechanisms for the device when virtio flush is configured.
+
+\subsection{Driver Operations}\label{sec:Device Types / PMEM Driver / Driver Operation}
+\drivernormative{\subsubsection}{Driver Operation: Virtqueue command}{Device Types / PMEM Driver / Driver Operation / Virtqueue command}
+
+Driver MUST send VIRTIO_FLUSH command on request virtqueue, allows guest userspace process to perform IO operations asynchronously.
+
+Driver SHOULD handle multiple fsync requests on files present on the device.
+
+\subsection{Device Operations}\label{sec:Device Types / PMEM Driver / Device Operation}
+
+\devicenormative{\subsubsection}{Device Operations}{Device Types / PMEM Device / Device Operation / Virtqueue flush}
+
+Device SHOULD handle multiple flush requests simultaneously using host filesystem fsync or flush call.
+
+\devicenormative{\subsubsection}{Device operations}{Device Types / PMEM Device / Device Operation / Virtqueue return}
+
+Device MUST return integer "0" for success and "-1" for failure.
+These errors are converted to corresponding error codes by guest
+as per architecture.
+
+\subsection{Possible security implications}\label{sec:Device Types / PMEM Device / Possible Security Implications}
+
+There could be potential security implications depending on how
+memory mapped host backing file is used. By default device emulation
+is done with SHARED mapping. There is a contract between guest and host
+process to access same backing file for read/write operations.
+
+If a malicious guest or host userspace map the same backing file,
+attacking process can make use of known cache side channel attacks
+to predict the current state of shared page cache page. If both
+attacker and victim somehow execute same shared code after a
+flush/evict call, with difference in execution timing attacker
+could infer another guest local data or host data. Though this is
+not easy and same challenges exist as with bare metal host system
+when userspace share same backing file.
+
+\subsection{Countermeasures}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures}
+
+\subsubsection{ With SHARED mapping}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / SHARED}
+
+If device backing backing file is shared with multiple guests or host
+processes, this may act as a metric for page cache side channel attack.
+As a counter measure every guest should have its own(not shared with
+another guest) SHARED backing file and gets populated a per host process
+page cache pages.
+
+\subsubsection{ With PRIVATE mapping}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / PRIVATE}
+There maybe be chances of side channels attack with PRIVATE
+memory mapping similar to SHARED with read-only shared mappings.
+PRIVATE is not used for virtio pmem making this usecase
+irrelevant.
+
+\subsubsection{ Workload specific mapping}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / Workload}
+For SHARED mapping, if workload is single application inside
+guest and there is no risk with sharing of data between guests.
+Guest sharing same backing file with SHARED mapping can be
+used as a valid configuration.
+
+\subsubsection{ Prevent cache eviction}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / Cache eviction}
+Don't allow cache evict from guest filesystem trim/discard command
+with virtio pmem. This rules out any possibility of evict-reload
+page cache side channel attacks if backing disk is shared(SHARED)
+with mutliple guests. Though if we use per device backing file with
+shared mapping this countermeasure is not required.
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [virtio-dev] Re: [PATCH RESEND] virtio-pmem: PMEM device spec
  2021-07-28 15:04 [PATCH RESEND] virtio-pmem: PMEM device spec Pankaj Gupta
@ 2021-07-30 11:53 ` Cornelia Huck
  2021-07-30 12:25   ` Pankaj Gupta
  2021-08-04 11:07 ` [virtio-dev] " Stefan Hajnoczi
  1 sibling, 1 reply; 17+ messages in thread
From: Cornelia Huck @ 2021-07-30 11:53 UTC (permalink / raw)
  To: Pankaj Gupta, virtio-dev
  Cc: dan.j.williams, david, mst, tstark, pankaj.gupta, Pankaj Gupta

On Wed, Jul 28 2021, Pankaj Gupta <pankaj.gupta.linux@gmail.com> wrote:

> Posting virtio specification for virtio pmem device. Virtio pmem is a
> paravirtualized device which allows the guest to bypass page cache.
> Virtio pmem kernel driver is merged in Upstream Kernel 5.3. Also, Qemu
> device is merged in Qemu 4.1.
>
> Signed-off-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> ---
> Sorry, It took me long time to get back on this. There is
> an enhancement to this spec by "Taylor Stark" CCed in the list.
> Request for feedback and merging.

Thank you for following up on this.

>
> RFC is posted here [1]
> [1] https://lists.oasis-open.org/archives/virtio-dev/201903/msg00083.html
>
>  conformance.tex |  19 ++++++-
>  content.tex     |   1 +
>  virtio-pmem.tex | 132 ++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 150 insertions(+), 2 deletions(-)
>  create mode 100644 virtio-pmem.tex

(...)

> diff --git a/virtio-pmem.tex b/virtio-pmem.tex
> new file mode 100644
> index 0000000..a2b888e
> --- /dev/null
> +++ b/virtio-pmem.tex
> @@ -0,0 +1,132 @@
> +\section{PMEM Device}\label{sec:Device Types / PMEM Device}
> +
> +The virtio pmem is a fake persistent memory (NVDIMM) device

"The virtio pmem device"?

> +used to bypass the guest page cache and provide a virtio
> +based asynchronous flush mechanism.This avoids the need

missing space after '.'

> +of a separate page cache in guest and keeps page cache only

s/guest/the guest/
s/page cache/the page cache/

> +in the host. Under memory pressure, the host makes use of

"can make use", or maybe "is enabled to make use"?

> +effecient memory reclaim decisions for page cache pages
> +of all the guests. This helps to reduce the memory footprint
> +and fit more guests in the host system.
> +
> +\subsection{Device ID}\label{sec:Device Types / PMEM Device / Device ID}
> +  27
> +
> +\subsection{Virtqueues}\label{sec:Device Types / PMEM Device / Virtqueues}
> +\begin{description}
> +\item[0] req_vq
> +\end{description}
> +
> +\subsection{Feature bits}\label{sec:Device Types / PMEM Device / Feature bits}
> +
> +There are currently no feature bits defined for this device.
> +
> +\subsection{Device configuration layout}\label{sec:Device Types / PMEM Device / Device configuration layout}
> +
> +\begin{lstlisting}
> +struct virtio_pmem_config {
> +	le64 start;
> +	le64 size;
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{start}] contains the start address from the guest physical address range
> +to be hotplugged into the guest address space using the pmem API.
> +
> +\item[\field{size}] contains the length of this address range.
> +\end{description}
> +
> +\subsection{Device Initialization}\label{sec:Device Types / PMEM Device / Device Initialization}
> +
> +Device hotplugs physical memory to guest address space. Persistent memory device

s/Device/The device/
s/Persistent memory device/The persistent memory device/

> +is emulated with file backed memory at host side.

"on the host side"?

> +
> +\begin{enumerate}
> +\item Guest vpmem start is read from \field{start}.
> +\item Guest vpmem end is read from \field{size}.
> +\end{enumerate}
> +
> +\devicenormative{\subsubsection}{Device Initialization}{Device Types / PMEM Device / Device Initialization}
> +
> +File backed memory MUST be memory mapped to guest address space with SHARED
> +memory mapping.

Is 'SHARED' generic enough? Probably yes.

(Similar for the other terms like 'page cache' -- I think we can assume
similar concepts for most operating systems?)

> +
> +\subsection{Driver Initialization}\label{sec:Device Types / PMEM Driver / Driver Initialization}
> +
> +Driver hotplugs the physical memory and registers associated region with the pmem API.

s/Driver/The driver/
s/associated region/the associated region/ ?

> +Also, configures a flush callback function with the corresponding region.

Not sure if that is too specific already... maybe something like "Also,
it configures a notification for when the corresponding region is flushed."?

> +
> +\drivernormative{\subsubsection}{Driver Initialization: Filesystem direct access}{Device Types / PMEM Driver / Driver Initialization / Direct access}
> +
> +Driver MUST enable filesystem direct access operations for read/write on the device.

s/Driver/The driver/

Not sure whether this is operating system agnostic enough... does anyone
else have a better idea?

> +
> +\drivernormative{\subsubsection}{Driver Initialization: Virtio flush}{Device Types / PMEM Driver / Driver Initialization / Virtio flush}
> +
> +Driver MUST implement a virtio based flush callback.
> +
> +Driver MUST disable other FLUSH/SYNC mechanisms for the device when virtio flush is configured.

s/Driver/The driver/ (x2)

See above for "flush callback". I'm mostly worrying about the wording
being generic enough (even though it's probably obvious enough for
non-Linux people as well.)

> +
> +\subsection{Driver Operations}\label{sec:Device Types / PMEM Driver / Driver Operation}
> +\drivernormative{\subsubsection}{Driver Operation: Virtqueue command}{Device Types / PMEM Driver / Driver Operation / Virtqueue command}
> +
> +Driver MUST send VIRTIO_FLUSH command on request virtqueue, allows guest userspace process to perform IO operations asynchronously.

s/Driver/The driver/

I don't think we should refer to "guest userspace" in the spec; can we
reword this?

> +
> +Driver SHOULD handle multiple fsync requests on files present on the device.

s/Driver/The driver/

Again, a bit unsure on whether this is generic enough.

> +
> +\subsection{Device Operations}\label{sec:Device Types / PMEM Driver / Device Operation}
> +
> +\devicenormative{\subsubsection}{Device Operations}{Device Types / PMEM Device / Device Operation / Virtqueue flush}
> +
> +Device SHOULD handle multiple flush requests simultaneously using host filesystem fsync or flush call.

s/Device/The device/

> +
> +\devicenormative{\subsubsection}{Device operations}{Device Types / PMEM Device / Device Operation / Virtqueue return}
> +
> +Device MUST return integer "0" for success and "-1" for failure.

s/Device/The device/

> +These errors are converted to corresponding error codes by guest
> +as per architecture.

I don't think you need to specify what the guest will actually do with
the errors, that's entirely driver-dependent.

> +
> +\subsection{Possible security implications}\label{sec:Device Types / PMEM Device / Possible Security Implications}
> +
> +There could be potential security implications depending on how
> +memory mapped host backing file is used. By default device emulation
> +is done with SHARED mapping. There is a contract between guest and host
> +process to access same backing file for read/write operations.
> +
> +If a malicious guest or host userspace map the same backing file,
> +attacking process can make use of known cache side channel attacks
> +to predict the current state of shared page cache page. If both
> +attacker and victim somehow execute same shared code after a
> +flush/evict call, with difference in execution timing attacker
> +could infer another guest local data or host data. Though this is
> +not easy and same challenges exist as with bare metal host system
> +when userspace share same backing file.
> +
> +\subsection{Countermeasures}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures}
> +
> +\subsubsection{ With SHARED mapping}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / SHARED}
> +
> +If device backing backing file is shared with multiple guests or host
> +processes, this may act as a metric for page cache side channel attack.
> +As a counter measure every guest should have its own(not shared with
> +another guest) SHARED backing file and gets populated a per host process
> +page cache pages.
> +
> +\subsubsection{ With PRIVATE mapping}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / PRIVATE}
> +There maybe be chances of side channels attack with PRIVATE
> +memory mapping similar to SHARED with read-only shared mappings.
> +PRIVATE is not used for virtio pmem making this usecase
> +irrelevant.
> +
> +\subsubsection{ Workload specific mapping}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / Workload}
> +For SHARED mapping, if workload is single application inside
> +guest and there is no risk with sharing of data between guests.
> +Guest sharing same backing file with SHARED mapping can be
> +used as a valid configuration.
> +
> +\subsubsection{ Prevent cache eviction}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / Cache eviction}
> +Don't allow cache evict from guest filesystem trim/discard command
> +with virtio pmem. This rules out any possibility of evict-reload
> +page cache side channel attacks if backing disk is shared(SHARED)
> +with mutliple guests. Though if we use per device backing file with
> +shared mapping this countermeasure is not required.

I'll leave review of these to others who are more familiar with this
area.


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND] virtio-pmem: PMEM device spec
  2021-07-30 11:53 ` [virtio-dev] " Cornelia Huck
@ 2021-07-30 12:25   ` Pankaj Gupta
  2021-08-03  8:26     ` [virtio-dev] " Cornelia Huck
  0 siblings, 1 reply; 17+ messages in thread
From: Pankaj Gupta @ 2021-07-30 12:25 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: virtio-dev, Dan Williams, David Hildenbrand, Michael S . Tsirkin,
	Taylor Stark, Pankaj Gupta

Hi Cornelia,

> > Posting virtio specification for virtio pmem device. Virtio pmem is a
> > paravirtualized device which allows the guest to bypass page cache.
> > Virtio pmem kernel driver is merged in Upstream Kernel 5.3. Also, Qemu
> > device is merged in Qemu 4.1.
> >
> > Signed-off-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> > ---
> > Sorry, It took me long time to get back on this. There is
> > an enhancement to this spec by "Taylor Stark" CCed in the list.
> > Request for feedback and merging.
>
> Thank you for following up on this.

Thank you for for the review.
>
> >
> > RFC is posted here [1]
> > [1] https://lists.oasis-open.org/archives/virtio-dev/201903/msg00083.html
> >
> >  conformance.tex |  19 ++++++-
> >  content.tex     |   1 +
> >  virtio-pmem.tex | 132 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 150 insertions(+), 2 deletions(-)
> >  create mode 100644 virtio-pmem.tex
>
> (...)
>
> > diff --git a/virtio-pmem.tex b/virtio-pmem.tex
> > new file mode 100644
> > index 0000000..a2b888e
> > --- /dev/null
> > +++ b/virtio-pmem.tex
> > @@ -0,0 +1,132 @@
> > +\section{PMEM Device}\label{sec:Device Types / PMEM Device}
> > +
> > +The virtio pmem is a fake persistent memory (NVDIMM) device
>
> "The virtio pmem device"?

Sure.

>
> > +used to bypass the guest page cache and provide a virtio
> > +based asynchronous flush mechanism.This avoids the need
>
> missing space after '.'

will fix.

>
> > +of a separate page cache in guest and keeps page cache only
>
> s/guest/the guest/
> s/page cache/the page cache/

will fix both.

>
> > +in the host. Under memory pressure, the host makes use of
>
> "can make use", or maybe "is enabled to make use"?

seems better.

>
> > +effecient memory reclaim decisions for page cache pages

s/effecient/efficient
> > +of all the guests. This helps to reduce the memory footprint
> > +and fit more guests in the host system.
> > +
> > +\subsection{Device ID}\label{sec:Device Types / PMEM Device / Device ID}
> > +  27
> > +
> > +\subsection{Virtqueues}\label{sec:Device Types / PMEM Device / Virtqueues}
> > +\begin{description}
> > +\item[0] req_vq
> > +\end{description}
> > +
> > +\subsection{Feature bits}\label{sec:Device Types / PMEM Device / Feature bits}
> > +
> > +There are currently no feature bits defined for this device.
> > +
> > +\subsection{Device configuration layout}\label{sec:Device Types / PMEM Device / Device configuration layout}
> > +
> > +\begin{lstlisting}
> > +struct virtio_pmem_config {
> > +     le64 start;
> > +     le64 size;
> > +};
> > +\end{lstlisting}
> > +
> > +\begin{description}
> > +\item[\field{start}] contains the start address from the guest physical address range
> > +to be hotplugged into the guest address space using the pmem API.
> > +
> > +\item[\field{size}] contains the length of this address range.
> > +\end{description}
> > +
> > +\subsection{Device Initialization}\label{sec:Device Types / PMEM Device / Device Initialization}
> > +
> > +Device hotplugs physical memory to guest address space. Persistent memory device
>
> s/Device/The device/
> s/Persistent memory device/The persistent memory device/

will fix.

>
> > +is emulated with file backed memory at host side.
>
> "on the host side"?

Sure.

>
> > +
> > +\begin{enumerate}
> > +\item Guest vpmem start is read from \field{start}.
> > +\item Guest vpmem end is read from \field{size}.
> > +\end{enumerate}
> > +
> > +\devicenormative{\subsubsection}{Device Initialization}{Device Types / PMEM Device / Device Initialization}
> > +
> > +File backed memory MUST be memory mapped to guest address space with SHARED
> > +memory mapping.
>
> Is 'SHARED' generic enough? Probably yes.
>
> (Similar for the other terms like 'page cache' -- I think we can assume
> similar concepts for most operating systems?)

Yes, SHRED seems generic to me.

'page cache' can be changed to 'host cache', but not sure

>
> > +
> > +\subsection{Driver Initialization}\label{sec:Device Types / PMEM Driver / Driver Initialization}
> > +
> > +Driver hotplugs the physical memory and registers associated region with the pmem API.
>
> s/Driver/The driver/
> s/associated region/the associated region/ ?

o.k

>
> > +Also, configures a flush callback function with the corresponding region.
>
> Not sure if that is too specific already... maybe something like "Also,
> it configures a notification for when the corresponding region is flushed."?

Maybe will remove this line altogether as it is implementation details?

>
> > +
> > +\drivernormative{\subsubsection}{Driver Initialization: Filesystem direct access}{Device Types / PMEM Driver / Driver Initialization / Direct access}
> > +
> > +Driver MUST enable filesystem direct access operations for read/write on the device.
>
> s/Driver/The driver/

o.k
>
> Not sure whether this is operating system agnostic enough... does anyone
> else have a better idea?
>
> > +
> > +\drivernormative{\subsubsection}{Driver Initialization: Virtio flush}{Device Types / PMEM Driver / Driver Initialization / Virtio flush}
> > +
> > +Driver MUST implement a virtio based flush callback.
> > +
> > +Driver MUST disable other FLUSH/SYNC mechanisms for the device when virtio flush is configured.
>
> s/Driver/The driver/ (x2)

o.k

>
> See above for "flush callback". I'm mostly worrying about the wording
> being generic enough (even though it's probably obvious enough for
> non-Linux people as well.)

yes, Something below is better?

The driver MUST not enable any explicit FLUSH on the file memory
mapped from the Virtio pmem device

>
> > +
> > +\subsection{Driver Operations}\label{sec:Device Types / PMEM Driver / Driver Operation}
> > +\drivernormative{\subsubsection}{Driver Operation: Virtqueue command}{Device Types / PMEM Driver / Driver Operation / Virtqueue command}
> > +
> > +Driver MUST send VIRTIO_FLUSH command on request virtqueue, allows guest userspace process to perform IO operations asynchronously.
>
> s/Driver/The driver/
>
> I don't think we should refer to "guest userspace" in the spec; can we
> reword this?

Sure

Driver MUST send VIRTIO_FLUSH command on request virtqueue, thus
allows asynchronous FLUSH operation on the files present in Virtio
pmem device.

>
> > +
> > +Driver SHOULD handle multiple fsync requests on files present on the device.
>
> s/Driver/The driver/

o.k

>
> Again, a bit unsure on whether this is generic enough.

Driver SHOULD handle multiple FLUSH requests on the files present on
the Virtio pmem device.

>
> > +
> > +\subsection{Device Operations}\label{sec:Device Types / PMEM Driver / Device Operation}
> > +
> > +\devicenormative{\subsubsection}{Device Operations}{Device Types / PMEM Device / Device Operation / Virtqueue flush}
> > +
> > +Device SHOULD handle multiple flush requests simultaneously using host filesystem fsync or flush call.
>
> s/Device/The device/

o.k

>
> > +
> > +\devicenormative{\subsubsection}{Device operations}{Device Types / PMEM Device / Device Operation / Virtqueue return}
> > +
> > +Device MUST return integer "0" for success and "-1" for failure.
>
> s/Device/The device/

o.k

>
> > +These errors are converted to corresponding error codes by guest
> > +as per architecture.
>
> I don't think you need to specify what the guest will actually do with
> the errors, that's entirely driver-dependent.

Sure, will remove it.

>
> > +
> > +\subsection{Possible security implications}\label{sec:Device Types / PMEM Device / Possible Security Implications}
> > +
> > +There could be potential security implications depending on how
> > +memory mapped host backing file is used. By default device emulation
> > +is done with SHARED mapping. There is a contract between guest and host
> > +process to access same backing file for read/write operations.
> > +
> > +If a malicious guest or host userspace map the same backing file,
> > +attacking process can make use of known cache side channel attacks
> > +to predict the current state of shared page cache page. If both
> > +attacker and victim somehow execute same shared code after a
> > +flush/evict call, with difference in execution timing attacker
> > +could infer another guest local data or host data. Though this is
> > +not easy and same challenges exist as with bare metal host system
> > +when userspace share same backing file.
> > +
> > +\subsection{Countermeasures}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures}
> > +
> > +\subsubsection{ With SHARED mapping}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / SHARED}
> > +
> > +If device backing backing file is shared with multiple guests or host
> > +processes, this may act as a metric for page cache side channel attack.
> > +As a counter measure every guest should have its own(not shared with
> > +another guest) SHARED backing file and gets populated a per host process
> > +page cache pages.
> > +
> > +\subsubsection{ With PRIVATE mapping}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / PRIVATE}
> > +There maybe be chances of side channels attack with PRIVATE
> > +memory mapping similar to SHARED with read-only shared mappings.
> > +PRIVATE is not used for virtio pmem making this usecase
> > +irrelevant.
> > +
> > +\subsubsection{ Workload specific mapping}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / Workload}
> > +For SHARED mapping, if workload is single application inside
> > +guest and there is no risk with sharing of data between guests.
> > +Guest sharing same backing file with SHARED mapping can be
> > +used as a valid configuration.
> > +
> > +\subsubsection{ Prevent cache eviction}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / Cache eviction}
> > +Don't allow cache evict from guest filesystem trim/discard command
> > +with virtio pmem. This rules out any possibility of evict-reload
> > +page cache side channel attacks if backing disk is shared(SHARED)
> > +with mutliple guests. Though if we use per device backing file with
> > +shared mapping this countermeasure is not required.
>
> I'll leave review of these to others who are more familiar with this
> area.

o.k.

Thanks you very much!

Best regards,
Pankaj
>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [virtio-dev] Re: [PATCH RESEND] virtio-pmem: PMEM device spec
  2021-07-30 12:25   ` Pankaj Gupta
@ 2021-08-03  8:26     ` Cornelia Huck
  2021-08-03  8:50       ` David Hildenbrand
  2021-08-03  9:16       ` Pankaj Gupta
  0 siblings, 2 replies; 17+ messages in thread
From: Cornelia Huck @ 2021-08-03  8:26 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: virtio-dev, Dan Williams, David Hildenbrand, Michael S . Tsirkin,
	Taylor Stark, Pankaj Gupta

On Fri, Jul 30 2021, Pankaj Gupta <pankaj.gupta.linux@gmail.com> wrote:

>> > +Also, configures a flush callback function with the corresponding region.
>>
>> Not sure if that is too specific already... maybe something like "Also,
>> it configures a notification for when the corresponding region is flushed."?
>
> Maybe will remove this line altogether as it is implementation
> details?

Maybe... I think the point is to configure _something_, not sure if we
can really generalize that. Other ideas welcome.

>> See above for "flush callback". I'm mostly worrying about the wording
>> being generic enough (even though it's probably obvious enough for
>> non-Linux people as well.)
>
> yes, Something below is better?
>
> The driver MUST not enable any explicit FLUSH on the file memory
> mapped from the Virtio pmem device

Hm, not sure. Would like to see feedback from others that had worked in
this area.

>
>>
>> > +
>> > +\subsection{Driver Operations}\label{sec:Device Types / PMEM Driver / Driver Operation}
>> > +\drivernormative{\subsubsection}{Driver Operation: Virtqueue command}{Device Types / PMEM Driver / Driver Operation / Virtqueue command}
>> > +
>> > +Driver MUST send VIRTIO_FLUSH command on request virtqueue, allows guest userspace process to perform IO operations asynchronously.
>>
>> s/Driver/The driver/
>>
>> I don't think we should refer to "guest userspace" in the spec; can we
>> reword this?
>
> Sure
>
> Driver MUST send VIRTIO_FLUSH command on request virtqueue, thus
> allows asynchronous FLUSH operation on the files present in Virtio
> pmem device.

s/Driver/The driver/
s/allows/allowing/

I'm not sure whether we should refer to 'files'.

Again, feedback from others welcome; this is not really one of my core topics.

>
>>
>> > +
>> > +Driver SHOULD handle multiple fsync requests on files present on the device.
>>
>> s/Driver/The driver/
>
> o.k
>
>>
>> Again, a bit unsure on whether this is generic enough.
>
> Driver SHOULD handle multiple FLUSH requests on the files present on
> the Virtio pmem device.

Same here. I'm afraid this is not easy :(


---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND] virtio-pmem: PMEM device spec
  2021-08-03  8:26     ` [virtio-dev] " Cornelia Huck
@ 2021-08-03  8:50       ` David Hildenbrand
  2021-08-03  9:02         ` Pankaj Gupta
  2021-08-03  9:16       ` Pankaj Gupta
  1 sibling, 1 reply; 17+ messages in thread
From: David Hildenbrand @ 2021-08-03  8:50 UTC (permalink / raw)
  To: Cornelia Huck, Pankaj Gupta
  Cc: virtio-dev, Dan Williams, Michael S . Tsirkin, Taylor Stark,
	Pankaj Gupta

On 03.08.21 10:26, Cornelia Huck wrote:
> On Fri, Jul 30 2021, Pankaj Gupta <pankaj.gupta.linux@gmail.com> wrote:
> 
>>>> +Also, configures a flush callback function with the corresponding region.
>>>
>>> Not sure if that is too specific already... maybe something like "Also,
>>> it configures a notification for when the corresponding region is flushed."?
>>
>> Maybe will remove this line altogether as it is implementation
>> details?
> 
> Maybe... I think the point is to configure _something_, not sure if we
> can really generalize that. Other ideas welcome.
> 
>>> See above for "flush callback". I'm mostly worrying about the wording
>>> being generic enough (even though it's probably obvious enough for
>>> non-Linux people as well.)
>>
>> yes, Something below is better?
>>
>> The driver MUST not enable any explicit FLUSH on the file memory
>> mapped from the Virtio pmem device
> 
> Hm, not sure. Would like to see feedback from others that had worked in
> this area.


I think instead of describing detailed device handling in regard to 
e.g., fsync, we should document what the exact semantics are in POV of 
the driver when issuing a flush, and when it makes sense to issue a 
flush. The device is free to implement that however it likes (fsync, 
whatsoever).

Why do we care about "The driver MUST not enable any explicit FLUSH on 
the file memory mapped from the Virtio pmem device" and what exactly do 
we mean with "explicit FLUSH" here ?

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND] virtio-pmem: PMEM device spec
  2021-08-03  8:50       ` David Hildenbrand
@ 2021-08-03  9:02         ` Pankaj Gupta
  0 siblings, 0 replies; 17+ messages in thread
From: Pankaj Gupta @ 2021-08-03  9:02 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Cornelia Huck, virtio-dev, Dan Williams, Michael S . Tsirkin,
	Taylor Stark, Pankaj Gupta

> >>>> +Also, configures a flush callback function with the corresponding region.
> >>>
> >>> Not sure if that is too specific already... maybe something like "Also,
> >>> it configures a notification for when the corresponding region is flushed."?
> >>
> >> Maybe will remove this line altogether as it is implementation
> >> details?
> >
> > Maybe... I think the point is to configure _something_, not sure if we
> > can really generalize that. Other ideas welcome.
> >
> >>> See above for "flush callback". I'm mostly worrying about the wording
> >>> being generic enough (even though it's probably obvious enough for
> >>> non-Linux people as well.)
> >>
> >> yes, Something below is better?
> >>
> >> The driver MUST not enable any explicit FLUSH on the file memory
> >> mapped from the Virtio pmem device
> >
> > Hm, not sure. Would like to see feedback from others that had worked in
> > this area.
>
>
> I think instead of describing detailed device handling in regard to
> e.g., fsync, we should document what the exact semantics are in POV of
> the driver when issuing a flush, and when it makes sense to issue a
> flush. The device is free to implement that however it likes (fsync,
> whatsoever).
>
> Why do we care about "The driver MUST not enable any explicit FLUSH on
> the file memory mapped from the Virtio pmem device" and what exactly do
> we mean with "explicit FLUSH" here ?

Fair. Will remove this and any other hints towards implementation.
Thanks for having a look.

Best regards,
Pankaj


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND] virtio-pmem: PMEM device spec
  2021-08-03  8:26     ` [virtio-dev] " Cornelia Huck
  2021-08-03  8:50       ` David Hildenbrand
@ 2021-08-03  9:16       ` Pankaj Gupta
  2021-08-03  9:17         ` David Hildenbrand
  1 sibling, 1 reply; 17+ messages in thread
From: Pankaj Gupta @ 2021-08-03  9:16 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: virtio-dev, Dan Williams, David Hildenbrand, Michael S . Tsirkin,
	Taylor Stark, Pankaj Gupta

> >> > +Also, configures a flush callback function with the corresponding region.
> >>
> >> Not sure if that is too specific already... maybe something like "Also,
> >> it configures a notification for when the corresponding region is flushed."?
> >
> > Maybe will remove this line altogether as it is implementation
> > details?
>
> Maybe... I think the point is to configure _something_, not sure if we
> can really generalize that. Other ideas welcome.

Agree.
>
> >> See above for "flush callback". I'm mostly worrying about the wording
> >> being generic enough (even though it's probably obvious enough for
> >> non-Linux people as well.)
> >
> > yes, Something below is better?
> >
> > The driver MUST not enable any explicit FLUSH on the file memory
> > mapped from the Virtio pmem device
>
> Hm, not sure. Would like to see feedback from others that had worked in
> this area.

As suggested by David, will remove this as well. As this seems to be
implementation
details.
>
> >
> >>
> >> > +
> >> > +\subsection{Driver Operations}\label{sec:Device Types / PMEM Driver / Driver Operation}
> >> > +\drivernormative{\subsubsection}{Driver Operation: Virtqueue command}{Device Types / PMEM Driver / Driver Operation / Virtqueue command}
> >> > +
> >> > +Driver MUST send VIRTIO_FLUSH command on request virtqueue, allows guest userspace process to perform IO operations asynchronously.
> >>
> >> s/Driver/The driver/
> >>
> >> I don't think we should refer to "guest userspace" in the spec; can we
> >> reword this?
> >
> > Sure
> >
> > Driver MUST send VIRTIO_FLUSH command on request virtqueue, thus
> > allows asynchronous FLUSH operation on the files present in Virtio
> > pmem device.
>
> s/Driver/The driver/
> s/allows/allowing/

o.k

>
> I'm not sure whether we should refer to 'files'.
>
> Again, feedback from others welcome; this is not really one of my core topics.

Sure. Will recheck.
>
> >
> >>
> >> > +
> >> > +Driver SHOULD handle multiple fsync requests on files present on the device.
> >>
> >> s/Driver/The driver/
> >
> > o.k
> >
> >>
> >> Again, a bit unsure on whether this is generic enough.
> >
> > Driver SHOULD handle multiple FLUSH requests on the files present on
> > the Virtio pmem device.
>
> Same here. I'm afraid this is not easy :(

hmm...

The driver SHOULD handle multiple FLUSH requests.

Thanks,
Pankaj

>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RESEND] virtio-pmem: PMEM device spec
  2021-08-03  9:16       ` Pankaj Gupta
@ 2021-08-03  9:17         ` David Hildenbrand
  0 siblings, 0 replies; 17+ messages in thread
From: David Hildenbrand @ 2021-08-03  9:17 UTC (permalink / raw)
  To: Pankaj Gupta, Cornelia Huck
  Cc: virtio-dev, Dan Williams, Michael S . Tsirkin, Taylor Stark,
	Pankaj Gupta

>>> Driver SHOULD handle multiple FLUSH requests on the files present on
>>> the Virtio pmem device.
>>
>> Same here. I'm afraid this is not easy :(
> 
> hmm...
> 
> The driver SHOULD handle multiple FLUSH requests.

Do you want to say

The driver MUST be able to handle concurrent FLUSH requests.

?

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [virtio-dev] [PATCH RESEND] virtio-pmem: PMEM device spec
  2021-07-28 15:04 [PATCH RESEND] virtio-pmem: PMEM device spec Pankaj Gupta
  2021-07-30 11:53 ` [virtio-dev] " Cornelia Huck
@ 2021-08-04 11:07 ` Stefan Hajnoczi
  2021-08-04 11:11   ` David Hildenbrand
  2021-08-04 11:28   ` Pankaj Gupta
  1 sibling, 2 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2021-08-04 11:07 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: virtio-dev, dan.j.williams, david, mst, cohuck, tstark, pankaj.gupta

[-- Attachment #1: Type: text/plain, Size: 14596 bytes --]

On Wed, Jul 28, 2021 at 05:04:35PM +0200, Pankaj Gupta wrote:
> Posting virtio specification for virtio pmem device. Virtio pmem is a
> paravirtualized device which allows the guest to bypass page cache.
> Virtio pmem kernel driver is merged in Upstream Kernel 5.3. Also, Qemu
> device is merged in Qemu 4.1.
> 
> Signed-off-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> ---
> Sorry, It took me long time to get back on this. There is
> an enhancement to this spec by "Taylor Stark" CCed in the list.
> Request for feedback and merging. 
> 
> RFC is posted here [1]
> [1] https://lists.oasis-open.org/archives/virtio-dev/201903/msg00083.html

I skimmed through the review comments but pretty much reviewed this
patch from scratch. Feel free to ignore questions that others have
already raised.

> 
>  conformance.tex |  19 ++++++-
>  content.tex     |   1 +
>  virtio-pmem.tex | 132 ++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 150 insertions(+), 2 deletions(-)
>  create mode 100644 virtio-pmem.tex
> 
> diff --git a/conformance.tex b/conformance.tex
> index 94d7a06..818ddda 100644
> --- a/conformance.tex
> +++ b/conformance.tex
> @@ -31,7 +31,8 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>  \ref{sec:Conformance / Driver Conformance / Sound Driver Conformance},
>  \ref{sec:Conformance / Driver Conformance / Memory Driver Conformance},
>  \ref{sec:Conformance / Driver Conformance / I2C Adapter Driver Conformance} or
> -\ref{sec:Conformance / Driver Conformance / SCMI Driver Conformance}.
> +\ref{sec:Conformance / Driver Conformance / SCMI Driver Conformance},
> +\ref{sec:Conformance / Driver Conformance / PMEM Driver Conformance}.
>  
>      \item Clause \ref{sec:Conformance / Legacy Interface: Transitional Device and Transitional Driver Conformance}.
>    \end{itemize}
> @@ -55,7 +56,8 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>  \ref{sec:Conformance / Device Conformance / Sound Device Conformance},
>  \ref{sec:Conformance / Device Conformance / Memory Device Conformance},
>  \ref{sec:Conformance / Device Conformance / I2C Adapter Device Conformance} or
> -\ref{sec:Conformance / Device Conformance / SCMI Device Conformance}.
> +\ref{sec:Conformance / Device Conformance / SCMI Device Conformance},
> +\ref{sec:Conformance / Device Conformance / PMEM Driver Conformance}.
>  
>      \item Clause \ref{sec:Conformance / Legacy Interface: Transitional Device and Transitional Driver Conformance}.
>    \end{itemize}
> @@ -301,6 +303,19 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>  \item \ref{drivernormative:Device Types / SCMI Device / Device Operation / Setting Up eventq Buffers}
>  \end{itemize}
>  
> +\conformance{\subsection}{PMEM Driver Conformance}\label{sec:Conformance / Driver Conformance / PMEM Driver Conformance}
> +
> +A PMEM driver MUST conform to the following normative statements:
> +
> +\begin{itemize}
> +\item \ref{devicenormative:Device Types / PMEM Device / Device Initialization}
> +\item \ref{drivernormative:Device Types / PMEM Driver / Driver Initialization / Direct access}
> +\item \ref{drivernormative:Device Types / PMEM Driver / Driver Initialization / Virtio flush}
> +\item \ref{drivernormative:Device Types / PMEM Driver / Driver Operation / Virtqueue command}
> +\item \ref{devicenormative:Device Types / PMEM Device / Device Operation / Virtqueue flush}
> +\item \ref{devicenormative:Device Types / PMEM Device / Device Operation / Virtqueue return}
> +\end{itemize}
> +
>  \conformance{\section}{Device Conformance}\label{sec:Conformance / Device Conformance}
>  
>  A device MUST conform to the following normative statements:
> diff --git a/content.tex b/content.tex
> index ceb2562..6acc785 100644
> --- a/content.tex
> +++ b/content.tex
> @@ -6583,6 +6583,7 @@ \subsubsection{Legacy Interface: Framing Requirements}\label{sec:Device
>  \input{virtio-mem.tex}
>  \input{virtio-i2c.tex}
>  \input{virtio-scmi.tex}
> +\input{virtio-pmem.tex}
>  
>  \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>  
> diff --git a/virtio-pmem.tex b/virtio-pmem.tex
> new file mode 100644
> index 0000000..a2b888e
> --- /dev/null
> +++ b/virtio-pmem.tex
> @@ -0,0 +1,132 @@
> +\section{PMEM Device}\label{sec:Device Types / PMEM Device}
> +
> +The virtio pmem is a fake persistent memory (NVDIMM) device

s/fake/virtual/ or drop "fake"? If the device persists data correctly
then it's not fake.

> +used to bypass the guest page cache and provide a virtio
> +based asynchronous flush mechanism.This avoids the need
> +of a separate page cache in guest and keeps page cache only
> +in the host. Under memory pressure, the host makes use of
> +effecient memory reclaim decisions for page cache pages

s/effecient/efficient/

> +of all the guests. This helps to reduce the memory footprint
> +and fit more guests in the host system.

This explains the motivation for the device. It would also be nice to
explain the nature of the device:

  The virtio pmem device provides access to byte-addressable persistent
  memory. The persist memory is directly accessible as a Shared Memory
  Region. Data written to this memory is made persistent by separately
  sending a flush command. Writes that have been flushed are preserved
  across device reset and power failure.

> +
> +\subsection{Device ID}\label{sec:Device Types / PMEM Device / Device ID}
> +  27
> +
> +\subsection{Virtqueues}\label{sec:Device Types / PMEM Device / Virtqueues}
> +\begin{description}
> +\item[0] req_vq
> +\end{description}
> +
> +\subsection{Feature bits}\label{sec:Device Types / PMEM Device / Feature bits}
> +
> +There are currently no feature bits defined for this device.
> +
> +\subsection{Device configuration layout}\label{sec:Device Types / PMEM Device / Device configuration layout}
> +
> +\begin{lstlisting}
> +struct virtio_pmem_config {
> +	le64 start;
> +	le64 size;
> +};
> +\end{lstlisting}
> +
> +\begin{description}
> +\item[\field{start}] contains the start address from the guest physical address range
> +to be hotplugged into the guest address space using the pmem API.
> +
> +\item[\field{size}] contains the length of this address range.
> +\end{description}

Please use a Shared Memory Region instead. This is a relatively new
addition to the VIRTIO device model that seems like a good fit for this
device:
https://github.com/oasis-tcs/virtio-spec/blob/master/shared-mem.tex

> +
> +\subsection{Device Initialization}\label{sec:Device Types / PMEM Device / Device Initialization}
> +
> +Device hotplugs physical memory to guest address space. Persistent memory device
> +is emulated with file backed memory at host side.

This paragraph can be dropped. The device could be implemented in a
number of ways and it's beyond the scope of the specification.

> +
> +\begin{enumerate}
> +\item Guest vpmem start is read from \field{start}.
> +\item Guest vpmem end is read from \field{size}.
> +\end{enumerate}

Please avoid the terms "host" and "guest". VIRTIO devices can also be
implemented in hardware and used without guests. The spec usually uses
the terms "device" and "driver" instead.

> +
> +\devicenormative{\subsubsection}{Device Initialization}{Device Types / PMEM Device / Device Initialization}
> +
> +File backed memory MUST be memory mapped to guest address space with SHARED
> +memory mapping.

This is a device implementation detail that is beyond the scope of the
specification. The VIRTIO spec is concerned with the driver/device
interface, not with the implementation details of the device.

> +
> +\subsection{Driver Initialization}\label{sec:Device Types / PMEM Driver / Driver Initialization}
> +
> +Driver hotplugs the physical memory and registers associated region with the pmem API.

The pmem API is a Linux kernel implementation detail. Instead you could
say something like:

  The driver determines the start address and size of the persist memory
  region in preparation for reading or writing data.

> +Also, configures a flush callback function with the corresponding region.

"callback function" is a driver implementation detail that's beyond the
scope of the specification. Instead you could say something like:

  The driver initializes req_vq in preparation for making flush
  requests.

> +
> +\drivernormative{\subsubsection}{Driver Initialization: Filesystem direct access}{Device Types / PMEM Driver / Driver Initialization / Direct access}
> +
> +Driver MUST enable filesystem direct access operations for read/write on the device.

This is beyond the scope of the VIRTIO specification because it's a
Linux guest kernel detail.

> +
> +\drivernormative{\subsubsection}{Driver Initialization: Virtio flush}{Device Types / PMEM Driver / Driver Initialization / Virtio flush}
> +
> +Driver MUST implement a virtio based flush callback.

Driver implementation detail.

> +
> +Driver MUST disable other FLUSH/SYNC mechanisms for the device when virtio flush is configured.

What does this mean?

> +
> +\subsection{Driver Operations}\label{sec:Device Types / PMEM Driver / Driver Operation}
> +\drivernormative{\subsubsection}{Driver Operation: Virtqueue command}{Device Types / PMEM Driver / Driver Operation / Virtqueue command}
> +
> +Driver MUST send VIRTIO_FLUSH command on request virtqueue, allows guest userspace process to perform IO operations asynchronously.

VIRTIO_FLUSH has not been defined.

"guest userspace process" is beyond the scope of the driver/device
interface and therefore not relevant to the VIRTIO specification. This
could be rephrased:

  The driver MUST send and wait for the successful completion of a
  VIRTIO_PMEM_FLUSH command on req_vq in order to ensure previously
  written data will persist across device reset and power failure.

> +
> +Driver SHOULD handle multiple fsync requests on files present on the device.

The concept of "files" is beyond the scope of the driver/device
interface. We only deal with the memory region in this specification, so
I think the scenario you're describing is when multiple writes have been
performed and there are several flush commands in flight.

> +
> +\subsection{Device Operations}\label{sec:Device Types / PMEM Driver / Device Operation}
> +
> +\devicenormative{\subsubsection}{Device Operations}{Device Types / PMEM Device / Device Operation / Virtqueue flush}
> +
> +Device SHOULD handle multiple flush requests simultaneously using host filesystem fsync or flush call.

Same thing as above.

Missing:

  The device MUST ensure that all writes made before a flush request
  will persist across device reset and power failure before completing
  the flush request.

> +
> +\devicenormative{\subsubsection}{Device operations}{Device Types / PMEM Device / Device Operation / Virtqueue return}
> +
> +Device MUST return integer "0" for success and "-1" for failure.
> +These errors are converted to corresponding error codes by guest
> +as per architecture.

This sentence about guest error codes is outside the scope of the VIRTIO
specification.

> +
> +\subsection{Possible security implications}\label{sec:Device Types / PMEM Device / Possible Security Implications}
> +
> +There could be potential security implications depending on how
> +memory mapped host backing file is used. By default device emulation
> +is done with SHARED mapping. There is a contract between guest and host
> +process to access same backing file for read/write operations.
> +
> +If a malicious guest or host userspace map the same backing file,
> +attacking process can make use of known cache side channel attacks
> +to predict the current state of shared page cache page. If both
> +attacker and victim somehow execute same shared code after a
> +flush/evict call, with difference in execution timing attacker
> +could infer another guest local data or host data. Though this is
> +not easy and same challenges exist as with bare metal host system
> +when userspace share same backing file.

This is important information but needs to be phrased without referring
to "host"/"host" and only with respect to the driver/device interface
(not the host/guest kernel, applications, etc).

The same applies to the rest of the security points below.

> +
> +\subsection{Countermeasures}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures}
> +
> +\subsubsection{ With SHARED mapping}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / SHARED}
> +
> +If device backing backing file is shared with multiple guests or host
> +processes, this may act as a metric for page cache side channel attack.
> +As a counter measure every guest should have its own(not shared with
> +another guest) SHARED backing file and gets populated a per host process
> +page cache pages.
> +
> +\subsubsection{ With PRIVATE mapping}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / PRIVATE}
> +There maybe be chances of side channels attack with PRIVATE
> +memory mapping similar to SHARED with read-only shared mappings.
> +PRIVATE is not used for virtio pmem making this usecase
> +irrelevant.
> +
> +\subsubsection{ Workload specific mapping}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / Workload}
> +For SHARED mapping, if workload is single application inside
> +guest and there is no risk with sharing of data between guests.
> +Guest sharing same backing file with SHARED mapping can be
> +used as a valid configuration.
> +
> +\subsubsection{ Prevent cache eviction}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / Cache eviction}
> +Don't allow cache evict from guest filesystem trim/discard command
> +with virtio pmem. This rules out any possibility of evict-reload
> +page cache side channel attacks if backing disk is shared(SHARED)
> +with mutliple guests. Though if we use per device backing file with
> +shared mapping this countermeasure is not required.
> -- 
> 2.25.1
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [virtio-dev] [PATCH RESEND] virtio-pmem: PMEM device spec
  2021-08-04 11:07 ` [virtio-dev] " Stefan Hajnoczi
@ 2021-08-04 11:11   ` David Hildenbrand
  2021-08-04 12:33     ` Stefan Hajnoczi
  2021-08-04 11:28   ` Pankaj Gupta
  1 sibling, 1 reply; 17+ messages in thread
From: David Hildenbrand @ 2021-08-04 11:11 UTC (permalink / raw)
  To: Stefan Hajnoczi, Pankaj Gupta
  Cc: virtio-dev, dan.j.williams, mst, cohuck, tstark, pankaj.gupta

On 04.08.21 13:07, Stefan Hajnoczi wrote:
> On Wed, Jul 28, 2021 at 05:04:35PM +0200, Pankaj Gupta wrote:
>> Posting virtio specification for virtio pmem device. Virtio pmem is a
>> paravirtualized device which allows the guest to bypass page cache.
>> Virtio pmem kernel driver is merged in Upstream Kernel 5.3. Also, Qemu
>> device is merged in Qemu 4.1.
>>
>> Signed-off-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
>> ---
>> Sorry, It took me long time to get back on this. There is
>> an enhancement to this spec by "Taylor Stark" CCed in the list.
>> Request for feedback and merging.
>>
>> RFC is posted here [1]
>> [1] https://lists.oasis-open.org/archives/virtio-dev/201903/msg00083.html
> 
> I skimmed through the review comments but pretty much reviewed this
> patch from scratch. Feel free to ignore questions that others have
> already raised.
> 
>>
>>   conformance.tex |  19 ++++++-
>>   content.tex     |   1 +
>>   virtio-pmem.tex | 132 ++++++++++++++++++++++++++++++++++++++++++++++++
>>   3 files changed, 150 insertions(+), 2 deletions(-)
>>   create mode 100644 virtio-pmem.tex
>>
>> diff --git a/conformance.tex b/conformance.tex
>> index 94d7a06..818ddda 100644
>> --- a/conformance.tex
>> +++ b/conformance.tex
>> @@ -31,7 +31,8 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>>   \ref{sec:Conformance / Driver Conformance / Sound Driver Conformance},
>>   \ref{sec:Conformance / Driver Conformance / Memory Driver Conformance},
>>   \ref{sec:Conformance / Driver Conformance / I2C Adapter Driver Conformance} or
>> -\ref{sec:Conformance / Driver Conformance / SCMI Driver Conformance}.
>> +\ref{sec:Conformance / Driver Conformance / SCMI Driver Conformance},
>> +\ref{sec:Conformance / Driver Conformance / PMEM Driver Conformance}.
>>   
>>       \item Clause \ref{sec:Conformance / Legacy Interface: Transitional Device and Transitional Driver Conformance}.
>>     \end{itemize}
>> @@ -55,7 +56,8 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>>   \ref{sec:Conformance / Device Conformance / Sound Device Conformance},
>>   \ref{sec:Conformance / Device Conformance / Memory Device Conformance},
>>   \ref{sec:Conformance / Device Conformance / I2C Adapter Device Conformance} or
>> -\ref{sec:Conformance / Device Conformance / SCMI Device Conformance}.
>> +\ref{sec:Conformance / Device Conformance / SCMI Device Conformance},
>> +\ref{sec:Conformance / Device Conformance / PMEM Driver Conformance}.
>>   
>>       \item Clause \ref{sec:Conformance / Legacy Interface: Transitional Device and Transitional Driver Conformance}.
>>     \end{itemize}
>> @@ -301,6 +303,19 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
>>   \item \ref{drivernormative:Device Types / SCMI Device / Device Operation / Setting Up eventq Buffers}
>>   \end{itemize}
>>   
>> +\conformance{\subsection}{PMEM Driver Conformance}\label{sec:Conformance / Driver Conformance / PMEM Driver Conformance}
>> +
>> +A PMEM driver MUST conform to the following normative statements:
>> +
>> +\begin{itemize}
>> +\item \ref{devicenormative:Device Types / PMEM Device / Device Initialization}
>> +\item \ref{drivernormative:Device Types / PMEM Driver / Driver Initialization / Direct access}
>> +\item \ref{drivernormative:Device Types / PMEM Driver / Driver Initialization / Virtio flush}
>> +\item \ref{drivernormative:Device Types / PMEM Driver / Driver Operation / Virtqueue command}
>> +\item \ref{devicenormative:Device Types / PMEM Device / Device Operation / Virtqueue flush}
>> +\item \ref{devicenormative:Device Types / PMEM Device / Device Operation / Virtqueue return}
>> +\end{itemize}
>> +
>>   \conformance{\section}{Device Conformance}\label{sec:Conformance / Device Conformance}
>>   
>>   A device MUST conform to the following normative statements:
>> diff --git a/content.tex b/content.tex
>> index ceb2562..6acc785 100644
>> --- a/content.tex
>> +++ b/content.tex
>> @@ -6583,6 +6583,7 @@ \subsubsection{Legacy Interface: Framing Requirements}\label{sec:Device
>>   \input{virtio-mem.tex}
>>   \input{virtio-i2c.tex}
>>   \input{virtio-scmi.tex}
>> +\input{virtio-pmem.tex}
>>   
>>   \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
>>   
>> diff --git a/virtio-pmem.tex b/virtio-pmem.tex
>> new file mode 100644
>> index 0000000..a2b888e
>> --- /dev/null
>> +++ b/virtio-pmem.tex
>> @@ -0,0 +1,132 @@
>> +\section{PMEM Device}\label{sec:Device Types / PMEM Device}
>> +
>> +The virtio pmem is a fake persistent memory (NVDIMM) device
> 
> s/fake/virtual/ or drop "fake"? If the device persists data correctly
> then it's not fake.
> 
>> +used to bypass the guest page cache and provide a virtio
>> +based asynchronous flush mechanism.This avoids the need
>> +of a separate page cache in guest and keeps page cache only
>> +in the host. Under memory pressure, the host makes use of
>> +effecient memory reclaim decisions for page cache pages
> 
> s/effecient/efficient/
> 
>> +of all the guests. This helps to reduce the memory footprint
>> +and fit more guests in the host system.
> 
> This explains the motivation for the device. It would also be nice to
> explain the nature of the device:
> 
>    The virtio pmem device provides access to byte-addressable persistent
>    memory. The persist memory is directly accessible as a Shared Memory
>    Region. Data written to this memory is made persistent by separately
>    sending a flush command. Writes that have been flushed are preserved
>    across device reset and power failure.
> 
>> +
>> +\subsection{Device ID}\label{sec:Device Types / PMEM Device / Device ID}
>> +  27
>> +
>> +\subsection{Virtqueues}\label{sec:Device Types / PMEM Device / Virtqueues}
>> +\begin{description}
>> +\item[0] req_vq
>> +\end{description}
>> +
>> +\subsection{Feature bits}\label{sec:Device Types / PMEM Device / Feature bits}
>> +
>> +There are currently no feature bits defined for this device.
>> +
>> +\subsection{Device configuration layout}\label{sec:Device Types / PMEM Device / Device configuration layout}
>> +
>> +\begin{lstlisting}
>> +struct virtio_pmem_config {
>> +	le64 start;
>> +	le64 size;
>> +};
>> +\end{lstlisting}
>> +
>> +\begin{description}
>> +\item[\field{start}] contains the start address from the guest physical address range
>> +to be hotplugged into the guest address space using the pmem API.
>> +
>> +\item[\field{size}] contains the length of this address range.
>> +\end{description}
> 
> Please use a Shared Memory Region instead. This is a relatively new
> addition to the VIRTIO device model that seems like a good fit for this
> device:
> https://github.com/oasis-tcs/virtio-spec/blob/master/shared-mem.tex

Note that virtio-pmem is already upstream in QEMU and in Linux using 
this model. There is a proposal to optionally use shared memory for 
exposing the area, to be unlocked with a feature bit.


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [virtio-dev] [PATCH RESEND] virtio-pmem: PMEM device spec
  2021-08-04 11:07 ` [virtio-dev] " Stefan Hajnoczi
  2021-08-04 11:11   ` David Hildenbrand
@ 2021-08-04 11:28   ` Pankaj Gupta
  2021-08-04 12:36     ` Stefan Hajnoczi
  1 sibling, 1 reply; 17+ messages in thread
From: Pankaj Gupta @ 2021-08-04 11:28 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: virtio-dev, Dan Williams, David Hildenbrand, Michael S . Tsirkin,
	Cornelia Huck, Taylor Stark, Pankaj Gupta

Hi Stefan,

Thank you for reviewing this.

> > Posting virtio specification for virtio pmem device. Virtio pmem is a
> > paravirtualized device which allows the guest to bypass page cache.
> > Virtio pmem kernel driver is merged in Upstream Kernel 5.3. Also, Qemu
> > device is merged in Qemu 4.1.
> >
> > Signed-off-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
> > ---
> > Sorry, It took me long time to get back on this. There is
> > an enhancement to this spec by "Taylor Stark" CCed in the list.
> > Request for feedback and merging.
> >
> > RFC is posted here [1]
> > [1] https://lists.oasis-open.org/archives/virtio-dev/201903/msg00083.html
>
> I skimmed through the review comments but pretty much reviewed this
> patch from scratch. Feel free to ignore questions that others have
> already raised.
>
> >
> >  conformance.tex |  19 ++++++-
> >  content.tex     |   1 +
> >  virtio-pmem.tex | 132 ++++++++++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 150 insertions(+), 2 deletions(-)
> >  create mode 100644 virtio-pmem.tex
> >
> > diff --git a/conformance.tex b/conformance.tex
> > index 94d7a06..818ddda 100644
> > --- a/conformance.tex
> > +++ b/conformance.tex
> > @@ -31,7 +31,8 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
> >  \ref{sec:Conformance / Driver Conformance / Sound Driver Conformance},
> >  \ref{sec:Conformance / Driver Conformance / Memory Driver Conformance},
> >  \ref{sec:Conformance / Driver Conformance / I2C Adapter Driver Conformance} or
> > -\ref{sec:Conformance / Driver Conformance / SCMI Driver Conformance}.
> > +\ref{sec:Conformance / Driver Conformance / SCMI Driver Conformance},
> > +\ref{sec:Conformance / Driver Conformance / PMEM Driver Conformance}.
> >
> >      \item Clause \ref{sec:Conformance / Legacy Interface: Transitional Device and Transitional Driver Conformance}.
> >    \end{itemize}
> > @@ -55,7 +56,8 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
> >  \ref{sec:Conformance / Device Conformance / Sound Device Conformance},
> >  \ref{sec:Conformance / Device Conformance / Memory Device Conformance},
> >  \ref{sec:Conformance / Device Conformance / I2C Adapter Device Conformance} or
> > -\ref{sec:Conformance / Device Conformance / SCMI Device Conformance}.
> > +\ref{sec:Conformance / Device Conformance / SCMI Device Conformance},
> > +\ref{sec:Conformance / Device Conformance / PMEM Driver Conformance}.
> >
> >      \item Clause \ref{sec:Conformance / Legacy Interface: Transitional Device and Transitional Driver Conformance}.
> >    \end{itemize}
> > @@ -301,6 +303,19 @@ \section{Conformance Targets}\label{sec:Conformance / Conformance Targets}
> >  \item \ref{drivernormative:Device Types / SCMI Device / Device Operation / Setting Up eventq Buffers}
> >  \end{itemize}
> >
> > +\conformance{\subsection}{PMEM Driver Conformance}\label{sec:Conformance / Driver Conformance / PMEM Driver Conformance}
> > +
> > +A PMEM driver MUST conform to the following normative statements:
> > +
> > +\begin{itemize}
> > +\item \ref{devicenormative:Device Types / PMEM Device / Device Initialization}
> > +\item \ref{drivernormative:Device Types / PMEM Driver / Driver Initialization / Direct access}
> > +\item \ref{drivernormative:Device Types / PMEM Driver / Driver Initialization / Virtio flush}
> > +\item \ref{drivernormative:Device Types / PMEM Driver / Driver Operation / Virtqueue command}
> > +\item \ref{devicenormative:Device Types / PMEM Device / Device Operation / Virtqueue flush}
> > +\item \ref{devicenormative:Device Types / PMEM Device / Device Operation / Virtqueue return}
> > +\end{itemize}
> > +
> >  \conformance{\section}{Device Conformance}\label{sec:Conformance / Device Conformance}
> >
> >  A device MUST conform to the following normative statements:
> > diff --git a/content.tex b/content.tex
> > index ceb2562..6acc785 100644
> > --- a/content.tex
> > +++ b/content.tex
> > @@ -6583,6 +6583,7 @@ \subsubsection{Legacy Interface: Framing Requirements}\label{sec:Device
> >  \input{virtio-mem.tex}
> >  \input{virtio-i2c.tex}
> >  \input{virtio-scmi.tex}
> > +\input{virtio-pmem.tex}
> >
> >  \chapter{Reserved Feature Bits}\label{sec:Reserved Feature Bits}
> >
> > diff --git a/virtio-pmem.tex b/virtio-pmem.tex
> > new file mode 100644
> > index 0000000..a2b888e
> > --- /dev/null
> > +++ b/virtio-pmem.tex
> > @@ -0,0 +1,132 @@
> > +\section{PMEM Device}\label{sec:Device Types / PMEM Device}
> > +
> > +The virtio pmem is a fake persistent memory (NVDIMM) device
>
> s/fake/virtual/ or drop "fake"? If the device persists data correctly
> then it's not fake.

Yes, Agree.

>
> > +used to bypass the guest page cache and provide a virtio
> > +based asynchronous flush mechanism.This avoids the need
> > +of a separate page cache in guest and keeps page cache only
> > +in the host. Under memory pressure, the host makes use of
> > +effecient memory reclaim decisions for page cache pages
>
> s/effecient/efficient/

noted.
>
> > +of all the guests. This helps to reduce the memory footprint
> > +and fit more guests in the host system.
>
> This explains the motivation for the device. It would also be nice to
> explain the nature of the device:
>
>   The virtio pmem device provides access to byte-addressable persistent
>   memory. The persist memory is directly accessible as a Shared Memory
>   Region. Data written to this memory is made persistent by separately
>   sending a flush command. Writes that have been flushed are preserved
>   across device reset and power failure.

I was not sure we need to add motivation as well in the spec. But yes, will add.
>
> > +
> > +\subsection{Device ID}\label{sec:Device Types / PMEM Device / Device ID}
> > +  27
> > +
> > +\subsection{Virtqueues}\label{sec:Device Types / PMEM Device / Virtqueues}
> > +\begin{description}
> > +\item[0] req_vq
> > +\end{description}
> > +
> > +\subsection{Feature bits}\label{sec:Device Types / PMEM Device / Feature bits}
> > +
> > +There are currently no feature bits defined for this device.
> > +
> > +\subsection{Device configuration layout}\label{sec:Device Types / PMEM Device / Device configuration layout}
> > +
> > +\begin{lstlisting}
> > +struct virtio_pmem_config {
> > +     le64 start;
> > +     le64 size;
> > +};
> > +\end{lstlisting}
> > +
> > +\begin{description}
> > +\item[\field{start}] contains the start address from the guest physical address range
> > +to be hotplugged into the guest address space using the pmem API.
> > +
> > +\item[\field{size}] contains the length of this address range.
> > +\end{description}
>
> Please use a Shared Memory Region instead. This is a relatively new
> addition to the VIRTIO device model that seems like a good fit for this
> device:
> https://github.com/oasis-tcs/virtio-spec/blob/master/shared-mem.tex

David has answered this.
>
> > +
> > +\subsection{Device Initialization}\label{sec:Device Types / PMEM Device / Device Initialization}
> > +
> > +Device hotplugs physical memory to guest address space. Persistent memory device
> > +is emulated with file backed memory at host side.
>
> This paragraph can be dropped. The device could be implemented in a
> number of ways and it's beyond the scope of the specification.

But spec is for current implementation?  As we have more devices or ways, we
can amend this.

>
> > +
> > +\begin{enumerate}
> > +\item Guest vpmem start is read from \field{start}.
> > +\item Guest vpmem end is read from \field{size}.
> > +\end{enumerate}
>
> Please avoid the terms "host" and "guest". VIRTIO devices can also be
> implemented in hardware and used without guests. The spec usually uses
> the terms "device" and "driver" instead.

Agree.

>
> > +
> > +\devicenormative{\subsubsection}{Device Initialization}{Device Types / PMEM Device / Device Initialization}
> > +
> > +File backed memory MUST be memory mapped to guest address space with SHARED
> > +memory mapping.
>
> This is a device implementation detail that is beyond the scope of the
> specification. The VIRTIO spec is concerned with the driver/device
> interface, not with the implementation details of the device.

Already noted this.
>
> > +
> > +\subsection{Driver Initialization}\label{sec:Device Types / PMEM Driver / Driver Initialization}
> > +
> > +Driver hotplugs the physical memory and registers associated region with the pmem API.
>
> The pmem API is a Linux kernel implementation detail. Instead you could
> say something like:
>
>   The driver determines the start address and size of the persist memory
>   region in preparation for reading or writing data.

Sure.

>
> > +Also, configures a flush callback function with the corresponding region.
>
> "callback function" is a driver implementation detail that's beyond the
> scope of the specification. Instead you could say something like:
>
>   The driver initializes req_vq in preparation for making flush
>   requests.

Sure.

>
> > +
> > +\drivernormative{\subsubsection}{Driver Initialization: Filesystem direct access}{Device Types / PMEM Driver / Driver Initialization / Direct access}
> > +
> > +Driver MUST enable filesystem direct access operations for read/write on the device.
>
> This is beyond the scope of the VIRTIO specification because it's a
> Linux guest kernel detail.

noted.
>
> > +
> > +\drivernormative{\subsubsection}{Driver Initialization: Virtio flush}{Device Types / PMEM Driver / Driver Initialization / Virtio flush}
> > +
> > +Driver MUST implement a virtio based flush callback.
>
> Driver implementation detail.
>
> > +
> > +Driver MUST disable other FLUSH/SYNC mechanisms for the device when virtio flush is configured.
>
> What does this mean?

This will be removed as per earlier feedback.
>
> > +
> > +\subsection{Driver Operations}\label{sec:Device Types / PMEM Driver / Driver Operation}
> > +\drivernormative{\subsubsection}{Driver Operation: Virtqueue command}{Device Types / PMEM Driver / Driver Operation / Virtqueue command}
> > +
> > +Driver MUST send VIRTIO_FLUSH command on request virtqueue, allows guest userspace process to perform IO operations asynchronously.
>
> VIRTIO_FLUSH has not been defined.

o.k

>
> "guest userspace process" is beyond the scope of the driver/device
> interface and therefore not relevant to the VIRTIO specification. This
> could be rephrased:
>
>   The driver MUST send and wait for the successful completion of a
>   VIRTIO_PMEM_FLUSH command on req_vq in order to ensure previously
>   written data will persist across device reset and power failure.
>
> > +
> > +Driver SHOULD handle multiple fsync requests on files present on the device.
>
> The concept of "files" is beyond the scope of the driver/device
> interface. We only deal with the memory region in this specification, so
> I think the scenario you're describing is when multiple writes have been
> performed and there are several flush commands in flight.

yes. noted this as per earlier feedback.
>
> > +
> > +\subsection{Device Operations}\label{sec:Device Types / PMEM Driver / Device Operation}
> > +
> > +\devicenormative{\subsubsection}{Device Operations}{Device Types / PMEM Device / Device Operation / Virtqueue flush}
> > +
> > +Device SHOULD handle multiple flush requests simultaneously using host filesystem fsync or flush call.
>
> Same thing as above.
>
> Missing:
>
>   The device MUST ensure that all writes made before a flush request
>   will persist across device reset and power failure before completing
>   the flush request.

Sure.

>
> > +
> > +\devicenormative{\subsubsection}{Device operations}{Device Types / PMEM Device / Device Operation / Virtqueue return}
> > +
> > +Device MUST return integer "0" for success and "-1" for failure.
> > +These errors are converted to corresponding error codes by guest
> > +as per architecture.
>
> This sentence about guest error codes is outside the scope of the VIRTIO
> specification.

Yes. w
>
> > +
> > +\subsection{Possible security implications}\label{sec:Device Types / PMEM Device / Possible Security Implications}
> > +
> > +There could be potential security implications depending on how
> > +memory mapped host backing file is used. By default device emulation
> > +is done with SHARED mapping. There is a contract between guest and host
> > +process to access same backing file for read/write operations.
> > +
> > +If a malicious guest or host userspace map the same backing file,
> > +attacking process can make use of known cache side channel attacks
> > +to predict the current state of shared page cache page. If both
> > +attacker and victim somehow execute same shared code after a
> > +flush/evict call, with difference in execution timing attacker
> > +could infer another guest local data or host data. Though this is
> > +not easy and same challenges exist as with bare metal host system
> > +when userspace share same backing file.
>
> This is important information but needs to be phrased without referring
> to "host"/"host" and only with respect to the driver/device interface
> (not the host/guest kernel, applications, etc).

Sure will do, Thank you!
>
> The same applies to the rest of the security points below.

Yes
>
> > +
> > +\subsection{Countermeasures}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures}
> > +
> > +\subsubsection{ With SHARED mapping}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / SHARED}
> > +
> > +If device backing backing file is shared with multiple guests or host
> > +processes, this may act as a metric for page cache side channel attack.
> > +As a counter measure every guest should have its own(not shared with
> > +another guest) SHARED backing file and gets populated a per host process
> > +page cache pages.
> > +
> > +\subsubsection{ With PRIVATE mapping}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / PRIVATE}
> > +There maybe be chances of side channels attack with PRIVATE
> > +memory mapping similar to SHARED with read-only shared mappings.
> > +PRIVATE is not used for virtio pmem making this usecase
> > +irrelevant.
> > +
> > +\subsubsection{ Workload specific mapping}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / Workload}
> > +For SHARED mapping, if workload is single application inside
> > +guest and there is no risk with sharing of data between guests.
> > +Guest sharing same backing file with SHARED mapping can be
> > +used as a valid configuration.
> > +
> > +\subsubsection{ Prevent cache eviction}\label{sec:Device Types / PMEM Device / Possible Security Implications / Countermeasures / Cache eviction}
> > +Don't allow cache evict from guest filesystem trim/discard command
> > +with virtio pmem. This rules out any possibility of evict-reload
> > +page cache side channel attacks if backing disk is shared(SHARED)
> > +with mutliple guests. Though if we use per device backing file with
> > +shared mapping this countermeasure is not required.
> > --
> > 2.25.1
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> >


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [virtio-dev] [PATCH RESEND] virtio-pmem: PMEM device spec
  2021-08-04 11:11   ` David Hildenbrand
@ 2021-08-04 12:33     ` Stefan Hajnoczi
  2021-08-04 12:42       ` Pankaj Gupta
  2021-08-04 22:58       ` Taylor Stark
  0 siblings, 2 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2021-08-04 12:33 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Pankaj Gupta, virtio-dev, dan.j.williams, mst, cohuck, tstark,
	pankaj.gupta

[-- Attachment #1: Type: text/plain, Size: 1640 bytes --]

On Wed, Aug 04, 2021 at 01:11:52PM +0200, David Hildenbrand wrote:
> On 04.08.21 13:07, Stefan Hajnoczi wrote:
> > On Wed, Jul 28, 2021 at 05:04:35PM +0200, Pankaj Gupta wrote:
> > > +\subsection{Device configuration layout}\label{sec:Device Types / PMEM Device / Device configuration layout}
> > > +
> > > +\begin{lstlisting}
> > > +struct virtio_pmem_config {
> > > +	le64 start;
> > > +	le64 size;
> > > +};
> > > +\end{lstlisting}
> > > +
> > > +\begin{description}
> > > +\item[\field{start}] contains the start address from the guest physical address range
> > > +to be hotplugged into the guest address space using the pmem API.
> > > +
> > > +\item[\field{size}] contains the length of this address range.
> > > +\end{description}
> > 
> > Please use a Shared Memory Region instead. This is a relatively new
> > addition to the VIRTIO device model that seems like a good fit for this
> > device:
> > https://github.com/oasis-tcs/virtio-spec/blob/master/shared-mem.tex
> 
> Note that virtio-pmem is already upstream in QEMU and in Linux using this
> model. There is a proposal to optionally use shared memory for exposing the
> area, to be unlocked with a feature bit.

Please incorporate the proposal into this patch. A feature bit is a good
way of handling this case.

Although the difference between configuration space parameters and a
shared memory region might seem trivial, I think using the shared memory
region is worthwhile because it makes the memory region a first-class
concept in the VIRTIO device model that is supported by the driver and
device emulation infrastructure.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [virtio-dev] [PATCH RESEND] virtio-pmem: PMEM device spec
  2021-08-04 11:28   ` Pankaj Gupta
@ 2021-08-04 12:36     ` Stefan Hajnoczi
  2021-08-04 12:40       ` Pankaj Gupta
  0 siblings, 1 reply; 17+ messages in thread
From: Stefan Hajnoczi @ 2021-08-04 12:36 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: virtio-dev, Dan Williams, David Hildenbrand, Michael S . Tsirkin,
	Cornelia Huck, Taylor Stark, Pankaj Gupta

[-- Attachment #1: Type: text/plain, Size: 871 bytes --]

On Wed, Aug 04, 2021 at 01:28:39PM +0200, Pankaj Gupta wrote:
> > > +\subsection{Device Initialization}\label{sec:Device Types / PMEM Device / Device Initialization}
> > > +
> > > +Device hotplugs physical memory to guest address space. Persistent memory device
> > > +is emulated with file backed memory at host side.
> >
> > This paragraph can be dropped. The device could be implemented in a
> > number of ways and it's beyond the scope of the specification.
> 
> But spec is for current implementation?  As we have more devices or ways, we
> can amend this.

The spec defines the driver/device interface, not the implementation. By
focussing on the interface rather than implementation, we leave room for
multiple implementations and avoid baking in assumptions about one
particular implementation (e.g. file-backed MAP_SHARED host memory).

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [virtio-dev] [PATCH RESEND] virtio-pmem: PMEM device spec
  2021-08-04 12:36     ` Stefan Hajnoczi
@ 2021-08-04 12:40       ` Pankaj Gupta
  0 siblings, 0 replies; 17+ messages in thread
From: Pankaj Gupta @ 2021-08-04 12:40 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: virtio-dev, Dan Williams, David Hildenbrand, Michael S . Tsirkin,
	Cornelia Huck, Taylor Stark, Pankaj Gupta

> > > > +\subsection{Device Initialization}\label{sec:Device Types / PMEM Device / Device Initialization}
> > > > +
> > > > +Device hotplugs physical memory to guest address space. Persistent memory device
> > > > +is emulated with file backed memory at host side.
> > >
> > > This paragraph can be dropped. The device could be implemented in a
> > > number of ways and it's beyond the scope of the specification.
> >
> > But spec is for current implementation?  As we have more devices or ways, we
> > can amend this.
>
> The spec defines the driver/device interface, not the implementation. By
> focussing on the interface rather than implementation, we leave room for
> multiple implementations and avoid baking in assumptions about one
> particular implementation (e.g. file-backed MAP_SHARED host memory).

Fair point. Will remove the paragraph.

Thanks,
Pankaj


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [virtio-dev] [PATCH RESEND] virtio-pmem: PMEM device spec
  2021-08-04 12:33     ` Stefan Hajnoczi
@ 2021-08-04 12:42       ` Pankaj Gupta
  2021-08-04 22:58       ` Taylor Stark
  1 sibling, 0 replies; 17+ messages in thread
From: Pankaj Gupta @ 2021-08-04 12:42 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: David Hildenbrand, virtio-dev, Dan Williams, Michael S . Tsirkin,
	Cornelia Huck, Taylor Stark, Pankaj Gupta

> > On 04.08.21 13:07, Stefan Hajnoczi wrote:
> > > On Wed, Jul 28, 2021 at 05:04:35PM +0200, Pankaj Gupta wrote:
> > > > +\subsection{Device configuration layout}\label{sec:Device Types / PMEM Device / Device configuration layout}
> > > > +
> > > > +\begin{lstlisting}
> > > > +struct virtio_pmem_config {
> > > > + le64 start;
> > > > + le64 size;
> > > > +};
> > > > +\end{lstlisting}
> > > > +
> > > > +\begin{description}
> > > > +\item[\field{start}] contains the start address from the guest physical address range
> > > > +to be hotplugged into the guest address space using the pmem API.
> > > > +
> > > > +\item[\field{size}] contains the length of this address range.
> > > > +\end{description}
> > >
> > > Please use a Shared Memory Region instead. This is a relatively new
> > > addition to the VIRTIO device model that seems like a good fit for this
> > > device:
> > > https://github.com/oasis-tcs/virtio-spec/blob/master/shared-mem.tex
> >
> > Note that virtio-pmem is already upstream in QEMU and in Linux using this
> > model. There is a proposal to optionally use shared memory for exposing the
> > area, to be unlocked with a feature bit.
>
> Please incorporate the proposal into this patch. A feature bit is a good
> way of handling this case.

You mean in the patch with parent spec?

>
> Although the difference between configuration space parameters and a
> shared memory region might seem trivial, I think using the shared memory
> region is worthwhile because it makes the memory region a first-class
> concept in the VIRTIO device model that is supported by the driver and
> device emulation infrastructure.
>
> Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [virtio-dev] [PATCH RESEND] virtio-pmem: PMEM device spec
  2021-08-04 12:33     ` Stefan Hajnoczi
  2021-08-04 12:42       ` Pankaj Gupta
@ 2021-08-04 22:58       ` Taylor Stark
  2021-08-05 15:22         ` Stefan Hajnoczi
  1 sibling, 1 reply; 17+ messages in thread
From: Taylor Stark @ 2021-08-04 22:58 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: David Hildenbrand, Pankaj Gupta, virtio-dev, dan.j.williams, mst,
	cohuck, pankaj.gupta, tstark

On Wed, Aug 04, 2021 at 01:33:05PM +0100, Stefan Hajnoczi wrote:
> On Wed, Aug 04, 2021 at 01:11:52PM +0200, David Hildenbrand wrote:
> > On 04.08.21 13:07, Stefan Hajnoczi wrote:
> > > On Wed, Jul 28, 2021 at 05:04:35PM +0200, Pankaj Gupta wrote:
> > > > +\subsection{Device configuration layout}\label{sec:Device Types / PMEM Device / Device configuration layout}
> > > > +
> > > > +\begin{lstlisting}
> > > > +struct virtio_pmem_config {
> > > > +	le64 start;
> > > > +	le64 size;
> > > > +};
> > > > +\end{lstlisting}
> > > > +
> > > > +\begin{description}
> > > > +\item[\field{start}] contains the start address from the guest physical address range
> > > > +to be hotplugged into the guest address space using the pmem API.
> > > > +
> > > > +\item[\field{size}] contains the length of this address range.
> > > > +\end{description}
> > > 
> > > Please use a Shared Memory Region instead. This is a relatively new
> > > addition to the VIRTIO device model that seems like a good fit for this
> > > device:
> > > https://github.com/oasis-tcs/virtio-spec/blob/master/shared-mem.tex
> > 
> > Note that virtio-pmem is already upstream in QEMU and in Linux using this
> > model. There is a proposal to optionally use shared memory for exposing the
> > area, to be unlocked with a feature bit.
> 
> Please incorporate the proposal into this patch. A feature bit is a good
> way of handling this case.
> 
> Although the difference between configuration space parameters and a
> shared memory region might seem trivial, I think using the shared memory
> region is worthwhile because it makes the memory region a first-class
> concept in the VIRTIO device model that is supported by the driver and
> device emulation infrastructure.
> 
> Stefan

For what it's worth, here's the proposal to add support for shared memory regions:
https://lists.oasis-open.org/archives/virtio-comment/202107/msg00169.html

The original plan was to have the base spec go in first, and then consider the
proposal. Do you think it's better to include the shared memory region proposal
from the start?

Thanks,
Taylor


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [virtio-dev] [PATCH RESEND] virtio-pmem: PMEM device spec
  2021-08-04 22:58       ` Taylor Stark
@ 2021-08-05 15:22         ` Stefan Hajnoczi
  0 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2021-08-05 15:22 UTC (permalink / raw)
  To: Taylor Stark
  Cc: David Hildenbrand, Pankaj Gupta, virtio-dev, dan.j.williams, mst,
	cohuck, pankaj.gupta, tstark

[-- Attachment #1: Type: text/plain, Size: 2512 bytes --]

On Wed, Aug 04, 2021 at 03:58:49PM -0700, Taylor Stark wrote:
> On Wed, Aug 04, 2021 at 01:33:05PM +0100, Stefan Hajnoczi wrote:
> > On Wed, Aug 04, 2021 at 01:11:52PM +0200, David Hildenbrand wrote:
> > > On 04.08.21 13:07, Stefan Hajnoczi wrote:
> > > > On Wed, Jul 28, 2021 at 05:04:35PM +0200, Pankaj Gupta wrote:
> > > > > +\subsection{Device configuration layout}\label{sec:Device Types / PMEM Device / Device configuration layout}
> > > > > +
> > > > > +\begin{lstlisting}
> > > > > +struct virtio_pmem_config {
> > > > > +	le64 start;
> > > > > +	le64 size;
> > > > > +};
> > > > > +\end{lstlisting}
> > > > > +
> > > > > +\begin{description}
> > > > > +\item[\field{start}] contains the start address from the guest physical address range
> > > > > +to be hotplugged into the guest address space using the pmem API.
> > > > > +
> > > > > +\item[\field{size}] contains the length of this address range.
> > > > > +\end{description}
> > > > 
> > > > Please use a Shared Memory Region instead. This is a relatively new
> > > > addition to the VIRTIO device model that seems like a good fit for this
> > > > device:
> > > > https://github.com/oasis-tcs/virtio-spec/blob/master/shared-mem.tex
> > > 
> > > Note that virtio-pmem is already upstream in QEMU and in Linux using this
> > > model. There is a proposal to optionally use shared memory for exposing the
> > > area, to be unlocked with a feature bit.
> > 
> > Please incorporate the proposal into this patch. A feature bit is a good
> > way of handling this case.
> > 
> > Although the difference between configuration space parameters and a
> > shared memory region might seem trivial, I think using the shared memory
> > region is worthwhile because it makes the memory region a first-class
> > concept in the VIRTIO device model that is supported by the driver and
> > device emulation infrastructure.
> > 
> > Stefan
> 
> For what it's worth, here's the proposal to add support for shared memory regions:
> https://lists.oasis-open.org/archives/virtio-comment/202107/msg00169.html
> 
> The original plan was to have the base spec go in first, and then consider the
> proposal. Do you think it's better to include the shared memory region proposal
> from the start?

Talyor and Pankaj,
I hadn't seen Taylor's patch. Since it's small and has already had some
review it can probably be merged quickly after the main virtio-pmem spec
change.

Feel free to keep it as a separate series!

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-08-05 15:22 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-28 15:04 [PATCH RESEND] virtio-pmem: PMEM device spec Pankaj Gupta
2021-07-30 11:53 ` [virtio-dev] " Cornelia Huck
2021-07-30 12:25   ` Pankaj Gupta
2021-08-03  8:26     ` [virtio-dev] " Cornelia Huck
2021-08-03  8:50       ` David Hildenbrand
2021-08-03  9:02         ` Pankaj Gupta
2021-08-03  9:16       ` Pankaj Gupta
2021-08-03  9:17         ` David Hildenbrand
2021-08-04 11:07 ` [virtio-dev] " Stefan Hajnoczi
2021-08-04 11:11   ` David Hildenbrand
2021-08-04 12:33     ` Stefan Hajnoczi
2021-08-04 12:42       ` Pankaj Gupta
2021-08-04 22:58       ` Taylor Stark
2021-08-05 15:22         ` Stefan Hajnoczi
2021-08-04 11:28   ` Pankaj Gupta
2021-08-04 12:36     ` Stefan Hajnoczi
2021-08-04 12:40       ` Pankaj Gupta

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.