All of lore.kernel.org
 help / color / mirror / Atom feed
* [DOC v1] Xen transport for 9pfs
@ 2016-11-29 23:34 Stefano Stabellini
  2016-11-30 10:59 ` David Vrabel
  2016-12-01 10:56 ` Dario Faggioli
  0 siblings, 2 replies; 10+ messages in thread
From: Stefano Stabellini @ 2016-11-29 23:34 UTC (permalink / raw)
  To: xen-devel; +Cc: stefano, wei.liu2

# Xen transport for 9pfs version 1 

## Background

9pfs is a network filesystem protocol developed for Plan 9. 9pfs is very
simple and describes a series of commands and responses. It is
completely independent from the communication channels, in fact many
clients and servers support multiple channels, usually called
"transports". For example the Linux client supports tcp and unix
sockets, fds, virtio and rdma.


### 9pfs protocol

This document won't cover the full 9pfs specification. Please refer to
this [paper] and this [website] for a detailed description of it.
However it is useful to know that each 9pfs request and response has the
following header:

    struct header {
    	uint32_t size;
    	uint8_t id;
    	uint16_t tag;
    } __attribute__((packed));

    0         4  5    7
    +---------+--+----+
    |  size   |id|tag |
    +---------+--+----+

- *size*
The size of the request or response.

- *id*
The 9pfs request or response operation.

- *tag*
Unique id that identifies a specific request/response pair. It is used
to multiplex operations on a single channel.

It is possible to have multiple requests in-flight at any given time.


## Rationale

This document describes a Xen based transport for 9pfs, in the
traditional PV frontend and backend format. The PV frontend is used by
the client to send commands to the server. The PV backend is used by the
9pfs server to receive commands from clients and send back responses.

Each 9pfs connection corresponds to a single filesystem share.

This document does not cover the 9pfs client/server design or
implementation, only the transport for it.


## Xenstore

The frontend and the backend connect via xenstore to exchange
information. The toolstack creates front and back nodes with state
[XenbusStateInitialising]. The protocol node name is **9pfs**.

Multiple rings are supported for each frontend and backend connection.

### Frontend XenBus Nodes

    num-rings
         Values:         <uint32_t>
    
         Number of rings. It needs to be lower or equal to max-rings.
    
    port-<num> (port-0, port-1, etc)
         Values:         <uint32_t>
    
         The identifier of the Xen event channel used to signal activity
         in the ring buffer. One for each ring.
    
    ring-ref-<num> (ring-ref-0, ring-ref-1, etc)
         Values:         <uint32_t>
    
         The Xen grant reference granting permission for the backend to
         map a page with information to setup a share ring. One for each
         ring.

### Backend XenBus Nodes

Backend specific properties, written by the backend, read by the
frontend:

    version
         Values:         <uint32_t>
    
         Protocol version supported by the backend. Currently the value is
         1.
    
    max-rings
         Values:         <uint32_t>
    
         The maximum supported number of rings.
    
    max-ring-page-order
         Values:         <uint32_t>
    
         The maximum supported size of a memory allocation in units of
         lb(machine pages), e.g. 0 == 1 page,  1 = 2 pages, 2 == 4 pages,
         etc.

Backend configuration nodes, written by the toolstack, read by the
backend:

    path
         Values:         <string>
    
         Host filesystem path to share.
    
    tag
         Values:         <string>
    
         Alphanumeric tag that identifies the 9pfs share. The client needs
         to know the tag to mount it.
    
    security_model
         Values:         "none"
    
         *none*: files are stored using the same credentials as they are
                 created on the guest
         Only "none" is supported in this version of the protocol.


### State Machine

Initialization:

    *Front*                               *Back*
    XenbusStateInitialising               XenbusStateInitialising
    - Query virtual device                - Query backend device
      properties.                           identification data.
    - Setup OS device instance.           - Publish backend features
    - Allocate and initialize the           and transport parameters
      request ring.                                      |
    - Publish transport parameters                       |
      that will be in effect during                      V
      this connection.                            XenbusStateInitWait
                 |
                 |
                 V
       XenbusStateInitialised

                                          - Query frontend transport parameters.
                                          - Connect to the request ring and
                                            event channel.
                                                         |
                                                         |
                                                         V
                                                 XenbusStateConnected

     - Query backend device properties.
     - Finalize OS virtual device
       instance.
                 |
                 |
                 V
        XenbusStateConnected

Once frontend and backend are connected, they have a shared page per
ring, which are used to setup the rings, and an event channel per ring,
which are used to send notifications.

Shutdown:

    *Front*                            *Back*
    XenbusStateConnected               XenbusStateConnected
                |
                |
                V
       XenbusStateClosing

                                       - Unmap grants
                                       - Unbind evtchns
                                                 |
                                                 |
                                                 V
                                         XenbusStateClosing

    - Unbind evtchns
    - Free rings
    - Free data structures
               |
               |
               V
       XenbusStateClosed

                                       - Free remaining data structures
                                                 |
                                                 |
                                                 V
                                         XenbusStateClosed


## Ring Setup

The shared page has the following layout:

    typedef uint32_t XEN_9PFS_RING_IDX;

    struct xen_9pfs_intf {
    	XEN_9PFS_RING_IDX in_cons, in_prod;
    	XEN_9PFS_RING_IDX out_cons, out_prod;
    
    	uint32_t ring_order;
    	grant_ref_t ref[];
    };

    /* not actually C compliant (ring_order changes from socket to socket) */
    struct ring_data {
        char in[((1 << ring_order) << PAGE_SHIFT) / 2];
        char out[((1 << ring_order) << PAGE_SHIFT) / 2];
    };

- **ring_order**
  It represents the order of the data ring. The following list of grant
  references is of `(1 << ring_order)` elements. It cannot be greater than
  **max-ring-page-order**, as specified by the backend on XenBus.
- **ref[]**
  The list of grant references which will contain the actual data. They are
  mapped contiguosly in virtual memory. The first half of the pages is the
  **in** array, the second half is the **out** array. The array must
  have a power of two number of elements.
- **out** is an array used as circular buffer
  It contains client requests. The producer is the frontend, the
  consumer is the backend.
- **in** is an array used as circular buffer
  It contains server responses. The producer is the backend, the
  consumer is the frontend.
- **out_cons**, **out_prod**
  Consumer and producer indices for client requests. They keep track of
  how much data has been written by the frontend to **out** and how much
  data has already been consumed by the backend. **out_prod** is
  increased by the frontend, after writing data to **out**. **out_cons**
  is increased by the backend, after reading data from **out**.
- **in_cons** and **in_prod**
  Consumer and producer indices for responses. They keep track of how
  much data has already been consumed by the frontend from the **in**
  array. **in_prod** is increased by the backend, after writing data to
  **in**.  **in_cons** is increased by the frontend, after reading data
  from **in**.

The binary layout of `struct xen_9pfs_intf` follows:

    0         4         8         12        16        20
    +---------+---------+---------+---------+---------+
    | in_cons | in_prod |out_cons |out_prod |ring_orde|
    +---------+---------+---------+---------+---------+

    20        24        26      4092      4096
    +---------+---------+----//---+---------+
    |  ref[0] |  ref[1] |         |  ref[N] |
    +---------+---------+----//---+---------+

**N.B** For one page, N is maximum 1019 ((4096-20)/4), but given that N
needs to be a power of two, actually max N is 512.
The binary layout of the ring buffers follow:

    0         ((1<<ring_order)<<PAGE_SHIFT)/2       ((1<<ring_order)<<PAGE_SHIFT)
    +------------//-------------+------------//-------------+
    |            in             |           out             |
    +------------//-------------+------------//-------------+


## Ring Usage

The **in** and **out** arrays are used as circular buffers:
    
    0                               sizeof(array) == ((1<<ring_order)<<PAGE_SHIFT)/2
    +-----------------------------------+
    |to consume|    free    |to consume |
    +-----------------------------------+
               ^            ^
               prod         cons

    0                               sizeof(array)
    +-----------------------------------+
    |  free    | to consume |   free    |
    +-----------------------------------+
               ^            ^
               cons         prod

The following functions are provided to read and write to an array:

    #define MASK_XEN_9PFS_IDX(idx) ((idx) & (XEN_9PFS_RING_SIZE - 1))

    static inline void xen_9pfs_read(char *buf,
    		XEN_9PFS_RING_IDX *masked_prod, XEN_9PFS_RING_IDX *masked_cons,
    		uint8_t *h, size_t len) {
    	if (*masked_cons < *masked_prod) {
    		memcpy(h, buf + *masked_cons, len);
    	} else {
    		if (len > XEN_9PFS_RING_SIZE - *masked_cons) {
    			memcpy(h, buf + *masked_cons, XEN_9PFS_RING_SIZE - *masked_cons);
    			memcpy((char *)h + XEN_9PFS_RING_SIZE - *masked_cons, buf, len - (XEN_9PFS_RING_SIZE - *masked_cons));
    		} else {
    			memcpy(h, buf + *masked_cons, len);
    		}
    	}
    	*masked_cons = _MASK_XEN_9PFS_IDX(*masked_cons + len);
    }
    
    static inline void xen_9pfs_write(char *buf,
    		XEN_9PFS_RING_IDX *masked_prod, XEN_9PFS_RING_IDX *masked_cons,
    		uint8_t *opaque, size_t len) {
    	if (*masked_prod < *masked_cons) {
    		memcpy(buf + *masked_prod, opaque, len);
    	} else {
    		if (len > XEN_9PFS_RING_SIZE - *masked_prod) {
    			memcpy(buf + *masked_prod, opaque, XEN_9PFS_RING_SIZE - *masked_prod);
    			memcpy(buf, opaque + (XEN_9PFS_RING_SIZE - *masked_prod), len - (XEN_9PFS_RING_SIZE - *masked_prod)); 
    		} else {
    			memcpy(buf + *masked_prod, opaque, len); 
    		}
    	}
    	*masked_prod = _MASK_XEN_9PFS_IDX(*masked_prod + len);
    }

The producer (the backend for **in**, the frontend for **out**) writes to the
array in the following way:

- read *cons*, *prod* from shared memory
- memory barrier
- write to array at position *prod* up to *cons*, wrapping around the circular
  buffer when necessary
- memory barrier
- increase *prod*
- notify the other end via evtchn

The consumer (the backend for **out**, the frontend for **in**) reads from the
array in the following way:

- read *prod*, *cons* from shared memory
- memory barrier
- read from array at position *cons* up to *prod*, wrapping around the circular
  buffer when necessary
- memory barrier
- increase *cons*
- notify the other end via evtchn

The producer takes care of writing only as many bytes as available in the buffer
up to *cons*. The consumer takes care of reading only as many bytes as available
in the buffer up to *prod*.


## Request/Response Workflow

The client chooses one of the available rings, then it sends a request
to the other end on the *out* array, following the producer workflow
described in [Ring Usage].

The server receives the notification and reads the request, following
the consumer workflow described in [Ring Usage]. The server knows how
much to read because it is specified in the *size* field of the 9pfs
header. The server processes the request and sends back a response on
the *in* array of the same ring, following the producer workflow as
usual.

The client receives a notification and reads the response from the *in*
array. The client knows how much data to read because it is specified in
the *size* field of the 9pfs header.


[paper]: https://www.usenix.org/legacy/event/usenix05/tech/freenix/full_papers/hensbergen/hensbergen.pdf
[website]: https://github.com/chaos/diod/blob/master/protocol.md
[XenbusStateInitialising]: http://xenbits.xen.org/docs/unstable/hypercall/x86_64/include,public,io,xenbus.h.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [DOC v1] Xen transport for 9pfs
  2016-11-29 23:34 [DOC v1] Xen transport for 9pfs Stefano Stabellini
@ 2016-11-30 10:59 ` David Vrabel
  2016-12-02  0:29   ` Stefano Stabellini
  2016-12-01 10:56 ` Dario Faggioli
  1 sibling, 1 reply; 10+ messages in thread
From: David Vrabel @ 2016-11-30 10:59 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: wei.liu2

On 29/11/16 23:34, Stefano Stabellini wrote:
> 
> The producer (the backend for **in**, the frontend for **out**) writes to the
> array in the following way:
> 
- read memory barrier
> - read *cons*, *prod* from shared memory
> - write to array at position *prod* up to *cons*, wrapping around the circular
>   buffer when necessary
- write memory barrier
> - increase *prod*
> - notify the other end via evtchn
> 
> The consumer (the backend for **out**, the frontend for **in**) reads from the
> array in the following way:

- read memory barrier
> - read *prod*, *cons* from shared memory
> - read from array at position *cons* up to *prod*, wrapping around the circular
>   buffer when necessary
> - memory barrier
> - increase *cons*
> - notify the other end via evtchn

Your barriers are wrong (see corrections above).

I think you should use a private copy of cons/prod in the
consumer/producer and use this to validate that the shared prod/cons is
within a sensible range.

You're missing a mechanism to omit unnecessary evtchn notifications
(like the standard ring protocol has).

This all looks like a generic "transfer byte stream" mechanism which
could be usefully made generic and not specific to 9pfs.

David

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [DOC v1] Xen transport for 9pfs
  2016-11-29 23:34 [DOC v1] Xen transport for 9pfs Stefano Stabellini
  2016-11-30 10:59 ` David Vrabel
@ 2016-12-01 10:56 ` Dario Faggioli
  2016-12-01 23:14   ` Stefano Stabellini
  1 sibling, 1 reply; 10+ messages in thread
From: Dario Faggioli @ 2016-12-01 10:56 UTC (permalink / raw)
  To: Stefano Stabellini, xen-devel; +Cc: wei.liu2


[-- Attachment #1.1: Type: text/plain, Size: 3360 bytes --]

On Tue, 2016-11-29 at 15:34 -0800, Stefano Stabellini wrote:
> ### Frontend XenBus Nodes
> 
>     num-rings
>          Values:         <uint32_t>
>     
>          Number of rings. It needs to be lower or equal to max-rings.
>     
>     port-<num> (port-0, port-1, etc)
>          Values:         <uint32_t>
>     
>          The identifier of the Xen event channel used to signal
> activity
>          in the ring buffer. One for each ring.
>     
>     ring-ref-<num> (ring-ref-0, ring-ref-1, etc)
>
blkif uses ring-ref%u, rather than ring-ref-%u (i.e., no dash before
the index). Not a big deal, I guess, but I thought it could be nice to
be a bit more uniform.

>          Values:         <uint32_t>
>     
>          The Xen grant reference granting permission for the backend
> to
>          map a page with information to setup a share ring. One for
> each
>          ring.
> 
So, this means there can be multiple rings (up to max-rings advertised
by backend), _each_of_which_ has its own event channel, and
_each_of_which_ can be a multi-page (in case ring_order > 0) ring... Is
this correct?

If it is, what's the typical envisioned use of these multiple rings, if
I can ask?

(Maybe it's obvious, I'm still not so familiar with these protocols..
but I'm trying to improve, and that's why I'm here, so bear with me if
possible. :-D)

> ## Ring Setup
> 
> The shared page has the following layout:
> 
>     typedef uint32_t XEN_9PFS_RING_IDX;
> 
>     struct xen_9pfs_intf {
>     	XEN_9PFS_RING_IDX in_cons, in_prod;
>     	XEN_9PFS_RING_IDX out_cons, out_prod;
>     
>     	uint32_t ring_order;
>     	grant_ref_t ref[];
>     };
> 
>     /* not actually C compliant (ring_order changes from socket to
> socket) */
>     struct ring_data {
>         char in[((1 << ring_order) << PAGE_SHIFT) / 2];
>         char out[((1 << ring_order) << PAGE_SHIFT) / 2];
>     };
>
Sorry, what does "ring_order changes from socket to socket" mean?

> The binary layout of `struct xen_9pfs_intf` follows:
> 
>     0         4         8         12        16        20
>     +---------+---------+---------+---------+---------+
>     | in_cons | in_prod |out_cons |out_prod |ring_orde|
>     +---------+---------+---------+---------+---------+
> 
>     20        24        26      4092      4096
>     +---------+---------+----//---+---------+
>     |  ref[0] |  ref[1] |         |  ref[N] |
>     +---------+---------+----//---+---------+
> 
> **N.B** For one page, N is maximum 1019 ((4096-20)/4), but given that
> N
> needs to be a power of two, actually max N is 512.
>
It may again be me being still too naive, but I'd quickly add at least
another example, with the value of N computed for a multiple pages
ring. Something like: "For 4 pages (i.e., ring_orfer=2), N is..."

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [DOC v1] Xen transport for 9pfs
  2016-12-01 10:56 ` Dario Faggioli
@ 2016-12-01 23:14   ` Stefano Stabellini
  2016-12-02  8:54     ` Dario Faggioli
  0 siblings, 1 reply; 10+ messages in thread
From: Stefano Stabellini @ 2016-12-01 23:14 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Stefano Stabellini, xen-devel, wei.liu2

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4880 bytes --]

On Thu, 1 Dec 2016, Dario Faggioli wrote:
> On Tue, 2016-11-29 at 15:34 -0800, Stefano Stabellini wrote:
> > ### Frontend XenBus Nodes
> > 
> >     num-rings
> >          Values:         <uint32_t>
> >     
> >          Number of rings. It needs to be lower or equal to max-rings.
> >     
> >     port-<num> (port-0, port-1, etc)
> >          Values:         <uint32_t>
> >     
> >          The identifier of the Xen event channel used to signal
> > activity
> >          in the ring buffer. One for each ring.
> >     
> >     ring-ref-<num> (ring-ref-0, ring-ref-1, etc)
> >
> blkif uses ring-ref%u, rather than ring-ref-%u (i.e., no dash before
> the index). Not a big deal, I guess, but I thought it could be nice to
> be a bit more uniform.

Sure, but in this case each ring-ref-%u is used to map a different ring.
That said, I can make the change.


> >          Values:         <uint32_t>
> >     
> >          The Xen grant reference granting permission for the backend
> > to
> >          map a page with information to setup a share ring. One for
> > each
> >          ring.
> > 
> So, this means there can be multiple rings (up to max-rings advertised
> by backend), _each_of_which_ has its own event channel, and
> _each_of_which_ can be a multi-page (in case ring_order > 0) ring... Is
> this correct?

That's right, and the size of each ring can be up to max-ring-page-order.


> If it is, what's the typical envisioned use of these multiple rings, if
> I can ask?

They are used to handle multiple read/write requests in parallel. Let's
assume that we configure the rings to be 8K each. Given that the data is
transmitted over the ring, each ring can hold only one outstanding 4K
write request (there is an header for each write request).

With two 8K rings, we can have two outstanding 4K write requests, each
of them processed in parallel on a different vcpu.

The system is completely configurable in terms of number and size of
rings, so a user can configure it to only export one 4K ring for
example or go as far as several 2MB rings.


> (Maybe it's obvious, I'm still not so familiar with these protocols..
> but I'm trying to improve, and that's why I'm here, so bear with me if
> possible. :-D)
> 
> > ## Ring Setup
> > 
> > The shared page has the following layout:
> > 
> >     typedef uint32_t XEN_9PFS_RING_IDX;
> > 
> >     struct xen_9pfs_intf {
> >     	XEN_9PFS_RING_IDX in_cons, in_prod;
> >     	XEN_9PFS_RING_IDX out_cons, out_prod;
> >     
> >     	uint32_t ring_order;
> >     	grant_ref_t ref[];
> >     };
> > 
> >     /* not actually C compliant (ring_order changes from socket to
> > socket) */
> >     struct ring_data {
> >         char in[((1 << ring_order) << PAGE_SHIFT) / 2];
> >         char out[((1 << ring_order) << PAGE_SHIFT) / 2];
> >     };
> >
> Sorry, what does "ring_order changes from socket to socket" mean?

Woops, copy/paste error from PVCalls. I meant "ring_order changes from
ring to ring".


> > The binary layout of `struct xen_9pfs_intf` follows:
> > 
> >     0         4         8         12        16        20
> >     +---------+---------+---------+---------+---------+
> >     | in_cons | in_prod |out_cons |out_prod |ring_orde|
> >     +---------+---------+---------+---------+---------+
> > 
> >     20        24        26      4092      4096
> >     +---------+---------+----//---+---------+
> >     |  ref[0] |  ref[1] |         |  ref[N] |
> >     +---------+---------+----//---+---------+
> > 
> > **N.B** For one page, N is maximum 1019 ((4096-20)/4), but given that
> > N
> > needs to be a power of two, actually max N is 512.
> >
> It may again be me being still too naive, but I'd quickly add at least
> another example, with the value of N computed for a multiple pages
> ring. Something like: "For 4 pages (i.e., ring_orfer=2), N is..."

For 4 pages, N is 4. N is the number of pages that make up the ring.

Maybe there is a misunderstanding, let me try to explain it again: each
page shared via xenstore contains information to handle one new ring,
including grant references for the pages making up the multipage ring
itself. I'll repeat: pages shared via xenstore are not used as a ring,
they are used to setup the rings, each page has the info to setup a new
ring.

The structure of these "setup pages" is `struct xen_9pfs_intf`. Each
page is completely separate and independent from the others. Given that
one page is just 4096 bytes, it can contain max 512 grant refs (see
calculation above). So the max size of one multipage ring is 512 pages =
2MB.

Does it make sense?

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [DOC v1] Xen transport for 9pfs
  2016-11-30 10:59 ` David Vrabel
@ 2016-12-02  0:29   ` Stefano Stabellini
  2016-12-02 10:45     ` David Vrabel
  2016-12-02 10:57     ` Andrew Cooper
  0 siblings, 2 replies; 10+ messages in thread
From: Stefano Stabellini @ 2016-12-02  0:29 UTC (permalink / raw)
  To: David Vrabel; +Cc: Stefano Stabellini, xen-devel, wei.liu2

On Wed, 30 Nov 2016, David Vrabel wrote:
> On 29/11/16 23:34, Stefano Stabellini wrote:
> > 
> > The producer (the backend for **in**, the frontend for **out**) writes to the
> > array in the following way:
> > 
> - read memory barrier
> > - read *cons*, *prod* from shared memory
> > - write to array at position *prod* up to *cons*, wrapping around the circular
> >   buffer when necessary
> - write memory barrier
> > - increase *prod*
> > - notify the other end via evtchn
> > 
> > The consumer (the backend for **out**, the frontend for **in**) reads from the
> > array in the following way:
> 
> - read memory barrier
> > - read *prod*, *cons* from shared memory
> > - read from array at position *cons* up to *prod*, wrapping around the circular
> >   buffer when necessary
> > - memory barrier
> > - increase *cons*
> > - notify the other end via evtchn
> 
> Your barriers are wrong (see corrections above).

Thanks for the review. Sorry for not specifying the type of memory
barriers. I agree with some of your suggestions, but I don't understand
why you moved the read memory barrier before reading prod and cons.

Shouldn't it be for a producer:
  - read *cons*, *prod* from shared memory
  - read memory barrier
  - write to array at position *prod* up to *cons*, wrapping around the circular
    buffer when necessary
  - write memory barrier
  - increase *prod*
  - notify the other end via evtchn
 
and for a consumer:
  - read *prod*, *cons* from shared memory
  - read memory barrier
  - read from array at position *cons* up to *prod*, wrapping around the circular
    buffer when necessary
  - general memory barrier
  - increase *cons*
  - notify the other end via evtchn

If this is wrong, could you please explain why?


> I think you should use a private copy of cons/prod in the
> consumer/producer and use this to validate that the shared prod/cons is
> within a sensible range.

With the masking macro provided (MASK_XEN_9PFS_IDX), indices cannot go
out of range:

#define MASK_XEN_9PFS_IDX(idx) ((idx) & (XEN_9PFS_RING_SIZE - 1))

That said, it could be a cheap additional validation.


> You're missing a mechanism to omit unnecessary evtchn notifications
> (like the standard ring protocol has).

Yes, indeed. I'll see what I can do.


> This all looks like a generic "transfer byte stream" mechanism which
> could be usefully made generic and not specific to 9pfs.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [DOC v1] Xen transport for 9pfs
  2016-12-01 23:14   ` Stefano Stabellini
@ 2016-12-02  8:54     ` Dario Faggioli
  2016-12-02 22:01       ` Stefano Stabellini
  0 siblings, 1 reply; 10+ messages in thread
From: Dario Faggioli @ 2016-12-02  8:54 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, wei.liu2


[-- Attachment #1.1: Type: text/plain, Size: 6961 bytes --]

On Thu, 2016-12-01 at 15:14 -0800, Stefano Stabellini wrote:
> On Thu, 1 Dec 2016, Dario Faggioli wrote:
> > 
> > On Tue, 2016-11-29 at 15:34 -0800, Stefano Stabellini wrote:
> > > 
> > >     ring-ref-<num> (ring-ref-0, ring-ref-1, etc)
> > > 
> > blkif uses ring-ref%u, rather than ring-ref-%u (i.e., no dash
> > before
> > the index). Not a big deal, I guess, but I thought it could be nice
> > to
> > be a bit more uniform.
> 
> Sure, but in this case each ring-ref-%u is used to map a different
> ring.
>
Yeah, right. So it may even be a good thing to differentiate, indeed...

> That said, I can make the change.
> 
I don't know. I, FWIW, thought it would be good, now I'm not so sure
any longer. Yours and maintainers' call, I guess. :-)

> > If it is, what's the typical envisioned use of these multiple
> > rings, if
> > I can ask?
> 
> They are used to handle multiple read/write requests in parallel.
> Let's
> assume that we configure the rings to be 8K each. Given that the data
> is
> transmitted over the ring, each ring can hold only one outstanding 4K
> write request (there is an header for each write request).
> 
Ok.

> With two 8K rings, we can have two outstanding 4K write requests,
> each
> of them processed in parallel on a different vcpu.
> 
> The system is completely configurable in terms of number and size of
> rings, so a user can configure it to only export one 4K ring for
> example or go as far as several 2MB rings.
> 
Right. So, it is indeed similar to blkif multiqueueing, with which it
also shares the idea/objective of exploiting parallelism at the (v)CPU
level, but without (quite obviously, in this case) any direct link to
hardware queues in disk controllers, and without the protocol itself
giving any direction or indication of how to actually use all this.

Got it. Nice.

FWIW, I think a few words --just a shorter version of what you just
said-- may be useful if present in this document.

> > >     /* not actually C compliant (ring_order changes from socket
> > > to
> > > socket) */
> > >     struct ring_data {
> > >         char in[((1 << ring_order) << PAGE_SHIFT) / 2];
> > >         char out[((1 << ring_order) << PAGE_SHIFT) / 2];
> > >     };
> > > 
> > Sorry, what does "ring_order changes from socket to socket" mean?
> 
> Woops, copy/paste error from PVCalls. I meant "ring_order changes
> from
> ring to ring".
> 
Ah, yes, now it makes sense. :-)

BTW, what's the reason for putting ring_order inside xen_9pfs_intf,
instead of having a ring-page-order (well, actually, a
ring-%u-page-order) xenstore key?

> > > The binary layout of `struct xen_9pfs_intf` follows:
> > > 
> > >     0         4         8         12        16        20
> > >     +---------+---------+---------+---------+---------+
> > >     | in_cons | in_prod |out_cons |out_prod |ring_orde|
> > >     +---------+---------+---------+---------+---------+
> > > 
> > >     20        24        26      4092      4096
> > >     +---------+---------+----//---+---------+
> > >     |  ref[0] |  ref[1] |         |  ref[N] |
> > >     +---------+---------+----//---+---------+
> > > 
> > > **N.B** For one page, N is maximum 1019 ((4096-20)/4), but given
> > > that
> > > N
> > > needs to be a power of two, actually max N is 512.
> > > 
> > It may again be me being still too naive, but I'd quickly add at
> > least
> > another example, with the value of N computed for a multiple pages
> > ring. Something like: "For 4 pages (i.e., ring_orfer=2), N is..."
> 
> For 4 pages, N is 4. N is the number of pages that make up the ring.
> 
> Maybe there is a misunderstanding, let me try to explain it again:
> each
> page shared via xenstore contains information to handle one new ring,
> including grant references for the pages making up the multipage ring
> itself. I'll repeat: pages shared via xenstore are not used as a
> ring,
> they are used to setup the rings, each page has the info to setup a
> new
> ring.
> 
Right, I got this. And indeed I expressed myself very badly above.

So, the descriptor of 1 ring is one page. Such page contains, in signle
page rings, the reference to another page, which is the actual ring. If
the ring is multi-page, the descriptor page contains an array of page
references which, together, are the actual ring.

Such array --of which N is, in the diagram above, the last index-- can
be, as you say, up to 1019 elements big (the available space in a ring
descriptor page). Therefore, the math I was asking about is really the
relationship between N and max-ring-page-order. That is, a ring can
have at most 2^max-ring-page-order pages, and N can be at most 1019
(well, I think it's 1018 if, as in diagram above, you count from 0, but
that does not matter much); so:

 2^max-ring-page-order <= N
 lb(2^max-ring-page-order) <= lb(N)  //lb(): base 2 logarithm
 max-ring-page-order <= lb(N)

and, considering that max-ring-page-order must be a natural number:

 max-ring-page-order <= floor(lb(N))
 max-ring-page-order <= floor(lb(1018))
 max-ring-page-order <= floor(9.9915)
 max-ring-page-order <= 9

so a ring can be at most 2^9 pages big, which indeed matches with your
own calculations, and bring us to the fact that the maximum size of a
ring is 512*4Kb=2Mb

So, to recap (sorry for being so long!), I think that saying:

"**N.B** For one page, N is maximum 1019 ((4096-20)/4), but given that
N needs to be a power of two, actually max N is 512."

is indeed correct, and probably makes it enough clear that the maximum
ring size is 2MB. It's not equally easy, IMO, to map this back to the
fact that this also mean max-ring-page-order must be at most 9, and
that is not spelled out anywhere else, AFAICT.

Therefore, an example of how things look with a couple of different
values of ring_order, or some shorter and less boring version of this
reasoning and calculations may help with that. That's what I'm trying
to say. :-)

> The structure of these "setup pages" is `struct xen_9pfs_intf`. Each
> page is completely separate and independent from the others. Given
> that
> one page is just 4096 bytes, it can contain max 512 grant refs (see
> calculation above). So the max size of one multipage ring is 512
> pages =
> 2MB.
> 
> Does it make sense?
>
It does, and this is probably me being a mix of, not too used to this,
and too picky... If that's the case, sorry for the noise. :-D

Thanks and Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [DOC v1] Xen transport for 9pfs
  2016-12-02  0:29   ` Stefano Stabellini
@ 2016-12-02 10:45     ` David Vrabel
  2016-12-02 21:45       ` Stefano Stabellini
  2016-12-02 10:57     ` Andrew Cooper
  1 sibling, 1 reply; 10+ messages in thread
From: David Vrabel @ 2016-12-02 10:45 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, wei.liu2

On 02/12/16 00:29, Stefano Stabellini wrote:
> On Wed, 30 Nov 2016, David Vrabel wrote:
>> On 29/11/16 23:34, Stefano Stabellini wrote:
>>>
>>> The producer (the backend for **in**, the frontend for **out**) writes to the
>>> array in the following way:
>>>
>> - read memory barrier
>>> - read *cons*, *prod* from shared memory
>>> - write to array at position *prod* up to *cons*, wrapping around the circular
>>>   buffer when necessary
>> - write memory barrier
>>> - increase *prod*
>>> - notify the other end via evtchn
>>>
>>> The consumer (the backend for **out**, the frontend for **in**) reads from the
>>> array in the following way:
>>
>> - read memory barrier
>>> - read *prod*, *cons* from shared memory
>>> - read from array at position *cons* up to *prod*, wrapping around the circular
>>>   buffer when necessary
>>> - memory barrier
>>> - increase *cons*
>>> - notify the other end via evtchn
>>
>> Your barriers are wrong (see corrections above).
> 
> Thanks for the review. Sorry for not specifying the type of memory
> barriers. I agree with some of your suggestions, but I don't understand
> why you moved the read memory barrier before reading prod and cons.
> 
> Shouldn't it be for a producer:
>   - read *cons*, *prod* from shared memory
>   - read memory barrier

Barriers must be paired to enforce the required ordering. So where you
have a barrier before writing cons, you need another barrier before
reading cons on the other side.

This barrier here does nothing because the following access is a write
and there's a data dependency here anyway.

Without a barrier P before reading cons it can see the value after C as
written cons but before C has actually read the requests.

The Linux barrier documentation explains this better than I can.

>> I think you should use a private copy of cons/prod in the
>> consumer/producer and use this to validate that the shared prod/cons is
>> within a sensible range.
> 
> With the masking macro provided (MASK_XEN_9PFS_IDX), indices cannot go
> out of range:
> 
> #define MASK_XEN_9PFS_IDX(idx) ((idx) & (XEN_9PFS_RING_SIZE - 1))

For example, the producer writes cons = 0 and prod = 100000000 so
although you will not index off the array, the consumer will end up
processing 1000s of bogus entries.

There have been XSAs where backends did not validate prod in this way.

David

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [DOC v1] Xen transport for 9pfs
  2016-12-02  0:29   ` Stefano Stabellini
  2016-12-02 10:45     ` David Vrabel
@ 2016-12-02 10:57     ` Andrew Cooper
  1 sibling, 0 replies; 10+ messages in thread
From: Andrew Cooper @ 2016-12-02 10:57 UTC (permalink / raw)
  To: Stefano Stabellini, David Vrabel; +Cc: xen-devel, wei.liu2

On 02/12/16 00:29, Stefano Stabellini wrote:
> On Wed, 30 Nov 2016, David Vrabel wrote:
>> On 29/11/16 23:34, Stefano Stabellini wrote:
>>> The producer (the backend for **in**, the frontend for **out**) writes to the
>>> array in the following way:
>>>
>> - read memory barrier
>>> - read *cons*, *prod* from shared memory
>>> - write to array at position *prod* up to *cons*, wrapping around the circular
>>>   buffer when necessary
>> - write memory barrier
>>> - increase *prod*
>>> - notify the other end via evtchn
>>>
>>> The consumer (the backend for **out**, the frontend for **in**) reads from the
>>> array in the following way:
>> - read memory barrier
>>> - read *prod*, *cons* from shared memory
>>> - read from array at position *cons* up to *prod*, wrapping around the circular
>>>   buffer when necessary
>>> - memory barrier
>>> - increase *cons*
>>> - notify the other end via evtchn
>> Your barriers are wrong (see corrections above).
> Thanks for the review. Sorry for not specifying the type of memory
> barriers. I agree with some of your suggestions, but I don't understand
> why you moved the read memory barrier before reading prod and cons.
>
> Shouldn't it be for a producer:
>   - read *cons*, *prod* from shared memory
>   - read memory barrier
>   - write to array at position *prod* up to *cons*, wrapping around the circular
>     buffer when necessary
>   - write memory barrier
>   - increase *prod*
>   - notify the other end via evtchn
>  
> and for a consumer:
>   - read *prod*, *cons* from shared memory
>   - read memory barrier
>   - read from array at position *cons* up to *prod*, wrapping around the circular
>     buffer when necessary
>   - general memory barrier
>   - increase *cons*
>   - notify the other end via evtchn
>
> If this is wrong, could you please explain why?

It is wrong.  A read barrier is used to separate two reads which must
happen in a certain order.  A write barrier separates two writes.  A
full memory barrier is required to separate a read from a write.

e.g.

foo = a;
smp_rmb();
bar = b;

and

a = foo';
smp_wmb();
b = bar';

Read and write barriers must come in matched pairs excluding the same
fields.  If your pattern looks different to that, then you are using
them wrong.

If instead, you need to exclude reads from writes, then you must use
something looking like:

foo = a;
smp_mb();
b = bar;

on both sides.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [DOC v1] Xen transport for 9pfs
  2016-12-02 10:45     ` David Vrabel
@ 2016-12-02 21:45       ` Stefano Stabellini
  0 siblings, 0 replies; 10+ messages in thread
From: Stefano Stabellini @ 2016-12-02 21:45 UTC (permalink / raw)
  To: David Vrabel; +Cc: Stefano Stabellini, xen-devel, wei.liu2

On Fri, 2 Dec 2016, David Vrabel wrote:
> On 02/12/16 00:29, Stefano Stabellini wrote:
> > On Wed, 30 Nov 2016, David Vrabel wrote:
> >> On 29/11/16 23:34, Stefano Stabellini wrote:
> >>>
> >>> The producer (the backend for **in**, the frontend for **out**) writes to the
> >>> array in the following way:
> >>>
> >> - read memory barrier
> >>> - read *cons*, *prod* from shared memory
> >>> - write to array at position *prod* up to *cons*, wrapping around the circular
> >>>   buffer when necessary
> >> - write memory barrier
> >>> - increase *prod*
> >>> - notify the other end via evtchn
> >>>
> >>> The consumer (the backend for **out**, the frontend for **in**) reads from the
> >>> array in the following way:
> >>
> >> - read memory barrier
> >>> - read *prod*, *cons* from shared memory
> >>> - read from array at position *cons* up to *prod*, wrapping around the circular
> >>>   buffer when necessary
> >>> - memory barrier
> >>> - increase *cons*
> >>> - notify the other end via evtchn
> >>
> >> Your barriers are wrong (see corrections above).
> > 
> > Thanks for the review. Sorry for not specifying the type of memory
> > barriers. I agree with some of your suggestions, but I don't understand
> > why you moved the read memory barrier before reading prod and cons.
> > 
> > Shouldn't it be for a producer:
> >   - read *cons*, *prod* from shared memory
> >   - read memory barrier
> 
> Barriers must be paired to enforce the required ordering. So where you
> have a barrier before writing cons, you need another barrier before
> reading cons on the other side.
> 
> This barrier here does nothing because the following access is a write
> and there's a data dependency here anyway.

You are right that this should be a general memory barrier as the
following is a write. You are also right that it looks superfluous
because there is a data dependency, but it is necessary to match the
barrier on the consumer side, see below.


> Without a barrier P before reading cons it can see the value after C as
> written cons but before C has actually read the requests.
> 
> The Linux barrier documentation explains this better than I can.

This is an example from Documentation/memory-barriers.txt:

	CPU 1                               CPU 2
	===================                 ===================
	WRITE_ONCE(a, 1);    }----   --->{  v = READ_ONCE(c);
	WRITE_ONCE(b, 2);    }    \ /    {  w = READ_ONCE(d);
	<write barrier>            \        <read barrier>
	WRITE_ONCE(c, 3);    }    / \    {  x = READ_ONCE(a);
	WRITE_ONCE(d, 4);    }----   --->{  y = READ_ONCE(b);

I'll adapt this example to our case:

    CPU1 == Producer                    CPU2 == Consumer
    ===================                 ===================
    WRITE DATA TO RING                  READ PROD
    <write barrier>                     <read barrier>
    WRITE PROD                          READ DATA FROM RING

It is exactly the same. If you agree that this is correct so far, I'll
add the other half of it:

    CPU1 == Producer                    CPU2 == Consumer
    ===================                 ===================
    READ CONS                           READ DATA FROM RING
    <general barrier>                   <general barrier>
    WRITE DATA TO RING                  WRITE CONS

There isn't exactly a matching example in
Documentation/memory-barriers.txt, but I think is correct.
All together:

    CPU1 == Producer                    CPU2 == Consumer
    ===================                 ===================
    READ CONS                           READ PROD
    <general barrier 1>                 <read barrier 3>
    WRITE DATA TO RING                  READ DATA FROM RING
    <write barrier 2>                   <general barrier 4>
    WRITE PROD                          WRITE CONS

1 is pared with 4
2 is pared with 3

If there is a problem with this, I don't see it, please explain.


> >> I think you should use a private copy of cons/prod in the
> >> consumer/producer and use this to validate that the shared prod/cons is
> >> within a sensible range.
> > 
> > With the masking macro provided (MASK_XEN_9PFS_IDX), indices cannot go
> > out of range:
> > 
> > #define MASK_XEN_9PFS_IDX(idx) ((idx) & (XEN_9PFS_RING_SIZE - 1))
> 
> For example, the producer writes cons = 0 and prod = 100000000 so
> although you will not index off the array, the consumer will end up
> processing 1000s of bogus entries.

I see, I'll write something about it.


> There have been XSAs where backends did not validate prod in this way.

Really? I thought that the XSAs were about reading the same field twice.
So for example:

cons = ring->cons;
prod = ring->prod;
<some barrier>
do_stuff1();
do_stuff2(ring->prod);

Maybe I am thinking of a different XSA?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [DOC v1] Xen transport for 9pfs
  2016-12-02  8:54     ` Dario Faggioli
@ 2016-12-02 22:01       ` Stefano Stabellini
  0 siblings, 0 replies; 10+ messages in thread
From: Stefano Stabellini @ 2016-12-02 22:01 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: Stefano Stabellini, xen-devel, wei.liu2

[-- Attachment #1: Type: TEXT/PLAIN, Size: 7012 bytes --]

On Fri, 2 Dec 2016, Dario Faggioli wrote:
> On Thu, 2016-12-01 at 15:14 -0800, Stefano Stabellini wrote:
> > On Thu, 1 Dec 2016, Dario Faggioli wrote:
> > > 
> > > On Tue, 2016-11-29 at 15:34 -0800, Stefano Stabellini wrote:
> > > > 
> > > >     ring-ref-<num> (ring-ref-0, ring-ref-1, etc)
> > > > 
> > > blkif uses ring-ref%u, rather than ring-ref-%u (i.e., no dash
> > > before
> > > the index). Not a big deal, I guess, but I thought it could be nice
> > > to
> > > be a bit more uniform.
> > 
> > Sure, but in this case each ring-ref-%u is used to map a different
> > ring.
> >
> Yeah, right. So it may even be a good thing to differentiate, indeed...
> 
> > That said, I can make the change.
> > 
> I don't know. I, FWIW, thought it would be good, now I'm not so sure
> any longer. Yours and maintainers' call, I guess. :-)
> 
> > > If it is, what's the typical envisioned use of these multiple
> > > rings, if
> > > I can ask?
> > 
> > They are used to handle multiple read/write requests in parallel.
> > Let's
> > assume that we configure the rings to be 8K each. Given that the data
> > is
> > transmitted over the ring, each ring can hold only one outstanding 4K
> > write request (there is an header for each write request).
> > 
> Ok.
> 
> > With two 8K rings, we can have two outstanding 4K write requests,
> > each
> > of them processed in parallel on a different vcpu.
> > 
> > The system is completely configurable in terms of number and size of
> > rings, so a user can configure it to only export one 4K ring for
> > example or go as far as several 2MB rings.
> > 
> Right. So, it is indeed similar to blkif multiqueueing, with which it
> also shares the idea/objective of exploiting parallelism at the (v)CPU
> level, but without (quite obviously, in this case) any direct link to
> hardware queues in disk controllers, and without the protocol itself
> giving any direction or indication of how to actually use all this.
> 
> Got it. Nice.
> 
> FWIW, I think a few words --just a shorter version of what you just
> said-- may be useful if present in this document.

I'll do.


> > > >     /* not actually C compliant (ring_order changes from socket
> > > > to
> > > > socket) */
> > > >     struct ring_data {
> > > >         char in[((1 << ring_order) << PAGE_SHIFT) / 2];
> > > >         char out[((1 << ring_order) << PAGE_SHIFT) / 2];
> > > >     };
> > > > 
> > > Sorry, what does "ring_order changes from socket to socket" mean?
> > 
> > Woops, copy/paste error from PVCalls. I meant "ring_order changes
> > from
> > ring to ring".
> > 
> Ah, yes, now it makes sense. :-)
> 
> BTW, what's the reason for putting ring_order inside xen_9pfs_intf,
> instead of having a ring-page-order (well, actually, a
> ring-%u-page-order) xenstore key?

So that each ring can have a different size.


> > > > The binary layout of `struct xen_9pfs_intf` follows:
> > > > 
> > > >     0         4         8         12        16        20
> > > >     +---------+---------+---------+---------+---------+
> > > >     | in_cons | in_prod |out_cons |out_prod |ring_orde|
> > > >     +---------+---------+---------+---------+---------+
> > > > 
> > > >     20        24        26      4092      4096
> > > >     +---------+---------+----//---+---------+
> > > >     |  ref[0] |  ref[1] |         |  ref[N] |
> > > >     +---------+---------+----//---+---------+
> > > > 
> > > > **N.B** For one page, N is maximum 1019 ((4096-20)/4), but given
> > > > that
> > > > N
> > > > needs to be a power of two, actually max N is 512.
> > > > 
> > > It may again be me being still too naive, but I'd quickly add at
> > > least
> > > another example, with the value of N computed for a multiple pages
> > > ring. Something like: "For 4 pages (i.e., ring_orfer=2), N is..."
> > 
> > For 4 pages, N is 4. N is the number of pages that make up the ring.
> > 
> > Maybe there is a misunderstanding, let me try to explain it again:
> > each
> > page shared via xenstore contains information to handle one new ring,
> > including grant references for the pages making up the multipage ring
> > itself. I'll repeat: pages shared via xenstore are not used as a
> > ring,
> > they are used to setup the rings, each page has the info to setup a
> > new
> > ring.
> > 
> Right, I got this. And indeed I expressed myself very badly above.
> 
> So, the descriptor of 1 ring is one page. Such page contains, in signle
> page rings, the reference to another page, which is the actual ring. If
> the ring is multi-page, the descriptor page contains an array of page
> references which, together, are the actual ring.

Right


> Such array --of which N is, in the diagram above, the last index-- can
> be, as you say, up to 1019 elements big (the available space in a ring
> descriptor page). Therefore, the math I was asking about is really the
> relationship between N and max-ring-page-order. That is, a ring can
> have at most 2^max-ring-page-order pages, and N can be at most 1019
> (well, I think it's 1018 if, as in diagram above, you count from 0, but
> that does not matter much); so:
> 
>  2^max-ring-page-order <= N
>  lb(2^max-ring-page-order) <= lb(N)  //lb(): base 2 logarithm
>  max-ring-page-order <= lb(N)
> 
> and, considering that max-ring-page-order must be a natural number:
> 
>  max-ring-page-order <= floor(lb(N))
>  max-ring-page-order <= floor(lb(1018))
>  max-ring-page-order <= floor(9.9915)
>  max-ring-page-order <= 9
> 
> so a ring can be at most 2^9 pages big, which indeed matches with your
> own calculations, and bring us to the fact that the maximum size of a
> ring is 512*4Kb=2Mb
> 
> So, to recap (sorry for being so long!), I think that saying:
> 
> "**N.B** For one page, N is maximum 1019 ((4096-20)/4), but given that
> N needs to be a power of two, actually max N is 512."
> 
> is indeed correct, and probably makes it enough clear that the maximum
> ring size is 2MB. It's not equally easy, IMO, to map this back to the
> fact that this also mean max-ring-page-order must be at most 9, and
> that is not spelled out anywhere else, AFAICT.
> 
> Therefore, an example of how things look with a couple of different
> values of ring_order, or some shorter and less boring version of this
> reasoning and calculations may help with that. That's what I'm trying
> to say. :-)

I'll expand the N.B. to cover this.


> > The structure of these "setup pages" is `struct xen_9pfs_intf`. Each
> > page is completely separate and independent from the others. Given
> > that
> > one page is just 4096 bytes, it can contain max 512 grant refs (see
> > calculation above). So the max size of one multipage ring is 512
> > pages =
> > 2MB.
> > 
> > Does it make sense?
> >
> It does, and this is probably me being a mix of, not too used to this,
> and too picky... If that's the case, sorry for the noise. :-D

No worries, thanks for the review!

[-- Attachment #2: Type: text/plain, Size: 127 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-12-02 22:01 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-29 23:34 [DOC v1] Xen transport for 9pfs Stefano Stabellini
2016-11-30 10:59 ` David Vrabel
2016-12-02  0:29   ` Stefano Stabellini
2016-12-02 10:45     ` David Vrabel
2016-12-02 21:45       ` Stefano Stabellini
2016-12-02 10:57     ` Andrew Cooper
2016-12-01 10:56 ` Dario Faggioli
2016-12-01 23:14   ` Stefano Stabellini
2016-12-02  8:54     ` Dario Faggioli
2016-12-02 22:01       ` Stefano Stabellini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.