On Fri, 2 Dec 2016, Dario Faggioli wrote: > On Thu, 2016-12-01 at 15:14 -0800, Stefano Stabellini wrote: > > On Thu, 1 Dec 2016, Dario Faggioli wrote: > > > > > > On Tue, 2016-11-29 at 15:34 -0800, Stefano Stabellini wrote: > > > >  > > > >     ring-ref- (ring-ref-0, ring-ref-1, etc) > > > > > > > blkif uses ring-ref%u, rather than ring-ref-%u (i.e., no dash > > > before > > > the index). Not a big deal, I guess, but I thought it could be nice > > > to > > > be a bit more uniform. > > > > Sure, but in this case each ring-ref-%u is used to map a different > > ring. > > > Yeah, right. So it may even be a good thing to differentiate, indeed... > > > That said, I can make the change. > > > I don't know. I, FWIW, thought it would be good, now I'm not so sure > any longer. Yours and maintainers' call, I guess. :-) > > > > If it is, what's the typical envisioned use of these multiple > > > rings, if > > > I can ask? > > > > They are used to handle multiple read/write requests in parallel. > > Let's > > assume that we configure the rings to be 8K each. Given that the data > > is > > transmitted over the ring, each ring can hold only one outstanding 4K > > write request (there is an header for each write request). > > > Ok. > > > With two 8K rings, we can have two outstanding 4K write requests, > > each > > of them processed in parallel on a different vcpu. > > > > The system is completely configurable in terms of number and size of > > rings, so a user can configure it to only export one 4K ring for > > example or go as far as several 2MB rings. > > > Right. So, it is indeed similar to blkif multiqueueing, with which it > also shares the idea/objective of exploiting parallelism at the (v)CPU > level, but without (quite obviously, in this case) any direct link to > hardware queues in disk controllers, and without the protocol itself > giving any direction or indication of how to actually use all this. > > Got it. Nice. > > FWIW, I think a few words --just a shorter version of what you just > said-- may be useful if present in this document. I'll do. > > > >     /* not actually C compliant (ring_order changes from socket > > > > to > > > > socket) */ > > > >     struct ring_data { > > > >         char in[((1 << ring_order) << PAGE_SHIFT) / 2]; > > > >         char out[((1 << ring_order) << PAGE_SHIFT) / 2]; > > > >     }; > > > > > > > Sorry, what does "ring_order changes from socket to socket" mean? > > > > Woops, copy/paste error from PVCalls. I meant "ring_order changes > > from > > ring to ring". > > > Ah, yes, now it makes sense. :-) > > BTW, what's the reason for putting ring_order inside xen_9pfs_intf, > instead of having a ring-page-order (well, actually, a > ring-%u-page-order) xenstore key? So that each ring can have a different size. > > > > The binary layout of `struct xen_9pfs_intf` follows: > > > > > > > >     0         4         8         12        16        20 > > > >     +---------+---------+---------+---------+---------+ > > > >     | in_cons | in_prod |out_cons |out_prod |ring_orde| > > > >     +---------+---------+---------+---------+---------+ > > > > > > > >     20        24        26      4092      4096 > > > >     +---------+---------+----//---+---------+ > > > >     |  ref[0] |  ref[1] |         |  ref[N] | > > > >     +---------+---------+----//---+---------+ > > > > > > > > **N.B** For one page, N is maximum 1019 ((4096-20)/4), but given > > > > that > > > > N > > > > needs to be a power of two, actually max N is 512. > > > > > > > It may again be me being still too naive, but I'd quickly add at > > > least > > > another example, with the value of N computed for a multiple pages > > > ring. Something like: "For 4 pages (i.e., ring_orfer=2), N is..." > > > > For 4 pages, N is 4. N is the number of pages that make up the ring. > > > > Maybe there is a misunderstanding, let me try to explain it again: > > each > > page shared via xenstore contains information to handle one new ring, > > including grant references for the pages making up the multipage ring > > itself. I'll repeat: pages shared via xenstore are not used as a > > ring, > > they are used to setup the rings, each page has the info to setup a > > new > > ring. > > > Right, I got this. And indeed I expressed myself very badly above. > > So, the descriptor of 1 ring is one page. Such page contains, in signle > page rings, the reference to another page, which is the actual ring. If > the ring is multi-page, the descriptor page contains an array of page > references which, together, are the actual ring. Right > Such array --of which N is, in the diagram above, the last index-- can > be, as you say, up to 1019 elements big (the available space in a ring > descriptor page). Therefore, the math I was asking about is really the > relationship between N and max-ring-page-order. That is, a ring can > have at most 2^max-ring-page-order pages, and N can be at most 1019 > (well, I think it's 1018 if, as in diagram above, you count from 0, but > that does not matter much); so: > >  2^max-ring-page-order <= N >  lb(2^max-ring-page-order) <= lb(N)  //lb(): base 2 logarithm >  max-ring-page-order <= lb(N) > > and, considering that max-ring-page-order must be a natural number: > >  max-ring-page-order <= floor(lb(N)) >  max-ring-page-order <= floor(lb(1018)) >  max-ring-page-order <= floor(9.9915) >  max-ring-page-order <= 9 > > so a ring can be at most 2^9 pages big, which indeed matches with your > own calculations, and bring us to the fact that the maximum size of a > ring is 512*4Kb=2Mb > > So, to recap (sorry for being so long!), I think that saying: > > "**N.B** For one page, N is maximum 1019 ((4096-20)/4), but given that > N needs to be a power of two, actually max N is 512." > > is indeed correct, and probably makes it enough clear that the maximum > ring size is 2MB. It's not equally easy, IMO, to map this back to the > fact that this also mean max-ring-page-order must be at most 9, and > that is not spelled out anywhere else, AFAICT. > > Therefore, an example of how things look with a couple of different > values of ring_order, or some shorter and less boring version of this > reasoning and calculations may help with that. That's what I'm trying > to say. :-) I'll expand the N.B. to cover this. > > The structure of these "setup pages" is `struct xen_9pfs_intf`. Each > > page is completely separate and independent from the others. Given > > that > > one page is just 4096 bytes, it can contain max 512 grant refs (see > > calculation above). So the max size of one multipage ring is 512 > > pages = > > 2MB. > > > > Does it make sense? > > > It does, and this is probably me being a mix of, not too used to this, > and too picky... If that's the case, sorry for the noise. :-D No worries, thanks for the review!