All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jan Beulich" <JBeulich@suse.com>
To: Jennifer Herbert <Jennifer.Herbert@citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>,
	PaulDurrant <paul.durrant@citrix.com>,
	Wei Liu <wei.liu2@citrix.com>,
	Ian Jackson <ian.jackson@eu.citrix.com>,
	Xen-devel <xen-devel@lists.xen.org>
Subject: Re: [PATCH v2] dm_op: Add xendevicemodel_modified_memory_bulk.
Date: Thu, 23 Mar 2017 02:35:45 -0600	[thread overview]
Message-ID: <58D396F10200007800146A01@prv-mh.provo.novell.com> (raw)
In-Reply-To: <58D2D6BE.7000705@citrix.com>

>>> On 22.03.17 at 20:55, <Jennifer.Herbert@citrix.com> wrote:
> On 22/03/17 10:33, Jan Beulich wrote:
>>>>> On 21.03.17 at 14:59, <jennifer.herbert@citrix.com> wrote:
>>> This version of this patch removes the need for the 'offset' parameter,
>>> by instead reducing nr_extents, and working backwards from the end of
>>> the array.
>>>
>>> This patch also removes the need to ever write back the passed array of
>>> extents to the guest, but instead putting its partial progress in a
>>> 'pfns_processed' parameter, which replaces the old 'offset' parameter.
>>> As with the 'offest' parameter, 'pfns_processed' should be set to 0 on
>>> calling.
>> So how is the need for 'pfns_processed' better than the prior need
>> for 'offset'? I continue to not see why you'd need any extra field, as
>> you can write back to 'first_pfn' and 'nr' just like the old code did. (But
>> note that I'm not outright objecting to this solution, I'd first like to
>> understand why it was chosen.)
>>
> 
> With v1 of the patch, on continuation, two buffers get written back to 
> guest, one
> to update 'offset', and one to update first_pfn (in the array). Although 
> I can't think of an example
> where this would be a problem, I think that semi-randomly altering items 
> in the passed
> array may not be expected, and if that data was re-used for something, 
> could cause a bug.
> 
> There is precedence for changing the op buffer,  and using 
> pfns_processed means we never have
> to change this array, which I think is much cleaner, intuitive, and 
> halving the copy backs.
> 
> I considered your suggestion of modifying the handle array, but I think 
> this would make the code much harder to follow,  and still require data 
> written back to the guest - just in a different place. I thought just 
> reducing the nr_extents a cleaner solution.

This backwards processing was a neat idea actually, since here
we definitely don't have any ordering requirement.

> I can't see the problem with having an argument for continuation - the 
> xen_dm_op will remain the same size, if we use the argument space or 
> not.  If its just about how the API looks, I find this highly subjective 
> - its no secret that continuations happen, and data gets stashed in 
> these parameters - and so i see no reason why having a dedicated 
> parameter for this is a problem.
> 
> If we want to hide this continuation information, we could define:
> 
> struct xen_dm_op_modified_memory {
>      /* IN - number of extents. */
>      uint32_t nr_extents;
> }
> 
> struct xen_dm_op_modified_memory modified_memory_expanded {
>      struct xen_dm_op_modified_memory modified_memory;
>      uint32_t  pfns_processed
> }
> 
> and include both structures in the xen_dm_op union.
> The caller would use xen_dm_op_modified_memory modified_memory, and due 
> to the memset(&op, 0, sizeof(op));, the value for pfns_processed would 
> end up as zero.  Xen could use expanded structure, and make use of 
> pfns_processed.

I don't think this would be helpful.

> Or, we could re-purpose the 'pad' member of struct xen_dm_op, call it 
> continuation_data, and have this as a general purpose continuation 
> variable, that the caller wouldn't pay any attention to.
> 
> What do you think?

No, that's not really an option here imo, as in the general case we'd
need this to be a 64-bit value. It just so happens that you can get
away with a 32-bit one because you limit the input value to 32 bits.

>>> --- a/xen/arch/x86/hvm/dm.c
>>> +++ b/xen/arch/x86/hvm/dm.c
>>> @@ -119,56 +119,89 @@ static int set_isa_irq_level(struct domain *d, uint8_t isa_irq,
>>>   }
>>>   
>>>   static int modified_memory(struct domain *d,
>>> -                           struct xen_dm_op_modified_memory *data)
>>> +                           struct xen_dm_op_modified_memory *header,
>>> +                           xen_dm_op_buf_t* buf)
>>>   {
>>> -    xen_pfn_t last_pfn = data->first_pfn + data->nr - 1;
>>> -    unsigned int iter = 0;
>>> +    /* Process maximum of 256 pfns before checking for continuation */
>>> +    const unsigned int cont_check_interval = 0xff;
>> Iirc the question was asked on v1 already: 256 (as the comment
>> says) or 255 (as the value enforces)?
> 
> aw shucks!  I had it as 255, and Paul said "255 or 256?" and I changed 
> it to 256, without thinking, assuming an off by one error, and that he 
> was right.  But, with a seconds thought, it should be 255, and Paul  was 
> actually referring to a comment later in the code, which was 256.

Yet batches of 256 would seem a little more "usual".

>>>       int rc = 0;
>>>       if ( copy_from_guest_offset(&extent, buf->h, rem_extents - 1, 1) )
>>> +            return -EFAULT;
>>> +
>>> +        pfn = extent.first_pfn + header->pfns_processed;
>>> +        batch_nr = extent.nr - header->pfns_processed;
>> Even if this looks to be harmless to the hypervisor, I dislike the
>> overflows you allow for here.
> 
> I'll add a check for (header->pfns_processed > extent.nr), returning 
> -EINVAL.

>= actually, I think.

>>> @@ -441,13 +474,8 @@ static int dm_op(domid_t domid,
>>>           struct xen_dm_op_modified_memory *data =
>>>               &op.u.modified_memory;
>>>   
>>> -        const_op = false;
>>> -
>>> -        rc = -EINVAL;
>>> -        if ( data->pad )
>>> -            break;
>> Where did this check go?
> 
> data->pad is undefined here.  But I could add a check for extent.pad in 
> the modified_memory function.

That's what I'm asking for. Talking of which - is there a way for the
caller to know on which extent an error (if any) has occurred? This
certainly needs to be possible.

>>> --- a/xen/include/public/hvm/dm_op.h
>>> +++ b/xen/include/public/hvm/dm_op.h
>>> @@ -237,13 +237,24 @@ struct xen_dm_op_set_pci_link_route {
>>>    * XEN_DMOP_modified_memory: Notify that a set of pages were modified by
>>>    *                           an emulator.
>>>    *
>>> - * NOTE: In the event of a continuation, the @first_pfn is set to the
>>> - *       value of the pfn of the remaining set of pages and @nr reduced
>>> - *       to the size of the remaining set.
>>> + * DMOP buf 1 contains an array of xen_dm_op_modified_memory_extent with
>>> + * @nr_extent entries.  The value of  @nr_extent will be reduced on
>>> + *  continuation.
>>> + *
>>> + * @pfns_processed is used for continuation, and not intended to be usefull
>>> + * to the caller.  It gives the number of pfns already processed (and can
>>> + * be skipped) in the last extent. This should always be set to zero.
>>>    */
>>>   #define XEN_DMOP_modified_memory 11
>>>   
>>>   struct xen_dm_op_modified_memory {
>>> +    /* IN - number of extents. */
>>> +    uint32_t nr_extents;
>>> +    /* IN - should be set to 0, and is updated on continuation. */
>>> +    uint32_t pfns_processed;
>> I'd prefer if the field (if to be kept) wasn't named after its current
>> purpose, nor should we state anything beyond it needing to be
>> zero when first invoking the call. These are implementation details
>> which callers should not rely on.
> 
> Assuming we keep it, how about calling it "reserved", with a comment in 
> dm.c explaining what its actually for?

Elsewhere we use "opaque", but "reserved" is probably fine too.
However, we may want to name it as an OUT value, for the
error-on-extent indication mentioned above (of course another
option would be to make nr_extent IN/OUT). As an OUT, we're
free to use it for whatever intermediate value, just so long as
upon final return to the caller it has the intended value. (It's
debatable whether it should be IN/OUT, due to the need for it
to be set to zero.)

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  reply	other threads:[~2017-03-23  8:35 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-21 13:59 [PATCH v2] dm_op: Add xendevicemodel_modified_memory_bulk Jennifer Herbert
2017-03-22 10:33 ` Jan Beulich
2017-03-22 19:55   ` Jennifer Herbert
2017-03-23  8:35     ` Jan Beulich [this message]
2017-03-23 14:54       ` Jennifer Herbert
2017-03-23 15:58         ` Jan Beulich
2017-03-23 18:44           ` Jennifer Herbert
2017-03-24  7:21             ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=58D396F10200007800146A01@prv-mh.provo.novell.com \
    --to=jbeulich@suse.com \
    --cc=Jennifer.Herbert@citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=paul.durrant@citrix.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.