Re: [PATCH v2] dm_op: Add xendevicemodel_modified_memory_bulk.

From: "Jan Beulich" <JBeulich@suse.com>
To: Jennifer Herbert <Jennifer.Herbert@citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>,
	PaulDurrant <paul.durrant@citrix.com>,
	Wei Liu <wei.liu2@citrix.com>,
	Ian Jackson <ian.jackson@eu.citrix.com>,
	Xen-devel <xen-devel@lists.xen.org>
Subject: Re: [PATCH v2] dm_op: Add xendevicemodel_modified_memory_bulk.
Date: Thu, 23 Mar 2017 02:35:45 -0600	[thread overview]
Message-ID: <58D396F10200007800146A01@prv-mh.provo.novell.com> (raw)
In-Reply-To: <58D2D6BE.7000705@citrix.com>

>>> On 22.03.17 at 20:55, <Jennifer.Herbert@citrix.com> wrote:
> On 22/03/17 10:33, Jan Beulich wrote:
>>>>> On 21.03.17 at 14:59, <jennifer.herbert@citrix.com> wrote:
>>> This version of this patch removes the need for the 'offset' parameter,
>>> by instead reducing nr_extents, and working backwards from the end of
>>> the array.
>>>
>>> This patch also removes the need to ever write back the passed array of
>>> extents to the guest, but instead putting its partial progress in a
>>> 'pfns_processed' parameter, which replaces the old 'offset' parameter.
>>> As with the 'offest' parameter, 'pfns_processed' should be set to 0 on
>>> calling.
>> So how is the need for 'pfns_processed' better than the prior need
>> for 'offset'? I continue to not see why you'd need any extra field, as
>> you can write back to 'first_pfn' and 'nr' just like the old code did. (But
>> note that I'm not outright objecting to this solution, I'd first like to
>> understand why it was chosen.)
>>
> 
> With v1 of the patch, on continuation, two buffers get written back to 
> guest, one
> to update 'offset', and one to update first_pfn (in the array). Although 
> I can't think of an example
> where this would be a problem, I think that semi-randomly altering items 
> in the passed
> array may not be expected, and if that data was re-used for something, 
> could cause a bug.
> 
> There is precedence for changing the op buffer,  and using 
> pfns_processed means we never have
> to change this array, which I think is much cleaner, intuitive, and 
> halving the copy backs.
> 
> I considered your suggestion of modifying the handle array, but I think 
> this would make the code much harder to follow,  and still require data 
> written back to the guest - just in a different place. I thought just 
> reducing the nr_extents a cleaner solution.

This backwards processing was a neat idea actually, since here
we definitely don't have any ordering requirement.

> I can't see the problem with having an argument for continuation - the 
> xen_dm_op will remain the same size, if we use the argument space or 
> not.  If its just about how the API looks, I find this highly subjective 
> - its no secret that continuations happen, and data gets stashed in 
> these parameters - and so i see no reason why having a dedicated 
> parameter for this is a problem.
> 
> If we want to hide this continuation information, we could define:
> 
> struct xen_dm_op_modified_memory {
>      /* IN - number of extents. */
>      uint32_t nr_extents;
> }
> 
> struct xen_dm_op_modified_memory modified_memory_expanded {
>      struct xen_dm_op_modified_memory modified_memory;
>      uint32_t  pfns_processed
> }
> 
> and include both structures in the xen_dm_op union.
> The caller would use xen_dm_op_modified_memory modified_memory, and due 
> to the memset(&op, 0, sizeof(op));, the value for pfns_processed would 
> end up as zero.  Xen could use expanded structure, and make use of 
> pfns_processed.

I don't think this would be helpful.

> Or, we could re-purpose the 'pad' member of struct xen_dm_op, call it 
> continuation_data, and have this as a general purpose continuation 
> variable, that the caller wouldn't pay any attention to.
> 
> What do you think?

No, that's not really an option here imo, as in the general case we'd
need this to be a 64-bit value. It just so happens that you can get
away with a 32-bit one because you limit the input value to 32 bits.

>>> --- a/xen/arch/x86/hvm/dm.c
>>> +++ b/xen/arch/x86/hvm/dm.c
>>> @@ -119,56 +119,89 @@ static int set_isa_irq_level(struct domain *d, uint8_t isa_irq,
>>>   }
>>>   
>>>   static int modified_memory(struct domain *d,
>>> -                           struct xen_dm_op_modified_memory *data)
>>> +                           struct xen_dm_op_modified_memory *header,
>>> +                           xen_dm_op_buf_t* buf)
>>>   {
>>> -    xen_pfn_t last_pfn = data->first_pfn + data->nr - 1;
>>> -    unsigned int iter = 0;
>>> +    /* Process maximum of 256 pfns before checking for continuation */
>>> +    const unsigned int cont_check_interval = 0xff;
>> Iirc the question was asked on v1 already: 256 (as the comment
>> says) or 255 (as the value enforces)?
> 
> aw shucks!  I had it as 255, and Paul said "255 or 256?" and I changed 
> it to 256, without thinking, assuming an off by one error, and that he 
> was right.  But, with a seconds thought, it should be 255, and Paul  was 
> actually referring to a comment later in the code, which was 256.

Yet batches of 256 would seem a little more "usual".

>>>       int rc = 0;
>>>       if ( copy_from_guest_offset(&extent, buf->h, rem_extents - 1, 1) )
>>> +            return -EFAULT;
>>> +
>>> +        pfn = extent.first_pfn + header->pfns_processed;
>>> +        batch_nr = extent.nr - header->pfns_processed;
>> Even if this looks to be harmless to the hypervisor, I dislike the
>> overflows you allow for here.
> 
> I'll add a check for (header->pfns_processed > extent.nr), returning 
> -EINVAL.

>= actually, I think.

>>> @@ -441,13 +474,8 @@ static int dm_op(domid_t domid,
>>>           struct xen_dm_op_modified_memory *data =
>>>               &op.u.modified_memory;
>>>   
>>> -        const_op = false;
>>> -
>>> -        rc = -EINVAL;
>>> -        if ( data->pad )
>>> -            break;
>> Where did this check go?
> 
> data->pad is undefined here.  But I could add a check for extent.pad in 
> the modified_memory function.

That's what I'm asking for. Talking of which - is there a way for the
caller to know on which extent an error (if any) has occurred? This
certainly needs to be possible.

>>> --- a/xen/include/public/hvm/dm_op.h
>>> +++ b/xen/include/public/hvm/dm_op.h
>>> @@ -237,13 +237,24 @@ struct xen_dm_op_set_pci_link_route {
>>>    * XEN_DMOP_modified_memory: Notify that a set of pages were modified by
>>>    *                           an emulator.
>>>    *
>>> - * NOTE: In the event of a continuation, the @first_pfn is set to the
>>> - *       value of the pfn of the remaining set of pages and @nr reduced
>>> - *       to the size of the remaining set.
>>> + * DMOP buf 1 contains an array of xen_dm_op_modified_memory_extent with
>>> + * @nr_extent entries.  The value of  @nr_extent will be reduced on
>>> + *  continuation.
>>> + *
>>> + * @pfns_processed is used for continuation, and not intended to be usefull
>>> + * to the caller.  It gives the number of pfns already processed (and can
>>> + * be skipped) in the last extent. This should always be set to zero.
>>>    */
>>>   #define XEN_DMOP_modified_memory 11
>>>   
>>>   struct xen_dm_op_modified_memory {
>>> +    /* IN - number of extents. */
>>> +    uint32_t nr_extents;
>>> +    /* IN - should be set to 0, and is updated on continuation. */
>>> +    uint32_t pfns_processed;
>> I'd prefer if the field (if to be kept) wasn't named after its current
>> purpose, nor should we state anything beyond it needing to be
>> zero when first invoking the call. These are implementation details
>> which callers should not rely on.
> 
> Assuming we keep it, how about calling it "reserved", with a comment in 
> dm.c explaining what its actually for?

Elsewhere we use "opaque", but "reserved" is probably fine too.
However, we may want to name it as an OUT value, for the
error-on-extent indication mentioned above (of course another
option would be to make nr_extent IN/OUT). As an OUT, we're
free to use it for whatever intermediate value, just so long as
upon final return to the caller it has the intended value. (It's
debatable whether it should be IN/OUT, due to the need for it
to be set to zero.)

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel