xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* abstract model of  IOMMU unmaping/mapping failures
@ 2016-03-31  9:06 Xu, Quan
  2016-03-31  9:12 ` Xu, Quan
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Xu, Quan @ 2016-03-31  9:06 UTC (permalink / raw)
  To: xen-devel
  Cc: Tian, Kevin, Wu, Feng, George Dunlap, Liu Jinsong,
	Dario Faggioli, Nakajima, Jun, Andrew Cooper, Keir Fraser

All,

Here is a summary of my investigation of the abstract model:

Below policies are adopted when deciding whether to rollback a callchain:

1. Domain will be crashed immediately within iommu_{,un}map_page, treated as a fatal error (with the exception of the hardware one). Whether to rollback depends on the need of hardware domain;

2. For hardware domain, roll back on a best effort basis. When rollback is not feasible (in early initialization phase or trade-off of complexity), at least unmap upon maps error and then throw out error message;

Below are a detail analysis of all existing callers on IOMMU interfaces (8-11 needs more discussions):

----
1. memory_add(): rollback (no change)

(Existing code)
   >>...
   if ( ret )
        goto destroy_m2p;

    if ( iommu_enabled && !iommu_passthrough && !need_iommu(hardware_domain) )
    {
        for ( i = spfn; i < epfn; i++ )
            if ( iommu_map_page(hardware_domain, i, i, IOMMUF_readable|IOMMUF_writable) )
                break;
        if ( i != epfn )
        {
            while (i-- > old_max)
                iommu_unmap_page(hardware_domain, i);
            goto destroy_m2p;
        }
    } 
   <<...
Existing code already rolls back through destroy_m2p cleanly.

----
2. vtd_set_hwdom_mapping(): no rollback (no change).

It is in early initialization phase. A quick thought looks a bit tricky to fully rollback this path, so propose to leave as it is.

----
3. __gnttab_map_grant_ref():rollback (no change)


(Existing code)
        >>...
        if ( (act_pin & (GNTPIN_hstw_mask|GNTPIN_devw_mask)) &&
             !(old_pin & (GNTPIN_hstw_mask|GNTPIN_devw_mask)) )
        {
            if ( !(kind & MAPKIND_WRITE) )
                err = iommu_map_page(ld, frame, frame,
                                     IOMMUF_readable|IOMMUF_writable);
        }
        else if ( act_pin && !old_pin )
        {
            if ( !kind )
                err = iommu_map_page(ld, frame, frame, IOMMUF_readable);
        }
        if ( err )
        {
            double_gt_unlock(lgt, rgt);
            rc = GNTST_general_error;
            goto undo_out;
        }
        <<...

rollback is already cleanly handled.

----
4. __gnttab_unmap_common():rollback (no change)

(Existing code)
       >>...
             if ( !kind )
            err = iommu_unmap_page(ld, op->frame);
        else if ( !(kind & MAPKIND_WRITE) )
            err = iommu_map_page(ld, op->frame, op->frame, IOMMUF_readable);

        double_gt_unlock(lgt, rgt);

        if ( err )
            rc = GNTST_general_error;
       <<...

Similarly no change required, as error has been correctly handled.

----
5. guest_physmap_add_entry():rollback (no change)

(Existing code)
   >>...
    rc = iommu_map_page(
        d, mfn + i, mfn + i, IOMMUF_readable|IOMMUF_writable);
    if ( rc != 0 )
    {
        while ( i-- > 0 )
            iommu_unmap_page(d, mfn + i);
        return rc;
    }
   <<...

Above is in the start of the function, so no additional state to be cleared thus it is in good shape too.

----
6. clear_identity_p2m_entry():rollback (no change).

(Existing code)
  >>...
    if ( !paging_mode_translate(d) )
    {
        if ( !need_iommu(d) )
            return 0;
        return iommu_unmap_page(d, gfn);
    }
  <<...

in the start, and error already rolled back to caller.

----
7. set_identity_p2m_entry():rollback (no change).

(Existing code)
  >>...
    if ( !paging_mode_translate(p2m->domain) )
    {
        if ( !need_iommu(d) )
            return 0;
        return iommu_map_page(d, gfn, gfn, IOMMUF_readable|IOMMUF_writable);
    }
  <<...
	 if ( is_hardware_domain(d) && !iommu_use_hap_pt(d) )
        ret = iommu_map_page(d, gfn, gfn, IOMMUF_readable|IOMMUF_writable);
   >> ...

error handling and rollback already in-place.

----
8. p2m_remove_page():rollback one level(change required).

(Existing code)
  >>...
    if ( !paging_mode_translate(p2m->domain) )
    {
        if ( need_iommu(p2m->domain) )
            for ( i = 0; i < (1 << page_order); i++ )
                iommu_unmap_page(p2m->domain, mfn + i);
        return 0;
    }
  <<...

There is no error checking of iommu_unmap_page. We need to add.
However, there are several callers of this function which don't handle error at all. I'm not sure whether hardware domain will use this function.
Since it is in core p2m logic (which I don't have much knowledge), I hope we can print a warning and simply return error here (given the fact that non-hardware domains are all crashed immediately within unmap call)

----
9. p2m_pt_set_entry(): open (propose no-rollback).

(Existing code)
  >>...
           for ( i = 0; i < (1UL << page_order); i++ )
                iommu_map_page(p2m->domain, gfn + i, mfn_x(mfn) + i,
                               iommu_pte_flags);
        else
            for ( i = 0; i < (1UL << page_order); i++ )
                iommu_unmap_page(p2m->domain, gfn + i);
  <<...

Above is in the end of the func.

I'm not sure whether it will be called for hardware domain ( PVH mode is one potential concern. as we know, PVH has been introduced on Xen 4.4 as a DomU, and on Xen 4.5 as a Dom0.).

If not, we can leave it as today. Otherwise, I want to hear your suggestion whether we can use no-rollback (just erring) for hardware domain here since this code path is very complex.

  10. ept_set_entry(): open (propose no-rollback).

(Existing code)
  >>...
            if ( iommu_flags )
                for ( i = 0; i < (1 << order); i++ )
                    iommu_map_page(d, gfn + i, mfn_x(mfn) + i, iommu_flags);
            else
                for ( i = 0; i < (1 << order); i++ )
                    iommu_unmap_page(d, gfn + i);
  <<...

Above is in the end of the func.

Same open as 9) whether we can use no-rollback here.

----
11. __get_page_type(): open (propose no-rollback).
The following is in detail:
    >>...
      if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
        {
            if ( (x & PGT_type_mask) == PGT_writable_page )
                iommu_unmap_page(d, mfn_to_gmfn(d, page_to_mfn(page)));
            else if ( type == PGT_writable_page )
                iommu_map_page(d, mfn_to_gmfn(d, page_to_mfn(page)),
                               page_to_mfn(page),
                               IOMMUF_readable|IOMMUF_writable);
        }
    <<...

It's almost in the end of the func. 
As the limited set of cases where __get_page_type() calls iommu_{,un}map_page(), and a get_page_type() invocation that
previously was known to never fail (or at least so we hope, based on the existing code) [I got it from Jan's reply.],
we can classify it as normal PV domain and be treated as a fatal error. 
So for I am inclined to leave it as today. 


I welcome your comments and feedback!!

Quan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: abstract model of  IOMMU unmaping/mapping failures
  2016-03-31  9:06 abstract model of IOMMU unmaping/mapping failures Xu, Quan
@ 2016-03-31  9:12 ` Xu, Quan
  2016-04-01  0:25 ` Tian, Kevin
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Xu, Quan @ 2016-03-31  9:12 UTC (permalink / raw)
  To: xen-devel, Jan Beulich, Konrad Rzeszutek Wilk
  Cc: Tian, Kevin, Wu, Feng, George Dunlap, Liu Jinsong,
	Dario Faggioli, Nakajima, Jun, Andrew Cooper, Keir Fraser

+to 2 key maintainers, 
 
On March 31, 2016 5:06pm, Xu, Quan <quan.xu@intel.com> wrote:
> All,
> 
> Here is a summary of my investigation of the abstract model:
> 
> Below policies are adopted when deciding whether to rollback a callchain:
> 
> 1. Domain will be crashed immediately within iommu_{,un}map_page, treated
> as a fatal error (with the exception of the hardware one). Whether to rollback
> depends on the need of hardware domain;
> 
> 2. For hardware domain, roll back on a best effort basis. When rollback is not
> feasible (in early initialization phase or trade-off of complexity), at least unmap
> upon maps error and then throw out error message;
> 
> Below are a detail analysis of all existing callers on IOMMU interfaces (8-11
> needs more discussions):
> 
> ----
> 1. memory_add(): rollback (no change)
> 
> (Existing code)
>    >>...
>    if ( ret )
>         goto destroy_m2p;
> 
>     if ( iommu_enabled && !iommu_passthrough
> && !need_iommu(hardware_domain) )
>     {
>         for ( i = spfn; i < epfn; i++ )
>             if ( iommu_map_page(hardware_domain, i, i,
> IOMMUF_readable|IOMMUF_writable) )
>                 break;
>         if ( i != epfn )
>         {
>             while (i-- > old_max)
>                 iommu_unmap_page(hardware_domain, i);
>             goto destroy_m2p;
>         }
>     }
>    <<...
> Existing code already rolls back through destroy_m2p cleanly.
> 
> ----
> 2. vtd_set_hwdom_mapping(): no rollback (no change).
> 
> It is in early initialization phase. A quick thought looks a bit tricky to fully
> rollback this path, so propose to leave as it is.
> 
> ----
> 3. __gnttab_map_grant_ref():rollback (no change)
> 
> 
> (Existing code)
>         >>...
>         if ( (act_pin & (GNTPIN_hstw_mask|GNTPIN_devw_mask)) &&
>              !(old_pin & (GNTPIN_hstw_mask|GNTPIN_devw_mask)) )
>         {
>             if ( !(kind & MAPKIND_WRITE) )
>                 err = iommu_map_page(ld, frame, frame,
> 
> IOMMUF_readable|IOMMUF_writable);
>         }
>         else if ( act_pin && !old_pin )
>         {
>             if ( !kind )
>                 err = iommu_map_page(ld, frame, frame,
> IOMMUF_readable);
>         }
>         if ( err )
>         {
>             double_gt_unlock(lgt, rgt);
>             rc = GNTST_general_error;
>             goto undo_out;
>         }
>         <<...
> 
> rollback is already cleanly handled.
> 
> ----
> 4. __gnttab_unmap_common():rollback (no change)
> 
> (Existing code)
>        >>...
>              if ( !kind )
>             err = iommu_unmap_page(ld, op->frame);
>         else if ( !(kind & MAPKIND_WRITE) )
>             err = iommu_map_page(ld, op->frame, op->frame,
> IOMMUF_readable);
> 
>         double_gt_unlock(lgt, rgt);
> 
>         if ( err )
>             rc = GNTST_general_error;
>        <<...
> 
> Similarly no change required, as error has been correctly handled.
> 
> ----
> 5. guest_physmap_add_entry():rollback (no change)
> 
> (Existing code)
>    >>...
>     rc = iommu_map_page(
>         d, mfn + i, mfn + i, IOMMUF_readable|IOMMUF_writable);
>     if ( rc != 0 )
>     {
>         while ( i-- > 0 )
>             iommu_unmap_page(d, mfn + i);
>         return rc;
>     }
>    <<...
> 
> Above is in the start of the function, so no additional state to be cleared thus it
> is in good shape too.
> 
> ----
> 6. clear_identity_p2m_entry():rollback (no change).
> 
> (Existing code)
>   >>...
>     if ( !paging_mode_translate(d) )
>     {
>         if ( !need_iommu(d) )
>             return 0;
>         return iommu_unmap_page(d, gfn);
>     }
>   <<...
> 
> in the start, and error already rolled back to caller.
> 
> ----
> 7. set_identity_p2m_entry():rollback (no change).
> 
> (Existing code)
>   >>...
>     if ( !paging_mode_translate(p2m->domain) )
>     {
>         if ( !need_iommu(d) )
>             return 0;
>         return iommu_map_page(d, gfn, gfn,
> IOMMUF_readable|IOMMUF_writable);
>     }
>   <<...
> 	 if ( is_hardware_domain(d) && !iommu_use_hap_pt(d) )
>         ret = iommu_map_page(d, gfn, gfn,
> IOMMUF_readable|IOMMUF_writable);
>    >> ...
> 
> error handling and rollback already in-place.
> 
> ----
> 8. p2m_remove_page():rollback one level(change required).
> 
> (Existing code)
>   >>...
>     if ( !paging_mode_translate(p2m->domain) )
>     {
>         if ( need_iommu(p2m->domain) )
>             for ( i = 0; i < (1 << page_order); i++ )
>                 iommu_unmap_page(p2m->domain, mfn + i);
>         return 0;
>     }
>   <<...
> 
> There is no error checking of iommu_unmap_page. We need to add.
> However, there are several callers of this function which don't handle error at
> all. I'm not sure whether hardware domain will use this function.
> Since it is in core p2m logic (which I don't have much knowledge), I hope we can
> print a warning and simply return error here (given the fact that non-hardware
> domains are all crashed immediately within unmap call)
> 
> ----
> 9. p2m_pt_set_entry(): open (propose no-rollback).
> 
> (Existing code)
>   >>...
>            for ( i = 0; i < (1UL << page_order); i++ )
>                 iommu_map_page(p2m->domain, gfn + i, mfn_x(mfn) + i,
>                                iommu_pte_flags);
>         else
>             for ( i = 0; i < (1UL << page_order); i++ )
>                 iommu_unmap_page(p2m->domain, gfn + i);
>   <<...
> 
> Above is in the end of the func.
> 
> I'm not sure whether it will be called for hardware domain ( PVH mode is one
> potential concern. as we know, PVH has been introduced on Xen 4.4 as a DomU,
> and on Xen 4.5 as a Dom0.).
> 
> If not, we can leave it as today. Otherwise, I want to hear your suggestion
> whether we can use no-rollback (just erring) for hardware domain here since
> this code path is very complex.
> 
>   10. ept_set_entry(): open (propose no-rollback).
> 
> (Existing code)
>   >>...
>             if ( iommu_flags )
>                 for ( i = 0; i < (1 << order); i++ )
>                     iommu_map_page(d, gfn + i, mfn_x(mfn) + i,
> iommu_flags);
>             else
>                 for ( i = 0; i < (1 << order); i++ )
>                     iommu_unmap_page(d, gfn + i);
>   <<...
> 
> Above is in the end of the func.
> 
> Same open as 9) whether we can use no-rollback here.
> 
> ----
> 11. __get_page_type(): open (propose no-rollback).
> The following is in detail:
>     >>...
>       if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
>         {
>             if ( (x & PGT_type_mask) == PGT_writable_page )
>                 iommu_unmap_page(d, mfn_to_gmfn(d,
> page_to_mfn(page)));
>             else if ( type == PGT_writable_page )
>                 iommu_map_page(d, mfn_to_gmfn(d, page_to_mfn(page)),
>                                page_to_mfn(page),
>                                IOMMUF_readable|IOMMUF_writable);
>         }
>     <<...
> 
> It's almost in the end of the func.
> As the limited set of cases where __get_page_type() calls
> iommu_{,un}map_page(), and a get_page_type() invocation that previously was
> known to never fail (or at least so we hope, based on the existing code) [I got it
> from Jan's reply.], we can classify it as normal PV domain and be treated as a
> fatal error.
> So for I am inclined to leave it as today.
> 
> 
> I welcome your comments and feedback!!
> 
> Quan
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: abstract model of  IOMMU unmaping/mapping failures
  2016-03-31  9:06 abstract model of IOMMU unmaping/mapping failures Xu, Quan
  2016-03-31  9:12 ` Xu, Quan
@ 2016-04-01  0:25 ` Tian, Kevin
  2016-04-01 11:57 ` Jan Beulich
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Tian, Kevin @ 2016-04-01  0:25 UTC (permalink / raw)
  To: Xu, Quan, xen-devel
  Cc: Wu, Feng, George Dunlap, Liu Jinsong, Dario Faggioli, Nakajima,
	Jun, Andrew Cooper, Keir Fraser

Jan and others, I've reviewed below with Quan internally. Basically I'm OK with the
proposal, but there does be several opens (8-11) which we'd like to know your
opinions. Once they are cleared, next version of the patch should be made quickly...

> From: Xu, Quan
> Sent: Thursday, March 31, 2016 5:06 PM
> 
> All,
> 
> Here is a summary of my investigation of the abstract model:
> 
> Below policies are adopted when deciding whether to rollback a callchain:
> 
> 1. Domain will be crashed immediately within iommu_{,un}map_page, treated as a fatal
> error (with the exception of the hardware one). Whether to rollback depends on the need
> of hardware domain;
> 
> 2. For hardware domain, roll back on a best effort basis. When rollback is not feasible (in
> early initialization phase or trade-off of complexity), at least unmap upon maps error and
> then throw out error message;
> 
> Below are a detail analysis of all existing callers on IOMMU interfaces (8-11 needs more
> discussions):
> 
> ----
> 1. memory_add(): rollback (no change)
> 
> (Existing code)
>    >>...
>    if ( ret )
>         goto destroy_m2p;
> 
>     if ( iommu_enabled && !iommu_passthrough && !need_iommu(hardware_domain) )
>     {
>         for ( i = spfn; i < epfn; i++ )
>             if ( iommu_map_page(hardware_domain, i, i,
> IOMMUF_readable|IOMMUF_writable) )
>                 break;
>         if ( i != epfn )
>         {
>             while (i-- > old_max)
>                 iommu_unmap_page(hardware_domain, i);
>             goto destroy_m2p;
>         }
>     }
>    <<...
> Existing code already rolls back through destroy_m2p cleanly.
> 
> ----
> 2. vtd_set_hwdom_mapping(): no rollback (no change).
> 
> It is in early initialization phase. A quick thought looks a bit tricky to fully rollback this path,
> so propose to leave as it is.
> 
> ----
> 3. __gnttab_map_grant_ref():rollback (no change)
> 
> 
> (Existing code)
>         >>...
>         if ( (act_pin & (GNTPIN_hstw_mask|GNTPIN_devw_mask)) &&
>              !(old_pin & (GNTPIN_hstw_mask|GNTPIN_devw_mask)) )
>         {
>             if ( !(kind & MAPKIND_WRITE) )
>                 err = iommu_map_page(ld, frame, frame,
>                                      IOMMUF_readable|IOMMUF_writable);
>         }
>         else if ( act_pin && !old_pin )
>         {
>             if ( !kind )
>                 err = iommu_map_page(ld, frame, frame, IOMMUF_readable);
>         }
>         if ( err )
>         {
>             double_gt_unlock(lgt, rgt);
>             rc = GNTST_general_error;
>             goto undo_out;
>         }
>         <<...
> 
> rollback is already cleanly handled.
> 
> ----
> 4. __gnttab_unmap_common():rollback (no change)
> 
> (Existing code)
>        >>...
>              if ( !kind )
>             err = iommu_unmap_page(ld, op->frame);
>         else if ( !(kind & MAPKIND_WRITE) )
>             err = iommu_map_page(ld, op->frame, op->frame, IOMMUF_readable);
> 
>         double_gt_unlock(lgt, rgt);
> 
>         if ( err )
>             rc = GNTST_general_error;
>        <<...
> 
> Similarly no change required, as error has been correctly handled.
> 
> ----
> 5. guest_physmap_add_entry():rollback (no change)
> 
> (Existing code)
>    >>...
>     rc = iommu_map_page(
>         d, mfn + i, mfn + i, IOMMUF_readable|IOMMUF_writable);
>     if ( rc != 0 )
>     {
>         while ( i-- > 0 )
>             iommu_unmap_page(d, mfn + i);
>         return rc;
>     }
>    <<...
> 
> Above is in the start of the function, so no additional state to be cleared thus it is in good
> shape too.
> 
> ----
> 6. clear_identity_p2m_entry():rollback (no change).
> 
> (Existing code)
>   >>...
>     if ( !paging_mode_translate(d) )
>     {
>         if ( !need_iommu(d) )
>             return 0;
>         return iommu_unmap_page(d, gfn);
>     }
>   <<...
> 
> in the start, and error already rolled back to caller.
> 
> ----
> 7. set_identity_p2m_entry():rollback (no change).
> 
> (Existing code)
>   >>...
>     if ( !paging_mode_translate(p2m->domain) )
>     {
>         if ( !need_iommu(d) )
>             return 0;
>         return iommu_map_page(d, gfn, gfn, IOMMUF_readable|IOMMUF_writable);
>     }
>   <<...
> 	 if ( is_hardware_domain(d) && !iommu_use_hap_pt(d) )
>         ret = iommu_map_page(d, gfn, gfn, IOMMUF_readable|IOMMUF_writable);
>    >> ...
> 
> error handling and rollback already in-place.
> 
> ----
> 8. p2m_remove_page():rollback one level(change required).
> 
> (Existing code)
>   >>...
>     if ( !paging_mode_translate(p2m->domain) )
>     {
>         if ( need_iommu(p2m->domain) )
>             for ( i = 0; i < (1 << page_order); i++ )
>                 iommu_unmap_page(p2m->domain, mfn + i);
>         return 0;
>     }
>   <<...
> 
> There is no error checking of iommu_unmap_page. We need to add.
> However, there are several callers of this function which don't handle error at all. I'm not
> sure whether hardware domain will use this function.
> Since it is in core p2m logic (which I don't have much knowledge), I hope we can print a
> warning and simply return error here (given the fact that non-hardware domains are all
> crashed immediately within unmap call)
> 
> ----
> 9. p2m_pt_set_entry(): open (propose no-rollback).
> 
> (Existing code)
>   >>...
>            for ( i = 0; i < (1UL << page_order); i++ )
>                 iommu_map_page(p2m->domain, gfn + i, mfn_x(mfn) + i,
>                                iommu_pte_flags);
>         else
>             for ( i = 0; i < (1UL << page_order); i++ )
>                 iommu_unmap_page(p2m->domain, gfn + i);
>   <<...
> 
> Above is in the end of the func.
> 
> I'm not sure whether it will be called for hardware domain ( PVH mode is one potential
> concern. as we know, PVH has been introduced on Xen 4.4 as a DomU, and on Xen 4.5 as
> a Dom0.).
> 
> If not, we can leave it as today. Otherwise, I want to hear your suggestion whether we can
> use no-rollback (just erring) for hardware domain here since this code path is very
> complex.
> 
>   10. ept_set_entry(): open (propose no-rollback).
> 
> (Existing code)
>   >>...
>             if ( iommu_flags )
>                 for ( i = 0; i < (1 << order); i++ )
>                     iommu_map_page(d, gfn + i, mfn_x(mfn) + i, iommu_flags);
>             else
>                 for ( i = 0; i < (1 << order); i++ )
>                     iommu_unmap_page(d, gfn + i);
>   <<...
> 
> Above is in the end of the func.
> 
> Same open as 9) whether we can use no-rollback here.
> 
> ----
> 11. __get_page_type(): open (propose no-rollback).
> The following is in detail:
>     >>...
>       if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
>         {
>             if ( (x & PGT_type_mask) == PGT_writable_page )
>                 iommu_unmap_page(d, mfn_to_gmfn(d, page_to_mfn(page)));
>             else if ( type == PGT_writable_page )
>                 iommu_map_page(d, mfn_to_gmfn(d, page_to_mfn(page)),
>                                page_to_mfn(page),
>                                IOMMUF_readable|IOMMUF_writable);
>         }
>     <<...
> 
> It's almost in the end of the func.
> As the limited set of cases where __get_page_type() calls iommu_{,un}map_page(), and
> a get_page_type() invocation that
> previously was known to never fail (or at least so we hope, based on the existing code)
> [I got it from Jan's reply.],
> we can classify it as normal PV domain and be treated as a fatal error.
> So for I am inclined to leave it as today.
> 
> 
> I welcome your comments and feedback!!
> 
> Quan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: abstract model of  IOMMU unmaping/mapping failures
  2016-03-31  9:06 abstract model of IOMMU unmaping/mapping failures Xu, Quan
  2016-03-31  9:12 ` Xu, Quan
  2016-04-01  0:25 ` Tian, Kevin
@ 2016-04-01 11:57 ` Jan Beulich
  2016-04-06  7:38   ` Xu, Quan
  2016-04-12  3:30 ` Xu, Quan
  2016-04-15 14:02 ` George Dunlap
  4 siblings, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2016-04-01 11:57 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Feng Wu, George Dunlap, Liu Jinsong, Dario Faggioli,
	xen-devel, Jun Nakajima, Andrew Cooper, Keir Fraser

>>> On 31.03.16 at 11:06, <quan.xu@intel.com> wrote:
> 4. __gnttab_unmap_common():rollback (no change)
> 
> (Existing code)
>        >>...
>              if ( !kind )
>             err = iommu_unmap_page(ld, op->frame);
>         else if ( !(kind & MAPKIND_WRITE) )
>             err = iommu_map_page(ld, op->frame, op->frame, IOMMUF_readable);
> 
>         double_gt_unlock(lgt, rgt);
> 
>         if ( err )
>             rc = GNTST_general_error;
>        <<...
> 
> Similarly no change required, as error has been correctly handled.

I wouldn't call this "correctly handled", but for the hardware domain
it should be good enough, and by crashing DomU-s simply reporting
the error up the call tree is sufficient. One question though is
whether the loops higher up the call tree in grant_table.c shouldn't
be exited when noticing the domain has crashed, both to avoid
unnecessary work and to reduce the risk of secondary problems.

> 7. set_identity_p2m_entry():rollback (no change).
> 
> (Existing code)
>   >>...
>     if ( !paging_mode_translate(p2m->domain) )
>     {
>         if ( !need_iommu(d) )
>             return 0;
>         return iommu_map_page(d, gfn, gfn, IOMMUF_readable|IOMMUF_writable);
>     }
>   <<...
> 	 if ( is_hardware_domain(d) && !iommu_use_hap_pt(d) )
>         ret = iommu_map_page(d, gfn, gfn, IOMMUF_readable|IOMMUF_writable);
>    >> ...
> 
> error handling and rollback already in-place.

For the first portion of the quoted code you mean. I don't see any
rollback in the paging_mode_translate() case.

> 8. p2m_remove_page():rollback one level(change required).
> 
> (Existing code)
>   >>...
>     if ( !paging_mode_translate(p2m->domain) )
>     {
>         if ( need_iommu(p2m->domain) )
>             for ( i = 0; i < (1 << page_order); i++ )
>                 iommu_unmap_page(p2m->domain, mfn + i);
>         return 0;
>     }
>   <<...
> 
> There is no error checking of iommu_unmap_page. We need to add.
> However, there are several callers of this function which don't handle error 
> at all. I'm not sure whether hardware domain will use this function.
> Since it is in core p2m logic (which I don't have much knowledge), I hope we 
> can print a warning and simply return error here (given the fact that 
> non-hardware domains are all crashed immediately within unmap call)

Yes, at least error propagation needs to be added here. I don't
think more is required. (Be careful, btw, with adding warnings -
you must not spam the log with such.)

> ----
> 9. p2m_pt_set_entry(): open (propose no-rollback).
> 
> (Existing code)
>   >>...
>            for ( i = 0; i < (1UL << page_order); i++ )
>                 iommu_map_page(p2m->domain, gfn + i, mfn_x(mfn) + i,
>                                iommu_pte_flags);
>         else
>             for ( i = 0; i < (1UL << page_order); i++ )
>                 iommu_unmap_page(p2m->domain, gfn + i);
>   <<...
> 
> Above is in the end of the func.
> 
> I'm not sure whether it will be called for hardware domain ( PVH mode is one 
> potential concern. as we know, PVH has been introduced on Xen 4.4 as a DomU, 
> and on Xen 4.5 as a Dom0.).

Indeed you can get here for Dom0 only when it's non-PV.

> If not, we can leave it as today. Otherwise, I want to hear your suggestion 
> whether we can use no-rollback (just erring) for hardware domain here since 
> this code path is very complex.

Of course (and there's really little point in repeating the same
over and over again) - as long as no-one involved considers the
general model (set forth at the top) problematic, it can be done
the same way everywhere. Which still means (as said before)
that errors need to be propagated.

> 11. __get_page_type(): open (propose no-rollback).
> The following is in detail:
>     >>...
>       if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
>         {
>             if ( (x & PGT_type_mask) == PGT_writable_page )
>                 iommu_unmap_page(d, mfn_to_gmfn(d, page_to_mfn(page)));
>             else if ( type == PGT_writable_page )
>                 iommu_map_page(d, mfn_to_gmfn(d, page_to_mfn(page)),
>                                page_to_mfn(page),
>                                IOMMUF_readable|IOMMUF_writable);
>         }
>     <<...
> 
> It's almost in the end of the func. 
> As the limited set of cases where __get_page_type() calls 
> iommu_{,un}map_page(), and a get_page_type() invocation that
> previously was known to never fail (or at least so we hope, based on the 
> existing code) [I got it from Jan's reply.],

Please don't mis-quote me: I said this for a specific caller of the
function, not about the function itself.

> we can classify it as normal PV domain and be treated as a fatal error. 
> So for I am inclined to leave it as today. 

And again - at least errors need to be propagated.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: abstract model of  IOMMU unmaping/mapping failures
  2016-04-01 11:57 ` Jan Beulich
@ 2016-04-06  7:38   ` Xu, Quan
  2016-04-07 22:44     ` Jan Beulich
  0 siblings, 1 reply; 11+ messages in thread
From: Xu, Quan @ 2016-04-06  7:38 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, George Dunlap, Liu Jinsong,
	Dario Faggioli, xen-devel, Nakajima, Jun, Andrew Cooper,
	Keir Fraser

On April 01, 2016 7:57pm, <JBeulich@suse.com> wrote:
> >>> On 31.03.16 at 11:06, <quan.xu@intel.com> wrote:
> > 4. __gnttab_unmap_common():rollback (no change)
> >
> > (Existing code)
> >        >>...
> >              if ( !kind )
> >             err = iommu_unmap_page(ld, op->frame);
> >         else if ( !(kind & MAPKIND_WRITE) )
> >             err = iommu_map_page(ld, op->frame, op->frame,
> > IOMMUF_readable);
> >
> >         double_gt_unlock(lgt, rgt);
> >
> >         if ( err )
> >             rc = GNTST_general_error;
> >        <<...
> >
> > Similarly no change required, as error has been correctly handled.
> 
> I wouldn't call this "correctly handled", but for the hardware domain it should
> be good enough, and by crashing DomU-s simply reporting the error up the call
> tree is sufficient. One question though is whether the loops higher up the call
> tree in grant_table.c shouldn't be exited when noticing the domain has crashed,
> both to avoid unnecessary work and to reduce the risk of secondary problems.
> 

For this point, I suppose that the domain structure is not destructed (we are safe to call domain's element, e.g. ld->grant_table) in do_grant_table_op hypercall cycle,
even the domain is crashed down. I am not quite sure, whether it is correct or not, could you explain more?
Then, it is unnecessary to flush tlb. The code is:
    ...
fault:
    gnttab_flush_tlb(current->domain);
    ...

If the domain is crashed down, the current->domain->is_shut_down is ture. So we can modify it as:
...
fault:
    if ( current->domain->is_shut_down )
       gnttab_flush_tlb(current->domain);
...

The other code of the call trees is good to me.


> > 7. set_identity_p2m_entry():rollback (no change).
> >
> > (Existing code)
> >   >>...
> >     if ( !paging_mode_translate(p2m->domain) )
> >     {
> >         if ( !need_iommu(d) )
> >             return 0;
> >         return iommu_map_page(d, gfn, gfn,
> IOMMUF_readable|IOMMUF_writable);
> >     }
> >   <<...
> > 	 if ( is_hardware_domain(d) && !iommu_use_hap_pt(d) )
> >         ret = iommu_map_page(d, gfn, gfn,
> IOMMUF_readable|IOMMUF_writable);
> >    >> ...
> >
> > error handling and rollback already in-place.
> 
> For the first portion of the quoted code you mean. I don't see any rollback in
> the paging_mode_translate() case.
> 

Does it refer to as the below call trees:
 set_identity_p2m_entry()--rmrr_identity_mapping()--intel_iommu_add_device() --...
 set_identity_p2m_entry()--rmrr_identity_mapping()-- intel_iommu_remove_device()--... 

my description is not accurate, but the current code is right, and I am no need to modify it. right?


> > 8. p2m_remove_page():rollback one level(change required).
> >
> > (Existing code)
> >   >>...
> >     if ( !paging_mode_translate(p2m->domain) )
> >     {
> >         if ( need_iommu(p2m->domain) )
> >             for ( i = 0; i < (1 << page_order); i++ )
> >                 iommu_unmap_page(p2m->domain, mfn + i);
> >         return 0;
> >     }
> >   <<...
> >
> > There is no error checking of iommu_unmap_page. We need to add.
> > However, there are several callers of this function which don't handle
> > error at all. I'm not sure whether hardware domain will use this function.
> > Since it is in core p2m logic (which I don't have much knowledge), I
> > hope we can print a warning and simply return error here (given the
> > fact that non-hardware domains are all crashed immediately within
> > unmap call)
> 
> Yes, at least error propagation needs to be added here. I don't think more is
> required. (Be careful, btw, with adding warnings - you must not spam the log
> with such.)
> 


 agreed. A quick question about error propagation:
...
for ( i = 0; i < (1UL << page_order); i++ )
    iommu_unmap_page(p2m->domain, gfn + i);
...

As you mentioned, as a special case, unmapping should perhaps continue despite an error, in an attempt to do best effort cleanup.
Then i could modify the code as:

...
for ( i = 0; i < (1UL << page_order); i++ )
{
    rc = iommu_unmap_page(p2m->domain, gfn + i);
    if ( rc )
      ret = rc;
}
..
   return ret;
...

It looks cumbersome to me, any suggestion?

> > ----
> > 9. p2m_pt_set_entry(): open (propose no-rollback).
> >
> > (Existing code)
> >   >>...
> >            for ( i = 0; i < (1UL << page_order); i++ )
> >                 iommu_map_page(p2m->domain, gfn + i, mfn_x(mfn) + i,
> >                                iommu_pte_flags);
> >         else
> >             for ( i = 0; i < (1UL << page_order); i++ )
> >                 iommu_unmap_page(p2m->domain, gfn + i);
> >   <<...
> >
> > Above is in the end of the func.
> >
> > I'm not sure whether it will be called for hardware domain ( PVH mode
> > is one potential concern. as we know, PVH has been introduced on Xen
> > 4.4 as a DomU, and on Xen 4.5 as a Dom0.).
> 
> Indeed you can get here for Dom0 only when it's non-PV.
> 

A quick question, is non-PV the same as PVH for hardware domain?


> > If not, we can leave it as today. Otherwise, I want to hear your
> > suggestion whether we can use no-rollback (just erring) for hardware
> > domain here since this code path is very complex.
> 
> Of course (and there's really little point in repeating the same over and over
> again) - as long as no-one involved considers the general model (set forth at the
> top) problematic, it can be done the same way everywhere. Which still means
> (as said before) that errors need to be propagated.
> 
Agreed.

> > 11. __get_page_type(): open (propose no-rollback).
> > The following is in detail:
> >     >>...
> >       if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
> >         {
> >             if ( (x & PGT_type_mask) == PGT_writable_page )
> >                 iommu_unmap_page(d, mfn_to_gmfn(d,
> page_to_mfn(page)));
> >             else if ( type == PGT_writable_page )
> >                 iommu_map_page(d, mfn_to_gmfn(d,
> page_to_mfn(page)),
> >                                page_to_mfn(page),
> >
> IOMMUF_readable|IOMMUF_writable);
> >         }
> >     <<...
> >
> > It's almost in the end of the func.
> > As the limited set of cases where __get_page_type() calls
> > iommu_{,un}map_page(), and a get_page_type() invocation that
> > previously was known to never fail (or at least so we hope, based on
> > the existing code) [I got it from Jan's reply.],
> 
> Please don't mis-quote me: I said this for a specific caller of the function, not
> about the function itself.
> 

Sorry for that.

> > we can classify it as normal PV domain and be treated as a fatal error.
> > So for I am inclined to leave it as today.
> 

I think one level propagation, only propagated in __get_page_type(), is enough.

> And again - at least errors need to be propagated.
Sure.

Quan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: abstract model of  IOMMU unmaping/mapping failures
  2016-04-06  7:38   ` Xu, Quan
@ 2016-04-07 22:44     ` Jan Beulich
  2016-04-08  8:54       ` Xu, Quan
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2016-04-07 22:44 UTC (permalink / raw)
  To: Quan Xu
  Cc: Kevin Tian, Feng Wu, George Dunlap, Liu Jinsong, Dario Faggioli,
	xen-devel, Jun Nakajima, Andrew Cooper, Keir Fraser

>>> On 06.04.16 at 09:38, <quan.xu@intel.com> wrote:
> On April 01, 2016 7:57pm, <JBeulich@suse.com> wrote:
>> >>> On 31.03.16 at 11:06, <quan.xu@intel.com> wrote:
>> > 4. __gnttab_unmap_common():rollback (no change)
>> >
>> > (Existing code)
>> >        >>...
>> >              if ( !kind )
>> >             err = iommu_unmap_page(ld, op->frame);
>> >         else if ( !(kind & MAPKIND_WRITE) )
>> >             err = iommu_map_page(ld, op->frame, op->frame,
>> > IOMMUF_readable);
>> >
>> >         double_gt_unlock(lgt, rgt);
>> >
>> >         if ( err )
>> >             rc = GNTST_general_error;
>> >        <<...
>> >
>> > Similarly no change required, as error has been correctly handled.
>> 
>> I wouldn't call this "correctly handled", but for the hardware domain it should
>> be good enough, and by crashing DomU-s simply reporting the error up the call
>> tree is sufficient. One question though is whether the loops higher up the call
>> tree in grant_table.c shouldn't be exited when noticing the domain has crashed,
>> both to avoid unnecessary work and to reduce the risk of secondary problems.
>> 
> 
> For this point, I suppose that the domain structure is not destructed (we 
> are safe to call domain's element, e.g. ld->grant_table) in do_grant_table_op 
> hypercall cycle,
> even the domain is crashed down. I am not quite sure, whether it is correct 
> or not, could you explain more?

Explain what more? Sure, struct domain stays around until the
domain gets actually cleaned up, so accesses to its grant table
(and alike) remain valid while execution didn't leave the context
of that guest (vCPU) yet.

>> > 7. set_identity_p2m_entry():rollback (no change).
>> >
>> > (Existing code)
>> >   >>...
>> >     if ( !paging_mode_translate(p2m->domain) )
>> >     {
>> >         if ( !need_iommu(d) )
>> >             return 0;
>> >         return iommu_map_page(d, gfn, gfn,
>> IOMMUF_readable|IOMMUF_writable);
>> >     }
>> >   <<...
>> > 	 if ( is_hardware_domain(d) && !iommu_use_hap_pt(d) )
>> >         ret = iommu_map_page(d, gfn, gfn,
>> IOMMUF_readable|IOMMUF_writable);
>> >    >> ...
>> >
>> > error handling and rollback already in-place.
>> 
>> For the first portion of the quoted code you mean. I don't see any rollback in
>> the paging_mode_translate() case.
>> 
> 
> Does it refer to as the below call trees:
>  set_identity_p2m_entry()--rmrr_identity_mapping()--intel_iommu_add_device() --...
>  set_identity_p2m_entry()--rmrr_identity_mapping()--intel_iommu_remove_device()--... 

No, my comment solely referred to set_identity_p2m_entry(),
but it looks like I mis-read the code and there's indeed nothing
to roll back inside that function.

>> > 8. p2m_remove_page():rollback one level(change required).
>> >
>> > (Existing code)
>> >   >>...
>> >     if ( !paging_mode_translate(p2m->domain) )
>> >     {
>> >         if ( need_iommu(p2m->domain) )
>> >             for ( i = 0; i < (1 << page_order); i++ )
>> >                 iommu_unmap_page(p2m->domain, mfn + i);
>> >         return 0;
>> >     }
>> >   <<...
>> >
>> > There is no error checking of iommu_unmap_page. We need to add.
>> > However, there are several callers of this function which don't handle
>> > error at all. I'm not sure whether hardware domain will use this function.
>> > Since it is in core p2m logic (which I don't have much knowledge), I
>> > hope we can print a warning and simply return error here (given the
>> > fact that non-hardware domains are all crashed immediately within
>> > unmap call)
>> 
>> Yes, at least error propagation needs to be added here. I don't think more is
>> required. (Be careful, btw, with adding warnings - you must not spam the log
>> with such.)
>> 
> 
> 
>  agreed. A quick question about error propagation:
> ...
> for ( i = 0; i < (1UL << page_order); i++ )
>     iommu_unmap_page(p2m->domain, gfn + i);
> ...
> 
> As you mentioned, as a special case, unmapping should perhaps continue 
> despite an error, in an attempt to do best effort cleanup.
> Then i could modify the code as:
> 
> ...
> for ( i = 0; i < (1UL << page_order); i++ )
> {
>     rc = iommu_unmap_page(p2m->domain, gfn + i);
>     if ( rc )
>       ret = rc;
> }
> ..
>    return ret;
> ...
> 
> It looks cumbersome to me, any suggestion?

What's cumbersome here?

>> > 9. p2m_pt_set_entry(): open (propose no-rollback).
>> >
>> > (Existing code)
>> >   >>...
>> >            for ( i = 0; i < (1UL << page_order); i++ )
>> >                 iommu_map_page(p2m->domain, gfn + i, mfn_x(mfn) + i,
>> >                                iommu_pte_flags);
>> >         else
>> >             for ( i = 0; i < (1UL << page_order); i++ )
>> >                 iommu_unmap_page(p2m->domain, gfn + i);
>> >   <<...
>> >
>> > Above is in the end of the func.
>> >
>> > I'm not sure whether it will be called for hardware domain ( PVH mode
>> > is one potential concern. as we know, PVH has been introduced on Xen
>> > 4.4 as a DomU, and on Xen 4.5 as a Dom0.).
>> 
>> Indeed you can get here for Dom0 only when it's non-PV.
> 
> A quick question, is non-PV the same as PVH for hardware domain?

Right now, yes. Once PVHv2 becomes usable for Dom0, that
would be another such variant.

>> > 11. __get_page_type(): open (propose no-rollback).
>> > The following is in detail:
>> >     >>...
>> >       if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
>> >         {
>> >             if ( (x & PGT_type_mask) == PGT_writable_page )
>> >                 iommu_unmap_page(d, mfn_to_gmfn(d,
>> page_to_mfn(page)));
>> >             else if ( type == PGT_writable_page )
>> >                 iommu_map_page(d, mfn_to_gmfn(d,
>> page_to_mfn(page)),
>> >                                page_to_mfn(page),
>> >
>> IOMMUF_readable|IOMMUF_writable);
>> >         }
>> >     <<...
>> >
>> > It's almost in the end of the func.
>> > As the limited set of cases where __get_page_type() calls
>> > iommu_{,un}map_page(), and a get_page_type() invocation that
>> > previously was known to never fail (or at least so we hope, based on
>> > the existing code) [I got it from Jan's reply.],
>> 
>> Please don't mis-quote me: I said this for a specific caller of the function, not
>> about the function itself.
>> 
> 
> Sorry for that.
> 
>> > we can classify it as normal PV domain and be treated as a fatal error.
>> > So for I am inclined to leave it as today.
>> 
> 
> I think one level propagation, only propagated in __get_page_type(), is 
> enough.

Not sure what you mean with "one level propagation" here.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: abstract model of  IOMMU unmaping/mapping failures
  2016-04-07 22:44     ` Jan Beulich
@ 2016-04-08  8:54       ` Xu, Quan
  0 siblings, 0 replies; 11+ messages in thread
From: Xu, Quan @ 2016-04-08  8:54 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Tian, Kevin, Wu, Feng, George Dunlap, Liu Jinsong,
	Dario Faggioli, xen-devel, Nakajima, Jun, Andrew Cooper,
	Keir Fraser

On April 08, 2016 6:44am, <JBeulich@suse.com> wrote:
> >>> On 06.04.16 at 09:38, <quan.xu@intel.com> wrote:
> > On April 01, 2016 7:57pm, <JBeulich@suse.com> wrote:
> >> >>> On 31.03.16 at 11:06, <quan.xu@intel.com> wrote:
> >> > 4. __gnttab_unmap_common():rollback (no change)
> >> >
> >> > (Existing code)
> >> >        >>...
> >> >              if ( !kind )
> >> >             err = iommu_unmap_page(ld, op->frame);
> >> >         else if ( !(kind & MAPKIND_WRITE) )
> >> >             err = iommu_map_page(ld, op->frame, op->frame,
> >> > IOMMUF_readable);
> >> >
> >> >         double_gt_unlock(lgt, rgt);
> >> >
> >> >         if ( err )
> >> >             rc = GNTST_general_error;
> >> >        <<...
> >> >
> >> > Similarly no change required, as error has been correctly handled.
> >>
> >> I wouldn't call this "correctly handled", but for the hardware domain
> >> it should be good enough, and by crashing DomU-s simply reporting the
> >> error up the call tree is sufficient. One question though is whether
> >> the loops higher up the call tree in grant_table.c shouldn't be
> >> exited when noticing the domain has crashed, both to avoid unnecessary
> work and to reduce the risk of secondary problems.
> >>
> >
> > For this point, I suppose that the domain structure is not destructed
> > (we are safe to call domain's element, e.g. ld->grant_table) in
> > do_grant_table_op hypercall cycle, even the domain is crashed down. I
> > am not quite sure, whether it is correct or not, could you explain
> > more?
> 
> Explain what more? 
> Sure, struct domain stays around until the domain gets
> actually cleaned up, 

Explain why it is sure. Now i have got it.

> so accesses to its grant table (and alike) remain valid while
> execution didn't leave the context of that guest (vCPU) yet.
> 

Thanks for your reminding.

> >> > 8. p2m_remove_page():rollback one level(change required).
> >> >
> >> > (Existing code)
> >> >   >>...
> >> >     if ( !paging_mode_translate(p2m->domain) )
> >> >     {
> >> >         if ( need_iommu(p2m->domain) )
> >> >             for ( i = 0; i < (1 << page_order); i++ )
> >> >                 iommu_unmap_page(p2m->domain, mfn + i);
> >> >         return 0;
> >> >     }
> >> >   <<...
> >> >
> >> > There is no error checking of iommu_unmap_page. We need to add.
> >> > However, there are several callers of this function which don't
> >> > handle error at all. I'm not sure whether hardware domain will use this
> function.
> >> > Since it is in core p2m logic (which I don't have much knowledge),
> >> > I hope we can print a warning and simply return error here (given
> >> > the fact that non-hardware domains are all crashed immediately
> >> > within unmap call)
> >>
> >> Yes, at least error propagation needs to be added here. I don't think
> >> more is required. (Be careful, btw, with adding warnings - you must
> >> not spam the log with such.)
> >>
> >
> >
> >  agreed. A quick question about error propagation:
> > ...
> > for ( i = 0; i < (1UL << page_order); i++ )
> >     iommu_unmap_page(p2m->domain, gfn + i); ...
> >
> > As you mentioned, as a special case, unmapping should perhaps continue
> > despite an error, in an attempt to do best effort cleanup.
> > Then i could modify the code as:
> >
> > ...
> > for ( i = 0; i < (1UL << page_order); i++ ) {
> >     rc = iommu_unmap_page(p2m->domain, gfn + i);
> >     if ( rc )
> >       ret = rc;
> > }
> > ..
> >    return ret;
> > ...
> >
> > It looks cumbersome to me, any suggestion?
> 
> What's cumbersome here?
> 

Sorry, I don't like to introduce 'rc'/'ret' in the same function. Ignore me, if it is working.

> >> > 11. __get_page_type(): open (propose no-rollback).
> >> > The following is in detail:
> >> >     >>...
> >> >       if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
> >> >         {
> >> >             if ( (x & PGT_type_mask) == PGT_writable_page )
> >> >                 iommu_unmap_page(d, mfn_to_gmfn(d,
> >> page_to_mfn(page)));
> >> >             else if ( type == PGT_writable_page )
> >> >                 iommu_map_page(d, mfn_to_gmfn(d,
> >> page_to_mfn(page)),
> >> >                                page_to_mfn(page),
> >> >
> >> IOMMUF_readable|IOMMUF_writable);
> >> >         }
> >> >     <<...
> >> >
> >> > It's almost in the end of the func.
> >> > As the limited set of cases where __get_page_type() calls
> >> > iommu_{,un}map_page(), and a get_page_type() invocation that
> >> > previously was known to never fail (or at least so we hope, based
> >> > on the existing code) [I got it from Jan's reply.],
> >>
> >> Please don't mis-quote me: I said this for a specific caller of the
> >> function, not about the function itself.
> >>
> >
> > Sorry for that.
> >
> >> > we can classify it as normal PV domain and be treated as a fatal error.
> >> > So for I am inclined to leave it as today.
> >>
> >
> > I think one level propagation, only propagated in __get_page_type(),
> > is enough.
> 
> Not sure what you mean with "one level propagation" here.
> 

one level propagation is propagating error only in __get_page_type(), and leaving the callers (one or more levels up) as they are.

e.g. the call tree, 
__get_page_type() -- get_page_type() -- do_mmu_update()--..
propagate error only in __get_page_type(), and leaving the get_page_type() / do_mmu_update() as they are.


Jan, thanks for your review. I'd try my best on the release window.
Quan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: abstract model of  IOMMU unmaping/mapping failures
  2016-03-31  9:06 abstract model of IOMMU unmaping/mapping failures Xu, Quan
                   ` (2 preceding siblings ...)
  2016-04-01 11:57 ` Jan Beulich
@ 2016-04-12  3:30 ` Xu, Quan
  2016-04-15 13:32   ` George Dunlap
  2016-04-15 14:02 ` George Dunlap
  4 siblings, 1 reply; 11+ messages in thread
From: Xu, Quan @ 2016-04-12  3:30 UTC (permalink / raw)
  To: George Dunlap
  Cc: Tian, Kevin, Wu, Feng, Jan Beulich, Liu Jinsong, Dario Faggioli,
	xen-devel, Nakajima, Jun, Andrew Cooper, Keir Fraser

George,

In this discussion, most of them are memory-related problems. Your comments are valuable.
If you have read this thread, could you give me some feedback? I really appreciate your precious advice.

Quan



On March 31, 2016 5:06pm, Quan, Xu wrote:
> All,
> 
> Here is a summary of my investigation of the abstract model:
> 
> Below policies are adopted when deciding whether to rollback a callchain:
> 
> 1. Domain will be crashed immediately within iommu_{,un}map_page, treated
> as a fatal error (with the exception of the hardware one). Whether to rollback
> depends on the need of hardware domain;
> 
> 2. For hardware domain, roll back on a best effort basis. When rollback is not
> feasible (in early initialization phase or trade-off of complexity), at least unmap
> upon maps error and then throw out error message;
> 
> Below are a detail analysis of all existing callers on IOMMU interfaces (8-11
> needs more discussions):
> 
> ----
> 1. memory_add(): rollback (no change)
> 
> (Existing code)
>    >>...
>    if ( ret )
>         goto destroy_m2p;
> 
>     if ( iommu_enabled && !iommu_passthrough
> && !need_iommu(hardware_domain) )
>     {
>         for ( i = spfn; i < epfn; i++ )
>             if ( iommu_map_page(hardware_domain, i, i,
> IOMMUF_readable|IOMMUF_writable) )
>                 break;
>         if ( i != epfn )
>         {
>             while (i-- > old_max)
>                 iommu_unmap_page(hardware_domain, i);
>             goto destroy_m2p;
>         }
>     }
>    <<...
> Existing code already rolls back through destroy_m2p cleanly.
> 
> ----
> 2. vtd_set_hwdom_mapping(): no rollback (no change).
> 
> It is in early initialization phase. A quick thought looks a bit tricky to fully
> rollback this path, so propose to leave as it is.
> 
> ----
> 3. __gnttab_map_grant_ref():rollback (no change)
> 
> 
> (Existing code)
>         >>...
>         if ( (act_pin & (GNTPIN_hstw_mask|GNTPIN_devw_mask)) &&
>              !(old_pin & (GNTPIN_hstw_mask|GNTPIN_devw_mask)) )
>         {
>             if ( !(kind & MAPKIND_WRITE) )
>                 err = iommu_map_page(ld, frame, frame,
> 
> IOMMUF_readable|IOMMUF_writable);
>         }
>         else if ( act_pin && !old_pin )
>         {
>             if ( !kind )
>                 err = iommu_map_page(ld, frame, frame,
> IOMMUF_readable);
>         }
>         if ( err )
>         {
>             double_gt_unlock(lgt, rgt);
>             rc = GNTST_general_error;
>             goto undo_out;
>         }
>         <<...
> 
> rollback is already cleanly handled.
> 
> ----
> 4. __gnttab_unmap_common():rollback (no change)
> 
> (Existing code)
>        >>...
>              if ( !kind )
>             err = iommu_unmap_page(ld, op->frame);
>         else if ( !(kind & MAPKIND_WRITE) )
>             err = iommu_map_page(ld, op->frame, op->frame,
> IOMMUF_readable);
> 
>         double_gt_unlock(lgt, rgt);
> 
>         if ( err )
>             rc = GNTST_general_error;
>        <<...
> 
> Similarly no change required, as error has been correctly handled.
> 
> ----
> 5. guest_physmap_add_entry():rollback (no change)
> 
> (Existing code)
>    >>...
>     rc = iommu_map_page(
>         d, mfn + i, mfn + i, IOMMUF_readable|IOMMUF_writable);
>     if ( rc != 0 )
>     {
>         while ( i-- > 0 )
>             iommu_unmap_page(d, mfn + i);
>         return rc;
>     }
>    <<...
> 
> Above is in the start of the function, so no additional state to be cleared thus it
> is in good shape too.
> 
> ----
> 6. clear_identity_p2m_entry():rollback (no change).
> 
> (Existing code)
>   >>...
>     if ( !paging_mode_translate(d) )
>     {
>         if ( !need_iommu(d) )
>             return 0;
>         return iommu_unmap_page(d, gfn);
>     }
>   <<...
> 
> in the start, and error already rolled back to caller.
> 
> ----
> 7. set_identity_p2m_entry():rollback (no change).
> 
> (Existing code)
>   >>...
>     if ( !paging_mode_translate(p2m->domain) )
>     {
>         if ( !need_iommu(d) )
>             return 0;
>         return iommu_map_page(d, gfn, gfn,
> IOMMUF_readable|IOMMUF_writable);
>     }
>   <<...
> 	 if ( is_hardware_domain(d) && !iommu_use_hap_pt(d) )
>         ret = iommu_map_page(d, gfn, gfn,
> IOMMUF_readable|IOMMUF_writable);
>    >> ...
> 
> error handling and rollback already in-place.
> 
> ----
> 8. p2m_remove_page():rollback one level(change required).
> 
> (Existing code)
>   >>...
>     if ( !paging_mode_translate(p2m->domain) )
>     {
>         if ( need_iommu(p2m->domain) )
>             for ( i = 0; i < (1 << page_order); i++ )
>                 iommu_unmap_page(p2m->domain, mfn + i);
>         return 0;
>     }
>   <<...
> 
> There is no error checking of iommu_unmap_page. We need to add.
> However, there are several callers of this function which don't handle error at
> all. I'm not sure whether hardware domain will use this function.
> Since it is in core p2m logic (which I don't have much knowledge), I hope we can
> print a warning and simply return error here (given the fact that non-hardware
> domains are all crashed immediately within unmap call)
> 
> ----
> 9. p2m_pt_set_entry(): open (propose no-rollback).
> 
> (Existing code)
>   >>...
>            for ( i = 0; i < (1UL << page_order); i++ )
>                 iommu_map_page(p2m->domain, gfn + i, mfn_x(mfn) + i,
>                                iommu_pte_flags);
>         else
>             for ( i = 0; i < (1UL << page_order); i++ )
>                 iommu_unmap_page(p2m->domain, gfn + i);
>   <<...
> 
> Above is in the end of the func.
> 
> I'm not sure whether it will be called for hardware domain ( PVH mode is one
> potential concern. as we know, PVH has been introduced on Xen 4.4 as a DomU,
> and on Xen 4.5 as a Dom0.).
> 
> If not, we can leave it as today. Otherwise, I want to hear your suggestion
> whether we can use no-rollback (just erring) for hardware domain here since
> this code path is very complex.
> 
>   10. ept_set_entry(): open (propose no-rollback).
> 
> (Existing code)
>   >>...
>             if ( iommu_flags )
>                 for ( i = 0; i < (1 << order); i++ )
>                     iommu_map_page(d, gfn + i, mfn_x(mfn) + i,
> iommu_flags);
>             else
>                 for ( i = 0; i < (1 << order); i++ )
>                     iommu_unmap_page(d, gfn + i);
>   <<...
> 
> Above is in the end of the func.
> 
> Same open as 9) whether we can use no-rollback here.
> 
> ----
> 11. __get_page_type(): open (propose no-rollback).
> The following is in detail:
>     >>...
>       if ( d && is_pv_domain(d) && unlikely(need_iommu(d)) )
>         {
>             if ( (x & PGT_type_mask) == PGT_writable_page )
>                 iommu_unmap_page(d, mfn_to_gmfn(d,
> page_to_mfn(page)));
>             else if ( type == PGT_writable_page )
>                 iommu_map_page(d, mfn_to_gmfn(d, page_to_mfn(page)),
>                                page_to_mfn(page),
>                                IOMMUF_readable|IOMMUF_writable);
>         }
>     <<...
> 
> It's almost in the end of the func.
> As the limited set of cases where __get_page_type() calls
> iommu_{,un}map_page(), and a get_page_type() invocation that previously was
> known to never fail (or at least so we hope, based on the existing code) [I got it
> from Jan's reply.], we can classify it as normal PV domain and be treated as a
> fatal error.
> So for I am inclined to leave it as today.
> 
> 
> I welcome your comments and feedback!!
> 
> Quan
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: abstract model of IOMMU unmaping/mapping failures
  2016-04-12  3:30 ` Xu, Quan
@ 2016-04-15 13:32   ` George Dunlap
  0 siblings, 0 replies; 11+ messages in thread
From: George Dunlap @ 2016-04-15 13:32 UTC (permalink / raw)
  To: Xu, Quan, George Dunlap
  Cc: Tian, Kevin, Wu, Feng, Jan Beulich, Liu Jinsong, Dario Faggioli,
	xen-devel, Nakajima, Jun, Andrew Cooper, Keir Fraser

On 12/04/16 04:30, Xu, Quan wrote:
> George,
> 
> In this discussion, most of them are memory-related problems. Your comments are valuable.
> If you have read this thread, could you give me some feedback? I really appreciate your precious advice.

Sorry -- I was trying to get all the stuff for the hard freeze out of
the way last week.  I'm taking a look at this now.

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: abstract model of IOMMU unmaping/mapping failures
  2016-03-31  9:06 abstract model of IOMMU unmaping/mapping failures Xu, Quan
                   ` (3 preceding siblings ...)
  2016-04-12  3:30 ` Xu, Quan
@ 2016-04-15 14:02 ` George Dunlap
  2016-04-18  1:20   ` Xu, Quan
  4 siblings, 1 reply; 11+ messages in thread
From: George Dunlap @ 2016-04-15 14:02 UTC (permalink / raw)
  To: Xu, Quan, xen-devel
  Cc: Tian, Kevin, Wu, Feng, George Dunlap, Liu Jinsong,
	Dario Faggioli, Nakajima, Jun, Andrew Cooper, Keir Fraser

On 31/03/16 10:06, Xu, Quan wrote:
> All,
> 
> Here is a summary of my investigation of the abstract model:
> 
> Below policies are adopted when deciding whether to rollback a callchain:
> 
> 1. Domain will be crashed immediately within iommu_{,un}map_page, treated as a fatal error (with the exception of the hardware one). Whether to rollback depends on the need of hardware domain;
> 
> 2. For hardware domain, roll back on a best effort basis. When rollback is not feasible (in early initialization phase or trade-off of complexity), at least unmap upon maps error and then throw out error message;
> 
> Below are a detail analysis of all existing callers on IOMMU interfaces (8-11 needs more discussions):

Hey Quan,

I only reviewed the p2m-related ones (5-10), but I agree with your and
Jan's conclusions.  Thanks for doing this legwork.

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: abstract model of IOMMU unmaping/mapping failures
  2016-04-15 14:02 ` George Dunlap
@ 2016-04-18  1:20   ` Xu, Quan
  0 siblings, 0 replies; 11+ messages in thread
From: Xu, Quan @ 2016-04-18  1:20 UTC (permalink / raw)
  To: George Dunlap, xen-devel
  Cc: Tian, Kevin, Wu, Feng, George Dunlap, Liu Jinsong,
	Dario Faggioli, Nakajima, Jun, Andrew Cooper, Keir Fraser

On April 15, 2016 10:03pm, <george.dunlap@citrix.com> wrote:
> On 31/03/16 10:06, Xu, Quan wrote:
> > All,
> >
> > Here is a summary of my investigation of the abstract model:
> >
> > Below policies are adopted when deciding whether to rollback a callchain:
> >
> > 1. Domain will be crashed immediately within iommu_{,un}map_page,
> > treated as a fatal error (with the exception of the hardware one).
> > Whether to rollback depends on the need of hardware domain;
> >
> > 2. For hardware domain, roll back on a best effort basis. When
> > rollback is not feasible (in early initialization phase or trade-off
> > of complexity), at least unmap upon maps error and then throw out
> > error message;
> >
> > Below are a detail analysis of all existing callers on IOMMU interfaces (8-11
> needs more discussions):
> 
> Hey Quan,
> 
> I only reviewed the p2m-related ones (5-10), but I agree with your and Jan's
> conclusions.  Thanks for doing this legwork.
> 
George,
That's good. thanks for your review.
I will send out this patch set soon.

Quan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-04-18  1:20 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-31  9:06 abstract model of IOMMU unmaping/mapping failures Xu, Quan
2016-03-31  9:12 ` Xu, Quan
2016-04-01  0:25 ` Tian, Kevin
2016-04-01 11:57 ` Jan Beulich
2016-04-06  7:38   ` Xu, Quan
2016-04-07 22:44     ` Jan Beulich
2016-04-08  8:54       ` Xu, Quan
2016-04-12  3:30 ` Xu, Quan
2016-04-15 13:32   ` George Dunlap
2016-04-15 14:02 ` George Dunlap
2016-04-18  1:20   ` Xu, Quan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).