From mboxrd@z Thu Jan 1 00:00:00 1970 From: MaoXiaoyun Subject: RE: PV resume failed after self migration failed Date: Fri, 24 Jun 2011 18:32:50 +0800 Message-ID: References: , , , , , , , , , , , Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1400527120==" Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: james.harper@bendigoit.com.au, xen devel List-Id: xen-devel@lists.xenproject.org --===============1400527120== Content-Type: multipart/alternative; boundary="_4410005a-adc8-48bc-889c-20462afe016f_" --_4410005a-adc8-48bc-889c-20462afe016f_ Content-Type: text/plain; charset="gb2312" Content-Transfer-Encoding: 8bit Hi James: In addtion, I think the if statement in XenVbd_HwScsiResetBus, we might need use suspend_resume_state_fdo, not suspend_resume_state_pdo. Since suspend_resume_state_pdo is changed to SR_STATE_SUSPENDING, but there are still io request not finished, when reset happen, those IO can be finished. What do u think? Thanks. static BOOLEAN XenVbd_HwScsiResetBus(PVOID DeviceExtension, ULONG PathId) { PXENVBD_DEVICE_DATA xvdd = DeviceExtension; srb_list_entry_t *srb_entry; PSCSI_REQUEST_BLOCK srb; int i; UNREFERENCED_PARAMETER(DeviceExtension); UNREFERENCED_PARAMETER(PathId); FUNCTION_ENTER(); KdPrint((__DRIVER_NAME " IRQL = %d\n", KeGetCurrentIrql())); if (xvdd->ring_detect_state == RING_DETECT_STATE_COMPLETE && xvdd->device_state->suspend_resume_state_pdo == SR_STATE_RUNNING) *********this line { while((srb_entry = (srb_list_entry_t *)RemoveHeadList(&xvdd->srb_list)) != (srb_list_entry_t *)&xvdd->srb_list) { srb = srb_entry->srb; srb->SrbStatus = SRB_STATUS_BUS_RESET; KdPrint((__DRIVER_NAME " completing queued SRB %p with status SRB_STATUS_BUS_RESET\n", srb)); ScsiPortNotification(RequestComplete, xvdd, srb); } >> Subject: RE: PV resume failed after self migration failed >> Date: Wed, 22 Jun 2011 14:06:18 +1000 >> From: james.harper@bendigoit.com.au >> To: tinnycloud@hotmail.com; xen-devel@lists.xensource.com >> >> > > >> > > The xenvbd driver doesn't do any timeout, windows does the timeout >> and >> > > tells xenvbd to reset. I haven't tested the scenario you describe >> very >> > > recently, and xenvbd is now two different drivers, one for scsiport >> (<= >> > > 2003) and one for storport (>= Vista), so there could be bugs in >> either. >> > > >> > >> > The bug can be reproduced in 2003 32bit system. We are using scsi >> driver. >> > I put some log in XenVbd_HwScsiResetBus to see if there are not >> completed >> > srb(Like below) >> > but I didn't see the log when XenVbd_HwScsiResetBus called. So No IO >> is in >> > queue. >> >> Just to confirm, is this the issue that only happens when the migration >> fails in xen and is cancelled? >> >>Exactly. >>I've noticed some difference in log. > >In normal resuming, from the log, we can see event port assign like below: >pdo_event_channel = 5 (Notifying event channel 5) >suspend event channel = 6 >XEN_INIT_TYPE_EVENT_CHANNEL - event-channel = 7 (for VBD) >XEN_INIT_TYPE_EVENT_CHANNEL - event-channel = 8 (VIF) > >>when guest resuming locally from suspend(that is migration failed in xen, guest >>has already suspended, so it need resuming) > >>pdo_event_channel = 7 ( Notifying event channel 7) >>suspend event channel = 8 >>XEN_INIT_TYPE_EVENT_CHANNEL - event-channel = 9 (vif) > >>VBD port is not allocated, since pdo is waiting fdo change. > >>It looks like port 5 and 6 is still occpuied, or pdo_event_channel bind twice? > >it works when I unbind pdo_event_channel & suspend_evtchn. >=================================================================== >--- xenpci_fdo.c (revision 4304) >+++ xenpci_fdo.c (working copy) >@@ -656,6 +656,12 @@ > } > WdfChildListEndIteration(child_list, &child_iterator); > >+ EvtChn_Unbind(xpdd, xpdd->pdo_event_channel); >+ EvtChn_Close(xpdd, xpdd->pdo_event_channel); >+ >+ EvtChn_Unbind(xpdd, xpdd->suspend_evtchn); >+ EvtChn_Close(xpdd, xpdd->suspend_evtchn); >+ > XenBus_Suspend(xpdd); > EvtChn_Suspend(xpdd); > XenPci_HighSync(XenPci_Suspend0, XenPci_SuspendN, xpdd); > > >BTW, is there a missing "break" in XenVbd_HwScsiInterrupt, xenvbd_scsiport.c:928 >before default? Well, it is harmless. > >924 case SR_STATE_RUNNING: >925 KdPrint((__DRIVER_NAME " New pdo state %d\n", suspend_resume_state_pdo)); >926 xvdd->device_state->suspend_resume_state_fdo = suspend_resume_state_pdo; >927 xvdd->vectors.EvtChn_Notify(xvdd->vectors.context, xvdd->device_state->pdo_event_channel); >928 ScsiPortNotification(NextRequest, DeviceExtension); >929 default: >930 KdPrint((__DRIVER_NAME " New pdo state %d\n", suspend_resume_state_pdo)); >931 xvdd->device_state->suspend_resume_state_fdo = suspend_resume_state_pdo; >932 xvdd->vectors.EvtChn_Notify(xvdd->vectors.context, xvdd->device_state->pdo_event_channel); >933 break; > >Thanks. >>> James >>> > --_4410005a-adc8-48bc-889c-20462afe016f_ Content-Type: text/html; charset="gb2312" Content-Transfer-Encoding: 8bit
Hi James:
 
       In addtion, I think the if statement in XenVbd_HwScsiResetBus, we might need use
suspend_resume_state_fdo, not suspend_resume_state_pdo.
       Since suspend_resume_state_pdo is changed to SR_STATE_SUSPENDING, but there
are still io request not finished, when reset happen, those IO can be finished.
 
       What do u think?
        Thanks.
 
 
static BOOLEAN
XenVbd_HwScsiResetBus(PVOID DeviceExtension, ULONG PathId)
{
  PXENVBD_DEVICE_DATA xvdd = DeviceExtension;
  srb_list_entry_t *srb_entry;
  PSCSI_REQUEST_BLOCK srb;
  int i;
  UNREFERENCED_PARAMETER(DeviceExtension);
  UNREFERENCED_PARAMETER(PathId);
  FUNCTION_ENTER();
  KdPrint((__DRIVER_NAME "     IRQL = %d\n", KeGetCurrentIrql()));
  if (xvdd->ring_detect_state == RING_DETECT_STATE_COMPLETE && xvdd->device_state->suspend_resume_state_pdo == SR_STATE_RUNNING) *********this line
  {
    while((srb_entry = (srb_list_entry_t *)RemoveHeadList(&xvdd->srb_list)) != (srb_list_entry_t *)&xvdd->srb_list)
    {
      srb = srb_entry->srb;
      srb->SrbStatus = SRB_STATUS_BUS_RESET;
      KdPrint((__DRIVER_NAME "     completing queued SRB %p with status SRB_STATUS_BUS_RESET\n", srb));
      ScsiPortNotification(RequestComplete, xvdd, srb);
    }
 
 
>> Subject: RE: PV resume failed after self migration failed
>> Date: Wed, 22 Jun 2011 14:06:18 +1000
>> From: james.harper@bendigoit.com.au
>> To: tinnycloud@hotmail.com; xen-devel@lists.xensource.com
>>
>> > >
>> > > The xenvbd driver doesn't do any timeout, windows does the timeout
>> and
>> > > tells xenvbd to reset. I haven't tested the scenario you describe
>> very
>> > > recently, and xenvbd is now two different drivers, one for scsiport
>> (<=
>> > > 2003) and one for storport (>= Vista), s o there could be bugs in
>> either.
>> > >
>> >
>> > The bug can be reproduced in 2003 32bit system. We are using scsi
>> driver.
>> > I put some log in XenVbd_HwScsiResetBus to see if there are not
>> completed
>> > srb(Like below)
>> > but I didn't see the log when XenVbd_HwScsiResetBus called. So No IO
>> is in
>> > queue.
>>
>> Just to confirm, is this the issue that only happens when the migration
>> fails in xen and is cancelled?
>>
>>Exactly.
>>I've noticed some difference in log.
>
>In normal resuming, from the log, we can see event port assign like below:
>pdo_event_channel = 5 (Noti fying event channel 5)
>suspend event channel = 6
>XEN_INIT_TYPE_EVENT_CHANNEL - event-channel = 7  (for VBD)
>XEN_INIT_TYPE_EVENT_CHANNEL - event-channel = 8  (VIF)
>
>>when guest resuming locally from suspend(that is migration failed in xen, guest
>>has already suspended, so it need resuming)
>
>>pdo_event_channel = 7 ( Notifying event channel 7)
>>suspend event channel = 8
>>XEN_INIT_TYPE_EVENT_CHANNEL - event-channel = 9 (vif)
>
>>VBD port is not allocated, since pdo is waiting fdo change.
>
>>It looks like port 5 and 6 is still occpuied, or pdo_event_channel bind twice?
>
>it works when I unbind pdo_event_channel & suspend_evtchn.
>========== =========================================================
>--- xenpci_fdo.c (revision 4304)
>+++ xenpci_fdo.c (working copy)
>@@ -656,6 +656,12 @@
>     }
>     WdfChildListEndIteration(child_list, &child_iterator);
>
>+    EvtChn_Unbind(xpdd, xpdd->pdo_event_channel);
>+    EvtChn_Close(xpdd, xpdd->pdo_event_channel);
>+
>+    EvtChn_Unbind(xpdd, xpdd->suspend_evtchn);
>+    EvtChn_Close(xpdd, xpdd->suspend_evtchn);
>+   
>     XenBus_Suspend(xpdd);
>     EvtChn_Suspend(xpdd);
>     XenPci_HighSync(XenPci_Sus pend0, XenPci_SuspendN, xpdd);
>
>
>BTW, is there a missing "break" in XenVbd_HwScsiInterrupt,  xenvbd_scsiport.c:928
>before default? Well, it is harmless.
>
>924 case SR_STATE_RUNNING:
>925 KdPrint((__DRIVER_NAME " New pdo state %d\n", suspend_resume_state_pdo));
>926 xvdd->device_state->suspend_resume_state_fdo = suspend_resume_state_pdo;
>927 xvdd->vectors.EvtChn_Notify(xvdd->vectors.context, xvdd->device_state->pdo_event_channel);
>928 ScsiPortNotification(NextRequest, DeviceExtension);
>929 default:
>930 KdPrint((__DRIVER_NAME " New pdo state %d\n", suspend_resume_state_pdo));
>931 xvdd->device_state->suspend_resume_state_fdo = suspend_resume_state_pdo;
>932 x vdd->vectors.EvtChn_Notify(xvdd->vectors.context, xvdd->device_state->pdo_event_channel);
>933 break;
>
>Thanks.
>>> James
>>>
>
--_4410005a-adc8-48bc-889c-20462afe016f_-- --===============1400527120== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel --===============1400527120==--