From mboxrd@z Thu Jan 1 00:00:00 1970 From: MaoXiaoyun Subject: RE: PV resume failed after self migration failed Date: Tue, 21 Jun 2011 20:13:52 +0800 Message-ID: References: , , , , , Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0815401893==" Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: james.harper@bendigoit.com.au, xen devel List-Id: xen-devel@lists.xenproject.org --===============0815401893== Content-Type: multipart/alternative; boundary="_25eb0150-f33c-457f-9e74-991e4339e802_" --_25eb0150-f33c-457f-9e74-991e4339e802_ Content-Type: text/plain; charset="gb2312" Content-Transfer-Encoding: 8bit > Subject: RE: PV resume failed after self migration failed > Date: Mon, 20 Jun 2011 09:11:59 +1000 > From: james.harper@bendigoit.com.au > To: tinnycloud@hotmail.com; xen-devel@lists.xensource.com > > > > > > > Windows will invoke a scsi reset if a request takes too long to > complete > > > (5 seconds I think). It will also issue a reset when a crash dump > > > starts, just to make sure all previous requests are flushed etc. > > > > > Thanks for the help, sorry for the late response, I've been leaving a > while > > lase weekend. > > > > If VBD is already suspended, all further IO try to issue will find vbd > states > > is not SR_STATE_RUNNING, > > thus calls ScsiPortNotification to notify RequestComplete, right? > > > > If so, I have an assumption. > > at time t, VBD is suspend, an IO is try to issue, but before it calls > > ScsiPortNotificaiton, the whole > > VM paused(VCPU paused, last step of step), 10 or more seconds later, > if VM > > resumes, will the driver > > found the IO mentioned before has already timed out and trigger > > XenVbd_HwScsiResetBus? > > > > The xenvbd driver doesn't do any timeout, windows does the timeout and > tells xenvbd to reset. I haven't tested the scenario you describe very > recently, and xenvbd is now two different drivers, one for scsiport (<= > 2003) and one for storport (>= Vista), so there could be bugs in either. > The bug can be reproduced in 2003 32bit system. We are using scsi driver. I put some log in XenVbd_HwScsiResetBus to see if there are not completed srb(Like below) but I didn't see the log when XenVbd_HwScsiResetBus called. So No IO is in queue. for (i = 0; i < MAX_SHADOW_ENTRIES; i++) { if (xvdd->shadows[i].srb) { KdPrint((__DRIVER_NAME " in-flight srb %p with status SRB_STATUS_BUS_RESET\n", xvdd->shadows[i].srb)); } } Right now, I don't think it is related to bus reset. From the log, it looks like an event is not acked. The log shows that PV Resuming is waiting xppdd->device_state.suspend_resume_state_fdo to change but failed. that is : XenPci_Pdo_Resume->XenPci_Pdo_ChangeSuspendState(device, SR_STATE_RESUMING)-> -> KeWaitForSingleObject(&xpdd->pdo_suspend_event, Executive, KernelMode, FALSE, NULL); It is assumed that the change should happen in XenVbd_HwScsiInterrupt. But for some reason the if statement in XenVbd_HwScsiInterrupt(xenvbd_scsiport.c:920) return False. /* in dump mode I think we get called on a timer, not by an actual IRQ */ if (!dump_mode && !xvdd->vectors.EvtChn_AckEvent(xvdd->vectors.context, xvdd->event_channel, &last_interrupt)) return FALSE; /* interrupt was not for us */ Since the event is not acked, that's why in EvtChn_EvtInterruptIsr, print out a log like "Unacknowledged event word = 0, val = 00000200" 12952670574140: XenPCI --> XenPci_BalloonEnableHandler 12952670574140: XenPCI Unacknowledged event word = 0, val = 00000200 12952670574140: XenPCI receive balloon enable = (1308226300.21:0) 12952670574156: XenPCI Balloon enable change to 0 12952670574156: XenPCI successfull got BalloonEnableChangedEvent I will try to take a close look EvtChn_EvtInterruptIsr to get more understanding. Thanks. > James --_25eb0150-f33c-457f-9e74-991e4339e802_ Content-Type: text/html; charset="gb2312" Content-Transfer-Encoding: 8bit
> Subject: RE: PV resume failed after self migration failed
> Date: Mon, 20 Jun 2011 09:11:59 +1000
> From: james.harper@bendigoit.com.au
> To: tinnycloud@hotmail.com; xen-devel@lists.xensource.com
>
> > >
> > > Windows will invoke a scsi reset if a request takes too long to
> complete
> > > (5 seconds I think). It will also issue a reset when a crash dump
> > > starts, just to make sure all previous requests are flushed etc.
> > >
> > Thanks for the help, sorry for the late response, I've been leaving a
> while
> > lase weekend.
> >
> > If VBD is already suspended, all further IO try to issue will find vbd
> states
> > is not SR_STATE_RUNN ING,
> > thus calls ScsiPortNotification to notify RequestComplete, right?
> >
> > If so, I have an assumption.
> > at time t, VBD is suspend, an IO is try to issue, but before it calls
> > ScsiPortNotificaiton, the whole
> > VM paused(VCPU paused, last step of step), 10 or more seconds later,
> if VM
> > resumes, will the driver
> > found the IO mentioned before has already timed out and trigger
> > XenVbd_HwScsiResetBus?
> >
>
> The xenvbd driver doesn't do any timeout, windows does the timeout and
> tells xenvbd to reset. I haven't tested the scenario you describe very
> recently, and xenvbd is now two different drivers, one for scsiport (<=
> 2003) and o ne for storport (>= Vista), so there could be bugs in either.
>
The bug can be reproduced in 2003 32bit system. We are using scsi driver.
I put some log in XenVbd_HwScsiResetBus to see if there are not completed srb(Like below)
but I didn't see the log when XenVbd_HwScsiResetBus called. So No IO is in queue.  
 
 for (i = 0; i < MAX_SHADOW_ENTRIES; i++)
    {
      if (xvdd->shadows[i].srb)
      {
        KdPrint((__DRIVER_NAME "    in-flight srb %p with status SRB_STATUS_BUS_RESET\n", xvdd->shadows[i].srb));
      }
    }
 
 
Right now, I don't think it is related to bus reset.  From the log, it looks like an event is not acked.
The log shows that PV Resuming is waiting xppdd->device_state.suspend_resume_state_fdo to change but failed.
 
that is :  XenPci_Pdo_Resume->XenPci_Pdo_ChangeSuspendState(device, SR_STATE_RESUMING)->
-> KeWaitForSingleObject(&xpdd->pdo_suspend_event, Executive, KernelMode, FALSE, NULL);
It is assumed that the change should happen in XenVbd_HwScsiInterrupt.
But for some reason the if statement in XenVbd_HwScsiInterrupt(xenvbd_scsiport.c:920) return False.
 
             /* in dump mode I think we get called on a timer, not by an actual IRQ */
                 if (!dump_mode && !xvdd->vectors.EvtChn_AckEvent(xvdd->vectors.context, xvdd->event_channel, &last_interrupt))
                         return FALSE; /* interrupt was not for us */
 
Since the event is not acked, that's why in EvtChn_EvtInterruptIsr, print out a log like "Unacknowledged event word = 0, val = 00000200"
 
12952670574140: XenPCI --> XenPci_BalloonEnableHandler
12952670574140: XenPCI     Unacknowledged event word = 0, val = 00000200 
12952670574140: XenPCI  receive balloon enable = (1308226300.21:0)
12952670574156: XenPCI     Balloon enable change to 0
12952670574156: XenPCI  successfull got BalloonEnableChangedEvent
 
I will try to take a close look EvtChn_EvtInterruptIsr to get more understanding. Thanks.

> James
--_25eb0150-f33c-457f-9e74-991e4339e802_-- --===============0815401893== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel --===============0815401893==--