From mboxrd@z Thu Jan 1 00:00:00 1970 From: Rajendra Nayak Subject: Re: Boot hang regression 3.10.0-rc4 -> 3.10.0 Date: Tue, 9 Jul 2013 11:03:54 +0530 Message-ID: <51DBA0C2.6030003@ti.com> References: <51D59146.3070002@newflow.co.uk> <51D59C0E.8080003@newflow.co.uk> <20130705115959.GQ5523@atomide.com> <20130708112553.GU5523@atomide.com> <51DAB394.3050104@ti.com> <20130708131033.GA5523@atomide.com> <51DABC81.3080409@ti.com> <20130708133512.GD31221@arwen.pp.htv.fi> Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit Return-path: Received: from devils.ext.ti.com ([198.47.26.153]:44554 "EHLO devils.ext.ti.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751185Ab3GIFeY (ORCPT ); Tue, 9 Jul 2013 01:34:24 -0400 In-Reply-To: <20130708133512.GD31221@arwen.pp.htv.fi> Sender: linux-omap-owner@vger.kernel.org List-Id: linux-omap@vger.kernel.org To: balbi@ti.com Cc: Tony Lindgren , "Bedia, Vaibhav" , "linux-omap@vger.kernel.org" , "linux-arm-kernel@lists.infradead.org" , Mark Jackson , Sourav Poddar , Paul Walmsley On Monday 08 July 2013 07:05 PM, Felipe Balbi wrote: > Hi, > > On Mon, Jul 08, 2013 at 06:50:01PM +0530, Rajendra Nayak wrote: >>>>>>>> I wonder if this is because the timeouts get now initialized to 0 instead >>>>>>>> of -1 for the serial driver? >>>>>>>> >>>>>>> >>>>>>> You meant initialized to -1, right? There's an additional check for timeout being 0. Unless i >>>>>>> am missing something DT-boot will start off with timeout set to 0 and then get forced to -1. >>>>> >>>>> OK >>>> >>>> Issue 2: Causing boot to stop when serial driver is initialized. >>>> (After Issue 1 is fixed) >>>> >>>> I could narrow this down to the change done to return -EINVAL >>>> instead of 0 in serial_omap_get_context_loss_count() as part of >>>> commit 'a630fbfbb1beeffc5bbe542a7986bf2068874633' "serial: omap: >>>> Fix device tree based PM runtime" >>>> >>>> What this change in turn seems to do is cause a >>>> serial_omap_restore_context() to get called as part of >>>> serial_omap_runtime_resume() which was not the case when >>>> serial_omap_get_context_loss_count() returned 0 >>>> >>>> from serial_omap_runtime_resume(): >>>> ----- >>>> int loss_cnt = serial_omap_get_context_loss_count(up); >>>> >>>> if (loss_cnt < 0) { >>>> dev_dbg(dev, "serial_omap_get_context_loss_count failed : %d\n", >>>> loss_cnt); >>>> serial_omap_restore_context(up); >>>> } else if (up->context_loss_cnt != loss_cnt) { >>>> serial_omap_restore_context(up); >>>> } >>>> ----- >>>> >>>> I am still working on why a serial_omap_restore_context() could >>>> have caused console to die. I will work with Sourav on this and >>>> post the fixes for both issue 1 and issue2 once its clear on whats >>>> really causing issue 2. >>> >>> That's because we don't have the omap specific pdata callbacks for >>> context loss any longer. We may be able to detect when the context >>> was really lost in the serial driver, and only then call the >>> serial_omap_restore_context(). >> >> Right, but calling serial_omap_restore_context() even when the context >> is not lost, should not ideally cause an issue. > > it does in one condition. If context hasn't been saved before. And that > can happen in the case of wrong pm runtime status for that device. > > Imagine the device is marked as suspended even though it's fully enabled > (it hasn't been suspended by hwmod due to NO_IDLE flag). In that case > your context structure is all zeroes (context has never been saved > before) then when you call pm_runtime_get_sync() on probe() your > ->runtime_resume() will get called, which will restore context, > essentially undoing anything which was configured by u-boot. This could be a problem for drivers which do a save context in ->runtime_suspend() but from what I see with omap serial, there is no save context done as part of ->runtime_suspend. > > Am I missing something ? > >>>> Let me know if the fix I listed for Issue 1: makes sense. >>> >>> Yes makes sense as a fix, but IMHO we should not need any workarounds >>> like that. Is the hwmod code idling the the uarts early? If so, then >>> it should only do that in a late_initcall if no drivers are registered. >> >> hwmod as part of its setup (early) enables/resets and idles all modules. >> These flags are used to tell hwmod to avoid a reset and idle and leave the >> module enabled (in this case console uart) > > then it needs to call pm_runtime_set_active() for those devices which > have that flag set, right ? > > (completely untested, didn't even try to compile, just to illustrate) > > diff --git a/arch/arm/mach-omap2/omap_hwmod.c b/arch/arm/mach-omap2/omap_hwmod.c > index 7341eff..d8dca68 100644 > --- a/arch/arm/mach-omap2/omap_hwmod.c > +++ b/arch/arm/mach-omap2/omap_hwmod.c > @@ -2559,6 +2559,12 @@ static void __init _setup_postsetup(struct omap_hwmod *oh) > (postsetup_state == _HWMOD_STATE_IDLE)) { > oh->_int_flags |= _HWMOD_SKIP_ENABLE; > postsetup_state = _HWMOD_STATE_ENABLED; > + > + /* tell pm_runtime this device is already active */ > + pm_runtime_set_active(&oh->od->pdev->dev); > + } else { > + /* tell pm_runtime this device is trully suspended */ > + pm_runtime_set_suspended(&oh->od->pdev->dev); > } > > if (postsetup_state == _HWMOD_STATE_IDLE) > From mboxrd@z Thu Jan 1 00:00:00 1970 From: rnayak@ti.com (Rajendra Nayak) Date: Tue, 9 Jul 2013 11:03:54 +0530 Subject: Boot hang regression 3.10.0-rc4 -> 3.10.0 In-Reply-To: <20130708133512.GD31221@arwen.pp.htv.fi> References: <51D59146.3070002@newflow.co.uk> <51D59C0E.8080003@newflow.co.uk> <20130705115959.GQ5523@atomide.com> <20130708112553.GU5523@atomide.com> <51DAB394.3050104@ti.com> <20130708131033.GA5523@atomide.com> <51DABC81.3080409@ti.com> <20130708133512.GD31221@arwen.pp.htv.fi> Message-ID: <51DBA0C2.6030003@ti.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Monday 08 July 2013 07:05 PM, Felipe Balbi wrote: > Hi, > > On Mon, Jul 08, 2013 at 06:50:01PM +0530, Rajendra Nayak wrote: >>>>>>>> I wonder if this is because the timeouts get now initialized to 0 instead >>>>>>>> of -1 for the serial driver? >>>>>>>> >>>>>>> >>>>>>> You meant initialized to -1, right? There's an additional check for timeout being 0. Unless i >>>>>>> am missing something DT-boot will start off with timeout set to 0 and then get forced to -1. >>>>> >>>>> OK >>>> >>>> Issue 2: Causing boot to stop when serial driver is initialized. >>>> (After Issue 1 is fixed) >>>> >>>> I could narrow this down to the change done to return -EINVAL >>>> instead of 0 in serial_omap_get_context_loss_count() as part of >>>> commit 'a630fbfbb1beeffc5bbe542a7986bf2068874633' "serial: omap: >>>> Fix device tree based PM runtime" >>>> >>>> What this change in turn seems to do is cause a >>>> serial_omap_restore_context() to get called as part of >>>> serial_omap_runtime_resume() which was not the case when >>>> serial_omap_get_context_loss_count() returned 0 >>>> >>>> from serial_omap_runtime_resume(): >>>> ----- >>>> int loss_cnt = serial_omap_get_context_loss_count(up); >>>> >>>> if (loss_cnt < 0) { >>>> dev_dbg(dev, "serial_omap_get_context_loss_count failed : %d\n", >>>> loss_cnt); >>>> serial_omap_restore_context(up); >>>> } else if (up->context_loss_cnt != loss_cnt) { >>>> serial_omap_restore_context(up); >>>> } >>>> ----- >>>> >>>> I am still working on why a serial_omap_restore_context() could >>>> have caused console to die. I will work with Sourav on this and >>>> post the fixes for both issue 1 and issue2 once its clear on whats >>>> really causing issue 2. >>> >>> That's because we don't have the omap specific pdata callbacks for >>> context loss any longer. We may be able to detect when the context >>> was really lost in the serial driver, and only then call the >>> serial_omap_restore_context(). >> >> Right, but calling serial_omap_restore_context() even when the context >> is not lost, should not ideally cause an issue. > > it does in one condition. If context hasn't been saved before. And that > can happen in the case of wrong pm runtime status for that device. > > Imagine the device is marked as suspended even though it's fully enabled > (it hasn't been suspended by hwmod due to NO_IDLE flag). In that case > your context structure is all zeroes (context has never been saved > before) then when you call pm_runtime_get_sync() on probe() your > ->runtime_resume() will get called, which will restore context, > essentially undoing anything which was configured by u-boot. This could be a problem for drivers which do a save context in ->runtime_suspend() but from what I see with omap serial, there is no save context done as part of ->runtime_suspend. > > Am I missing something ? > >>>> Let me know if the fix I listed for Issue 1: makes sense. >>> >>> Yes makes sense as a fix, but IMHO we should not need any workarounds >>> like that. Is the hwmod code idling the the uarts early? If so, then >>> it should only do that in a late_initcall if no drivers are registered. >> >> hwmod as part of its setup (early) enables/resets and idles all modules. >> These flags are used to tell hwmod to avoid a reset and idle and leave the >> module enabled (in this case console uart) > > then it needs to call pm_runtime_set_active() for those devices which > have that flag set, right ? > > (completely untested, didn't even try to compile, just to illustrate) > > diff --git a/arch/arm/mach-omap2/omap_hwmod.c b/arch/arm/mach-omap2/omap_hwmod.c > index 7341eff..d8dca68 100644 > --- a/arch/arm/mach-omap2/omap_hwmod.c > +++ b/arch/arm/mach-omap2/omap_hwmod.c > @@ -2559,6 +2559,12 @@ static void __init _setup_postsetup(struct omap_hwmod *oh) > (postsetup_state == _HWMOD_STATE_IDLE)) { > oh->_int_flags |= _HWMOD_SKIP_ENABLE; > postsetup_state = _HWMOD_STATE_ENABLED; > + > + /* tell pm_runtime this device is already active */ > + pm_runtime_set_active(&oh->od->pdev->dev); > + } else { > + /* tell pm_runtime this device is trully suspended */ > + pm_runtime_set_suspended(&oh->od->pdev->dev); > } > > if (postsetup_state == _HWMOD_STATE_IDLE) >