From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alex Deucher Subject: Re: [Regression, post-2.6.34] Hibernation broken on machines with radeon/KMS and r300 Date: Thu, 17 Jun 2010 12:40:57 -0400 Message-ID: References: <201006141653.03879.rjw@sisk.pl> <201006162216.49756.rjw@sisk.pl> <201006171821.37060.rjw@sisk.pl> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <201006171821.37060.rjw@sisk.pl> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: linux-pm-bounces@lists.linux-foundation.org Errors-To: linux-pm-bounces@lists.linux-foundation.org To: "Rafael J. Wysocki" Cc: Ondrej Zary , linux-kernel@vger.kernel.org, dri-devel , Dave Airlie , Andrew Morton , Linus Torvalds , linux-pm@lists.linux-foundation.org List-Id: linux-pm@vger.kernel.org 2010/6/17 Rafael J. Wysocki : > On Wednesday, June 16, 2010, Alex Deucher wrote: >> On Wed, Jun 16, 2010 at 4:16 PM, Rafael J. Wysocki wrote: >> > On Wednesday, June 16, 2010, Ondrej Zary wrote: >> >> On Wednesday 16 June 2010, Rafael J. Wysocki wrote: >> >> > On Tuesday, June 15, 2010, Rafael J. Wysocki wrote: >> >> > > On Monday, June 14, 2010, Alex Deucher wrote: >> >> > > > On Mon, Jun 14, 2010 at 3:03 PM, Rafael J. Wysocki wrote: >> >> > > > > On Monday, June 14, 2010, Alex Deucher wrote: >> >> > > > >> On Mon, Jun 14, 2010 at 10:53 AM, Rafael J. Wysocki >> >> wrote: >> >> > > > >> > Alex, Dave, >> >> > > > >> > >> >> > > > >> > I'm afraid hibernation is broken on all machines using rad= eon/KMS >> >> > > > >> > with r300 after commit ce8f53709bf440100cb9d31b1303291551c= f517f >> >> > > > >> > (drm/radeon/kms/pm: rework power management). =A0At least,= I'm able >> >> > > > >> > to reproduce the symptom, which is that the machine hangs = hard >> >> > > > >> > around the point where an image is created (probably durin= g the >> >> > > > >> > device thaw phase), on two different boxes with r300 (the = output >> >> > > > >> > of lspci from one of them is attached for reference, the o= ther one >> >> > > > >> > is HP nx6325). >> >> > > > >> > >> >> > > > >> > Suspend to RAM appears to work fine at least on one of the >> >> > > > >> > affected boxes. >> >> > > > >> > >> >> > > > >> > Unfortunately, the commit above changes a lot of code and = it's not >> >> > > > >> > too easy to figure out what's wrong with it and I didn't h= ave the >> >> > > > >> > time to look more into details of this failure. =A0However= , it looks >> >> > > > >> > like you use .suspend() and .resume() callbacks as .freeze= () and >> >> > > > >> > .thaw() which may not be 100% correct (in fact it looks li= ke the >> >> > > > >> > "legacy" PCI suspend/resume is used, which is not recommen= ded any >> >> > > > >> > more). >> >> > > > >> >> >> > > > >> Does it work any better after Dave's last drm pull request? >> >> > > > > >> >> > > > > Nope. =A0The symptom is slightly different, though, because n= ow it >> >> > > > > hangs after turning off the screen. >> >> > > > > >> >> > > > >> With the latest changes, pm should not be a factor unless it= 's >> >> > > > >> explicitly enabled via sysfs. >> >> > > > > >> >> > > > > Well, I guess the first pm patch changed more than just pm, t= hen. >> >> > > > >> >> > > > Does this patch help? >> >> > > > http://lists.freedesktop.org/archives/dri-devel/2010-June/00131= 4.html >> >> > > >> >> > > No, it doesn't. =A0I try to hibernate, everything works to the po= int where >> >> > > the screen goes off and the box hangs (solid). =A0Normally, it wo= uld turn >> >> > > the screen back on and continue with saving the image. >> >> > > >> >> > > But, since that happens with the patch above applied, I think it = doesn't >> >> > > really pass the suspend phase (IOW, it probably hangs somewhere i= n the >> >> > > radeon's suspend routine). >> >> > >> >> > I've just verified that in fact hibernation works on HP nx6325 with >> >> > 2.6.35-rc3, but it takes about 55 sec. to suspend the graphics adap= ter in >> >> > the "freeze" phase. =A0Surprisingly enough, during suspend to RAM i= t works >> >> > normally (as well as in the "poweroff" phase of hibernation). >> >> >> >> It takes 2 minutes on RV530: >> >> https://bugzilla.redhat.com/show_bug.cgi?id=3D586522 >> > >> > Well, my second affected box appears to hang somewhere in the radeon's= suspend >> > routine. >> >> Does the attached patch help? > > It helps, but from what I can see in the code, it still has a few problem= s. > > First, the mutex around cancel_delayed_work() in radeon_pm_suspend() > doesn't really serve any purpose, because rdev->pm.pm_method cannot change > at this point and cancel_delayed_work() only tries to delete the work's t= imer. > Moreover, it doesn't prevent the work handler from running, in which case= the > handler can do some wrong things and will rearm itself to do some more wr= ong > things going forward. =A0So, I think it's better to wait for the handler = to run in case it's > already been queued up and it should also be prevented from rearming itse= lf in > that case. > > Second, in radeon_set_pm_method() the cancel_delayed_work() is not suffic= ient > to prevent the work handler from running and queing up itself for the nex= t run > (the failure scenario is that cancel_delayed_work_sync() returns 0, so the > handler is run, it waits on the mutex and then rearms itself after the mu= tex > has been released), so it looks like cancel_delayed_work_sync() > should be used to make sure it's not going to run again, but calling > that cancel_delayed_work_sync() from under the mutex is not a good idea. > > Finally, there's a potential deadlock in radeon_pm_fini(), where > cancel_delayed_work_sync() is called under rdev->pm.mutex, but the > work handler tries to acquire the same mutex (if it wins the race). > > So, I think something like the appended patch is needed. > Looks reasonable. Does it fix the suspend issue? Alex > Rafael > > > Signed-off-by: Rafael J. Wysocki > --- > =A0drivers/gpu/drm/radeon/radeon.h =A0 =A0| =A0 =A03 +- > =A0drivers/gpu/drm/radeon/radeon_pm.c | =A0 41 ++++++++++++++++++++++++++= ++++------- > =A02 files changed, 36 insertions(+), 8 deletions(-) > > Index: linux-2.6/drivers/gpu/drm/radeon/radeon_pm.c > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- linux-2.6.orig/drivers/gpu/drm/radeon/radeon_pm.c > +++ linux-2.6/drivers/gpu/drm/radeon/radeon_pm.c > @@ -397,13 +397,20 @@ static ssize_t radeon_set_pm_method(stru > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0rdev->pm.dynpm_planned_action =3D DYNPM_AC= TION_DEFAULT; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mutex_unlock(&rdev->pm.mutex); > =A0 =A0 =A0 =A0} else if (strncmp("profile", buf, strlen("profile")) =3D= =3D 0) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 bool flush_wq =3D false; > + > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mutex_lock(&rdev->pm.mutex); > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 rdev->pm.pm_method =3D PM_METHOD_PROFILE; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (rdev->pm.pm_method =3D=3D PM_METHOD_DYN= PM) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 cancel_delayed_work(&rdev->= pm.dynpm_idle_work); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 flush_wq =3D true; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 } > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0/* disable dynpm */ > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0rdev->pm.dynpm_state =3D DYNPM_STATE_DISAB= LED; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0rdev->pm.dynpm_planned_action =3D DYNPM_AC= TION_NONE; > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 cancel_delayed_work(&rdev->pm.dynpm_idle_wo= rk); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 rdev->pm.pm_method =3D PM_METHOD_PROFILE; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mutex_unlock(&rdev->pm.mutex); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (flush_wq) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 flush_workqueue(rdev->wq); > =A0 =A0 =A0 =A0} else { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0DRM_ERROR("invalid power method!\n"); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0goto fail; > @@ -418,9 +425,18 @@ static DEVICE_ATTR(power_method, S_IRUGO > > =A0void radeon_pm_suspend(struct radeon_device *rdev) > =A0{ > + =A0 =A0 =A0 bool flush_wq =3D false; > + > =A0 =A0 =A0 =A0mutex_lock(&rdev->pm.mutex); > - =A0 =A0 =A0 cancel_delayed_work(&rdev->pm.dynpm_idle_work); > + =A0 =A0 =A0 if (rdev->pm.pm_method =3D=3D PM_METHOD_DYNPM) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 cancel_delayed_work(&rdev->pm.dynpm_idle_wo= rk); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (rdev->pm.dynpm_state =3D=3D DYNPM_STATE= _ACTIVE) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 rdev->pm.dynpm_state =3D DY= NPM_STATE_SUSPENDED; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 flush_wq =3D true; > + =A0 =A0 =A0 } > =A0 =A0 =A0 =A0mutex_unlock(&rdev->pm.mutex); > + =A0 =A0 =A0 if (flush_wq) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 flush_workqueue(rdev->wq); > =A0} > > =A0void radeon_pm_resume(struct radeon_device *rdev) > @@ -432,6 +448,12 @@ void radeon_pm_resume(struct radeon_devi > =A0 =A0 =A0 =A0rdev->pm.current_sclk =3D rdev->clock.default_sclk; > =A0 =A0 =A0 =A0rdev->pm.current_mclk =3D rdev->clock.default_mclk; > =A0 =A0 =A0 =A0rdev->pm.current_vddc =3D rdev->pm.power_state[rdev->pm.de= fault_power_state_index].clock_info[0].voltage.voltage; > + =A0 =A0 =A0 if (rdev->pm.pm_method =3D=3D PM_METHOD_DYNPM > + =A0 =A0 =A0 =A0 =A0 && rdev->pm.dynpm_state =3D=3D DYNPM_STATE_SUSPENDE= D) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 rdev->pm.dynpm_state =3D DYNPM_STATE_ACTIVE; > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 queue_delayed_work(rdev->wq, &rdev->pm.dynp= m_idle_work, > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 msecs_to_jiffies(RADEON_IDLE_LOOP_MS)); > + =A0 =A0 =A0 } > =A0 =A0 =A0 =A0mutex_unlock(&rdev->pm.mutex); > =A0 =A0 =A0 =A0radeon_pm_compute_clocks(rdev); > =A0} > @@ -486,6 +508,8 @@ int radeon_pm_init(struct radeon_device > =A0void radeon_pm_fini(struct radeon_device *rdev) > =A0{ > =A0 =A0 =A0 =A0if (rdev->pm.num_power_states > 1) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 bool flush_wq =3D false; > + > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mutex_lock(&rdev->pm.mutex); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (rdev->pm.pm_method =3D=3D PM_METHOD_PR= OFILE) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0rdev->pm.profile =3D PM_PR= OFILE_DEFAULT; > @@ -493,13 +517,16 @@ void radeon_pm_fini(struct radeon_device > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0radeon_pm_set_clocks(rdev); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} else if (rdev->pm.pm_method =3D=3D PM_ME= THOD_DYNPM) { > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0/* cancel work */ > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 cancel_delayed_work_sync(&r= dev->pm.dynpm_idle_work); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 cancel_delayed_work(&rdev->= pm.dynpm_idle_work); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 flush_wq =3D true; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0/* reset default clocks */ > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0rdev->pm.dynpm_state =3D D= YNPM_STATE_DISABLED; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0rdev->pm.dynpm_planned_act= ion =3D DYNPM_ACTION_DEFAULT; > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0radeon_pm_set_clocks(rdev); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mutex_unlock(&rdev->pm.mutex); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (flush_wq) > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 flush_workqueue(rdev->wq); > > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0device_remove_file(rdev->dev, &dev_attr_po= wer_profile); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0device_remove_file(rdev->dev, &dev_attr_po= wer_method); > @@ -720,12 +747,12 @@ static void radeon_dynpm_idle_work_handl > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0radeon_pm_get_dynpm_state(= rdev); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0radeon_pm_set_clocks(rdev); > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0} > + > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 queue_delayed_work(rdev->wq, &rdev->pm.dynp= m_idle_work, > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 msecs_to_jiffies(RADEON_IDLE_LOOP_MS)); > =A0 =A0 =A0 =A0} > =A0 =A0 =A0 =A0mutex_unlock(&rdev->pm.mutex); > =A0 =A0 =A0 =A0ttm_bo_unlock_delayed_workqueue(&rdev->mman.bdev, resched); > - > - =A0 =A0 =A0 queue_delayed_work(rdev->wq, &rdev->pm.dynpm_idle_work, > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 msecs_to_jiffies(RADEON_IDLE_LOOP_MS)); > =A0} > > =A0/* > Index: linux-2.6/drivers/gpu/drm/radeon/radeon.h > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > --- linux-2.6.orig/drivers/gpu/drm/radeon/radeon.h > +++ linux-2.6/drivers/gpu/drm/radeon/radeon.h > @@ -619,7 +619,8 @@ enum radeon_dynpm_state { > =A0 =A0 =A0 =A0DYNPM_STATE_DISABLED, > =A0 =A0 =A0 =A0DYNPM_STATE_MINIMUM, > =A0 =A0 =A0 =A0DYNPM_STATE_PAUSED, > - =A0 =A0 =A0 DYNPM_STATE_ACTIVE > + =A0 =A0 =A0 DYNPM_STATE_ACTIVE, > + =A0 =A0 =A0 DYNPM_STATE_SUSPENDED, > =A0}; > =A0enum radeon_dynpm_action { > =A0 =A0 =A0 =A0DYNPM_ACTION_NONE, >