* [PATCH v2 0/2] drivers: devfreq: fix and optimize workqueue mechanism [not found] <CGME20190211153030eucas1p19bd9a7eca565ca066ab00dc2243cfb46@eucas1p1.samsung.com> @ 2019-02-11 15:30 ` Lukasz Luba [not found] ` <CGME20190211153035eucas1p12ecdd3289a20ce9fb28588ba20869c60@eucas1p1.samsung.com> ` (2 more replies) 0 siblings, 3 replies; 20+ messages in thread From: Lukasz Luba @ 2019-02-11 15:30 UTC (permalink / raw) To: linux-kernel, linux-pm Cc: b.zolnierkie, myungjoo.ham, cw00.choi, kyungmin.park, m.szyprowski, s.nawrocki, tkjos, joel, chris.diamand, mka, Lukasz Luba This patch set changes workqueue related features in devfreq framework. First patch switches to delayed work instead of deferred. The second switches to regular system work and deletes custom 'devfreq'. Using deferred work in this context might harm the system performance. When the CPU enters idle, deferred work is not fired. The devfreq device's utilization does not have to be connected with a particular CPU. The drivers for GPUs, Network on Chip, cache L3 rely on devfreq governor. They all are missing opportunity to check the HW state and react when the deferred work is not fired. A corner test case, when Dynamic Memory Controller is utilized by CPUs running on full speed, might show x5 worse performance if the crucial CPU is in idle. Changes: v2: - single patch split into two - added cover letter link for the previous version and discussion: https://marc.info/?l=linux-pm&m=154904631226997&w=2 Regards, Lukasz Luba Lukasz Luba (2): drivers: devfreq: change devfreq workqueue mechanism drivers: devfreq: change deferred work into delayed drivers/devfreq/devfreq.c | 27 +++++++-------------------- 1 file changed, 7 insertions(+), 20 deletions(-) -- 2.7.4 ^ permalink raw reply [flat|nested] 20+ messages in thread
[parent not found: <CGME20190211153035eucas1p12ecdd3289a20ce9fb28588ba20869c60@eucas1p1.samsung.com>]
* [PATCH v2 1/2] drivers: devfreq: change devfreq workqueue mechanism [not found] ` <CGME20190211153035eucas1p12ecdd3289a20ce9fb28588ba20869c60@eucas1p1.samsung.com> @ 2019-02-11 15:30 ` Lukasz Luba 2019-02-11 21:42 ` Matthias Kaehlcke 0 siblings, 1 reply; 20+ messages in thread From: Lukasz Luba @ 2019-02-11 15:30 UTC (permalink / raw) To: linux-kernel, linux-pm Cc: b.zolnierkie, myungjoo.ham, cw00.choi, kyungmin.park, m.szyprowski, s.nawrocki, tkjos, joel, chris.diamand, mka, Lukasz Luba There is no need for creating another workqueue in the system, the existing one should meet the requirements. This patch removes devfreq's custom workqueue and uses system one. It switches from queue_delayed_work() to schedule_delayed_work(). It also does not wake up the system when it enters suspend (this functionality stays the same). Signed-off-by: Lukasz Luba <l.luba@partner.samsung.com> --- drivers/devfreq/devfreq.c | 25 ++++++------------------- 1 file changed, 6 insertions(+), 19 deletions(-) diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c index 0ae3de7..882e717 100644 --- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -31,13 +31,6 @@ static struct class *devfreq_class; -/* - * devfreq core provides delayed work based load monitoring helper - * functions. Governors can use these or can implement their own - * monitoring mechanism. - */ -static struct workqueue_struct *devfreq_wq; - /* The list of all device-devfreq governors */ static LIST_HEAD(devfreq_governor_list); /* The list of all device-devfreq */ @@ -391,8 +384,8 @@ static void devfreq_monitor(struct work_struct *work) if (err) dev_err(&devfreq->dev, "dvfs failed with (%d) error\n", err); - queue_delayed_work(devfreq_wq, &devfreq->work, - msecs_to_jiffies(devfreq->profile->polling_ms)); + schedule_delayed_work(&devfreq->work, + msecs_to_jiffies(devfreq->profile->polling_ms)); mutex_unlock(&devfreq->lock); } @@ -409,7 +402,7 @@ void devfreq_monitor_start(struct devfreq *devfreq) { INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor); if (devfreq->profile->polling_ms) - queue_delayed_work(devfreq_wq, &devfreq->work, + schedule_delayed_work(&devfreq->work, msecs_to_jiffies(devfreq->profile->polling_ms)); } EXPORT_SYMBOL(devfreq_monitor_start); @@ -473,7 +466,7 @@ void devfreq_monitor_resume(struct devfreq *devfreq) if (!delayed_work_pending(&devfreq->work) && devfreq->profile->polling_ms) - queue_delayed_work(devfreq_wq, &devfreq->work, + schedule_delayed_work(&devfreq->work, msecs_to_jiffies(devfreq->profile->polling_ms)); devfreq->last_stat_updated = jiffies; @@ -516,7 +509,7 @@ void devfreq_interval_update(struct devfreq *devfreq, unsigned int *delay) /* if current delay is zero, start polling with new delay */ if (!cur_delay) { - queue_delayed_work(devfreq_wq, &devfreq->work, + schedule_delayed_work(&devfreq->work, msecs_to_jiffies(devfreq->profile->polling_ms)); goto out; } @@ -527,7 +520,7 @@ void devfreq_interval_update(struct devfreq *devfreq, unsigned int *delay) cancel_delayed_work_sync(&devfreq->work); mutex_lock(&devfreq->lock); if (!devfreq->stop_polling) - queue_delayed_work(devfreq_wq, &devfreq->work, + schedule_delayed_work(&devfreq->work, msecs_to_jiffies(devfreq->profile->polling_ms)); } out: @@ -1430,12 +1423,6 @@ static int __init devfreq_init(void) return PTR_ERR(devfreq_class); } - devfreq_wq = create_freezable_workqueue("devfreq_wq"); - if (!devfreq_wq) { - class_destroy(devfreq_class); - pr_err("%s: couldn't create workqueue\n", __FILE__); - return -ENOMEM; - } devfreq_class->dev_groups = devfreq_groups; return 0; -- 2.7.4 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/2] drivers: devfreq: change devfreq workqueue mechanism 2019-02-11 15:30 ` [PATCH v2 1/2] drivers: devfreq: change devfreq " Lukasz Luba @ 2019-02-11 21:42 ` Matthias Kaehlcke 2019-02-12 11:20 ` Lukasz Luba 0 siblings, 1 reply; 20+ messages in thread From: Matthias Kaehlcke @ 2019-02-11 21:42 UTC (permalink / raw) To: Lukasz Luba Cc: linux-kernel, linux-pm, b.zolnierkie, myungjoo.ham, cw00.choi, kyungmin.park, m.szyprowski, s.nawrocki, tkjos, joel, chris.diamand Hi Lukasz, On Mon, Feb 11, 2019 at 04:30:04PM +0100, Lukasz Luba wrote: > There is no need for creating another workqueue in the system, > the existing one should meet the requirements. > This patch removes devfreq's custom workqueue and uses system one. > It switches from queue_delayed_work() to schedule_delayed_work(). > It also does not wake up the system when it enters suspend (this > functionality stays the same). > > Signed-off-by: Lukasz Luba <l.luba@partner.samsung.com> > --- > drivers/devfreq/devfreq.c | 25 ++++++------------------- > 1 file changed, 6 insertions(+), 19 deletions(-) > > diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c > index 0ae3de7..882e717 100644 > --- a/drivers/devfreq/devfreq.c > +++ b/drivers/devfreq/devfreq.c > @@ -31,13 +31,6 @@ > > static struct class *devfreq_class; > > -/* > - * devfreq core provides delayed work based load monitoring helper > - * functions. Governors can use these or can implement their own > - * monitoring mechanism. > - */ > -static struct workqueue_struct *devfreq_wq; > - > /* The list of all device-devfreq governors */ > static LIST_HEAD(devfreq_governor_list); > /* The list of all device-devfreq */ > @@ -391,8 +384,8 @@ static void devfreq_monitor(struct work_struct *work) > if (err) > dev_err(&devfreq->dev, "dvfs failed with (%d) error\n", err); > > - queue_delayed_work(devfreq_wq, &devfreq->work, > - msecs_to_jiffies(devfreq->profile->polling_ms)); > + schedule_delayed_work(&devfreq->work, > + msecs_to_jiffies(devfreq->profile->polling_ms)); > mutex_unlock(&devfreq->lock); > } > > @@ -409,7 +402,7 @@ void devfreq_monitor_start(struct devfreq *devfreq) > { > INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor); > if (devfreq->profile->polling_ms) > - queue_delayed_work(devfreq_wq, &devfreq->work, > + schedule_delayed_work(&devfreq->work, > msecs_to_jiffies(devfreq->profile->polling_ms)); > } > EXPORT_SYMBOL(devfreq_monitor_start); > @@ -473,7 +466,7 @@ void devfreq_monitor_resume(struct devfreq *devfreq) > > if (!delayed_work_pending(&devfreq->work) && > devfreq->profile->polling_ms) > - queue_delayed_work(devfreq_wq, &devfreq->work, > + schedule_delayed_work(&devfreq->work, > msecs_to_jiffies(devfreq->profile->polling_ms)); > > devfreq->last_stat_updated = jiffies; > @@ -516,7 +509,7 @@ void devfreq_interval_update(struct devfreq *devfreq, unsigned int *delay) > > /* if current delay is zero, start polling with new delay */ > if (!cur_delay) { > - queue_delayed_work(devfreq_wq, &devfreq->work, > + schedule_delayed_work(&devfreq->work, > msecs_to_jiffies(devfreq->profile->polling_ms)); > goto out; > } > @@ -527,7 +520,7 @@ void devfreq_interval_update(struct devfreq *devfreq, unsigned int *delay) > cancel_delayed_work_sync(&devfreq->work); > mutex_lock(&devfreq->lock); > if (!devfreq->stop_polling) > - queue_delayed_work(devfreq_wq, &devfreq->work, > + schedule_delayed_work(&devfreq->work, > msecs_to_jiffies(devfreq->profile->polling_ms)); > } > out: > @@ -1430,12 +1423,6 @@ static int __init devfreq_init(void) > return PTR_ERR(devfreq_class); > } > > - devfreq_wq = create_freezable_workqueue("devfreq_wq"); > - if (!devfreq_wq) { > - class_destroy(devfreq_class); > - pr_err("%s: couldn't create workqueue\n", __FILE__); > - return -ENOMEM; > - } > devfreq_class->dev_groups = devfreq_groups; > > return 0; As commented on v1, the change from a custom to a system workqueue seems reasonable to me. However this patch also changes from a freezable workqueue to a non-freezable one. C&P of my comments on v1: ``WQ_FREEZABLE`` A freezable wq participates in the freeze phase of the system suspend operations. Work items on the wq are drained and no new work item starts execution until thawed. I'm not entirely sure what the impact of this is. I imagine suspend is potentially quicker because the wq isn't drained, but could works that execute during the suspend phase be a problem? Cheers Matthias ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/2] drivers: devfreq: change devfreq workqueue mechanism 2019-02-11 21:42 ` Matthias Kaehlcke @ 2019-02-12 11:20 ` Lukasz Luba 2019-02-12 20:12 ` Matthias Kaehlcke 0 siblings, 1 reply; 20+ messages in thread From: Lukasz Luba @ 2019-02-12 11:20 UTC (permalink / raw) To: Matthias Kaehlcke Cc: linux-kernel, linux-pm, b.zolnierkie, myungjoo.ham, cw00.choi, kyungmin.park, m.szyprowski, s.nawrocki, tkjos, joel, chris.diamand Hi Matthias, On 2/11/19 10:42 PM, Matthias Kaehlcke wrote: > Hi Lukasz, > > On Mon, Feb 11, 2019 at 04:30:04PM +0100, Lukasz Luba wrote: >> There is no need for creating another workqueue in the system, >> the existing one should meet the requirements. >> This patch removes devfreq's custom workqueue and uses system one. >> It switches from queue_delayed_work() to schedule_delayed_work(). >> It also does not wake up the system when it enters suspend (this >> functionality stays the same). >> >> Signed-off-by: Lukasz Luba <l.luba@partner.samsung.com> >> --- >> drivers/devfreq/devfreq.c | 25 ++++++------------------- >> 1 file changed, 6 insertions(+), 19 deletions(-) >> >> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c >> index 0ae3de7..882e717 100644 >> --- a/drivers/devfreq/devfreq.c >> +++ b/drivers/devfreq/devfreq.c >> @@ -31,13 +31,6 @@ >> >> static struct class *devfreq_class; >> >> -/* >> - * devfreq core provides delayed work based load monitoring helper >> - * functions. Governors can use these or can implement their own >> - * monitoring mechanism. >> - */ >> -static struct workqueue_struct *devfreq_wq; >> - >> /* The list of all device-devfreq governors */ >> static LIST_HEAD(devfreq_governor_list); >> /* The list of all device-devfreq */ >> @@ -391,8 +384,8 @@ static void devfreq_monitor(struct work_struct *work) >> if (err) >> dev_err(&devfreq->dev, "dvfs failed with (%d) error\n", err); >> >> - queue_delayed_work(devfreq_wq, &devfreq->work, >> - msecs_to_jiffies(devfreq->profile->polling_ms)); >> + schedule_delayed_work(&devfreq->work, >> + msecs_to_jiffies(devfreq->profile->polling_ms)); >> mutex_unlock(&devfreq->lock); >> } >> >> @@ -409,7 +402,7 @@ void devfreq_monitor_start(struct devfreq *devfreq) >> { >> INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor); >> if (devfreq->profile->polling_ms) >> - queue_delayed_work(devfreq_wq, &devfreq->work, >> + schedule_delayed_work(&devfreq->work, >> msecs_to_jiffies(devfreq->profile->polling_ms)); >> } >> EXPORT_SYMBOL(devfreq_monitor_start); >> @@ -473,7 +466,7 @@ void devfreq_monitor_resume(struct devfreq *devfreq) >> >> if (!delayed_work_pending(&devfreq->work) && >> devfreq->profile->polling_ms) >> - queue_delayed_work(devfreq_wq, &devfreq->work, >> + schedule_delayed_work(&devfreq->work, >> msecs_to_jiffies(devfreq->profile->polling_ms)); >> >> devfreq->last_stat_updated = jiffies; >> @@ -516,7 +509,7 @@ void devfreq_interval_update(struct devfreq *devfreq, unsigned int *delay) >> >> /* if current delay is zero, start polling with new delay */ >> if (!cur_delay) { >> - queue_delayed_work(devfreq_wq, &devfreq->work, >> + schedule_delayed_work(&devfreq->work, >> msecs_to_jiffies(devfreq->profile->polling_ms)); >> goto out; >> } >> @@ -527,7 +520,7 @@ void devfreq_interval_update(struct devfreq *devfreq, unsigned int *delay) >> cancel_delayed_work_sync(&devfreq->work); >> mutex_lock(&devfreq->lock); >> if (!devfreq->stop_polling) >> - queue_delayed_work(devfreq_wq, &devfreq->work, >> + schedule_delayed_work(&devfreq->work, >> msecs_to_jiffies(devfreq->profile->polling_ms)); >> } >> out: >> @@ -1430,12 +1423,6 @@ static int __init devfreq_init(void) >> return PTR_ERR(devfreq_class); >> } >> >> - devfreq_wq = create_freezable_workqueue("devfreq_wq"); >> - if (!devfreq_wq) { >> - class_destroy(devfreq_class); >> - pr_err("%s: couldn't create workqueue\n", __FILE__); >> - return -ENOMEM; >> - } >> devfreq_class->dev_groups = devfreq_groups; >> >> return 0; > > As commented on v1, the change from a custom to a system workqueue > seems reasonable to me. However this patch also changes from a > freezable workqueue to a non-freezable one. C&P of my comments on v1: > > ``WQ_FREEZABLE`` > A freezable wq participates in the freeze phase of the system > suspend operations. Work items on the wq are drained and no > new work item starts execution until thawed. > > I'm not entirely sure what the impact of this is. > > I imagine suspend is potentially quicker because the wq isn't drained, > but could works that execute during the suspend phase be a problem? The devfreq supports suspend from v4.20-rc6, which picks OPP for a device based on its DT 'opp-suspend'. For the devices which do not choose the suspend OPP it is possible to enter that state with any frequency. Queuing work for calling governor during suspend which calculates the device's frequency for the next period is IMO not needed, The 'next period' is actually suspend and is not related to 'predicted' load by the governor. You are right, the suspend will be faster. Regards, Lukasz > > Cheers > > Matthias > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/2] drivers: devfreq: change devfreq workqueue mechanism 2019-02-12 11:20 ` Lukasz Luba @ 2019-02-12 20:12 ` Matthias Kaehlcke 2019-02-12 21:37 ` Lukasz Luba 0 siblings, 1 reply; 20+ messages in thread From: Matthias Kaehlcke @ 2019-02-12 20:12 UTC (permalink / raw) To: Lukasz Luba Cc: linux-kernel, linux-pm, b.zolnierkie, myungjoo.ham, cw00.choi, kyungmin.park, m.szyprowski, s.nawrocki, tkjos, joel, chris.diamand On Tue, Feb 12, 2019 at 12:20:42PM +0100, Lukasz Luba wrote: > Hi Matthias, > > On 2/11/19 10:42 PM, Matthias Kaehlcke wrote: > > Hi Lukasz, > > > > On Mon, Feb 11, 2019 at 04:30:04PM +0100, Lukasz Luba wrote: > >> There is no need for creating another workqueue in the system, > >> the existing one should meet the requirements. > >> This patch removes devfreq's custom workqueue and uses system one. > >> It switches from queue_delayed_work() to schedule_delayed_work(). > >> It also does not wake up the system when it enters suspend (this > >> functionality stays the same). > >> > >> Signed-off-by: Lukasz Luba <l.luba@partner.samsung.com> > >> --- > >> drivers/devfreq/devfreq.c | 25 ++++++------------------- > >> 1 file changed, 6 insertions(+), 19 deletions(-) > >> > >> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c > >> index 0ae3de7..882e717 100644 > >> --- a/drivers/devfreq/devfreq.c > >> +++ b/drivers/devfreq/devfreq.c > >> @@ -31,13 +31,6 @@ > >> > >> static struct class *devfreq_class; > >> > >> -/* > >> - * devfreq core provides delayed work based load monitoring helper > >> - * functions. Governors can use these or can implement their own > >> - * monitoring mechanism. > >> - */ > >> -static struct workqueue_struct *devfreq_wq; > >> - > >> /* The list of all device-devfreq governors */ > >> static LIST_HEAD(devfreq_governor_list); > >> /* The list of all device-devfreq */ > >> @@ -391,8 +384,8 @@ static void devfreq_monitor(struct work_struct *work) > >> if (err) > >> dev_err(&devfreq->dev, "dvfs failed with (%d) error\n", err); > >> > >> - queue_delayed_work(devfreq_wq, &devfreq->work, > >> - msecs_to_jiffies(devfreq->profile->polling_ms)); > >> + schedule_delayed_work(&devfreq->work, > >> + msecs_to_jiffies(devfreq->profile->polling_ms)); > >> mutex_unlock(&devfreq->lock); > >> } > >> > >> @@ -409,7 +402,7 @@ void devfreq_monitor_start(struct devfreq *devfreq) > >> { > >> INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor); > >> if (devfreq->profile->polling_ms) > >> - queue_delayed_work(devfreq_wq, &devfreq->work, > >> + schedule_delayed_work(&devfreq->work, > >> msecs_to_jiffies(devfreq->profile->polling_ms)); > >> } > >> EXPORT_SYMBOL(devfreq_monitor_start); > >> @@ -473,7 +466,7 @@ void devfreq_monitor_resume(struct devfreq *devfreq) > >> > >> if (!delayed_work_pending(&devfreq->work) && > >> devfreq->profile->polling_ms) > >> - queue_delayed_work(devfreq_wq, &devfreq->work, > >> + schedule_delayed_work(&devfreq->work, > >> msecs_to_jiffies(devfreq->profile->polling_ms)); > >> > >> devfreq->last_stat_updated = jiffies; > >> @@ -516,7 +509,7 @@ void devfreq_interval_update(struct devfreq *devfreq, unsigned int *delay) > >> > >> /* if current delay is zero, start polling with new delay */ > >> if (!cur_delay) { > >> - queue_delayed_work(devfreq_wq, &devfreq->work, > >> + schedule_delayed_work(&devfreq->work, > >> msecs_to_jiffies(devfreq->profile->polling_ms)); > >> goto out; > >> } > >> @@ -527,7 +520,7 @@ void devfreq_interval_update(struct devfreq *devfreq, unsigned int *delay) > >> cancel_delayed_work_sync(&devfreq->work); > >> mutex_lock(&devfreq->lock); > >> if (!devfreq->stop_polling) > >> - queue_delayed_work(devfreq_wq, &devfreq->work, > >> + schedule_delayed_work(&devfreq->work, > >> msecs_to_jiffies(devfreq->profile->polling_ms)); > >> } > >> out: > >> @@ -1430,12 +1423,6 @@ static int __init devfreq_init(void) > >> return PTR_ERR(devfreq_class); > >> } > >> > >> - devfreq_wq = create_freezable_workqueue("devfreq_wq"); > >> - if (!devfreq_wq) { > >> - class_destroy(devfreq_class); > >> - pr_err("%s: couldn't create workqueue\n", __FILE__); > >> - return -ENOMEM; > >> - } > >> devfreq_class->dev_groups = devfreq_groups; > >> > >> return 0; > > > > As commented on v1, the change from a custom to a system workqueue > > seems reasonable to me. However this patch also changes from a > > freezable workqueue to a non-freezable one. C&P of my comments on v1: > > > > ``WQ_FREEZABLE`` > > A freezable wq participates in the freeze phase of the system > > suspend operations. Work items on the wq are drained and no > > new work item starts execution until thawed. > > > > I'm not entirely sure what the impact of this is. > > > > I imagine suspend is potentially quicker because the wq isn't drained, > > but could works that execute during the suspend phase be a problem? > The devfreq supports suspend from v4.20-rc6, which picks OPP for a > device based on its DT 'opp-suspend'. For the devices which do not > choose the suspend OPP it is possible to enter that state with any > frequency. Queuing work for calling governor during suspend which > calculates the device's frequency for the next period is IMO not needed, > The 'next period' is actually suspend and is not related to > 'predicted' load by the governor. If I am not mistaken the monitor can still be running after a device was suspended: devfreq_suspend list_for_each_entry(devfreq, &devfreq_list, node) devfreq_suspend_device devfreq->governor->event_handler(devfreq, DEVFREQ_GOV_SUSPEND, NULL); According to the comment of devfreq_monitor_suspend() the function is supposed to be called by the governor in response to DEVFREQ_GOV_SUSPEND, however this doesn't seem to be universally the case: git grep devfreq_monitor_suspend drivers/devfreq/governor_simpleondemand.c: devfreq_monitor_suspend(devfreq); drivers/devfreq/tegra-devfreq.c: devfreq_monitor_suspend(devfreq); i.e. the other governors don't seem to call devfreq_monitor_suspend(). Am I missing something? Thanks Matthias ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/2] drivers: devfreq: change devfreq workqueue mechanism 2019-02-12 20:12 ` Matthias Kaehlcke @ 2019-02-12 21:37 ` Lukasz Luba 2019-02-13 0:48 ` Matthias Kaehlcke 0 siblings, 1 reply; 20+ messages in thread From: Lukasz Luba @ 2019-02-12 21:37 UTC (permalink / raw) To: Matthias Kaehlcke Cc: linux-kernel, linux-pm, b.zolnierkie, myungjoo.ham, cw00.choi, kyungmin.park, m.szyprowski, s.nawrocki, tkjos, joel, chris.diamand Hi Matthias, On 2/12/19 9:12 PM, Matthias Kaehlcke wrote: > On Tue, Feb 12, 2019 at 12:20:42PM +0100, Lukasz Luba wrote: >> Hi Matthias, >> >> On 2/11/19 10:42 PM, Matthias Kaehlcke wrote: >>> Hi Lukasz, >>> >>> On Mon, Feb 11, 2019 at 04:30:04PM +0100, Lukasz Luba wrote: >>>> There is no need for creating another workqueue in the system, >>>> the existing one should meet the requirements. >>>> This patch removes devfreq's custom workqueue and uses system one. >>>> It switches from queue_delayed_work() to schedule_delayed_work(). >>>> It also does not wake up the system when it enters suspend (this >>>> functionality stays the same). >>>> >>>> Signed-off-by: Lukasz Luba <l.luba@partner.samsung.com> >>>> --- >>>> drivers/devfreq/devfreq.c | 25 ++++++------------------- >>>> 1 file changed, 6 insertions(+), 19 deletions(-) >>>> >>>> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c >>>> index 0ae3de7..882e717 100644 >>>> --- a/drivers/devfreq/devfreq.c >>>> +++ b/drivers/devfreq/devfreq.c >>>> @@ -31,13 +31,6 @@ >>>> >>>> static struct class *devfreq_class; >>>> >>>> -/* >>>> - * devfreq core provides delayed work based load monitoring helper >>>> - * functions. Governors can use these or can implement their own >>>> - * monitoring mechanism. >>>> - */ >>>> -static struct workqueue_struct *devfreq_wq; >>>> - >>>> /* The list of all device-devfreq governors */ >>>> static LIST_HEAD(devfreq_governor_list); >>>> /* The list of all device-devfreq */ >>>> @@ -391,8 +384,8 @@ static void devfreq_monitor(struct work_struct *work) >>>> if (err) >>>> dev_err(&devfreq->dev, "dvfs failed with (%d) error\n", err); >>>> >>>> - queue_delayed_work(devfreq_wq, &devfreq->work, >>>> - msecs_to_jiffies(devfreq->profile->polling_ms)); >>>> + schedule_delayed_work(&devfreq->work, >>>> + msecs_to_jiffies(devfreq->profile->polling_ms)); >>>> mutex_unlock(&devfreq->lock); >>>> } >>>> >>>> @@ -409,7 +402,7 @@ void devfreq_monitor_start(struct devfreq *devfreq) >>>> { >>>> INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor); >>>> if (devfreq->profile->polling_ms) >>>> - queue_delayed_work(devfreq_wq, &devfreq->work, >>>> + schedule_delayed_work(&devfreq->work, >>>> msecs_to_jiffies(devfreq->profile->polling_ms)); >>>> } >>>> EXPORT_SYMBOL(devfreq_monitor_start); >>>> @@ -473,7 +466,7 @@ void devfreq_monitor_resume(struct devfreq *devfreq) >>>> >>>> if (!delayed_work_pending(&devfreq->work) && >>>> devfreq->profile->polling_ms) >>>> - queue_delayed_work(devfreq_wq, &devfreq->work, >>>> + schedule_delayed_work(&devfreq->work, >>>> msecs_to_jiffies(devfreq->profile->polling_ms)); >>>> >>>> devfreq->last_stat_updated = jiffies; >>>> @@ -516,7 +509,7 @@ void devfreq_interval_update(struct devfreq *devfreq, unsigned int *delay) >>>> >>>> /* if current delay is zero, start polling with new delay */ >>>> if (!cur_delay) { >>>> - queue_delayed_work(devfreq_wq, &devfreq->work, >>>> + schedule_delayed_work(&devfreq->work, >>>> msecs_to_jiffies(devfreq->profile->polling_ms)); >>>> goto out; >>>> } >>>> @@ -527,7 +520,7 @@ void devfreq_interval_update(struct devfreq *devfreq, unsigned int *delay) >>>> cancel_delayed_work_sync(&devfreq->work); >>>> mutex_lock(&devfreq->lock); >>>> if (!devfreq->stop_polling) >>>> - queue_delayed_work(devfreq_wq, &devfreq->work, >>>> + schedule_delayed_work(&devfreq->work, >>>> msecs_to_jiffies(devfreq->profile->polling_ms)); >>>> } >>>> out: >>>> @@ -1430,12 +1423,6 @@ static int __init devfreq_init(void) >>>> return PTR_ERR(devfreq_class); >>>> } >>>> >>>> - devfreq_wq = create_freezable_workqueue("devfreq_wq"); >>>> - if (!devfreq_wq) { >>>> - class_destroy(devfreq_class); >>>> - pr_err("%s: couldn't create workqueue\n", __FILE__); >>>> - return -ENOMEM; >>>> - } >>>> devfreq_class->dev_groups = devfreq_groups; >>>> >>>> return 0; >>> >>> As commented on v1, the change from a custom to a system workqueue >>> seems reasonable to me. However this patch also changes from a >>> freezable workqueue to a non-freezable one. C&P of my comments on v1: >>> >>> ``WQ_FREEZABLE`` >>> A freezable wq participates in the freeze phase of the system >>> suspend operations. Work items on the wq are drained and no >>> new work item starts execution until thawed. >>> >>> I'm not entirely sure what the impact of this is. >>> >>> I imagine suspend is potentially quicker because the wq isn't drained, >>> but could works that execute during the suspend phase be a problem? >> The devfreq supports suspend from v4.20-rc6, which picks OPP for a >> device based on its DT 'opp-suspend'. For the devices which do not >> choose the suspend OPP it is possible to enter that state with any >> frequency. Queuing work for calling governor during suspend which >> calculates the device's frequency for the next period is IMO not needed, >> The 'next period' is actually suspend and is not related to >> 'predicted' load by the governor. > > If I am not mistaken the monitor can still be running after a device > was suspended: > > devfreq_suspend > list_for_each_entry(devfreq, &devfreq_list, node) > devfreq_suspend_device > devfreq->governor->event_handler(devfreq, > DEVFREQ_GOV_SUSPEND, NULL); > > According to the comment of devfreq_monitor_suspend() the function is > supposed to be called by the governor in response to > DEVFREQ_GOV_SUSPEND, however this doesn't seem to be universally the case: > > git grep devfreq_monitor_suspend > drivers/devfreq/governor_simpleondemand.c: devfreq_monitor_suspend(devfreq); > drivers/devfreq/tegra-devfreq.c: devfreq_monitor_suspend(devfreq); > > i.e. the other governors don't seem to call devfreq_monitor_suspend(). > > Am I missing something? Probably not. Good catch, these governors should support case DEVFREQ_GOV_SUSPEND. The system suspend which calls 'devfreq_suspend' does it when the workqueues are frozen and sets the desired OPP for later resume. The other use use cases (like pm_suspend) might assume that these governors are ready for DEVFREQ_GOV_SUSPEND... Do you like to write a patch for them (I can test it) or should I do it? Regards, Lukasz > > Thanks > > Matthias > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 1/2] drivers: devfreq: change devfreq workqueue mechanism 2019-02-12 21:37 ` Lukasz Luba @ 2019-02-13 0:48 ` Matthias Kaehlcke 0 siblings, 0 replies; 20+ messages in thread From: Matthias Kaehlcke @ 2019-02-13 0:48 UTC (permalink / raw) To: Lukasz Luba Cc: linux-kernel, linux-pm, b.zolnierkie, myungjoo.ham, cw00.choi, kyungmin.park, m.szyprowski, s.nawrocki, tkjos, joel, chris.diamand Hi Lukasz, On Tue, Feb 12, 2019 at 10:37:20PM +0100, Lukasz Luba wrote: > Hi Matthias, > > On 2/12/19 9:12 PM, Matthias Kaehlcke wrote: > > On Tue, Feb 12, 2019 at 12:20:42PM +0100, Lukasz Luba wrote: > >> Hi Matthias, > >> > >> On 2/11/19 10:42 PM, Matthias Kaehlcke wrote: > >>> Hi Lukasz, > >>> > >>> On Mon, Feb 11, 2019 at 04:30:04PM +0100, Lukasz Luba wrote: > >>>> There is no need for creating another workqueue in the system, > >>>> the existing one should meet the requirements. > >>>> This patch removes devfreq's custom workqueue and uses system one. > >>>> It switches from queue_delayed_work() to schedule_delayed_work(). > >>>> It also does not wake up the system when it enters suspend (this > >>>> functionality stays the same). > >>>> > >>>> Signed-off-by: Lukasz Luba <l.luba@partner.samsung.com> > >>>> --- > >>>> drivers/devfreq/devfreq.c | 25 ++++++------------------- > >>>> 1 file changed, 6 insertions(+), 19 deletions(-) > >>>> > >>>> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c > >>>> index 0ae3de7..882e717 100644 > >>>> --- a/drivers/devfreq/devfreq.c > >>>> +++ b/drivers/devfreq/devfreq.c > >>>> @@ -31,13 +31,6 @@ > >>>> > >>>> static struct class *devfreq_class; > >>>> > >>>> -/* > >>>> - * devfreq core provides delayed work based load monitoring helper > >>>> - * functions. Governors can use these or can implement their own > >>>> - * monitoring mechanism. > >>>> - */ > >>>> -static struct workqueue_struct *devfreq_wq; > >>>> - > >>>> /* The list of all device-devfreq governors */ > >>>> static LIST_HEAD(devfreq_governor_list); > >>>> /* The list of all device-devfreq */ > >>>> @@ -391,8 +384,8 @@ static void devfreq_monitor(struct work_struct *work) > >>>> if (err) > >>>> dev_err(&devfreq->dev, "dvfs failed with (%d) error\n", err); > >>>> > >>>> - queue_delayed_work(devfreq_wq, &devfreq->work, > >>>> - msecs_to_jiffies(devfreq->profile->polling_ms)); > >>>> + schedule_delayed_work(&devfreq->work, > >>>> + msecs_to_jiffies(devfreq->profile->polling_ms)); > >>>> mutex_unlock(&devfreq->lock); > >>>> } > >>>> > >>>> @@ -409,7 +402,7 @@ void devfreq_monitor_start(struct devfreq *devfreq) > >>>> { > >>>> INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor); > >>>> if (devfreq->profile->polling_ms) > >>>> - queue_delayed_work(devfreq_wq, &devfreq->work, > >>>> + schedule_delayed_work(&devfreq->work, > >>>> msecs_to_jiffies(devfreq->profile->polling_ms)); > >>>> } > >>>> EXPORT_SYMBOL(devfreq_monitor_start); > >>>> @@ -473,7 +466,7 @@ void devfreq_monitor_resume(struct devfreq *devfreq) > >>>> > >>>> if (!delayed_work_pending(&devfreq->work) && > >>>> devfreq->profile->polling_ms) > >>>> - queue_delayed_work(devfreq_wq, &devfreq->work, > >>>> + schedule_delayed_work(&devfreq->work, > >>>> msecs_to_jiffies(devfreq->profile->polling_ms)); > >>>> > >>>> devfreq->last_stat_updated = jiffies; > >>>> @@ -516,7 +509,7 @@ void devfreq_interval_update(struct devfreq *devfreq, unsigned int *delay) > >>>> > >>>> /* if current delay is zero, start polling with new delay */ > >>>> if (!cur_delay) { > >>>> - queue_delayed_work(devfreq_wq, &devfreq->work, > >>>> + schedule_delayed_work(&devfreq->work, > >>>> msecs_to_jiffies(devfreq->profile->polling_ms)); > >>>> goto out; > >>>> } > >>>> @@ -527,7 +520,7 @@ void devfreq_interval_update(struct devfreq *devfreq, unsigned int *delay) > >>>> cancel_delayed_work_sync(&devfreq->work); > >>>> mutex_lock(&devfreq->lock); > >>>> if (!devfreq->stop_polling) > >>>> - queue_delayed_work(devfreq_wq, &devfreq->work, > >>>> + schedule_delayed_work(&devfreq->work, > >>>> msecs_to_jiffies(devfreq->profile->polling_ms)); > >>>> } > >>>> out: > >>>> @@ -1430,12 +1423,6 @@ static int __init devfreq_init(void) > >>>> return PTR_ERR(devfreq_class); > >>>> } > >>>> > >>>> - devfreq_wq = create_freezable_workqueue("devfreq_wq"); > >>>> - if (!devfreq_wq) { > >>>> - class_destroy(devfreq_class); > >>>> - pr_err("%s: couldn't create workqueue\n", __FILE__); > >>>> - return -ENOMEM; > >>>> - } > >>>> devfreq_class->dev_groups = devfreq_groups; > >>>> > >>>> return 0; > >>> > >>> As commented on v1, the change from a custom to a system workqueue > >>> seems reasonable to me. However this patch also changes from a > >>> freezable workqueue to a non-freezable one. C&P of my comments on v1: > >>> > >>> ``WQ_FREEZABLE`` > >>> A freezable wq participates in the freeze phase of the system > >>> suspend operations. Work items on the wq are drained and no > >>> new work item starts execution until thawed. > >>> > >>> I'm not entirely sure what the impact of this is. > >>> > >>> I imagine suspend is potentially quicker because the wq isn't drained, > >>> but could works that execute during the suspend phase be a problem? > >> The devfreq supports suspend from v4.20-rc6, which picks OPP for a > >> device based on its DT 'opp-suspend'. For the devices which do not > >> choose the suspend OPP it is possible to enter that state with any > >> frequency. Queuing work for calling governor during suspend which > >> calculates the device's frequency for the next period is IMO not needed, > >> The 'next period' is actually suspend and is not related to > >> 'predicted' load by the governor. > > > > If I am not mistaken the monitor can still be running after a device > > was suspended: > > > > devfreq_suspend > > list_for_each_entry(devfreq, &devfreq_list, node) > > devfreq_suspend_device > > devfreq->governor->event_handler(devfreq, > > DEVFREQ_GOV_SUSPEND, NULL); > > > > According to the comment of devfreq_monitor_suspend() the function is > > supposed to be called by the governor in response to > > DEVFREQ_GOV_SUSPEND, however this doesn't seem to be universally the case: > > > > git grep devfreq_monitor_suspend > > drivers/devfreq/governor_simpleondemand.c: devfreq_monitor_suspend(devfreq); > > drivers/devfreq/tegra-devfreq.c: devfreq_monitor_suspend(devfreq); > > > > i.e. the other governors don't seem to call devfreq_monitor_suspend(). > > > > Am I missing something? > Probably not. > Good catch, these governors should support case DEVFREQ_GOV_SUSPEND. > The system suspend which calls 'devfreq_suspend' does it when the > workqueues are frozen and sets the desired OPP for later resume. > The other use use cases (like pm_suspend) might assume that these > governors are ready for DEVFREQ_GOV_SUSPEND... Thanks for the confirmation! > Do you like to write a patch for them (I can test it) or should I do it? I can send a patch, testing will be appreciated :) Thanks Matthias ^ permalink raw reply [flat|nested] 20+ messages in thread
[parent not found: <CGME20190211153037eucas1p20b80e44795e3599dabd9c2fc0291c063@eucas1p2.samsung.com>]
* [PATCH v2 2/2] drivers: devfreq: change deferred work into delayed [not found] ` <CGME20190211153037eucas1p20b80e44795e3599dabd9c2fc0291c063@eucas1p2.samsung.com> @ 2019-02-11 15:30 ` Lukasz Luba 2019-02-11 21:36 ` Matthias Kaehlcke 0 siblings, 1 reply; 20+ messages in thread From: Lukasz Luba @ 2019-02-11 15:30 UTC (permalink / raw) To: linux-kernel, linux-pm Cc: b.zolnierkie, myungjoo.ham, cw00.choi, kyungmin.park, m.szyprowski, s.nawrocki, tkjos, joel, chris.diamand, mka, Lukasz Luba This patch changes deferred work to delayed work, which is now not missed when timer is put on CPU that entered idle state. The devfreq framework governor was not called, thus changing the device's frequency did not happen. Benchmarks for stressing Dynamic Memory Controller show x2 (in edge cases even x5) performance boost with this patch when 'simpleondemand_governor' is responsible for monitoring the device load and frequency changes. With this patch, the delayed work is done no mater CPUs' idle. All of the drivers in devfreq which rely on periodic, guaranteed wakeup intervals should benefit from it. Signed-off-by: Lukasz Luba <l.luba@partner.samsung.com> --- drivers/devfreq/devfreq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c index 882e717..c200b3c 100644 --- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -400,7 +400,7 @@ static void devfreq_monitor(struct work_struct *work) */ void devfreq_monitor_start(struct devfreq *devfreq) { - INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor); + INIT_DELAYED_WORK(&devfreq->work, devfreq_monitor); if (devfreq->profile->polling_ms) schedule_delayed_work(&devfreq->work, msecs_to_jiffies(devfreq->profile->polling_ms)); -- 2.7.4 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/2] drivers: devfreq: change deferred work into delayed 2019-02-11 15:30 ` [PATCH v2 2/2] drivers: devfreq: change deferred work into delayed Lukasz Luba @ 2019-02-11 21:36 ` Matthias Kaehlcke 2019-02-12 11:03 ` Lukasz Luba 0 siblings, 1 reply; 20+ messages in thread From: Matthias Kaehlcke @ 2019-02-11 21:36 UTC (permalink / raw) To: Lukasz Luba Cc: linux-kernel, linux-pm, b.zolnierkie, myungjoo.ham, cw00.choi, kyungmin.park, m.szyprowski, s.nawrocki, tkjos, joel, chris.diamand Hi Lukasz, On Mon, Feb 11, 2019 at 04:30:05PM +0100, Lukasz Luba wrote: > This patch changes deferred work to delayed work, which is now not missed > when timer is put on CPU that entered idle state. > The devfreq framework governor was not called, thus changing the device's > frequency did not happen. > Benchmarks for stressing Dynamic Memory Controller show x2 (in edge cases > even x5) performance boost with this patch when 'simpleondemand_governor' > is responsible for monitoring the device load and frequency changes. > > With this patch, the delayed work is done no mater CPUs' idle. > All of the drivers in devfreq which rely on periodic, guaranteed wakeup > intervals should benefit from it. > > Signed-off-by: Lukasz Luba <l.luba@partner.samsung.com> > --- > drivers/devfreq/devfreq.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c > index 882e717..c200b3c 100644 > --- a/drivers/devfreq/devfreq.c > +++ b/drivers/devfreq/devfreq.c > @@ -400,7 +400,7 @@ static void devfreq_monitor(struct work_struct *work) > */ > void devfreq_monitor_start(struct devfreq *devfreq) > { > - INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor); > + INIT_DELAYED_WORK(&devfreq->work, devfreq_monitor); > if (devfreq->profile->polling_ms) > schedule_delayed_work(&devfreq->work, > msecs_to_jiffies(devfreq->profile->polling_ms)); I'd suggest to swap the order of the patches in this series. Why, you may ask, if the end product is the same? This patch ([2/2]) fixes an actual problem, while IIUC [1/2] is just an improvement, the fix doesn't really depend on it. If -stable wants to integrate the fix, they also need to pick the improvement (or resolve a conflict), which might not be desired. Otherwise this looks sane to me: Reviewed-by: Matthias Kaehlcke <mka@chromium.org> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 2/2] drivers: devfreq: change deferred work into delayed 2019-02-11 21:36 ` Matthias Kaehlcke @ 2019-02-12 11:03 ` Lukasz Luba 0 siblings, 0 replies; 20+ messages in thread From: Lukasz Luba @ 2019-02-12 11:03 UTC (permalink / raw) To: Matthias Kaehlcke Cc: linux-kernel, linux-pm, b.zolnierkie, myungjoo.ham, cw00.choi, kyungmin.park, m.szyprowski, s.nawrocki, tkjos, joel, chris.diamand Hi Matthias, On 2/11/19 10:36 PM, Matthias Kaehlcke wrote: > Hi Lukasz, > > On Mon, Feb 11, 2019 at 04:30:05PM +0100, Lukasz Luba wrote: >> This patch changes deferred work to delayed work, which is now not missed >> when timer is put on CPU that entered idle state. >> The devfreq framework governor was not called, thus changing the device's >> frequency did not happen. >> Benchmarks for stressing Dynamic Memory Controller show x2 (in edge cases >> even x5) performance boost with this patch when 'simpleondemand_governor' >> is responsible for monitoring the device load and frequency changes. >> >> With this patch, the delayed work is done no mater CPUs' idle. >> All of the drivers in devfreq which rely on periodic, guaranteed wakeup >> intervals should benefit from it. >> >> Signed-off-by: Lukasz Luba <l.luba@partner.samsung.com> >> --- >> drivers/devfreq/devfreq.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c >> index 882e717..c200b3c 100644 >> --- a/drivers/devfreq/devfreq.c >> +++ b/drivers/devfreq/devfreq.c >> @@ -400,7 +400,7 @@ static void devfreq_monitor(struct work_struct *work) >> */ >> void devfreq_monitor_start(struct devfreq *devfreq) >> { >> - INIT_DEFERRABLE_WORK(&devfreq->work, devfreq_monitor); >> + INIT_DELAYED_WORK(&devfreq->work, devfreq_monitor); >> if (devfreq->profile->polling_ms) >> schedule_delayed_work(&devfreq->work, >> msecs_to_jiffies(devfreq->profile->polling_ms)); > > I'd suggest to swap the order of the patches in this series. > > Why, you may ask, if the end product is the same? This patch ([2/2]) > fixes an actual problem, while IIUC [1/2] is just an improvement, the > fix doesn't really depend on it. If -stable wants to integrate the > fix, they also need to pick the improvement (or resolve a conflict), > which might not be desired. Good point, I will reorder them. > > Otherwise this looks sane to me: > > Reviewed-by: Matthias Kaehlcke <mka@chromium.org> > > Thank you for the review. Regards, Lukasz ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 0/2] drivers: devfreq: fix and optimize workqueue mechanism 2019-02-11 15:30 ` [PATCH v2 0/2] drivers: devfreq: fix and optimize workqueue mechanism Lukasz Luba [not found] ` <CGME20190211153035eucas1p12ecdd3289a20ce9fb28588ba20869c60@eucas1p1.samsung.com> [not found] ` <CGME20190211153037eucas1p20b80e44795e3599dabd9c2fc0291c063@eucas1p2.samsung.com> @ 2019-02-12 5:46 ` Chanwoo Choi 2019-02-12 12:05 ` Lukasz Luba 2019-02-12 19:32 ` Matthias Kaehlcke 2 siblings, 2 replies; 20+ messages in thread From: Chanwoo Choi @ 2019-02-12 5:46 UTC (permalink / raw) To: Lukasz Luba, linux-kernel, linux-pm Cc: b.zolnierkie, myungjoo.ham, kyungmin.park, m.szyprowski, s.nawrocki, joel, chris.diamand, mka Hi Lukasz, On 19. 2. 12. 오전 12:30, Lukasz Luba wrote: > This patch set changes workqueue related features in devfreq framework. > First patch switches to delayed work instead of deferred. > The second switches to regular system work and deletes custom 'devfreq'. > > Using deferred work in this context might harm the system performance. > When the CPU enters idle, deferred work is not fired. The devfreq device's > utilization does not have to be connected with a particular CPU. > The drivers for GPUs, Network on Chip, cache L3 rely on devfreq governor. > They all are missing opportunity to check the HW state and react when > the deferred work is not fired. > A corner test case, when Dynamic Memory Controller is utilized by CPUs running > on full speed, might show x5 worse performance if the crucial CPU is in idle. The devfreq framework keeps the balancing between performance and power-consumption. It is wrong to focus on only either performance or power. This cover-letter focus on the only performance without any power-consumption disadvantages. It is easy to raise the performance with short sampling rate with polling modes. To get the performance, it is good as short as possible of period. Sometimes, when cpu is idle, the device might require the busy state. It is very difficult to catch the always right timing between them. Also, this patch cannot prevent the unneeded wakeup from idle state. Apparently, it only focuses on performance without considering the power-consumption disadvantage. In the embedded device, the power-consumption is very important point. We can not ignore the side effect. Always, I hope to improve the devfreq framwork more that older. But, frankly, it is difficult to agree because it only consider the performance without considering the side-effect. The power management framework always have to consider the power-consumption issue. This point is always true. > > Changes: > v2: > - single patch split into two > - added cover letter > > link for the previous version and discussion: > https://marc.info/?l=linux-pm&m=154904631226997&w=2 > > Regards, > Lukasz Luba > > Lukasz Luba (2): > drivers: devfreq: change devfreq workqueue mechanism > drivers: devfreq: change deferred work into delayed > > drivers/devfreq/devfreq.c | 27 +++++++-------------------- > 1 file changed, 7 insertions(+), 20 deletions(-) > -- Best Regards, Chanwoo Choi Samsung Electronics ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 0/2] drivers: devfreq: fix and optimize workqueue mechanism 2019-02-12 5:46 ` [PATCH v2 0/2] drivers: devfreq: fix and optimize workqueue mechanism Chanwoo Choi @ 2019-02-12 12:05 ` Lukasz Luba 2019-02-13 1:09 ` Chanwoo Choi 2019-02-12 19:32 ` Matthias Kaehlcke 1 sibling, 1 reply; 20+ messages in thread From: Lukasz Luba @ 2019-02-12 12:05 UTC (permalink / raw) To: Chanwoo Choi, linux-kernel, linux-pm Cc: b.zolnierkie, myungjoo.ham, kyungmin.park, m.szyprowski, s.nawrocki, joel, chris.diamand, mka Hi Chanwoo On 2/12/19 6:46 AM, Chanwoo Choi wrote: > Hi Lukasz, > > On 19. 2. 12. 오전 12:30, Lukasz Luba wrote: >> This patch set changes workqueue related features in devfreq framework. >> First patch switches to delayed work instead of deferred. >> The second switches to regular system work and deletes custom 'devfreq'. >> >> Using deferred work in this context might harm the system performance. >> When the CPU enters idle, deferred work is not fired. The devfreq device's >> utilization does not have to be connected with a particular CPU. >> The drivers for GPUs, Network on Chip, cache L3 rely on devfreq governor. >> They all are missing opportunity to check the HW state and react when >> the deferred work is not fired. >> A corner test case, when Dynamic Memory Controller is utilized by CPUs running >> on full speed, might show x5 worse performance if the crucial CPU is in idle. > > The devfreq framework keeps the balancing between performance > and power-consumption. It is wrong to focus on only either > performance or power. IMO it just does not work, please see my explanation below. > > This cover-letter focus on the only performance without any power-consumption > disadvantages. It is easy to raise the performance with short sampling rate > with polling modes. To get the performance, it is good as short as possible > of period. The cover-letter mentioned about missing functionality. The interface has 'polling_ms' field, which driver developer would assume works. I have test cases where it would not be called for seconds or even never. In your driver drivers/devfreq/exynos-bus.c polling_ms = 50 The driver is controlling many devices including Network-on-Chip (NOC). It is using 'simple_ondemand' governor. When it is missing opportunity to change the frequency, it can either harm the performance or power consumption, depending of the frequency the device stuck on. > > Sometimes, when cpu is idle, the device might require the busy state. > It is very difficult to catch the always right timing between them. I will try to address them in the next patch set. > > Also, this patch cannot prevent the unneeded wakeup from idle state. > Apparently, it only focuses on performance without considering > the power-consumption disadvantage. In the embedded device, > the power-consumption is very important point. We can not ignore > the side effect. Power consumption is important, but we cannot rely on randomness when we develop core features in a framework. > > Always, I hope to improve the devfreq framwork more that older. > But, frankly, it is difficult to agree because it only consider > the performance without considering the side-effect. > > The power management framework always have to consider > the power-consumption issue. This point is always true. I do agree that the power vs. performance trade-off must be considered in the devfreq framework. I have developed 2 additional patches and I am going to post them today (you can now check them on Tizen gerrit, change 198160). We cannot simply pin the *device* load with *CPU* load or idle state. It is not an implication. The device like GPU, NoC or Dynamic Memory Controller can have completely different utilization (i.e in Exynos the GPU is connected to DDR memory through NoC and DMC). Some developers who use OpenCL on GPU might be interested in this improvement. Thank you for participating in the discussion on this issue. It will need more development and iterations. In my opinion currently there is one bug in the devfreq and one missing feature to solve. Regards, Lukasz > >> >> Changes: >> v2: >> - single patch split into two >> - added cover letter >> >> link for the previous version and discussion: >> https://marc.info/?l=linux-pm&m=154904631226997&w=2 >> >> Regards, >> Lukasz Luba >> >> Lukasz Luba (2): >> drivers: devfreq: change devfreq workqueue mechanism >> drivers: devfreq: change deferred work into delayed >> >> drivers/devfreq/devfreq.c | 27 +++++++-------------------- >> 1 file changed, 7 insertions(+), 20 deletions(-) >> > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 0/2] drivers: devfreq: fix and optimize workqueue mechanism 2019-02-12 12:05 ` Lukasz Luba @ 2019-02-13 1:09 ` Chanwoo Choi 2019-02-13 10:47 ` Lukasz Luba 0 siblings, 1 reply; 20+ messages in thread From: Chanwoo Choi @ 2019-02-13 1:09 UTC (permalink / raw) To: Lukasz Luba, linux-kernel, linux-pm Cc: b.zolnierkie, myungjoo.ham, kyungmin.park, m.szyprowski, s.nawrocki, joel, chris.diamand, mka Hi Lukasz, On 19. 2. 12. 오후 9:05, Lukasz Luba wrote: > Hi Chanwoo > > On 2/12/19 6:46 AM, Chanwoo Choi wrote: >> Hi Lukasz, >> >> On 19. 2. 12. 오전 12:30, Lukasz Luba wrote: >>> This patch set changes workqueue related features in devfreq framework. >>> First patch switches to delayed work instead of deferred. >>> The second switches to regular system work and deletes custom 'devfreq'. >>> >>> Using deferred work in this context might harm the system performance. >>> When the CPU enters idle, deferred work is not fired. The devfreq device's >>> utilization does not have to be connected with a particular CPU. >>> The drivers for GPUs, Network on Chip, cache L3 rely on devfreq governor. >>> They all are missing opportunity to check the HW state and react when >>> the deferred work is not fired. >>> A corner test case, when Dynamic Memory Controller is utilized by CPUs running >>> on full speed, might show x5 worse performance if the crucial CPU is in idle. >> >> The devfreq framework keeps the balancing between performance >> and power-consumption. It is wrong to focus on only either >> performance or power. > IMO it just does not work, please see my explanation below. >> >> This cover-letter focus on the only performance without any power-consumption >> disadvantages. It is easy to raise the performance with short sampling rate >> with polling modes. To get the performance, it is good as short as possible >> of period. > The cover-letter mentioned about missing functionality. The interface > has 'polling_ms' field, which driver developer would assume works. > I have test cases where it would not be called for seconds or even > never. > In your driver drivers/devfreq/exynos-bus.c polling_ms = 50 > The driver is controlling many devices including Network-on-Chip (NOC). > It is using 'simple_ondemand' governor. When it is missing opportunity > to change the frequency, it can either harm the performance or power > consumption, depending of the frequency the device stuck on. Almost everyone knew that DVFS governor is never perfect in the linux kernel. I don't want to discuss it with this too generic opinion which doesn't include the real measured data. > >> >> Sometimes, when cpu is idle, the device might require the busy state. >> It is very difficult to catch the always right timing between them. > I will try to address them in the next patch set. >> >> Also, this patch cannot prevent the unneeded wakeup from idle state. Please answer this question. When release the real mobile product like galaxy phone, it is very important issue to remove the unneeded wakeup on idle state. >> Apparently, it only focuses on performance without considering >> the power-consumption disadvantage. In the embedded device, >> the power-consumption is very important point. We can not ignore >> the side effect. > Power consumption is important, but we cannot rely on randomness > when we develop core features in a framework. Sure, I agree that as I commented, the devfreq framework keep the balancing between performance and power-consumption. Instead, this patch only focus on the performance without considering the power-consumption side-effect. >> >> Always, I hope to improve the devfreq framwork more that older. >> But, frankly, it is difficult to agree because it only consider >> the performance without considering the side-effect. >> >> The power management framework always have to consider >> the power-consumption issue. This point is always true. > I do agree that the power vs. performance trade-off must be considered > in the devfreq framework. I have developed 2 additional patches and You should only mention the posted patches on mailing list. > I am going to post them today (you can now check them on Tizen gerrit, > change 198160). It is not good to mention the some specific gerrit. I just only review the patches on mailing list. First of all, please answer the question on above > > We cannot simply pin the *device* load with *CPU* load or idle state. > It is not an implication. > The device like GPU, NoC or Dynamic Memory Controller can have > completely different utilization (i.e in Exynos the GPU is connected > to DDR memory through NoC and DMC). In order to get the high performance, the performance of GPU depends on CPU. h/w have depended on them tightly coupled. So, it is not easy to show the just relationship between them. We need the comprehensive measured data for both performance and power-consumption on all cases without the corner cases. > Some developers who use OpenCL on GPU might be interested in this > improvement.> > Thank you for participating in the discussion on this issue. > It will need more development and iterations. > In my opinion currently there is one bug in the devfreq and one missing > feature to solve. > > Regards, > Lukasz > >> >>> >>> Changes: >>> v2: >>> - single patch split into two >>> - added cover letter >>> >>> link for the previous version and discussion: >>> https://marc.info/?l=linux-pm&m=154904631226997&w=2 >>> >>> Regards, >>> Lukasz Luba >>> >>> Lukasz Luba (2): >>> drivers: devfreq: change devfreq workqueue mechanism >>> drivers: devfreq: change deferred work into delayed >>> >>> drivers/devfreq/devfreq.c | 27 +++++++-------------------- >>> 1 file changed, 7 insertions(+), 20 deletions(-) >>> >> >> > > -- Best Regards, Chanwoo Choi Samsung Electronics ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 0/2] drivers: devfreq: fix and optimize workqueue mechanism 2019-02-13 1:09 ` Chanwoo Choi @ 2019-02-13 10:47 ` Lukasz Luba 2019-02-14 4:00 ` Chanwoo Choi 0 siblings, 1 reply; 20+ messages in thread From: Lukasz Luba @ 2019-02-13 10:47 UTC (permalink / raw) To: Chanwoo Choi, linux-kernel, linux-pm Cc: b.zolnierkie, myungjoo.ham, kyungmin.park, m.szyprowski, s.nawrocki, joel, chris.diamand, mka Hi Chanwoo, On 2/13/19 2:09 AM, Chanwoo Choi wrote: > Hi Lukasz, > > On 19. 2. 12. 오후 9:05, Lukasz Luba wrote: >> Hi Chanwoo >> >> On 2/12/19 6:46 AM, Chanwoo Choi wrote: >>> Hi Lukasz, >>> >>> On 19. 2. 12. 오전 12:30, Lukasz Luba wrote: >>>> This patch set changes workqueue related features in devfreq framework. >>>> First patch switches to delayed work instead of deferred. >>>> The second switches to regular system work and deletes custom 'devfreq'. >>>> >>>> Using deferred work in this context might harm the system performance. >>>> When the CPU enters idle, deferred work is not fired. The devfreq device's >>>> utilization does not have to be connected with a particular CPU. >>>> The drivers for GPUs, Network on Chip, cache L3 rely on devfreq governor. >>>> They all are missing opportunity to check the HW state and react when >>>> the deferred work is not fired. >>>> A corner test case, when Dynamic Memory Controller is utilized by CPUs running >>>> on full speed, might show x5 worse performance if the crucial CPU is in idle. >>> >>> The devfreq framework keeps the balancing between performance >>> and power-consumption. It is wrong to focus on only either >>> performance or power. >> IMO it just does not work, please see my explanation below. >>> >>> This cover-letter focus on the only performance without any power-consumption >>> disadvantages. It is easy to raise the performance with short sampling rate >>> with polling modes. To get the performance, it is good as short as possible >>> of period. >> The cover-letter mentioned about missing functionality. The interface >> has 'polling_ms' field, which driver developer would assume works. >> I have test cases where it would not be called for seconds or even >> never. >> In your driver drivers/devfreq/exynos-bus.c polling_ms = 50 >> The driver is controlling many devices including Network-on-Chip (NOC). >> It is using 'simple_ondemand' governor. When it is missing opportunity >> to change the frequency, it can either harm the performance or power >> consumption, depending of the frequency the device stuck on. > > Almost everyone knew that DVFS governor is never perfect in the linux kernel. > I don't want to discuss it with this too generic opinion which doesn't > include the real measured data. > >> >>> >>> Sometimes, when cpu is idle, the device might require the busy state. >>> It is very difficult to catch the always right timing between them. >> I will try to address them in the next patch set. >>> >>> Also, this patch cannot prevent the unneeded wakeup from idle state. > > Please answer this question. > > When release the real mobile product like galaxy phone, > it is very important issue to remove the unneeded wakeup on idle state. I would say that these devfreq wake-ups are important and people thought that they are periodic and rely on it. Since the devfreq does not have trace events no one knew what is actually happening inside. Profiling the whole devfreq framework just for one product is not fair. The devfreq clients are not only mobiles, there are other type of embedded devices. There are embedded devices (based on TI, iMX, etc) which are not powered from battery and are used i.e. for streaming video from camera or image recognition. > > >>> Apparently, it only focuses on performance without considering >>> the power-consumption disadvantage. In the embedded device, >>> the power-consumption is very important point. We can not ignore >>> the side effect. >> Power consumption is important, but we cannot rely on randomness >> when we develop core features in a framework. > > Sure, I agree that as I commented, the devfreq framework keep > the balancing between performance and power-consumption. > > Instead, this patch only focus on the performance without considering > the power-consumption side-effect. Please refer to patch set v3 which tries to address battery power devices. > >>> >>> Always, I hope to improve the devfreq framwork more that older. >>> But, frankly, it is difficult to agree because it only consider >>> the performance without considering the side-effect. >>> >>> The power management framework always have to consider >>> the power-consumption issue. This point is always true. >> I do agree that the power vs. performance trade-off must be considered >> in the devfreq framework. I have developed 2 additional patches and > > You should only mention the posted patches on mailing list. The patches are now posted on LKLM as v3 (after ~7h). Frankly, I do not understand your behavior. You were explicitly added on the review on Tizen kernel on these patches (from 21 Jan) before even discussion on LKLM happen. There was a few iteration and good review. I just wanted to say that it was verified and questions about power usage also appeared. Secondly, people are referring to different patches in Android kernel, ARM EAS kernel or like Matthias to LineageOS. They are even referring to some research papers or trace analyses. I have mentioned these patches and said that the same day they will be posted on LKLM (which actually happen) because they were ready. > >> I am going to post them today (you can now check them on Tizen gerrit, >> change 198160). > > It is not good to mention the some specific gerrit. I just only review > the patches on mailing list. First of all, please answer the question > on above I have already replayed: devfreq is broken, drivers for GPUs, buses cannot rely on it. Cost of a fix is in corner case: waking up CPU a few times per second. Result: reliable periodic callback for drivers. The way how it is implemented in v3 provides a tunable for driver developer which saves some power when the device is less utilized: 'polling_idle_ms'. Thermal framework also has two polling intervals: longer when the temperature is lower than threshold (i.e. 1s) and shorter when the temperature crosses threshold (i.e. 100ms). Suggestion from Matthias that we could use power efficient wq would have to involve changes in configs and verifications on probably a lot of ARM platforms. > >> >> We cannot simply pin the *device* load with *CPU* load or idle state. >> It is not an implication. >> The device like GPU, NoC or Dynamic Memory Controller can have >> completely different utilization (i.e in Exynos the GPU is connected >> to DDR memory through NoC and DMC). > > In order to get the high performance, the performance of GPU depends on CPU. > h/w have depended on them tightly coupled. So, it is not easy to show > the just relationship between them. We need the comprehensive measured data > for both performance and power-consumption on all cases without the corner cases. Are you sure that the fully loaded GPU implies that all CPUs are not in idle? What about tasks pinned to CPU cgroups? I will try create some small OpenCL kernel for Odroid XU4 and verify it. The current devfreq implementation is missing trace events. I have posted in v3 basic support. It would be a good starting point for measurements and analysis. Regards, Lukasz > >> Some developers who use OpenCL on GPU might be interested in this >> improvement.> >> Thank you for participating in the discussion on this issue. >> It will need more development and iterations. >> In my opinion currently there is one bug in the devfreq and one missing >> feature to solve. >> >> Regards, >> Lukasz >> >>> >>>> >>>> Changes: >>>> v2: >>>> - single patch split into two >>>> - added cover letter >>>> >>>> link for the previous version and discussion: >>>> https://marc.info/?l=linux-pm&m=154904631226997&w=2 >>>> >>>> Regards, >>>> Lukasz Luba >>>> >>>> Lukasz Luba (2): >>>> drivers: devfreq: change devfreq workqueue mechanism >>>> drivers: devfreq: change deferred work into delayed >>>> >>>> drivers/devfreq/devfreq.c | 27 +++++++-------------------- >>>> 1 file changed, 7 insertions(+), 20 deletions(-) >>>> >>> >>> >> >> > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 0/2] drivers: devfreq: fix and optimize workqueue mechanism 2019-02-13 10:47 ` Lukasz Luba @ 2019-02-14 4:00 ` Chanwoo Choi 0 siblings, 0 replies; 20+ messages in thread From: Chanwoo Choi @ 2019-02-14 4:00 UTC (permalink / raw) To: Lukasz Luba, linux-kernel, linux-pm Cc: b.zolnierkie, myungjoo.ham, kyungmin.park, m.szyprowski, s.nawrocki, joel, chris.diamand, mka Hi Lukasz, On 19. 2. 13. 오후 7:47, Lukasz Luba wrote: > Hi Chanwoo, > > On 2/13/19 2:09 AM, Chanwoo Choi wrote: >> Hi Lukasz, >> >> On 19. 2. 12. 오후 9:05, Lukasz Luba wrote: >>> Hi Chanwoo >>> >>> On 2/12/19 6:46 AM, Chanwoo Choi wrote: >>>> Hi Lukasz, >>>> >>>> On 19. 2. 12. 오전 12:30, Lukasz Luba wrote: >>>>> This patch set changes workqueue related features in devfreq framework. >>>>> First patch switches to delayed work instead of deferred. >>>>> The second switches to regular system work and deletes custom 'devfreq'. >>>>> >>>>> Using deferred work in this context might harm the system performance. >>>>> When the CPU enters idle, deferred work is not fired. The devfreq device's >>>>> utilization does not have to be connected with a particular CPU. >>>>> The drivers for GPUs, Network on Chip, cache L3 rely on devfreq governor. >>>>> They all are missing opportunity to check the HW state and react when >>>>> the deferred work is not fired. >>>>> A corner test case, when Dynamic Memory Controller is utilized by CPUs running >>>>> on full speed, might show x5 worse performance if the crucial CPU is in idle. >>>> >>>> The devfreq framework keeps the balancing between performance >>>> and power-consumption. It is wrong to focus on only either >>>> performance or power. >>> IMO it just does not work, please see my explanation below. >>>> >>>> This cover-letter focus on the only performance without any power-consumption >>>> disadvantages. It is easy to raise the performance with short sampling rate >>>> with polling modes. To get the performance, it is good as short as possible >>>> of period. >>> The cover-letter mentioned about missing functionality. The interface >>> has 'polling_ms' field, which driver developer would assume works. >>> I have test cases where it would not be called for seconds or even >>> never. >>> In your driver drivers/devfreq/exynos-bus.c polling_ms = 50 >>> The driver is controlling many devices including Network-on-Chip (NOC). >>> It is using 'simple_ondemand' governor. When it is missing opportunity >>> to change the frequency, it can either harm the performance or power >>> consumption, depending of the frequency the device stuck on. >> >> Almost everyone knew that DVFS governor is never perfect in the linux kernel. >> I don't want to discuss it with this too generic opinion which doesn't >> include the real measured data. >> >>> >>>> >>>> Sometimes, when cpu is idle, the device might require the busy state. >>>> It is very difficult to catch the always right timing between them. >>> I will try to address them in the next patch set. >>>> >>>> Also, this patch cannot prevent the unneeded wakeup from idle state. >> >> Please answer this question. >> >> When release the real mobile product like galaxy phone, >> it is very important issue to remove the unneeded wakeup on idle state. > I would say that these devfreq wake-ups are important and people thought > that they are periodic and rely on it. Since the devfreq does not have > trace events no one knew what is actually happening inside. I agree that adding the trace events for devfreq. But, just I mentioned the 'unneeded wakeup from idle state'. As I already replied on patch, v2 patchset don't provide the any tunable interface for the kind of work. The v2 patches just changed the work from deferrable to delayed. Someone want to keep the deferrable or someone want to use the periodic without considering idle. But, in this series, just remove the deferrable work without any choice. It is my point to discuss your patch. > Profiling the whole devfreq framework just for one product is not fair. > The devfreq clients are not only mobiles, there are other type of > embedded devices. There are embedded devices (based on TI, iMX, etc) > which are not powered from battery and are used i.e. for streaming video > from camera or image recognition. I agree absolutely. There are many kind of embedded devices. For one embedded device, I don't want to remove the previous benefit. To remove the any confusion, I mention my opinion once again. It is not proper to just remove the deferrable work without any tunable interface. Instead, if devfreq framework supports both deferrable and delayed work according to tunable interface, I agree. >> >> >>>> Apparently, it only focuses on performance without considering >>>> the power-consumption disadvantage. In the embedded device, >>>> the power-consumption is very important point. We can not ignore >>>> the side effect. >>> Power consumption is important, but we cannot rely on randomness >>> when we develop core features in a framework. >> >> Sure, I agree that as I commented, the devfreq framework keep >> the balancing between performance and power-consumption. >> >> Instead, this patch only focus on the performance without considering >> the power-consumption side-effect. > Please refer to patch set v3 which tries to address battery power > devices. I didn't object to send next patches. But, Without any agreement and enough discussion on this series, you just sent next patches. It is not proper for review sequence. > >> >>>> >>>> Always, I hope to improve the devfreq framwork more that older. >>>> But, frankly, it is difficult to agree because it only consider >>>> the performance without considering the side-effect. >>>> >>>> The power management framework always have to consider >>>> the power-consumption issue. This point is always true. >>> I do agree that the power vs. performance trade-off must be considered >>> in the devfreq framework. I have developed 2 additional patches and >> >> You should only mention the posted patches on mailing list. > The patches are now posted on LKLM as v3 (after ~7h). > Frankly, I do not understand your behavior. > You were explicitly added on the review on Tizen kernel > on these patches (from 21 Jan) before even discussion on LKLM As I mentioned, I only want to discuss something on mailing list. > happen. There was a few iteration and good review. > I just wanted to say that it was verified and questions about > power usage also appeared.> > Secondly, people are referring to different patches in Android > kernel, ARM EAS kernel or like Matthias to LineageOS. > They are even referring to some research papers or trace analyses. > I have mentioned these patches and said that the same day they will > be posted on LKLM (which actually happen) because they were ready. You seem to misunderstand. I only mentioned that didn't agree to remove the deferrable work without considering the any side-effect and any tunable interface in order to use the deferrable work. >> >>> I am going to post them today (you can now check them on Tizen gerrit, >>> change 198160). >> >> It is not good to mention the some specific gerrit. I just only review >> the patches on mailing list. First of all, please answer the question >> on above > I have already replayed: devfreq is broken, drivers for GPUs, buses > cannot rely on it. Cost of a fix is in corner case: waking up CPU > a few times per second. Result: reliable periodic callback for drivers. > The way how it is implemented in v3 provides a tunable for driver As I commented already, I don't want to just remove the deferrable work and change the style of timer from deferrable to delayed work. Instead, if devfreq framework supports both deferrable and delayed work without considering the idle state according to tunable interface, I agree. > developer which saves some power when the device is less utilized: > 'polling_idle_ms'. > Thermal framework also has two polling intervals: longer when the > temperature is lower than threshold (i.e. 1s) and shorter when the > temperature crosses threshold (i.e. 100ms). Actullay, there is critical different between thermal and DVFS framework like cpufreq, devfreq. If thermal framework doesn't check the temperature on idle state, it may cause the critical problem of embedded board or SoC. Because the external reasons can affect and raise the temperature of embedded board or SoC even if the system is on idle state. > Suggestion from Matthias that we could use power efficient wq > would have to involve changes in configs and verifications on > probably a lot of ARM platforms. >> >>> >>> We cannot simply pin the *device* load with *CPU* load or idle state. >>> It is not an implication. >>> The device like GPU, NoC or Dynamic Memory Controller can have >>> completely different utilization (i.e in Exynos the GPU is connected >>> to DDR memory through NoC and DMC). >> >> In order to get the high performance, the performance of GPU depends on CPU. >> h/w have depended on them tightly coupled. So, it is not easy to show >> the just relationship between them. We need the comprehensive measured data >> for both performance and power-consumption on all cases without the corner cases. > Are you sure that the fully loaded GPU implies that all CPUs are not in > idle? What about tasks pinned to CPU cgroups? > I will try create some small OpenCL kernel for Odroid XU4 and verify it. > > The current devfreq implementation is missing trace events. > I have posted in v3 basic support. It would be a good starting point > for measurements and analysis. > > Regards, > Lukasz >> >>> Some developers who use OpenCL on GPU might be interested in this >>> improvement.> >>> Thank you for participating in the discussion on this issue. >>> It will need more development and iterations. >>> In my opinion currently there is one bug in the devfreq and one missing >>> feature to solve. >>> >>> Regards, >>> Lukasz >>> >>>> >>>>> >>>>> Changes: >>>>> v2: >>>>> - single patch split into two >>>>> - added cover letter >>>>> >>>>> link for the previous version and discussion: >>>>> https://marc.info/?l=linux-pm&m=154904631226997&w=2 >>>>> >>>>> Regards, >>>>> Lukasz Luba >>>>> >>>>> Lukasz Luba (2): >>>>> drivers: devfreq: change devfreq workqueue mechanism >>>>> drivers: devfreq: change deferred work into delayed >>>>> >>>>> drivers/devfreq/devfreq.c | 27 +++++++-------------------- >>>>> 1 file changed, 7 insertions(+), 20 deletions(-) >>>>> >>>> >>>> >>> >>> >> >> > > -- Best Regards, Chanwoo Choi Samsung Electronics ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 0/2] drivers: devfreq: fix and optimize workqueue mechanism 2019-02-12 5:46 ` [PATCH v2 0/2] drivers: devfreq: fix and optimize workqueue mechanism Chanwoo Choi 2019-02-12 12:05 ` Lukasz Luba @ 2019-02-12 19:32 ` Matthias Kaehlcke 2019-02-12 21:20 ` Lukasz Luba 1 sibling, 1 reply; 20+ messages in thread From: Matthias Kaehlcke @ 2019-02-12 19:32 UTC (permalink / raw) To: Chanwoo Choi Cc: Lukasz Luba, linux-kernel, linux-pm, b.zolnierkie, myungjoo.ham, kyungmin.park, m.szyprowski, s.nawrocki, joel, chris.diamand, Viresh Kumar Hi, On Tue, Feb 12, 2019 at 02:46:24PM +0900, Chanwoo Choi wrote: > Hi Lukasz, > > On 19. 2. 12. 오전 12:30, Lukasz Luba wrote: > > This patch set changes workqueue related features in devfreq framework. > > First patch switches to delayed work instead of deferred. > > The second switches to regular system work and deletes custom 'devfreq'. > > > > Using deferred work in this context might harm the system performance. > > When the CPU enters idle, deferred work is not fired. The devfreq device's > > utilization does not have to be connected with a particular CPU. > > The drivers for GPUs, Network on Chip, cache L3 rely on devfreq governor. > > They all are missing opportunity to check the HW state and react when > > the deferred work is not fired. > > A corner test case, when Dynamic Memory Controller is utilized by CPUs running > > on full speed, might show x5 worse performance if the crucial CPU is in idle. > > The devfreq framework keeps the balancing between performance > and power-consumption. It is wrong to focus on only either > performance or power. > > This cover-letter focus on the only performance without any power-consumption > disadvantages. It is easy to raise the performance with short sampling rate > with polling modes. To get the performance, it is good as short as possible > of period. > > Sometimes, when cpu is idle, the device might require the busy state. > It is very difficult to catch the always right timing between them. > > Also, this patch cannot prevent the unneeded wakeup from idle state. > Apparently, it only focuses on performance without considering > the power-consumption disadvantage. In the embedded device, > the power-consumption is very important point. We can not ignore > the side effect. > > Always, I hope to improve the devfreq framwork more that older. > But, frankly, it is difficult to agree because it only consider > the performance without considering the side-effect. > > The power management framework always have to consider > the power-consumption issue. This point is always true. I missed the impact of forcing a CPU out of an idle state and/or not allowing it to enter a more power efficient state. I agree that this should be avoided. I wonder if using a power-efficient workqueue could help here: Instead of running work on the local CPU, the workqueue core asks the scheduler to provide the target CPU for the work queued on unbound workqueues (which includes those marked as power-efficient). So they will not get pinned on a single CPU as can happen with regular workqueues. https://lwn.net/Articles/731052/ Since this series also changes from a custom to system workqueue it seems worth to mention that there are power-efficient system workqueues: system_power_efficient_wq system_freezable_power_efficient_wq In case a power-efficient workqueue is suitable in principle there would still be a problem though: the feature is currently disabled by default, hence devfreq couldn't really rely on it. It is enabled in the arm64 defconfig though, so at least devices on this architecture would benefit from it. Also power-efficient workqueues might be enabled by default in the future as the scheduler becomes more energy aware. Cheers Matthias ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 0/2] drivers: devfreq: fix and optimize workqueue mechanism 2019-02-12 19:32 ` Matthias Kaehlcke @ 2019-02-12 21:20 ` Lukasz Luba 2019-02-13 0:30 ` Matthias Kaehlcke 0 siblings, 1 reply; 20+ messages in thread From: Lukasz Luba @ 2019-02-12 21:20 UTC (permalink / raw) To: Matthias Kaehlcke, Chanwoo Choi Cc: linux-kernel, linux-pm, b.zolnierkie, myungjoo.ham, kyungmin.park, m.szyprowski, s.nawrocki, joel, chris.diamand, Viresh Kumar Hi Matthias, On 2/12/19 8:32 PM, Matthias Kaehlcke wrote: > Hi, > > On Tue, Feb 12, 2019 at 02:46:24PM +0900, Chanwoo Choi wrote: >> Hi Lukasz, >> >> On 19. 2. 12. 오전 12:30, Lukasz Luba wrote: >>> This patch set changes workqueue related features in devfreq framework. >>> First patch switches to delayed work instead of deferred. >>> The second switches to regular system work and deletes custom 'devfreq'. >>> >>> Using deferred work in this context might harm the system performance. >>> When the CPU enters idle, deferred work is not fired. The devfreq device's >>> utilization does not have to be connected with a particular CPU. >>> The drivers for GPUs, Network on Chip, cache L3 rely on devfreq governor. >>> They all are missing opportunity to check the HW state and react when >>> the deferred work is not fired. >>> A corner test case, when Dynamic Memory Controller is utilized by CPUs running >>> on full speed, might show x5 worse performance if the crucial CPU is in idle. >> >> The devfreq framework keeps the balancing between performance >> and power-consumption. It is wrong to focus on only either >> performance or power. >> >> This cover-letter focus on the only performance without any power-consumption >> disadvantages. It is easy to raise the performance with short sampling rate >> with polling modes. To get the performance, it is good as short as possible >> of period. >> >> Sometimes, when cpu is idle, the device might require the busy state. >> It is very difficult to catch the always right timing between them. >> >> Also, this patch cannot prevent the unneeded wakeup from idle state. >> Apparently, it only focuses on performance without considering >> the power-consumption disadvantage. In the embedded device, >> the power-consumption is very important point. We can not ignore >> the side effect. >> >> Always, I hope to improve the devfreq framwork more that older. >> But, frankly, it is difficult to agree because it only consider >> the performance without considering the side-effect. >> >> The power management framework always have to consider >> the power-consumption issue. This point is always true. > > I missed the impact of forcing a CPU out of an idle state and/or not > allowing it to enter a more power efficient state. I agree that this > should be avoided. It would be good to have some real world scenarios for comparison: w/ and w/o this change, i.e. it is 5% or 50% more power used. I have patches that tries to mitigate wake-ups when there is small utilization. Let's make it tunable and involve driver developers. They will decide how much impact on the system power usage they introduce. > > I wonder if using a power-efficient workqueue could help here: > > Instead of running work on the local CPU, the workqueue core asks the > scheduler to provide the target CPU for the work queued on unbound > workqueues (which includes those marked as power-efficient). So they > will not get pinned on a single CPU as can happen with regular > workqueues. > > https://lwn.net/Articles/731052/ > > > Since this series also changes from a custom to system workqueue it > seems worth to mention that there are power-efficient system workqueues: > > system_power_efficient_wq > system_freezable_power_efficient_wq > > > In case a power-efficient workqueue is suitable in principle there > would still be a problem though: the feature is currently disabled by > default, hence devfreq couldn't really rely on it. It is enabled in > the arm64 defconfig though, so at least devices on this architecture > would benefit from it. Also power-efficient workqueues might be > enabled by default in the future as the scheduler becomes more energy > aware. Regarding this CPU idle cost worries. IIRC the new energy model does not even consider idle costs of the CPU. It would be good to know the measurements, i.e. worst case scenario: waking up 1 (of 4 or 8) CPU from idle 30 times per second for let's say 100 us. It is 3 ms / 1000 ms * running energy cost i.e. 250mW. Thus, 0.75mW. In my opinion it is not a big cost. In most cases the system is still doing some other work. It is worth to mention here that on mobiles when the power button is hit the full suspend is called which freezes all tasks, devices and power consumption is ~15mW. Thus, the system suspend is out of scope here. As I replayed to Chanwoon for the same email: in my opinion current devfreq is broken. It was probably developed in times where there was 1 CPU (maybe 2) and idle state of CPU would be a good hint to not to check devfreq devices. Regards, Lukasz > > Cheers > > Matthias > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 0/2] drivers: devfreq: fix and optimize workqueue mechanism 2019-02-12 21:20 ` Lukasz Luba @ 2019-02-13 0:30 ` Matthias Kaehlcke 2019-02-13 13:00 ` Lukasz Luba 0 siblings, 1 reply; 20+ messages in thread From: Matthias Kaehlcke @ 2019-02-13 0:30 UTC (permalink / raw) To: Lukasz Luba Cc: Chanwoo Choi, linux-kernel, linux-pm, b.zolnierkie, myungjoo.ham, kyungmin.park, m.szyprowski, s.nawrocki, joel, chris.diamand, Viresh Kumar Hi Lukasz, On Tue, Feb 12, 2019 at 10:20:07PM +0100, Lukasz Luba wrote: > Hi Matthias, > > On 2/12/19 8:32 PM, Matthias Kaehlcke wrote: > > Hi, > > > > On Tue, Feb 12, 2019 at 02:46:24PM +0900, Chanwoo Choi wrote: > >> Hi Lukasz, > >> > >> On 19. 2. 12. 오전 12:30, Lukasz Luba wrote: > >>> This patch set changes workqueue related features in devfreq framework. > >>> First patch switches to delayed work instead of deferred. > >>> The second switches to regular system work and deletes custom 'devfreq'. > >>> > >>> Using deferred work in this context might harm the system performance. > >>> When the CPU enters idle, deferred work is not fired. The devfreq device's > >>> utilization does not have to be connected with a particular CPU. > >>> The drivers for GPUs, Network on Chip, cache L3 rely on devfreq governor. > >>> They all are missing opportunity to check the HW state and react when > >>> the deferred work is not fired. > >>> A corner test case, when Dynamic Memory Controller is utilized by CPUs running > >>> on full speed, might show x5 worse performance if the crucial CPU is in idle. > >> > >> The devfreq framework keeps the balancing between performance > >> and power-consumption. It is wrong to focus on only either > >> performance or power. > >> > >> This cover-letter focus on the only performance without any power-consumption > >> disadvantages. It is easy to raise the performance with short sampling rate > >> with polling modes. To get the performance, it is good as short as possible > >> of period. > >> > >> Sometimes, when cpu is idle, the device might require the busy state. > >> It is very difficult to catch the always right timing between them. > >> > >> Also, this patch cannot prevent the unneeded wakeup from idle state. > >> Apparently, it only focuses on performance without considering > >> the power-consumption disadvantage. In the embedded device, > >> the power-consumption is very important point. We can not ignore > >> the side effect. > >> > >> Always, I hope to improve the devfreq framwork more that older. > >> But, frankly, it is difficult to agree because it only consider > >> the performance without considering the side-effect. > >> > >> The power management framework always have to consider > >> the power-consumption issue. This point is always true. > > > > I missed the impact of forcing a CPU out of an idle state and/or not > > allowing it to enter a more power efficient state. I agree that this > > should be avoided. > It would be good to have some real world scenarios for comparison: > w/ and w/o this change, i.e. it is 5% or 50% more power used. If you have data please share :) Though I also imagine there will be quite some variation between different systems/platforms. > I have patches that tries to mitigate wake-ups when there is small > utilization. Let's make it tunable and involve driver developers. > They will decide how much impact on the system power usage they > introduce. Great! > > I wonder if using a power-efficient workqueue could help here: > > > > Instead of running work on the local CPU, the workqueue core asks the > > scheduler to provide the target CPU for the work queued on unbound > > workqueues (which includes those marked as power-efficient). So they > > will not get pinned on a single CPU as can happen with regular > > workqueues. > > > > https://lwn.net/Articles/731052/ > > > > > > Since this series also changes from a custom to system workqueue it > > seems worth to mention that there are power-efficient system workqueues: > > > > system_power_efficient_wq > > system_freezable_power_efficient_wq > > > > > > In case a power-efficient workqueue is suitable in principle there > > would still be a problem though: the feature is currently disabled by > > default, hence devfreq couldn't really rely on it. It is enabled in > > the arm64 defconfig though, so at least devices on this architecture > > would benefit from it. Also power-efficient workqueues might be > > enabled by default in the future as the scheduler becomes more energy > > aware. > Regarding this CPU idle cost worries. > IIRC the new energy model does not even consider idle costs of the CPU. > It would be good to know the measurements, i.e. worst case scenario: > waking up 1 (of 4 or 8) CPU from idle 30 times per second for let's > say 100 us. It is 3 ms / 1000 ms * running energy cost i.e. 250mW. > Thus, 0.75mW. I'm not an expert in this area, but your example seems too optimistic You are just accounting for the pure runtime, not for the cost of entering and exiting an idle state. Let's take a SDM845 idle state as example: C0_CPU_PD: c0-power-down { ... entry-latency-us = <350>; exit-latency-us = <461>; min-residency-us = <1890>; ... }; https://patchwork.kernel.org/patch/10661319/ That's 811us for entering and exiting the idle state. At an intermediate OPP (1.8 GHz) the power consumption is 505mW, according to the Energy Model. I'm ignoring the actual execution time, since I tend to agree with you that the monitoring should be done, unless it has a really unreasonable cost. That leaves us with 30 * 811us = 24.3ms and 24.3ms / 1000 ms * 505mW = 12.3mW. > In my opinion it is not a big cost. In most cases the system is still > doing some other work. It is worth to mention here that on mobiles > when the power button is hit the full suspend is called which freezes > all tasks, devices and power consumption is ~15mW. Thus, the system > suspend is out of scope here. I agree that system suspend is out of scope. > As I replayed to Chanwoon for the same email: in my opinion current > devfreq is broken. > It was probably developed in times where there was 1 CPU (maybe 2) > and idle state of CPU would be a good hint to not to check devfreq > devices. IIUC the use of a power-efficient workqueues would address the problem of waking up a CPU in idle state, however as mentioned earlier by default this feature is disabled (except for arm64). How about switching to system_power_efficient_wq and use INIT_DELAYED_WORK if CONFIG_WQ_POWER_EFFICIENT_DEFAULT=y (or check if the workqueue in question has WQ_UNBOUND set?) and INIT_DEFERRABLE_WORK otherwise? It's not ideal, but a possible improvement. Cheers Matthias ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 0/2] drivers: devfreq: fix and optimize workqueue mechanism 2019-02-13 0:30 ` Matthias Kaehlcke @ 2019-02-13 13:00 ` Lukasz Luba 2019-02-14 20:40 ` Matthias Kaehlcke 0 siblings, 1 reply; 20+ messages in thread From: Lukasz Luba @ 2019-02-13 13:00 UTC (permalink / raw) To: Matthias Kaehlcke Cc: Chanwoo Choi, linux-kernel, linux-pm, b.zolnierkie, myungjoo.ham, kyungmin.park, m.szyprowski, s.nawrocki, joel, chris.diamand, Viresh Kumar Hi Matthias, On 2/13/19 1:30 AM, Matthias Kaehlcke wrote: > Hi Lukasz, > > On Tue, Feb 12, 2019 at 10:20:07PM +0100, Lukasz Luba wrote: >> Hi Matthias, >> >> On 2/12/19 8:32 PM, Matthias Kaehlcke wrote: >>> Hi, >>> >>> On Tue, Feb 12, 2019 at 02:46:24PM +0900, Chanwoo Choi wrote: >>>> Hi Lukasz, >>>> >>>> On 19. 2. 12. 오전 12:30, Lukasz Luba wrote: >>>>> This patch set changes workqueue related features in devfreq framework. >>>>> First patch switches to delayed work instead of deferred. >>>>> The second switches to regular system work and deletes custom 'devfreq'. >>>>> >>>>> Using deferred work in this context might harm the system performance. >>>>> When the CPU enters idle, deferred work is not fired. The devfreq device's >>>>> utilization does not have to be connected with a particular CPU. >>>>> The drivers for GPUs, Network on Chip, cache L3 rely on devfreq governor. >>>>> They all are missing opportunity to check the HW state and react when >>>>> the deferred work is not fired. >>>>> A corner test case, when Dynamic Memory Controller is utilized by CPUs running >>>>> on full speed, might show x5 worse performance if the crucial CPU is in idle. >>>> >>>> The devfreq framework keeps the balancing between performance >>>> and power-consumption. It is wrong to focus on only either >>>> performance or power. >>>> >>>> This cover-letter focus on the only performance without any power-consumption >>>> disadvantages. It is easy to raise the performance with short sampling rate >>>> with polling modes. To get the performance, it is good as short as possible >>>> of period. >>>> >>>> Sometimes, when cpu is idle, the device might require the busy state. >>>> It is very difficult to catch the always right timing between them. >>>> >>>> Also, this patch cannot prevent the unneeded wakeup from idle state. >>>> Apparently, it only focuses on performance without considering >>>> the power-consumption disadvantage. In the embedded device, >>>> the power-consumption is very important point. We can not ignore >>>> the side effect. >>>> >>>> Always, I hope to improve the devfreq framwork more that older. >>>> But, frankly, it is difficult to agree because it only consider >>>> the performance without considering the side-effect. >>>> >>>> The power management framework always have to consider >>>> the power-consumption issue. This point is always true. >>> >>> I missed the impact of forcing a CPU out of an idle state and/or not >>> allowing it to enter a more power efficient state. I agree that this >>> should be avoided. >> It would be good to have some real world scenarios for comparison: >> w/ and w/o this change, i.e. it is 5% or 50% more power used. > > If you have data please share :) I will try to measure it. I have some data which refer to CPU hotplug and generic data regarding ARM big.LITTLE. It is a mobile on my desk. When one CPU of ARM big is sent offline power drops ~12mW comparing to WFI idle which was previous state. The same for LITTLE ~12mW. When the last CPU in the cluster is sent offline, whole culster is switched off and power drops ~50mW. The LITTLE core can consume ~250mW at max speed. Energy Aware Scheduler is now merged IIRC, so if it has to choose which core wake up for idle, it will take LITTLE not big. For older platforms which has Cortex-A9 500mW is also better estimation. > > Though I also imagine there will be quite some variation between > different systems/platforms. True. > >> I have patches that tries to mitigate wake-ups when there is small >> utilization. Let's make it tunable and involve driver developers. >> They will decide how much impact on the system power usage they >> introduce. > > Great! > >>> I wonder if using a power-efficient workqueue could help here: >>> >>> Instead of running work on the local CPU, the workqueue core asks the >>> scheduler to provide the target CPU for the work queued on unbound >>> workqueues (which includes those marked as power-efficient). So they >>> will not get pinned on a single CPU as can happen with regular >>> workqueues. >>> >>> https://lwn.net/Articles/731052/ >>> >>> >>> Since this series also changes from a custom to system workqueue it >>> seems worth to mention that there are power-efficient system workqueues: >>> >>> system_power_efficient_wq >>> system_freezable_power_efficient_wq >>> >>> >>> In case a power-efficient workqueue is suitable in principle there >>> would still be a problem though: the feature is currently disabled by >>> default, hence devfreq couldn't really rely on it. It is enabled in >>> the arm64 defconfig though, so at least devices on this architecture >>> would benefit from it. Also power-efficient workqueues might be >>> enabled by default in the future as the scheduler becomes more energy >>> aware. >> Regarding this CPU idle cost worries. >> IIRC the new energy model does not even consider idle costs of the CPU. >> It would be good to know the measurements, i.e. worst case scenario: >> waking up 1 (of 4 or 8) CPU from idle 30 times per second for let's >> say 100 us. It is 3 ms / 1000 ms * running energy cost i.e. 250mW. >> Thus, 0.75mW. > > I'm not an expert in this area, but your example seems too optimistic > You are just accounting for the pure runtime, not for the cost of > entering and exiting an idle state. Let's take a SDM845 idle state as > example: > > C0_CPU_PD: c0-power-down { > ... > entry-latency-us = <350>; > exit-latency-us = <461>; > min-residency-us = <1890>; > ... > }; > > https://patchwork.kernel.org/patch/10661319/ > > That's 811us for entering and exiting the idle state. At an > intermediate OPP (1.8 GHz) the power consumption is 505mW, according > to the Energy Model. I'm ignoring the actual execution time, since I > tend to agree with you that the monitoring should be done, unless it > has a really unreasonable cost. That leaves us with 30 * 811us = > 24.3ms and 24.3ms / 1000 ms * 505mW = 12.3mW. You are probably taking ARM 'big' core wake-up from deeper that WFI idle. I was referring to ARM LITTLE 250mW. It is also not 100% that the schedule work will wake up CPU which is currently in deepest idle. A short array would create a better picture of the use cases. The question is also probability of occurrence for each of these cases. For first two CPU state it would be a power cost lost during additional rescheduling to/from workqueue task, which takes i.e. 2*5 us * 30 times. CPU state ->| running | idle | idle clock | idle, pwr | ------------| (C0) | WFI (C1)| gated (C2)| gated (C3) | architecture| | | | | ------V----------------------------------------------------- ARM big | <1mW | <1mW | ~12mW | ~12mW | ARM LITTLE | <1mW | <1mW | ~6mW | ~6mW | MIPS PowerPC > >> In my opinion it is not a big cost. In most cases the system is still >> doing some other work. It is worth to mention here that on mobiles >> when the power button is hit the full suspend is called which freezes >> all tasks, devices and power consumption is ~15mW. Thus, the system >> suspend is out of scope here. > > I agree that system suspend is out of scope. > >> As I replayed to Chanwoon for the same email: in my opinion current >> devfreq is broken. >> It was probably developed in times where there was 1 CPU (maybe 2) >> and idle state of CPU would be a good hint to not to check devfreq >> devices. > > IIUC the use of a power-efficient workqueues would address the problem > of waking up a CPU in idle state, however as mentioned earlier by > default this feature is disabled (except for arm64). How about > switching to system_power_efficient_wq and use INIT_DELAYED_WORK if > CONFIG_WQ_POWER_EFFICIENT_DEFAULT=y (or check if the workqueue in > question has WQ_UNBOUND set?) and INIT_DEFERRABLE_WORK otherwise? It's > not ideal, but a possible improvement. I think it would be to complicated to maintain because different platforms might use different mechanisms. I would suggests that we could just follow mechanism in thermal framework. I have never faced any issue with delayed work there, while working on IPA. They use 'system_freezable_power_efficient_wq' and INIT_DELAYED_WORK(). https://elixir.bootlin.com/linux/v5.0-rc6/source/drivers/thermal/thermal_core.c#L293 https://elixir.bootlin.com/linux/v5.0-rc6/source/drivers/thermal/thermal_core.c#L1281 They have these two polling intervals, though. Regards, Lukasz > > Cheers > > Matthias > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v2 0/2] drivers: devfreq: fix and optimize workqueue mechanism 2019-02-13 13:00 ` Lukasz Luba @ 2019-02-14 20:40 ` Matthias Kaehlcke 0 siblings, 0 replies; 20+ messages in thread From: Matthias Kaehlcke @ 2019-02-14 20:40 UTC (permalink / raw) To: Lukasz Luba Cc: Chanwoo Choi, linux-kernel, linux-pm, b.zolnierkie, myungjoo.ham, kyungmin.park, m.szyprowski, s.nawrocki, joel, chris.diamand, Viresh Kumar Hi Lukasz, On Wed, Feb 13, 2019 at 02:00:26PM +0100, Lukasz Luba wrote: > Hi Matthias, > > On 2/13/19 1:30 AM, Matthias Kaehlcke wrote: > > Hi Lukasz, > > > > On Tue, Feb 12, 2019 at 10:20:07PM +0100, Lukasz Luba wrote: > >> Hi Matthias, > >> > >> On 2/12/19 8:32 PM, Matthias Kaehlcke wrote: > >>> Hi, > >>> > >>> On Tue, Feb 12, 2019 at 02:46:24PM +0900, Chanwoo Choi wrote: > >>>> Hi Lukasz, > >>>> > >>>> On 19. 2. 12. 오전 12:30, Lukasz Luba wrote: > >>>>> This patch set changes workqueue related features in devfreq framework. > >>>>> First patch switches to delayed work instead of deferred. > >>>>> The second switches to regular system work and deletes custom 'devfreq'. > >>>>> > >>>>> Using deferred work in this context might harm the system performance. > >>>>> When the CPU enters idle, deferred work is not fired. The devfreq device's > >>>>> utilization does not have to be connected with a particular CPU. > >>>>> The drivers for GPUs, Network on Chip, cache L3 rely on devfreq governor. > >>>>> They all are missing opportunity to check the HW state and react when > >>>>> the deferred work is not fired. > >>>>> A corner test case, when Dynamic Memory Controller is utilized by CPUs running > >>>>> on full speed, might show x5 worse performance if the crucial CPU is in idle. > >>>> > >>>> The devfreq framework keeps the balancing between performance > >>>> and power-consumption. It is wrong to focus on only either > >>>> performance or power. > >>>> > >>>> This cover-letter focus on the only performance without any power-consumption > >>>> disadvantages. It is easy to raise the performance with short sampling rate > >>>> with polling modes. To get the performance, it is good as short as possible > >>>> of period. > >>>> > >>>> Sometimes, when cpu is idle, the device might require the busy state. > >>>> It is very difficult to catch the always right timing between them. > >>>> > >>>> Also, this patch cannot prevent the unneeded wakeup from idle state. > >>>> Apparently, it only focuses on performance without considering > >>>> the power-consumption disadvantage. In the embedded device, > >>>> the power-consumption is very important point. We can not ignore > >>>> the side effect. > >>>> > >>>> Always, I hope to improve the devfreq framwork more that older. > >>>> But, frankly, it is difficult to agree because it only consider > >>>> the performance without considering the side-effect. > >>>> > >>>> The power management framework always have to consider > >>>> the power-consumption issue. This point is always true. > >>> > >>> I missed the impact of forcing a CPU out of an idle state and/or not > >>> allowing it to enter a more power efficient state. I agree that this > >>> should be avoided. > >> It would be good to have some real world scenarios for comparison: > >> w/ and w/o this change, i.e. it is 5% or 50% more power used. > > > > If you have data please share :) > I will try to measure it. I have some data which refer to CPU hotplug > and generic data regarding ARM big.LITTLE. > It is a mobile on my desk. > When one CPU of ARM big is sent offline power drops ~12mW comparing > to WFI idle which was previous state. The same for LITTLE ~12mW. > When the last CPU in the cluster is sent offline, whole culster > is switched off and power drops ~50mW. > The LITTLE core can consume ~250mW at max speed. > Energy Aware Scheduler is now merged IIRC, so if it has to choose > which core wake up for idle, it will take LITTLE not big. I'm not sure that EAS will make a difference in this case: "We queue the work to the CPU on which it was submitted, but if the CPU dies it can be processed by another CPU." (queue_work() comment). > For older platforms which has Cortex-A9 500mW is also better estimation. > > > > > Though I also imagine there will be quite some variation between > > different systems/platforms. > True. > > > >> I have patches that tries to mitigate wake-ups when there is small > >> utilization. Let's make it tunable and involve driver developers. > >> They will decide how much impact on the system power usage they > >> introduce. > > > > Great! > > > >>> I wonder if using a power-efficient workqueue could help here: > >>> > >>> Instead of running work on the local CPU, the workqueue core asks the > >>> scheduler to provide the target CPU for the work queued on unbound > >>> workqueues (which includes those marked as power-efficient). So they > >>> will not get pinned on a single CPU as can happen with regular > >>> workqueues. > >>> > >>> https://lwn.net/Articles/731052/ > >>> > >>> > >>> Since this series also changes from a custom to system workqueue it > >>> seems worth to mention that there are power-efficient system workqueues: > >>> > >>> system_power_efficient_wq > >>> system_freezable_power_efficient_wq > >>> > >>> > >>> In case a power-efficient workqueue is suitable in principle there > >>> would still be a problem though: the feature is currently disabled by > >>> default, hence devfreq couldn't really rely on it. It is enabled in > >>> the arm64 defconfig though, so at least devices on this architecture > >>> would benefit from it. Also power-efficient workqueues might be > >>> enabled by default in the future as the scheduler becomes more energy > >>> aware. > >> Regarding this CPU idle cost worries. > >> IIRC the new energy model does not even consider idle costs of the CPU. > >> It would be good to know the measurements, i.e. worst case scenario: > >> waking up 1 (of 4 or 8) CPU from idle 30 times per second for let's > >> say 100 us. It is 3 ms / 1000 ms * running energy cost i.e. 250mW. > >> Thus, 0.75mW. > > > > I'm not an expert in this area, but your example seems too optimistic > > You are just accounting for the pure runtime, not for the cost of > > entering and exiting an idle state. Let's take a SDM845 idle state as > > example: > > > > C0_CPU_PD: c0-power-down { > > ... > > entry-latency-us = <350>; > > exit-latency-us = <461>; > > min-residency-us = <1890>; > > ... > > }; > > > > https://patchwork.kernel.org/patch/10661319/ > > > > That's 811us for entering and exiting the idle state. At an > > intermediate OPP (1.8 GHz) the power consumption is 505mW, according > > to the Energy Model. I'm ignoring the actual execution time, since I > > tend to agree with you that the monitoring should be done, unless it > > has a really unreasonable cost. That leaves us with 30 * 811us = > > 24.3ms and 24.3ms / 1000 ms * 505mW = 12.3mW. > You are probably taking ARM 'big' core wake-up from deeper that WFI > idle. I was referring to ARM LITTLE 250mW. Yes, it's a big core in a deep state. > It is also not 100% that the schedule work will wake up CPU which > is currently in deepest idle. true :) > A short array would create a better picture of the use cases. > The question is also probability of occurrence for each of these cases. > For first two CPU state it would be a power cost lost during additional > rescheduling to/from workqueue task, which takes i.e. 2*5 us * 30 times. > > CPU state ->| running | idle | idle clock | idle, pwr | > ------------| (C0) | WFI (C1)| gated (C2)| gated (C3) | > architecture| | | | | > ------V----------------------------------------------------- > ARM big | <1mW | <1mW | ~12mW | ~12mW | > ARM LITTLE | <1mW | <1mW | ~6mW | ~6mW | > MIPS > PowerPC > > > > > >> In my opinion it is not a big cost. In most cases the system is still > >> doing some other work. It is worth to mention here that on mobiles > >> when the power button is hit the full suspend is called which freezes > >> all tasks, devices and power consumption is ~15mW. Thus, the system > >> suspend is out of scope here. > > > > I agree that system suspend is out of scope. > > > >> As I replayed to Chanwoon for the same email: in my opinion current > >> devfreq is broken. > >> It was probably developed in times where there was 1 CPU (maybe 2) > >> and idle state of CPU would be a good hint to not to check devfreq > >> devices. > > > > IIUC the use of a power-efficient workqueues would address the problem > > of waking up a CPU in idle state, however as mentioned earlier by > > default this feature is disabled (except for arm64). How about > > switching to system_power_efficient_wq and use INIT_DELAYED_WORK if > > CONFIG_WQ_POWER_EFFICIENT_DEFAULT=y (or check if the workqueue in > > question has WQ_UNBOUND set?) and INIT_DEFERRABLE_WORK otherwise? It's > > not ideal, but a possible improvement. > I think it would be to complicated to maintain because different > platforms might use different mechanisms. > I would suggests that we could just follow mechanism in thermal > framework. I have never faced any issue with delayed work there, > while working on IPA. > They use 'system_freezable_power_efficient_wq' and INIT_DELAYED_WORK(). > https://elixir.bootlin.com/linux/v5.0-rc6/source/drivers/thermal/thermal_core.c#L293 > https://elixir.bootlin.com/linux/v5.0-rc6/source/drivers/thermal/thermal_core.c#L1281 > They have these two polling intervals, though. I think system_power_efficient_wq would be suitable if load monitoring is stopped on suspend. In any case it seems Chanwoo wants you to keep providing at least the option of using a deferrable work. If you end up doing that it probably would make sense to use always a delayed work for CONFIG_WQ_POWER_EFFICIENT_DEFAULT=y. Cheers Matthias ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2019-02-14 20:41 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <CGME20190211153030eucas1p19bd9a7eca565ca066ab00dc2243cfb46@eucas1p1.samsung.com> 2019-02-11 15:30 ` [PATCH v2 0/2] drivers: devfreq: fix and optimize workqueue mechanism Lukasz Luba [not found] ` <CGME20190211153035eucas1p12ecdd3289a20ce9fb28588ba20869c60@eucas1p1.samsung.com> 2019-02-11 15:30 ` [PATCH v2 1/2] drivers: devfreq: change devfreq " Lukasz Luba 2019-02-11 21:42 ` Matthias Kaehlcke 2019-02-12 11:20 ` Lukasz Luba 2019-02-12 20:12 ` Matthias Kaehlcke 2019-02-12 21:37 ` Lukasz Luba 2019-02-13 0:48 ` Matthias Kaehlcke [not found] ` <CGME20190211153037eucas1p20b80e44795e3599dabd9c2fc0291c063@eucas1p2.samsung.com> 2019-02-11 15:30 ` [PATCH v2 2/2] drivers: devfreq: change deferred work into delayed Lukasz Luba 2019-02-11 21:36 ` Matthias Kaehlcke 2019-02-12 11:03 ` Lukasz Luba 2019-02-12 5:46 ` [PATCH v2 0/2] drivers: devfreq: fix and optimize workqueue mechanism Chanwoo Choi 2019-02-12 12:05 ` Lukasz Luba 2019-02-13 1:09 ` Chanwoo Choi 2019-02-13 10:47 ` Lukasz Luba 2019-02-14 4:00 ` Chanwoo Choi 2019-02-12 19:32 ` Matthias Kaehlcke 2019-02-12 21:20 ` Lukasz Luba 2019-02-13 0:30 ` Matthias Kaehlcke 2019-02-13 13:00 ` Lukasz Luba 2019-02-14 20:40 ` Matthias Kaehlcke
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).