linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Grygorii Strashko <grygorii.strashko@ti.com>
To: Keerthy <j-keerthy@ti.com>,
	Eduardo Valentin <edubezval@gmail.com>,
	Zhang Rui <rui.zhang@intel.com>
Cc: <linux-pm@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<linux-omap@vger.kernel.org>, <nm@ti.com>, <t-kristo@ti.com>
Subject: Re: [PATCH] thermal: core: Add a back up thermal shutdown mechanism
Date: Wed, 12 Apr 2017 12:08:06 -0500	[thread overview]
Message-ID: <798128ac-1d0b-7eb8-2ea3-8bc0bd0b9d6f@ti.com> (raw)
In-Reply-To: <b0830fa9-9a3b-6862-6cab-274c17810227@ti.com>



On 04/12/2017 11:44 AM, Keerthy wrote:
> 
> 
> On Wednesday 12 April 2017 10:01 PM, Grygorii Strashko wrote:
>>
>>
>> On 04/12/2017 10:44 AM, Eduardo Valentin wrote:
>>> Hello,
>>>
>> ...
>>
>>>
>>> I agree. But there it nothing that says it is not reenterable. If you
>>> saw something in this line, can you please share?
>>>
>>>>>> will you generate a patch to do this?
>>>>> Sure. I will generate a patch to take care of 1) To make sure that
>>>>> orderly_poweroff is called only once right away. I have already
>>>>> tested.
>>>>>
>>>>> for 2) Cancel all the scheduled work queues to monitor the
>>>>> temperature.
>>>>> I will take some more time to make it and test.
>>>>>
>>>>> Is that okay? Or you want me to send both together?
>>>>>
>>>> I think you can send patch for step 1 first.
>>>
>>> I am happy to see that Keerthy found the problem with his setup and a
>>> possible solution. But I have a few concerns here.
>>>
>>> 1. If regular shutdown process takes 10seconds, that is a ballpark that
>>> thermal should never wait. orderly_poweroff() calls run_cmd() with wait
>>> flag set. That means, if regular userland shutdown takes 10s, we are
>>> waiting for it. Obviously this not acceptable. Specially if you setup
>>> critical trip to be 125C. Now, if you properly size the critical trip to
>>> fire before hotspot really reach 125C, for 10s (or the time it takes to
>>> shutdown), then fine. But based on what was described in this thread,
>>> his system is waiting 10s on regular shutdown, and his silicon is on
>>> out-of-spec temperature for 10s, which is wrong.
>>>
>>> 2. The above scenario is not acceptable in a long run, specially from a
>>> reliability perspective. If orderly_poweroff() has a possibility to
>>> simply never return (or take too long), I would say the thermal
>>> subsystem is using the wrong API.

^ this question just repeat everything which was already discussed in
previous versions of this patch - orderly_poweroff() is not good for critical shutdown/poweroff,
but what to use instead?


>>>
>>
>>
>> Hh, I do not see that orderly_poweroff() will wait for anything now:
>> void orderly_poweroff(bool force)
>> {
>> 	if (force) /* do not override the pending "true" */
>> 		poweroff_force = true;
>> 	schedule_work(&poweroff_work); 
>> ^^^^^^^ async call. even here can be pretty big delay if system is under pressure
>> }
>>
>>
>> static int __orderly_poweroff(bool force)
>> {
>> 	int ret;
>>
>> 	ret = run_cmd(poweroff_cmd);
> 
> When i tried with multiple orderly_poweroff calls ret was always 0.
> So every 250mS i see this ret = 0.
> 
>> ^^^^ no wait for the process - only for exec. flags == UMH_WAIT_EXEC
>>
>> 	if (ret && force) {
> 
> So it never entered this path. ret = 0 so if is not executed.

correct, because exec can find poweroff tool and start it, so you,
most probably, have bunch of this tool instance running in parallel (some of them can fail or block)
Issue 1 - you've sent fix for is actual :).

Again, thermal has no control of power off process once  run_cmd() is returned,
and it do not know what US poweroff binary is doing and how much time can it take
(which include disks maintenance - loooong delay).

> 
>> 		pr_warn("Failed to start orderly shutdown: forcing the issue\n");
>>
>> 		/*
>> 		 * I guess this should try to kick off some daemon to sync and
>> 		 * poweroff asap.  Or not even bother syncing if we're doing an
>> 		 * emergency shutdown?
>> 		 */
>> 		emergency_sync();
>> 		kernel_power_off();
>> ^^^ force power off, but only if run_cmd() failed - for example /sbin/poweroff doesn't exist
>> 	}
>>
>> 	return ret;
>> }
>>
>> static bool poweroff_force;
>>
>> static void poweroff_work_func(struct work_struct *work)
>> {
>> 	__orderly_poweroff(poweroff_force);
>> }
>>
>> As result thermal has no control of power off any more after calling orderly_poweroff() and can get the result
>> of US poweroff binary execution.
>>
>>>
>>> If you are going to implement the above two patches, keep in mind:
>>> i. At least within the thermal subsystem, you need to take care of all
>>> zones that could trigger a shutdown.
>>> ii. serializing the calls to orderly_poweroff() seams to be more
>>> concerning than cancelling all monitoring.
>>>
>>>
>>

-- 
regards,
-grygorii

  parent reply	other threads:[~2017-04-12 17:08 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-31  6:30 [PATCH] thermal: core: Add a back up thermal shutdown mechanism Keerthy
2017-04-11 17:29 ` Eduardo Valentin
2017-04-12  2:49   ` Keerthy
2017-04-12  3:20     ` Zhang Rui
2017-04-12  3:39       ` Keerthy
2017-04-12  4:05         ` Eduardo Valentin
2017-04-12  4:18           ` Keerthy
2017-04-12  7:55           ` Keerthy
2017-04-12  8:26             ` Zhang Rui
2017-04-12  8:36               ` Keerthy
2017-04-12  8:45                 ` Zhang Rui
2017-04-12 15:44                   ` Eduardo Valentin
2017-04-12 16:16                     ` Keerthy
2017-04-12 16:50                       ` Eduardo Valentin
2017-04-12 16:31                     ` Grygorii Strashko
2017-04-12 16:34                       ` Eduardo Valentin
2017-04-12 16:44                       ` Keerthy
2017-04-12 16:54                         ` Eduardo Valentin
2017-04-12 17:07                           ` Keerthy
2017-04-12 17:08                         ` Grygorii Strashko [this message]
2017-04-12 17:11                           ` Keerthy
2017-04-12 17:24                             ` Eduardo Valentin
2017-04-12 18:43                               ` Tero Kristo
2017-04-13  3:50                                 ` Keerthy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=798128ac-1d0b-7eb8-2ea3-8bc0bd0b9d6f@ti.com \
    --to=grygorii.strashko@ti.com \
    --cc=edubezval@gmail.com \
    --cc=j-keerthy@ti.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-omap@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=nm@ti.com \
    --cc=rui.zhang@intel.com \
    --cc=t-kristo@ti.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).