All of lore.kernel.org
 help / color / mirror / Atom feed
From: Juha-Matti Tilli <juha-matti.tilli-X3B1VOXEql0@public.gmane.org>
To: "edubezval-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org"
	<edubezval-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Mikko Perttunen
	<mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
Cc: Zhang Rui <rui.zhang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	Stephen Warren <swarren-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>,
	Thierry Reding
	<thierry.reding-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	"linux-pm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-pm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	LKML <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org"
	<linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org>,
	Juha-Matti Tilli <jtilli-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
Subject: Re: [PATCH v4 4/4] thermal: Add Tegra SOCTHERM thermal management driver
Date: Tue, 19 Aug 2014 21:25:33 +0300	[thread overview]
Message-ID: <20140819182533.GA51539@sandfort.unics.fi> (raw)
In-Reply-To: <CAC-25o-bLeDYpTgPy5UOvBGJ7k7HJkBM3YcCRPSZHBkyxugULA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

> > Originally, I thought the fuse offsets were incorrect but as it turns
> > out, the official Linux kernel starts counting the fuses at a different
> > location than NVIDIA's internal kernel.
> 
> This is a major concern. Juha-Matti and Mikko, can you please cross
> check if this driver is accessing to the correct fuse register
> location?

Mikko told me that the official Linux kernel and NVIDIA's internal
kernel have an offset of 0x100 between fuse registers. I believe it is
accessing correct fuse register locations, as I tested the code and it
works. But, if I port the code to a codebase that is using NVIDIA's
internal kernel functions, it fails (all fuses read 0x0) unless I offset
all fuses by 0x100. All fuse register locations in this code indeed have
the same offset of 0x100 between NVIDIA's internal docs & code and the
new driver code.

I'm not entirely sure why the official Linux kernel and NVIDIA's
internal code starts counting the fuses at a different location, but the
fuses read do have data in them and the temperature readings look sane,
so I assume they are indeed the correct fuses. The calculated
calibration values also seem to be similar to (but not identical with)
the ones calculated by NVIDIA's internal driver, but more on this later.

drivers/soc/tegra/fuse/fuse-tegra30.c in upstream has "#define
FUSE_BEGIN 0x100" and the read is "readl_relaxed(fuse_base + FUSE_BEGIN
+ offset);" so there's the explanation for the 0x100 offset.

I believe Mikko's explanation about the 0x100 offset, so the code in my
opinion seems correct as the offset of 0x100 is consistent, and the code
does in fact work. Reading incorrect fuses would mean the fuses read 0x0
and the temperatures are so strange that they are obviously incorrect.

Mikko said he did the same mistake earlier and fixed it. Well, I too did
the same mistake. I guess everyone working with fuses on both NVIDIA's
internal code and the upstream kernel will make the same mistake once
but not again.

> > The temperature readings look sane and do vary with load, but I think
> > they're a bit different than what NVIDIA's internal soc_therm driver
> 
> hmm..  Based on a single sample?

Yes, the comparison between NVIDIA's internal driver and this driver was
based only on a single sample. I don't now recall what the difference
was, but I should probably test it multiple times tomorrow so I can
report how large the difference is. Might be within the error margin, as
the temperatures are not stable to a 0.5 degrees C precision. A problem
here is that my test was basically loading code adopted from this new
driver dynamically on a system already using the NVIDIA's existing
driver as a static driver, so a new sample requires a reboot and there's
always a time delay between the commands used to manually read the
temperature, load the dynamic module and re-read the temperature. I
could tomorrow write a userspace test program to overwrite the registers
with the old and new values and read the temperature sensor register
immediately before and after the value switch to see how much the
readings differ. That way I could get multiple accurate data points with
no reboots required.

The test that the temperatures do vary with load was based on multiple
samples of running four shell infinite loops in the background. I can
easily reach 80 degrees C with that on my development board with passive
cooling. Killing the four loops results in the temperature slowly
trickling to the idle temperature. The idle temp seems to be a bit over
30 degrees C. My Jetson TK1 has active cooling, so the peak temperatures
are a lot lower than for the other development board. This loads only
the CPU, though. Loading additionally the GPU would cause higher
temperatures.

Mikko already noticed that NVIDIA's internal driver uses
div64_s64_precise and Mikko opted for a simpler implementation, so that
might be indeed the cause for the different calibration values. I'll
need to investigate this more. Yet another difference could be that
Mikko posted a link to an older version of the downstream kernel sources
(https://developer.nvidia.com/sites/default/files/akamai/mobile/files/L4T/kernel_src.tbz2)
-- if Mikko has based his code on that old code, it's possible that the
calibration value calculation has been improved afterwards. We'll know
for sure after I have had time to investigate this more.

WARNING: multiple messages have this Message-ID (diff)
From: Juha-Matti Tilli <juha-matti.tilli@iki.fi>
To: "edubezval@gmail.com" <edubezval@gmail.com>,
	Mikko Perttunen <mperttunen@nvidia.com>
Cc: Zhang Rui <rui.zhang@intel.com>,
	Stephen Warren <swarren@wwwdotorg.org>,
	Thierry Reding <thierry.reding@gmail.com>,
	"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>,
	"linux-tegra@vger.kernel.org" <linux-tegra@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>,
	Juha-Matti Tilli <jtilli@nvidia.com>
Subject: Re: [PATCH v4 4/4] thermal: Add Tegra SOCTHERM thermal management driver
Date: Tue, 19 Aug 2014 21:25:33 +0300	[thread overview]
Message-ID: <20140819182533.GA51539@sandfort.unics.fi> (raw)
In-Reply-To: <CAC-25o-bLeDYpTgPy5UOvBGJ7k7HJkBM3YcCRPSZHBkyxugULA@mail.gmail.com>

> > Originally, I thought the fuse offsets were incorrect but as it turns
> > out, the official Linux kernel starts counting the fuses at a different
> > location than NVIDIA's internal kernel.
> 
> This is a major concern. Juha-Matti and Mikko, can you please cross
> check if this driver is accessing to the correct fuse register
> location?

Mikko told me that the official Linux kernel and NVIDIA's internal
kernel have an offset of 0x100 between fuse registers. I believe it is
accessing correct fuse register locations, as I tested the code and it
works. But, if I port the code to a codebase that is using NVIDIA's
internal kernel functions, it fails (all fuses read 0x0) unless I offset
all fuses by 0x100. All fuse register locations in this code indeed have
the same offset of 0x100 between NVIDIA's internal docs & code and the
new driver code.

I'm not entirely sure why the official Linux kernel and NVIDIA's
internal code starts counting the fuses at a different location, but the
fuses read do have data in them and the temperature readings look sane,
so I assume they are indeed the correct fuses. The calculated
calibration values also seem to be similar to (but not identical with)
the ones calculated by NVIDIA's internal driver, but more on this later.

drivers/soc/tegra/fuse/fuse-tegra30.c in upstream has "#define
FUSE_BEGIN 0x100" and the read is "readl_relaxed(fuse_base + FUSE_BEGIN
+ offset);" so there's the explanation for the 0x100 offset.

I believe Mikko's explanation about the 0x100 offset, so the code in my
opinion seems correct as the offset of 0x100 is consistent, and the code
does in fact work. Reading incorrect fuses would mean the fuses read 0x0
and the temperatures are so strange that they are obviously incorrect.

Mikko said he did the same mistake earlier and fixed it. Well, I too did
the same mistake. I guess everyone working with fuses on both NVIDIA's
internal code and the upstream kernel will make the same mistake once
but not again.

> > The temperature readings look sane and do vary with load, but I think
> > they're a bit different than what NVIDIA's internal soc_therm driver
> 
> hmm..  Based on a single sample?

Yes, the comparison between NVIDIA's internal driver and this driver was
based only on a single sample. I don't now recall what the difference
was, but I should probably test it multiple times tomorrow so I can
report how large the difference is. Might be within the error margin, as
the temperatures are not stable to a 0.5 degrees C precision. A problem
here is that my test was basically loading code adopted from this new
driver dynamically on a system already using the NVIDIA's existing
driver as a static driver, so a new sample requires a reboot and there's
always a time delay between the commands used to manually read the
temperature, load the dynamic module and re-read the temperature. I
could tomorrow write a userspace test program to overwrite the registers
with the old and new values and read the temperature sensor register
immediately before and after the value switch to see how much the
readings differ. That way I could get multiple accurate data points with
no reboots required.

The test that the temperatures do vary with load was based on multiple
samples of running four shell infinite loops in the background. I can
easily reach 80 degrees C with that on my development board with passive
cooling. Killing the four loops results in the temperature slowly
trickling to the idle temperature. The idle temp seems to be a bit over
30 degrees C. My Jetson TK1 has active cooling, so the peak temperatures
are a lot lower than for the other development board. This loads only
the CPU, though. Loading additionally the GPU would cause higher
temperatures.

Mikko already noticed that NVIDIA's internal driver uses
div64_s64_precise and Mikko opted for a simpler implementation, so that
might be indeed the cause for the different calibration values. I'll
need to investigate this more. Yet another difference could be that
Mikko posted a link to an older version of the downstream kernel sources
(https://developer.nvidia.com/sites/default/files/akamai/mobile/files/L4T/kernel_src.tbz2)
-- if Mikko has based his code on that old code, it's possible that the
calibration value calculation has been improved afterwards. We'll know
for sure after I have had time to investigate this more.

WARNING: multiple messages have this Message-ID (diff)
From: juha-matti.tilli@iki.fi (Juha-Matti Tilli)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v4 4/4] thermal: Add Tegra SOCTHERM thermal management driver
Date: Tue, 19 Aug 2014 21:25:33 +0300	[thread overview]
Message-ID: <20140819182533.GA51539@sandfort.unics.fi> (raw)
In-Reply-To: <CAC-25o-bLeDYpTgPy5UOvBGJ7k7HJkBM3YcCRPSZHBkyxugULA@mail.gmail.com>

> > Originally, I thought the fuse offsets were incorrect but as it turns
> > out, the official Linux kernel starts counting the fuses at a different
> > location than NVIDIA's internal kernel.
> 
> This is a major concern. Juha-Matti and Mikko, can you please cross
> check if this driver is accessing to the correct fuse register
> location?

Mikko told me that the official Linux kernel and NVIDIA's internal
kernel have an offset of 0x100 between fuse registers. I believe it is
accessing correct fuse register locations, as I tested the code and it
works. But, if I port the code to a codebase that is using NVIDIA's
internal kernel functions, it fails (all fuses read 0x0) unless I offset
all fuses by 0x100. All fuse register locations in this code indeed have
the same offset of 0x100 between NVIDIA's internal docs & code and the
new driver code.

I'm not entirely sure why the official Linux kernel and NVIDIA's
internal code starts counting the fuses at a different location, but the
fuses read do have data in them and the temperature readings look sane,
so I assume they are indeed the correct fuses. The calculated
calibration values also seem to be similar to (but not identical with)
the ones calculated by NVIDIA's internal driver, but more on this later.

drivers/soc/tegra/fuse/fuse-tegra30.c in upstream has "#define
FUSE_BEGIN 0x100" and the read is "readl_relaxed(fuse_base + FUSE_BEGIN
+ offset);" so there's the explanation for the 0x100 offset.

I believe Mikko's explanation about the 0x100 offset, so the code in my
opinion seems correct as the offset of 0x100 is consistent, and the code
does in fact work. Reading incorrect fuses would mean the fuses read 0x0
and the temperatures are so strange that they are obviously incorrect.

Mikko said he did the same mistake earlier and fixed it. Well, I too did
the same mistake. I guess everyone working with fuses on both NVIDIA's
internal code and the upstream kernel will make the same mistake once
but not again.

> > The temperature readings look sane and do vary with load, but I think
> > they're a bit different than what NVIDIA's internal soc_therm driver
> 
> hmm..  Based on a single sample?

Yes, the comparison between NVIDIA's internal driver and this driver was
based only on a single sample. I don't now recall what the difference
was, but I should probably test it multiple times tomorrow so I can
report how large the difference is. Might be within the error margin, as
the temperatures are not stable to a 0.5 degrees C precision. A problem
here is that my test was basically loading code adopted from this new
driver dynamically on a system already using the NVIDIA's existing
driver as a static driver, so a new sample requires a reboot and there's
always a time delay between the commands used to manually read the
temperature, load the dynamic module and re-read the temperature. I
could tomorrow write a userspace test program to overwrite the registers
with the old and new values and read the temperature sensor register
immediately before and after the value switch to see how much the
readings differ. That way I could get multiple accurate data points with
no reboots required.

The test that the temperatures do vary with load was based on multiple
samples of running four shell infinite loops in the background. I can
easily reach 80 degrees C with that on my development board with passive
cooling. Killing the four loops results in the temperature slowly
trickling to the idle temperature. The idle temp seems to be a bit over
30 degrees C. My Jetson TK1 has active cooling, so the peak temperatures
are a lot lower than for the other development board. This loads only
the CPU, though. Loading additionally the GPU would cause higher
temperatures.

Mikko already noticed that NVIDIA's internal driver uses
div64_s64_precise and Mikko opted for a simpler implementation, so that
might be indeed the cause for the different calibration values. I'll
need to investigate this more. Yet another difference could be that
Mikko posted a link to an older version of the downstream kernel sources
(https://developer.nvidia.com/sites/default/files/akamai/mobile/files/L4T/kernel_src.tbz2)
-- if Mikko has based his code on that old code, it's possible that the
calibration value calculation has been improved afterwards. We'll know
for sure after I have had time to investigate this more.

  parent reply	other threads:[~2014-08-19 18:25 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-06 10:25 [PATCH v3 0/4] Tegra124 soctherm driver Mikko Perttunen
2014-08-06 10:25 ` Mikko Perttunen
2014-08-06 10:25 ` Mikko Perttunen
2014-08-06 10:25 ` [PATCH v3 1/4] of: Add bindings for nvidia,tegra124-soctherm Mikko Perttunen
2014-08-06 10:25   ` Mikko Perttunen
2014-08-06 10:25   ` Mikko Perttunen
     [not found]   ` <1407320706-17440-2-git-send-email-mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2014-08-20 19:37     ` Stephen Warren
2014-08-20 19:37       ` Stephen Warren
2014-08-20 19:37       ` Stephen Warren
2014-08-06 10:25 ` [PATCH v3 2/4] ARM: tegra: Add soctherm and thermal zones to Tegra124 device tree Mikko Perttunen
2014-08-06 10:25   ` Mikko Perttunen
2014-08-06 10:25   ` Mikko Perttunen
     [not found]   ` <1407320706-17440-3-git-send-email-mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2014-08-20 19:42     ` Stephen Warren
2014-08-20 19:42       ` Stephen Warren
2014-08-20 19:42       ` Stephen Warren
     [not found] ` <1407320706-17440-1-git-send-email-mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2014-08-06 10:25   ` [PATCH v3 3/4] ARM: tegra: Add thermal trip points for Jetson TK1 Mikko Perttunen
2014-08-06 10:25     ` Mikko Perttunen
2014-08-06 10:25     ` Mikko Perttunen
2014-08-20 19:44     ` Stephen Warren
2014-08-20 19:44       ` Stephen Warren
2014-08-21  7:37       ` Mikko Perttunen
2014-08-21  7:37         ` Mikko Perttunen
2014-08-21  7:37         ` Mikko Perttunen
2014-08-21 16:04       ` Eduardo Valentin
2014-08-21 16:04         ` Eduardo Valentin
2014-08-21 16:35         ` Stephen Warren
2014-08-21 16:35           ` Stephen Warren
2014-08-06 10:25 ` [PATCH v3 4/4] thermal: Add Tegra SOCTHERM thermal management driver Mikko Perttunen
2014-08-06 10:25   ` Mikko Perttunen
2014-08-06 10:25   ` Mikko Perttunen
2014-08-11 13:32   ` Eduardo Valentin
2014-08-11 13:32     ` Eduardo Valentin
2014-08-12  9:51     ` Mikko Perttunen
2014-08-12  9:51       ` Mikko Perttunen
2014-08-12  9:51       ` Mikko Perttunen
     [not found]   ` <1407320706-17440-5-git-send-email-mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2014-08-15 11:00     ` [PATCH v4 " Mikko Perttunen
2014-08-15 11:00       ` Mikko Perttunen
2014-08-15 11:00       ` Mikko Perttunen
2014-08-19 14:09       ` Juha-Matti Tilli
2014-08-19 14:09         ` Juha-Matti Tilli
2014-08-19 14:09         ` Juha-Matti Tilli
2014-08-19 14:31         ` Mikko Perttunen
2014-08-19 14:31           ` Mikko Perttunen
2014-08-19 14:31           ` Mikko Perttunen
     [not found]         ` <20140819140955.GA50475-Xv4IVSKz7SWQi9Q/X01l6/UpdFzICT1y@public.gmane.org>
2014-08-19 14:33           ` edubezval-Re5JQEeQqe8AvxtiuMwx3w
2014-08-19 14:33             ` edubezval at gmail.com
2014-08-19 14:33             ` edubezval
     [not found]             ` <CAC-25o-bLeDYpTgPy5UOvBGJ7k7HJkBM3YcCRPSZHBkyxugULA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-08-19 15:27               ` Mikko Perttunen
2014-08-19 15:27                 ` Mikko Perttunen
2014-08-19 15:27                 ` Mikko Perttunen
2014-08-19 18:25                 ` Juha-Matti Tilli
2014-08-19 18:25                   ` Juha-Matti Tilli
2014-08-19 18:25                   ` Juha-Matti Tilli
2014-08-19 18:25               ` Juha-Matti Tilli [this message]
2014-08-19 18:25                 ` Juha-Matti Tilli
2014-08-19 18:25                 ` Juha-Matti Tilli
2014-08-20 12:05             ` Juha-Matti Tilli
2014-08-20 12:05               ` Juha-Matti Tilli
2014-08-20 12:05               ` Juha-Matti Tilli
2014-08-20 19:50   ` [PATCH v3 " Stephen Warren
2014-08-20 19:50     ` Stephen Warren
     [not found]     ` <53F4FC18.90804-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org>
2014-08-21  7:39       ` Mikko Perttunen
2014-08-21  7:39         ` Mikko Perttunen
2014-08-21  7:39         ` Mikko Perttunen
2014-08-21 16:08     ` Eduardo Valentin
2014-08-21 16:08       ` Eduardo Valentin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140819182533.GA51539@sandfort.unics.fi \
    --to=juha-matti.tilli-x3b1voxeql0@public.gmane.org \
    --cc=edubezval-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=jtilli-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org \
    --cc=linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-pm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org \
    --cc=rui.zhang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=swarren-3lzwWm7+Weoh9ZMKESR00Q@public.gmane.org \
    --cc=thierry.reding-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.