All of lore.kernel.org
 help / color / mirror / Atom feed
* CPU hyperthreading turned on after soft power-cycle
       [not found] <20111030110543.5872.61279.reportbug@supermicro.uochb.cas.cz>
@ 2011-10-30 15:25 ` Ben Hutchings
  2011-10-31 13:06   ` Clarinet
  0 siblings, 1 reply; 16+ messages in thread
From: Ben Hutchings @ 2011-10-30 15:25 UTC (permalink / raw)
  To: Jiri Polach, 647095; +Cc: LKML

[-- Attachment #1: Type: text/plain, Size: 4259 bytes --]

On Sun, 2011-10-30 at 07:05 -0400, Jiri Polach wrote:
> Package: linux-2.6
> Version: 2.6.39-3~bpo60+1
> Severity: normal
> 
> 
> When the computer is turned off using "shutdown -h" or "halt" command,
> the hypertherading BIOS setting is changed - even if hypertherading is
> disabled in BIOS, the kernel detects twice as many "processors" on
> next boot as if hyperthreading was enabled. Please see details below.
> 
> I have observed the problem on several Supermicro platforms with
> various Intel Xeon processors. The particular case I report was
> observed on Supermicro X8DTT-F mainboard with two Intel Xeon E5645
> processors (6core). The problem can be reproduced the following way:

By my understanding of how hyperthreading is controlled, this has to be
a BIOS bug, as you seem to have suspected.  But if the BIOS behaviour is
kernel version-dependent, then presumably there is something the kernel
can do to work around it.

> 1. Turn on the computer, go to BIOS setup and turn "Simultaneous
> multithreading" to "Disabled". Boot Debian.
> 
> 2. Check with "cat /proc/cpuinfo" that the system reports 12 CPUs (2 x
> six-core processor).
> 
> 3. (optionally) Reboot the system (shutdown -r) and check that there
> are still 12 CPUs detected and reported.
> 
> 4. Halt the system using "shutdown -h" or "halt", turn it on again,
> and boot Debian.

I assume from this that shutdown -h is configured to turn the system
off.

> 5. Check the number of CPUs reported - it will show you that there are
> 24 CPUs as if hyperthreading was enabled.
> 
> 6. Reboot and go to BIOS setup - it still shows that "Simultaneous
> multithreading" is set to "Disabled". Do not change anythig, just
> select "Save and Exit". Boot Debian and check the number of CPUs - it
> now shows 12 CPUs again.
> 
> I have tested several kernel versions and it seems that this behavior
> appeared for the first time somewhere between 2.6.35.7 and 2.6.38.6
> versions (ok = does not show the decribed behavior, not ok = does
> show):
> 
> * linux-image-2.6.32-5-amd64 official Debian - ok
> * linux-image-2.6.39-bpo.2-amd64 official Debian from backports - not
> ok
> 
> * linux 2.6.35.7 - custom compiled from source - ok
> * linux 2.6.38.6 - custom compiled from source - not ok
> * linux 2.6.39.4 - custom compiled from source - not ok
> * linux 3.0.4 - custom compiled from source - not ok

That might be too large a range for developers to consider.  Can you
test some versions between 2.6.35.7 and 2.6.38.6 (bisection)?

Ben.

> I have exchnged many e-mails with Supermicro distributor who
> apparently is in direct contact with Supermicro technicians. They more
> or less deny any responsibility for this problem and repeatedly point
> to the fact that some (older) kernels do not exhibit this behavior so
> it must be a kernel problem. Their representative writes:
> 
> "I discussed this with supermicro and they informed me that the Kernel
> itself is causing the issue, that it may be sending the hyperthreading
> command code to the BIOS."
> 
> Although I do not completely agree with their arguments, my knowledge
> is not deep enough to recognize where exactly the core of the problem
> is so I report this as a bug in a hope that someone will know what
> happens when a kernel turns a computer off and what has changed in
> kernel somewhere between the versions I mention above. I have asked
> Supermicro distributor for more information on what they think happens
> there and what exactly they mean by "hyperhreading command code" and I
> am waiting for their response.
> 
> -- Package-specific info:
> ** Version:
> Linux version 2.6.39-bpo.2-amd64 (Debian 2.6.39-3~bpo60+1) (norbert@tretkowski.de) (gcc version 4.4.5 (Debian 4.4.5-8) ) #1 SMP Tue Jul 26 10:35:23 UTC 2011
[...]
> ** Model information
> sys_vendor: Supermicro
> product_name: X8DTT
> product_version: 1234567890
> chassis_vendor: Supermicro
> chassis_version: 1234567890
> bios_vendor: American Megatrends Inc.
> bios_version: 080016 
> board_vendor: Supermicro
> board_name: X8DTT
> board_version: 2.0       
[...]

-- 
Ben Hutchings
compatible: Gracefully accepts erroneous data from any source

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: CPU hyperthreading turned on after soft power-cycle
  2011-10-30 15:25 ` CPU hyperthreading turned on after soft power-cycle Ben Hutchings
@ 2011-10-31 13:06   ` Clarinet
  2011-11-08 12:33     ` Jiri Polach
  0 siblings, 1 reply; 16+ messages in thread
From: Clarinet @ 2011-10-31 13:06 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: 647095, LKML

On 10/30/2011 4:25 PM, Ben Hutchings wrote:
> On Sun, 2011-10-30 at 07:05 -0400, Jiri Polach wrote:
>> Package: linux-2.6
>> Version: 2.6.39-3~bpo60+1
>> Severity: normal
>>
>>
>> When the computer is turned off using "shutdown -h" or "halt" command,
>> the hypertherading BIOS setting is changed - even if hypertherading is
>> disabled in BIOS, the kernel detects twice as many "processors" on
>> next boot as if hyperthreading was enabled. Please see details below.
>>
>> I have observed the problem on several Supermicro platforms with
>> various Intel Xeon processors. The particular case I report was
>> observed on Supermicro X8DTT-F mainboard with two Intel Xeon E5645
>> processors (6core). The problem can be reproduced the following way:
>
> By my understanding of how hyperthreading is controlled, this has to be
> a BIOS bug, as you seem to have suspected.  But if the BIOS behaviour is
> kernel version-dependent, then presumably there is something the kernel
> can do to work around it.

Yes, there are reasons that support my suspicion that BIOS is not doing 
its work properly. But I cannot prove it until it is clear what has been 
changed in the kernel.

>> 1. Turn on the computer, go to BIOS setup and turn "Simultaneous
>> multithreading" to "Disabled". Boot Debian.
>>
>> 2. Check with "cat /proc/cpuinfo" that the system reports 12 CPUs (2 x
>> six-core processor).
>>
>> 3. (optionally) Reboot the system (shutdown -r) and check that there
>> are still 12 CPUs detected and reported.
>>
>> 4. Halt the system using "shutdown -h" or "halt", turn it on again,
>> and boot Debian.
>
> I assume from this that shutdown -h is configured to turn the system
> off.

I do not know. I have been using mostly "halt" to shutdown the system 
and turn the server off and I tried "shutdown -h" only several times to 
see if there is any difference. Both commands have turned the computer 
off, but I did not do any special "shutdown -h" configuration.

>> 5. Check the number of CPUs reported - it will show you that there are
>> 24 CPUs as if hyperthreading was enabled.
>>
>> 6. Reboot and go to BIOS setup - it still shows that "Simultaneous
>> multithreading" is set to "Disabled". Do not change anythig, just
>> select "Save and Exit". Boot Debian and check the number of CPUs - it
>> now shows 12 CPUs again.
>>
>> I have tested several kernel versions and it seems that this behavior
>> appeared for the first time somewhere between 2.6.35.7 and 2.6.38.6
>> versions (ok = does not show the decribed behavior, not ok = does
>> show):
>>
>> * linux-image-2.6.32-5-amd64 official Debian - ok
>> * linux-image-2.6.39-bpo.2-amd64 official Debian from backports - not
>> ok
>>
>> * linux 2.6.35.7 - custom compiled from source - ok
>> * linux 2.6.38.6 - custom compiled from source - not ok
>> * linux 2.6.39.4 - custom compiled from source - not ok
>> * linux 3.0.4 - custom compiled from source - not ok
>
> That might be too large a range for developers to consider.  Can you
> test some versions between 2.6.35.7 and 2.6.38.6 (bisection)?

OK, after another day of testing it seems that the problem appears in 
2.6.38.1, because

* linux 2.6.37.6 - custom compiled from source - ok
* linux 2.6.38.1 - custom compiled from source - not ok

Best regards,

Jiri Polach

> Ben.
>
>> I have exchnged many e-mails with Supermicro distributor who
>> apparently is in direct contact with Supermicro technicians. They more
>> or less deny any responsibility for this problem and repeatedly point
>> to the fact that some (older) kernels do not exhibit this behavior so
>> it must be a kernel problem. Their representative writes:
>>
>> "I discussed this with supermicro and they informed me that the Kernel
>> itself is causing the issue, that it may be sending the hyperthreading
>> command code to the BIOS."
>>
>> Although I do not completely agree with their arguments, my knowledge
>> is not deep enough to recognize where exactly the core of the problem
>> is so I report this as a bug in a hope that someone will know what
>> happens when a kernel turns a computer off and what has changed in
>> kernel somewhere between the versions I mention above. I have asked
>> Supermicro distributor for more information on what they think happens
>> there and what exactly they mean by "hyperhreading command code" and I
>> am waiting for their response.
>>
>> -- Package-specific info:
>> ** Version:
>> Linux version 2.6.39-bpo.2-amd64 (Debian 2.6.39-3~bpo60+1) (norbert@tretkowski.de) (gcc version 4.4.5 (Debian 4.4.5-8) ) #1 SMP Tue Jul 26 10:35:23 UTC 2011
> [...]
>> ** Model information
>> sys_vendor: Supermicro
>> product_name: X8DTT
>> product_version: 1234567890
>> chassis_vendor: Supermicro
>> chassis_version: 1234567890
>> bios_vendor: American Megatrends Inc.
>> bios_version: 080016
>> board_vendor: Supermicro
>> board_name: X8DTT
>> board_version: 2.0
> [...]


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: CPU hyperthreading turned on after soft power-cycle
  2011-10-31 13:06   ` Clarinet
@ 2011-11-08 12:33     ` Jiri Polach
  2011-11-10  1:52       ` Jonathan Nieder
  0 siblings, 1 reply; 16+ messages in thread
From: Jiri Polach @ 2011-11-08 12:33 UTC (permalink / raw)
  To: 647095; +Cc: Ben Hutchings, LKML

On 10/31/2011 2:06 PM, Clarinet wrote:
> On 10/30/2011 4:25 PM, Ben Hutchings wrote:
>> On Sun, 2011-10-30 at 07:05 -0400, Jiri Polach wrote:
>>> Package: linux-2.6
>>> Version: 2.6.39-3~bpo60+1
>>> Severity: normal
 >>
 >> ...
>>
>> That might be too large a range for developers to consider. Can you
>> test some versions between 2.6.35.7 and 2.6.38.6 (bisection)?
>
> OK, after another day of testing it seems that the problem appears in
> 2.6.38.1, because
>
> * linux 2.6.37.6 - custom compiled from source - ok
> * linux 2.6.38.1 - custom compiled from source - not ok

On Ben's advice I am trying to locate the commit that causes the problem 
to appear more precisely using 'git bisect'. However, too many of 
generated revisions are unbootable so I have to use 'bisect skip' 
frequently. I started with 4059 revisions to test (roughly 12 steps) and 
after 15 steps I still have 2902 revisions to test (rougly 12 steps).

Is there any way to speed this process up? I tried to do bisection 
manually but I do not understand git enough to be able to do this 
efficiently.

My current bisect log is below.

Jiri Polach

---

git bisect start
# good: [3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5] Linux 2.6.37
git bisect good 3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5
# bad: [521cb40b0c44418a4fd36dc633f575813d59a43d] Linux 2.6.38
git bisect bad 521cb40b0c44418a4fd36dc633f575813d59a43d
# bad: [c56eb8fb6dccb83d9fe62fd4dc00c834de9bc470] Linux 2.6.38-rc1
git bisect bad c56eb8fb6dccb83d9fe62fd4dc00c834de9bc470
# good: [c8ddb2713c624f432fa5fe3c7ecffcdda46ea0d4] Linux 2.6.37-rc1
git bisect good c8ddb2713c624f432fa5fe3c7ecffcdda46ea0d4
# good: [3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5] Linux 2.6.37
git bisect good 3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5
# skip: [ecacc6c70cf77a52a22af66c879873202522d6ce] Merge branch 
'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
git bisect skip ecacc6c70cf77a52a22af66c879873202522d6ce
# skip: [22113efd00491310da802f3b1a9a66cfcf415fac] mmc: Test bus-width 
for old MMC devices
git bisect skip 22113efd00491310da802f3b1a9a66cfcf415fac
# good: [233cbe5b94096f95ba7bca2162d63275b0b90b5b] OMAP2+: hwmod: Update 
the sysc_cache in case module context is lost
git bisect good 233cbe5b94096f95ba7bca2162d63275b0b90b5b
# skip: [443e6221e465efa8efb752a8405a759ef1161af9] Merge branch 
'for_linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86
git bisect skip 443e6221e465efa8efb752a8405a759ef1161af9
# good: [9e3be1edbe5ca57df51140b523168237b3a01f4d] Merge branch 
'for-2.6.37' into HEAD
git bisect good 9e3be1edbe5ca57df51140b523168237b3a01f4d
# good: [6c869e772c72d509d0db243a56c205ef48a29baf] Merge branch 
'perf/urgent' into perf/core
git bisect good 6c869e772c72d509d0db243a56c205ef48a29baf
# skip: [f451171c5ac829e55581c81caf2cb01e1c0a5c5f] i2c-algo-bit: 
Refactor adapter registration
git bisect skip f451171c5ac829e55581c81caf2cb01e1c0a5c5f
# good: [aa5cbf8a70f57c5360ce1bfef692b357c866ae7f] [SCSI] qla2xxx: Use 
sg_next to fetch next sg element while walking sg list.
git bisect good aa5cbf8a70f57c5360ce1bfef692b357c866ae7f
# skip: [9b3ffe523af895f6b969b971079da4c06c2743af] ARM: ns9xxx: irq_data 
conversion.
git bisect skip 9b3ffe523af895f6b969b971079da4c06c2743af
# good: [c0b33bdc5b8d9c1120dece660480d4dd86b817ee] [media] 
gspca-stv06xx: support bandwidth changing
git bisect good c0b33bdc5b8d9c1120dece660480d4dd86b817ee
# good: [d7ae30f073a179a9cebd663e7502843ddf4ba672] mac80211: document 
workqueue
git bisect good d7ae30f073a179a9cebd663e7502843ddf4ba672
# skip: [949f6711b83d2809d1ccb9d830155a65fdacdff9] Merge branch 
'staging-next' of 
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6
git bisect skip 949f6711b83d2809d1ccb9d830155a65fdacdff9
# good: [40e44399301b6dbd997408a184140b79b77f632d] omap2+: Add struct 
omap_board_data and use it for platform level serial init
git bisect good 40e44399301b6dbd997408a184140b79b77f632d
# skip: [8f9b54a35a70b604ebd2b2f2e7e04eabd0ff8a54] Decompressors: check 
for write errors in decompress_unlzo.c
git bisect skip 8f9b54a35a70b604ebd2b2f2e7e04eabd0ff8a54
# good: [190683a9d5457e6d962c232ffbecac3ab158dddd] net: net_families 
__rcu annotations
git bisect good 190683a9d5457e6d962c232ffbecac3ab158dddd
# skip: [c0afc916029c02a8650e533392893b3da6326d1e] ARM: ep93xx: irq_data 
conversion.
git bisect skip c0afc916029c02a8650e533392893b3da6326d1e
# good: [9ac4e613a88d7f6a7a9651d863e9c8f63b582718] mtd: OneNAND: 
OMAP2/3: prevent regulator sleeping while OneNAND is in use
git bisect good 9ac4e613a88d7f6a7a9651d863e9c8f63b582718
# skip: [2b6203bb7d85e6a2ca2088b8684f30be70246ddf] qeth: enable 
interface setup if LAN is offline
git bisect skip 2b6203bb7d85e6a2ca2088b8684f30be70246ddf
# good: [8ec98fe0b4ffdedce4c1caa9fb3d550f52ad1c6b] jz4740-battery: 
Protect against concurrent battery readings
git bisect good 8ec98fe0b4ffdedce4c1caa9fb3d550f52ad1c6b
# good: [dc69e2e9fcd7c613eb744ea3b9c4ee9ca554e822] ceph: associate 
requests with opening sessions
git bisect good dc69e2e9fcd7c613eb744ea3b9c4ee9ca554e822
# good: [3e2a037c1de79af999a54581cbf1e8a5c933fd95] ARM: PL08x: fix 
sparse warnings
git bisect good 3e2a037c1de79af999a54581cbf1e8a5c933fd95
# good: [0b97fee0ef9b0a0445520f90980410f905c6f9da] powerpc/mm: Avoid 
avoidable void* pointer
git bisect good 0b97fee0ef9b0a0445520f90980410f905c6f9da
# skip: [a7f5a5fcd9f13afd3471a0de8c1fdaa8f989497c] ixgbe: fix for link 
failure on SFP+ DA cables
git bisect skip a7f5a5fcd9f13afd3471a0de8c1fdaa8f989497c
# good: [ba5d1012292403c8037adf4a54c4ec50dfe846c4] xen/gntdev: stop 
using "token" argument
git bisect good ba5d1012292403c8037adf4a54c4ec50dfe846c4
# good: [b646d90053f887c1bc243191e693a9b02d09f2c2] r8169: magic.
git bisect good b646d90053f887c1bc243191e693a9b02d09f2c2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: CPU hyperthreading turned on after soft power-cycle
  2011-11-08 12:33     ` Jiri Polach
@ 2011-11-10  1:52       ` Jonathan Nieder
  2011-11-11 13:50         ` Clarinet
  0 siblings, 1 reply; 16+ messages in thread
From: Jonathan Nieder @ 2011-11-10  1:52 UTC (permalink / raw)
  To: Jiri Polach; +Cc: 647095, Ben Hutchings, LKML, x86

Hi Jiri,

Jiri Polach wrote:

> On Ben's advice I am trying to locate the commit that causes the problem to
> appear more precisely using 'git bisect'. However, too many of generated
> revisions are unbootable so I have to use 'bisect skip' frequently.

Ok, so I've looked over the log at <http://bugs.debian.org/647095>, and
this seems totally weird.  Have I described the symptoms correctly below?
(Warning: I am making some guesses, especially at step 5.  In case of
doubt, see the bug log just mentioned.)

	1. Disable SMT in the BIOS.

	2. Boot a bad kernel.  /proc/cpuinfo (correctly) shows one entry
	   per core.

	3. "shutdown -h now".  Enter BIOS.  SMT is still disabled.
	   Don't save.

	4. Boot any kernel.  /proc/cpuinfo shows two entries per core.

	5. "shutdown -h now".  Boot any kernel.  /proc/cpuinfo still shows
	   two entries per core.

	6. "shutdown -h now".  Enter BIOS.  SMT is still disabled.  Save.
	   Now /proc/cpuinfo will (correctly) shows one entry per core.

Reproducible for Jiri with v3.0.4.

Result of bisecting: v2.6.38-rc1 exhibits the problem.  v2.6.37 and
many of the topic branches merged in the 2.6.38 merge window work ok.
Some other topic branches do not boot at all.

Jiri: if you have gitk installed, then "git bisect visualize" can help
get a sense of what's in the middle of the regression range.
"gitk --bisect --first-parent v2.6.37..v2.6.38-rc1" might be a good way
to find mainline commits to test before finding a topic branch to delve
into.

x86 people: do the symptoms seem familiar?  Any hints for tracking it
down?

Thanks and hope that helps,
Jonathan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: CPU hyperthreading turned on after soft power-cycle
  2011-11-10  1:52       ` Jonathan Nieder
@ 2011-11-11 13:50         ` Clarinet
  2011-11-16 22:49           ` Clarinet
  0 siblings, 1 reply; 16+ messages in thread
From: Clarinet @ 2011-11-11 13:50 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: 647095, Ben Hutchings, LKML, x86


Hi all,

> Hi Jiri,
>
> Jiri Polach wrote:
>
>> On Ben's advice I am trying to locate the commit that causes the problem to
>> appear more precisely using 'git bisect'. However, too many of generated
>> revisions are unbootable so I have to use 'bisect skip' frequently.
>
> Ok, so I've looked over the log at<http://bugs.debian.org/647095>, and
> this seems totally weird.  Have I described the symptoms correctly below?
> (Warning: I am making some guesses, especially at step 5.  In case of
> doubt, see the bug log just mentioned.)
>
> 	1. Disable SMT in the BIOS.
>
> 	2. Boot a bad kernel.  /proc/cpuinfo (correctly) shows one entry
> 	   per core.
>
> 	3. "shutdown -h now".  Enter BIOS.  SMT is still disabled.
> 	   Don't save.
>
> 	4. Boot any kernel.  /proc/cpuinfo shows two entries per core.
>
> 	5. "shutdown -h now".  Boot any kernel.  /proc/cpuinfo still shows
> 	   two entries per core.
>
> 	6. "shutdown -h now".  Enter BIOS.  SMT is still disabled.  Save.
> 	   Now /proc/cpuinfo will (correctly) shows one entry per core.
>
> Reproducible for Jiri with v3.0.4.

Yes, this is exactly how it works. Something happens when kernel shuts 
down. Not when kernel reboots.

> Result of bisecting: v2.6.38-rc1 exhibits the problem.  v2.6.37 and
> many of the topic branches merged in the 2.6.38 merge window work ok.
> Some other topic branches do not boot at all.
>
> Jiri: if you have gitk installed, then "git bisect visualize" can help
> get a sense of what's in the middle of the regression range.
> "gitk --bisect --first-parent v2.6.37..v2.6.38-rc1" might be a good way
> to find mainline commits to test before finding a topic branch to delve
> into.

I have been able to narrow the interval manually a little bit from the 
"top" (the bad side) and I will go on from the bottom now. However, 
there seems to be a large area where kernels are unbootable for me - 
they mostly stop when init is called and I do not know why.

> x86 people: do the symptoms seem familiar?  Any hints for tracking it
> down?

Please! I have spent more than a month trying to resolve it. I cannot 
revert back to 2.6.37 kernels and I cannot live with SMT changing on 
every shutdown - I have too many servers to allow such unusual behavior ...

Thank you,

Jiri Polach

> Thanks and hope that helps,
> Jonathan


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: CPU hyperthreading turned on after soft power-cycle
  2011-11-11 13:50         ` Clarinet
@ 2011-11-16 22:49           ` Clarinet
  2011-11-17 20:32             ` John Stultz
  0 siblings, 1 reply; 16+ messages in thread
From: Clarinet @ 2011-11-16 22:49 UTC (permalink / raw)
  To: 647095; +Cc: Jonathan Nieder, Ben Hutchings, LKML, x86, john.stultz


Hi all,

>> Result of bisecting: v2.6.38-rc1 exhibits the problem. v2.6.37 and
>> many of the topic branches merged in the 2.6.38 merge window work ok.
>> Some other topic branches do not boot at all.
>>
>> Jiri: if you have gitk installed, then "git bisect visualize" can help
>> get a sense of what's in the middle of the regression range.
>> "gitk --bisect --first-parent v2.6.37..v2.6.38-rc1" might be a good way
>> to find mainline commits to test before finding a topic branch to delve
>> into.
>
> I have been able to narrow the interval manually a little bit from the
> "top" (the bad side) and I will go on from the bottom now. However,
> there seems to be a large area where kernels are unbootable for me -
> they mostly stop when init is called and I do not know why.

Finally! After another 50+ compilations a have it! It took some time as 
first I had to find a reason why some revisions did not boot (almost 2/3 
were unbootable and the first bad commit was among them). Having this 
solved I have been able to bisect without "skipping". The result is 
surprising (at least for me) - believe it or not, the first bad commit 
is 6610e089 "RTC: Rework RTC code to use timerqueue for events" from 
John Stultz (I am sending him a copy of this message).

I would never expect this would be a problem, but my understanding of 
this commit is very limited, so I am certainly missing the point. 
However, I have tried to compile 2.6.38 (which was "bad") with "Real 
Time Clock" configuration option turned off and it behaves "normally" 
then (= is "good").

Can you please comment this result? What does it mean? Any idea what is 
"wrong" there?

Best regards,

Jiri Polach


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: CPU hyperthreading turned on after soft power-cycle
  2011-11-16 22:49           ` Clarinet
@ 2011-11-17 20:32             ` John Stultz
  2011-11-17 23:42               ` Jiri Polach
  0 siblings, 1 reply; 16+ messages in thread
From: John Stultz @ 2011-11-17 20:32 UTC (permalink / raw)
  To: Clarinet; +Cc: 647095, Jonathan Nieder, Ben Hutchings, LKML, x86

On Wed, 2011-11-16 at 23:49 +0100, Clarinet wrote:
> Hi all,
> 
> >> Result of bisecting: v2.6.38-rc1 exhibits the problem. v2.6.37 and
> >> many of the topic branches merged in the 2.6.38 merge window work ok.
> >> Some other topic branches do not boot at all.
> >>
> >> Jiri: if you have gitk installed, then "git bisect visualize" can help
> >> get a sense of what's in the middle of the regression range.
> >> "gitk --bisect --first-parent v2.6.37..v2.6.38-rc1" might be a good way
> >> to find mainline commits to test before finding a topic branch to delve
> >> into.
> >
> > I have been able to narrow the interval manually a little bit from the
> > "top" (the bad side) and I will go on from the bottom now. However,
> > there seems to be a large area where kernels are unbootable for me -
> > they mostly stop when init is called and I do not know why.
> 
> Finally! After another 50+ compilations a have it! It took some time as 
> first I had to find a reason why some revisions did not boot (almost 2/3 
> were unbootable and the first bad commit was among them). Having this 
> solved I have been able to bisect without "skipping". The result is 
> surprising (at least for me) - believe it or not, the first bad commit 
> is 6610e089 "RTC: Rework RTC code to use timerqueue for events" from 
> John Stultz (I am sending him a copy of this message).
> 
> I would never expect this would be a problem, but my understanding of 
> this commit is very limited, so I am certainly missing the point. 
> However, I have tried to compile 2.6.38 (which was "bad") with "Real 
> Time Clock" configuration option turned off and it behaves "normally" 
> then (= is "good").

Huh. That's *very* odd.  Is your system doing anything in-particular
with the RTC?  I don't have a clue right off, so probably the next step
is doing a bit of instrumentation to try to figure out where exactly we
trigger the behavior. Could you checkout commit 6610e089 and apply the
patch below to see if we can't narrow it down?

Could you also send your .config to me?

thanks
-john

diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c
index 5856167..d049344 100644
--- a/drivers/rtc/rtc-cmos.c
+++ b/drivers/rtc/rtc-cmos.c
@@ -497,13 +497,13 @@ static int cmos_procfs(struct device *dev, struct seq_file *seq)
 static const struct rtc_class_ops cmos_rtc_ops = {
 	.read_time		= cmos_read_time,
 	.set_time		= cmos_set_time,
-	.read_alarm		= cmos_read_alarm,
-	.set_alarm		= cmos_set_alarm,
-	.proc			= cmos_procfs,
-	.irq_set_freq		= cmos_irq_set_freq,
-	.irq_set_state		= cmos_irq_set_state,
-	.alarm_irq_enable	= cmos_alarm_irq_enable,
-	.update_irq_enable	= cmos_update_irq_enable,
+//	.read_alarm		= cmos_read_alarm,
+//	.set_alarm		= cmos_set_alarm,
+//	.proc			= cmos_procfs,
+//	.irq_set_freq		= cmos_irq_set_freq,
+//	.irq_set_state		= cmos_irq_set_state,
+//	.alarm_irq_enable	= cmos_alarm_irq_enable,
+//	.update_irq_enable	= cmos_update_irq_enable,
 };
 
 /*----------------------------------------------------------------*/




^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: CPU hyperthreading turned on after soft power-cycle
  2011-11-17 20:32             ` John Stultz
@ 2011-11-17 23:42               ` Jiri Polach
  2011-11-17 23:53                 ` John Stultz
  0 siblings, 1 reply; 16+ messages in thread
From: Jiri Polach @ 2011-11-17 23:42 UTC (permalink / raw)
  To: John Stultz; +Cc: Clarinet, 647095, Jonathan Nieder, Ben Hutchings, LKML, x86

[-- Attachment #1: Type: text/plain, Size: 3498 bytes --]

On 11/17/2011 9:32 PM, John Stultz wrote:
> On Wed, 2011-11-16 at 23:49 +0100, Clarinet wrote:
>> Hi all,
>>
>>>> Result of bisecting: v2.6.38-rc1 exhibits the problem. v2.6.37 and
>>>> many of the topic branches merged in the 2.6.38 merge window work ok.
>>>> Some other topic branches do not boot at all.
>>>>
>>>> Jiri: if you have gitk installed, then "git bisect visualize" can help
>>>> get a sense of what's in the middle of the regression range.
>>>> "gitk --bisect --first-parent v2.6.37..v2.6.38-rc1" might be a good way
>>>> to find mainline commits to test before finding a topic branch to delve
>>>> into.
>>>
>>> I have been able to narrow the interval manually a little bit from the
>>> "top" (the bad side) and I will go on from the bottom now. However,
>>> there seems to be a large area where kernels are unbootable for me -
>>> they mostly stop when init is called and I do not know why.
>>
>> Finally! After another 50+ compilations a have it! It took some time as
>> first I had to find a reason why some revisions did not boot (almost 2/3
>> were unbootable and the first bad commit was among them). Having this
>> solved I have been able to bisect without "skipping". The result is
>> surprising (at least for me) - believe it or not, the first bad commit
>> is 6610e089 "RTC: Rework RTC code to use timerqueue for events" from
>> John Stultz (I am sending him a copy of this message).
>>
>> I would never expect this would be a problem, but my understanding of
>> this commit is very limited, so I am certainly missing the point.
>> However, I have tried to compile 2.6.38 (which was "bad") with "Real
>> Time Clock" configuration option turned off and it behaves "normally"
>> then (= is "good").
>
> Huh. That's *very* odd.  Is your system doing anything in-particular
> with the RTC?  I don't have a clue right off, so probably the next step

Yes, it is very odd. The system does not do anything special with RTC. 
It is a diskless computational workstation.

> is doing a bit of instrumentation to try to figure out where exactly we
> trigger the behavior. Could you checkout commit 6610e089 and apply the
> patch below to see if we can't narrow it down?

With the patch applied the system does not show the strange behavior (= 
is "good").

> Could you also send your .config to me?

Sure. It is attached. I have found that if I turn CONFIG_RTC_DRV_CMOS 
off, the system behaves normally (= is "good") too.

Thank you.

Jiri Polach

> thanks
> -john
>
> diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c
> index 5856167..d049344 100644
> --- a/drivers/rtc/rtc-cmos.c
> +++ b/drivers/rtc/rtc-cmos.c
> @@ -497,13 +497,13 @@ static int cmos_procfs(struct device *dev, struct seq_file *seq)
>   static const struct rtc_class_ops cmos_rtc_ops = {
>   	.read_time		= cmos_read_time,
>   	.set_time		= cmos_set_time,
> -	.read_alarm		= cmos_read_alarm,
> -	.set_alarm		= cmos_set_alarm,
> -	.proc			= cmos_procfs,
> -	.irq_set_freq		= cmos_irq_set_freq,
> -	.irq_set_state		= cmos_irq_set_state,
> -	.alarm_irq_enable	= cmos_alarm_irq_enable,
> -	.update_irq_enable	= cmos_update_irq_enable,
> +//	.read_alarm		= cmos_read_alarm,
> +//	.set_alarm		= cmos_set_alarm,
> +//	.proc			= cmos_procfs,
> +//	.irq_set_freq		= cmos_irq_set_freq,
> +//	.irq_set_state		= cmos_irq_set_state,
> +//	.alarm_irq_enable	= cmos_alarm_irq_enable,
> +//	.update_irq_enable	= cmos_update_irq_enable,
>   };
>
>   /*----------------------------------------------------------------*/



[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 13021 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: CPU hyperthreading turned on after soft power-cycle
  2011-11-17 23:42               ` Jiri Polach
@ 2011-11-17 23:53                 ` John Stultz
  2011-11-21 13:27                   ` Jiri Polach
  0 siblings, 1 reply; 16+ messages in thread
From: John Stultz @ 2011-11-17 23:53 UTC (permalink / raw)
  To: Jiri Polach; +Cc: 647095, Jonathan Nieder, Ben Hutchings, LKML, x86

On Fri, 2011-11-18 at 00:42 +0100, Jiri Polach wrote:
> On 11/17/2011 9:32 PM, John Stultz wrote:
> > On Wed, 2011-11-16 at 23:49 +0100, Clarinet wrote:
> >> Hi all,
> >>
> >>>> Result of bisecting: v2.6.38-rc1 exhibits the problem. v2.6.37 and
> >>>> many of the topic branches merged in the 2.6.38 merge window work ok.
> >>>> Some other topic branches do not boot at all.
> >>>>
> >>>> Jiri: if you have gitk installed, then "git bisect visualize" can help
> >>>> get a sense of what's in the middle of the regression range.
> >>>> "gitk --bisect --first-parent v2.6.37..v2.6.38-rc1" might be a good way
> >>>> to find mainline commits to test before finding a topic branch to delve
> >>>> into.
> >>>
> >>> I have been able to narrow the interval manually a little bit from the
> >>> "top" (the bad side) and I will go on from the bottom now. However,
> >>> there seems to be a large area where kernels are unbootable for me -
> >>> they mostly stop when init is called and I do not know why.
> >>
> >> Finally! After another 50+ compilations a have it! It took some time as
> >> first I had to find a reason why some revisions did not boot (almost 2/3
> >> were unbootable and the first bad commit was among them). Having this
> >> solved I have been able to bisect without "skipping". The result is
> >> surprising (at least for me) - believe it or not, the first bad commit
> >> is 6610e089 "RTC: Rework RTC code to use timerqueue for events" from
> >> John Stultz (I am sending him a copy of this message).
> >>
> >> I would never expect this would be a problem, but my understanding of
> >> this commit is very limited, so I am certainly missing the point.
> >> However, I have tried to compile 2.6.38 (which was "bad") with "Real
> >> Time Clock" configuration option turned off and it behaves "normally"
> >> then (= is "good").
> >
> > Huh. That's *very* odd.  Is your system doing anything in-particular
> > with the RTC?  I don't have a clue right off, so probably the next step
> 
> Yes, it is very odd. The system does not do anything special with RTC. 
> It is a diskless computational workstation.
> 
> > is doing a bit of instrumentation to try to figure out where exactly we
> > trigger the behavior. Could you checkout commit 6610e089 and apply the
> > patch below to see if we can't narrow it down?
> 
> With the patch applied the system does not show the strange behavior (= 
> is "good").
> 
> > Could you also send your .config to me?
> 
> Sure. It is attached. I have found that if I turn CONFIG_RTC_DRV_CMOS 
> off, the system behaves normally (= is "good") too.

Yea. My rough guess is that the BIOS is somehow sensitive to how the
CMOS RTC is touched.

Does disabling CONFIG_HPET_EMULATE_RTC change the behavior?

thanks
-john





^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: CPU hyperthreading turned on after soft power-cycle
  2011-11-17 23:53                 ` John Stultz
@ 2011-11-21 13:27                   ` Jiri Polach
  2011-11-21 20:02                     ` John Stultz
  0 siblings, 1 reply; 16+ messages in thread
From: Jiri Polach @ 2011-11-21 13:27 UTC (permalink / raw)
  To: John Stultz
  Cc: Jiri Polach, 647095, Jonathan Nieder, Ben Hutchings, LKML, x86


>>>> Finally! After another 50+ compilations a have it! It took some time as
>>>> first I had to find a reason why some revisions did not boot (almost 2/3
>>>> were unbootable and the first bad commit was among them). Having this
>>>> solved I have been able to bisect without "skipping". The result is
>>>> surprising (at least for me) - believe it or not, the first bad commit
>>>> is 6610e089 "RTC: Rework RTC code to use timerqueue for events" from
>>>> John Stultz (I am sending him a copy of this message).
>>>>
>>>> I would never expect this would be a problem, but my understanding of
>>>> this commit is very limited, so I am certainly missing the point.
>>>> However, I have tried to compile 2.6.38 (which was "bad") with "Real
>>>> Time Clock" configuration option turned off and it behaves "normally"
>>>> then (= is "good").
>>>
>>> Huh. That's *very* odd.  Is your system doing anything in-particular
>>> with the RTC?  I don't have a clue right off, so probably the next step
>>
>> Yes, it is very odd. The system does not do anything special with RTC.
>> It is a diskless computational workstation.
>>
>>> is doing a bit of instrumentation to try to figure out where exactly we
>>> trigger the behavior. Could you checkout commit 6610e089 and apply the
>>> patch below to see if we can't narrow it down?
>>
>> With the patch applied the system does not show the strange behavior (=
>> is "good").
>>
>>> Could you also send your .config to me?
>>
>> Sure. It is attached. I have found that if I turn CONFIG_RTC_DRV_CMOS
>> off, the system behaves normally (= is "good") too.
>
> Yea. My rough guess is that the BIOS is somehow sensitive to how the
> CMOS RTC is touched.
>
> Does disabling CONFIG_HPET_EMULATE_RTC change the behavior?

But how do I do it? :-)

I have not found a way to disable it in "menuconfig". If I comment it 
out manually in .config, it is automatically set back to "y" as soon as 
compilation starts ...

Thanks,

Jiri



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: CPU hyperthreading turned on after soft power-cycle
  2011-11-21 13:27                   ` Jiri Polach
@ 2011-11-21 20:02                     ` John Stultz
  2011-11-21 21:31                       ` Jiri Polach
  0 siblings, 1 reply; 16+ messages in thread
From: John Stultz @ 2011-11-21 20:02 UTC (permalink / raw)
  To: Jiri Polach
  Cc: Jiri Polach, 647095, Jonathan Nieder, Ben Hutchings, LKML, x86

On Mon, 2011-11-21 at 14:27 +0100, Jiri Polach wrote:
> >>>> Finally! After another 50+ compilations a have it! It took some time as
> >>>> first I had to find a reason why some revisions did not boot (almost 2/3
> >>>> were unbootable and the first bad commit was among them). Having this
> >>>> solved I have been able to bisect without "skipping". The result is
> >>>> surprising (at least for me) - believe it or not, the first bad commit
> >>>> is 6610e089 "RTC: Rework RTC code to use timerqueue for events" from
> >>>> John Stultz (I am sending him a copy of this message).
> >>>>
> >>>> I would never expect this would be a problem, but my understanding of
> >>>> this commit is very limited, so I am certainly missing the point.
> >>>> However, I have tried to compile 2.6.38 (which was "bad") with "Real
> >>>> Time Clock" configuration option turned off and it behaves "normally"
> >>>> then (= is "good").
> >>>
> >>> Huh. That's *very* odd.  Is your system doing anything in-particular
> >>> with the RTC?  I don't have a clue right off, so probably the next step
> >>
> >> Yes, it is very odd. The system does not do anything special with RTC.
> >> It is a diskless computational workstation.
> >>
> >>> is doing a bit of instrumentation to try to figure out where exactly we
> >>> trigger the behavior. Could you checkout commit 6610e089 and apply the
> >>> patch below to see if we can't narrow it down?
> >>
> >> With the patch applied the system does not show the strange behavior (=
> >> is "good").
> >>
> >>> Could you also send your .config to me?
> >>
> >> Sure. It is attached. I have found that if I turn CONFIG_RTC_DRV_CMOS
> >> off, the system behaves normally (= is "good") too.
> >
> > Yea. My rough guess is that the BIOS is somehow sensitive to how the
> > CMOS RTC is touched.
> >
> > Does disabling CONFIG_HPET_EMULATE_RTC change the behavior?
> 
> But how do I do it? :-)
> 
> I have not found a way to disable it in "menuconfig". If I comment it 
> out manually in .config, it is automatically set back to "y" as soon as 
> compilation starts ...

Good point. I forgot on x86_64 you can't disable HPET_TIMER.

Could you then use the following patch (and run make oldconfig before
building).

thanks
-john


diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index cb9a104..77b5273 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -640,7 +640,7 @@ config HPET_TIMER
 	  Choose N to continue using the legacy 8254 timer.
 
 config HPET_EMULATE_RTC
-	def_bool y
+	def_bool n
 	depends on HPET_TIMER && (RTC=y || RTC=m || RTC_DRV_CMOS=m || RTC_DRV_CMOS=y)
 
 config APB_TIMER




^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: CPU hyperthreading turned on after soft power-cycle
  2011-11-21 20:02                     ` John Stultz
@ 2011-11-21 21:31                       ` Jiri Polach
  2011-11-29  2:31                         ` John Stultz
  0 siblings, 1 reply; 16+ messages in thread
From: Jiri Polach @ 2011-11-21 21:31 UTC (permalink / raw)
  To: John Stultz
  Cc: Jiri Polach, 647095, Jonathan Nieder, Ben Hutchings, LKML, x86


>>> Yea. My rough guess is that the BIOS is somehow sensitive to how the
>>> CMOS RTC is touched.
>>>
>>> Does disabling CONFIG_HPET_EMULATE_RTC change the behavior?
>>
>> But how do I do it? :-)
>>
>> I have not found a way to disable it in "menuconfig". If I comment it
>> out manually in .config, it is automatically set back to "y" as soon as
>> compilation starts ...
>
> Good point. I forgot on x86_64 you can't disable HPET_TIMER.
>
> Could you then use the following patch (and run make oldconfig before
> building).
>
> thanks
> -john
>
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index cb9a104..77b5273 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -640,7 +640,7 @@ config HPET_TIMER
>   	  Choose N to continue using the legacy 8254 timer.
>
>   config HPET_EMULATE_RTC
> -	def_bool y
> +	def_bool n
>   	depends on HPET_TIMER&&  (RTC=y || RTC=m || RTC_DRV_CMOS=m || RTC_DRV_CMOS=y)
>
>   config APB_TIMER

Applying this patch does not change anything, this kernel is "bad".

Jiri Polach


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: CPU hyperthreading turned on after soft power-cycle
  2011-11-21 21:31                       ` Jiri Polach
@ 2011-11-29  2:31                         ` John Stultz
  2011-11-29 12:26                           ` Clarinet
  0 siblings, 1 reply; 16+ messages in thread
From: John Stultz @ 2011-11-29  2:31 UTC (permalink / raw)
  To: Jiri Polach; +Cc: 647095, Jonathan Nieder, Ben Hutchings, LKML, x86

On Mon, 2011-11-21 at 22:31 +0100, Jiri Polach wrote:
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index cb9a104..77b5273 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -640,7 +640,7 @@ config HPET_TIMER
> >   	  Choose N to continue using the legacy 8254 timer.
> >
> >   config HPET_EMULATE_RTC
> > -	def_bool y
> > +	def_bool n
> >   	depends on HPET_TIMER&&  (RTC=y || RTC=m || RTC_DRV_CMOS=m || RTC_DRV_CMOS=y)
> >
> >   config APB_TIMER
> 
> Applying this patch does not change anything, this kernel is "bad".

Using an older "known-good" kernel, could you build and run the test
case at the end of Documentation/rtc.txt a few times and see if it
triggers the same problem?

I'm suspicious that the setting the alarm is whats tripping the BIOS
into enabling the HT bit. Because with older kernels, we used PIE mode
irqs which hwclock usually uses at boot, but with newer kernels, we
emulate PIE via AIE alarm mode. So if the BIOS was broken before, you
wouldn't have noticed unless you tried to use AIE irqs.

If this doesn't work, I'll get some patches to both 2.6.27 and 2.6.28
kernels to debug the exact flow of how we're touching the hardware and
then we can further narrow it down.

thanks
-john




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: CPU hyperthreading turned on after soft power-cycle
  2011-11-29  2:31                         ` John Stultz
@ 2011-11-29 12:26                           ` Clarinet
  2011-11-29 23:34                             ` John Stultz
  0 siblings, 1 reply; 16+ messages in thread
From: Clarinet @ 2011-11-29 12:26 UTC (permalink / raw)
  To: John Stultz
  Cc: Jiri Polach, 647095, Jonathan Nieder, Ben Hutchings, LKML, x86


> Using an older "known-good" kernel, could you build and run the test
> case at the end of Documentation/rtc.txt a few times and see if it
> triggers the same problem?
>
> I'm suspicious that the setting the alarm is whats tripping the BIOS
> into enabling the HT bit. Because with older kernels, we used PIE mode
> irqs which hwclock usually uses at boot, but with newer kernels, we
> emulate PIE via AIE alarm mode. So if the BIOS was broken before, you
> wouldn't have noticed unless you tried to use AIE irqs.
>
> If this doesn't work, I'll get some patches to both 2.6.27 and 2.6.28
> kernels to debug the exact flow of how we're touching the hardware and
> then we can further narrow it down.

I ran the tests the following way:

- boot 2.6.37.6 - check /proc/cpuinfo - 12 processors
- halt
- boot 2.6.37.6 - check /proc/cpuinfo - 12 processors
- run rtctest
- reboot
- boot 2.6.37.6 - check /proc/cpuinfo - 12 processors
- halt
- boot 2.6.37.6 - check /proc/cpuinfo - 12 processors
- run rtctest
- halt
- boot 2.6.37.6 - check /proc/cpuinfo - 24 processors

So the conclusion is that only if rtctest is run and the machine is 
halted, it triggers the HT problem. Reboot seems to "neutralize" 
whatever rtctest did.

Jiri

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: CPU hyperthreading turned on after soft power-cycle
  2011-11-29 12:26                           ` Clarinet
@ 2011-11-29 23:34                             ` John Stultz
  2011-12-02 10:44                               ` Clarinet
  0 siblings, 1 reply; 16+ messages in thread
From: John Stultz @ 2011-11-29 23:34 UTC (permalink / raw)
  To: Clarinet; +Cc: 647095, Jonathan Nieder, Ben Hutchings, LKML, x86

On Tue, 2011-11-29 at 13:26 +0100, Clarinet wrote:
> > Using an older "known-good" kernel, could you build and run the test
> > case at the end of Documentation/rtc.txt a few times and see if it
> > triggers the same problem?
> >
> > I'm suspicious that the setting the alarm is whats tripping the BIOS
> > into enabling the HT bit. Because with older kernels, we used PIE mode
> > irqs which hwclock usually uses at boot, but with newer kernels, we
> > emulate PIE via AIE alarm mode. So if the BIOS was broken before, you
> > wouldn't have noticed unless you tried to use AIE irqs.
> >
> > If this doesn't work, I'll get some patches to both 2.6.27 and 2.6.28
> > kernels to debug the exact flow of how we're touching the hardware and
> > then we can further narrow it down.
> 
> I ran the tests the following way:
> 
> - boot 2.6.37.6 - check /proc/cpuinfo - 12 processors
> - halt
> - boot 2.6.37.6 - check /proc/cpuinfo - 12 processors
> - run rtctest
> - reboot
> - boot 2.6.37.6 - check /proc/cpuinfo - 12 processors
> - halt
> - boot 2.6.37.6 - check /proc/cpuinfo - 12 processors
> - run rtctest
> - halt
> - boot 2.6.37.6 - check /proc/cpuinfo - 24 processors
> 
> So the conclusion is that only if rtctest is run and the machine is 
> halted, it triggers the HT problem. Reboot seems to "neutralize" 
> whatever rtctest did.

Ok, this also confirms that the board had issues *before* any changes
were made to the RTC core. I'd push the board vendor to update the BIOS
to avoid this issue.

Even so, I'm curious as to what exactly trips it up. Maybe we can
provide a module option for the rtc-cmos driver to disable the alarm
functionality, so you can at least avoid the issue until the board
vendor fixes the problem (if ever).

Assuming its the alarm being set, could you try the following on a
current kernel and let me know if it still shows the problem? hwclock
might throw some odd messages with this test patch, but those can be
ignored.

thanks
-john

diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c
index 05beb6c..d9814aa 100644
--- a/drivers/rtc/rtc-cmos.c
+++ b/drivers/rtc/rtc-cmos.c
@@ -305,8 +305,8 @@ static void cmos_irq_enable(struct cmos_rtc *cmos, unsigned char mask)
 	cmos_checkintr(cmos, rtc_control);
 
 	rtc_control |= mask;
-	CMOS_WRITE(rtc_control, RTC_CONTROL);
-	hpet_set_rtc_irq_bit(mask);
+//	CMOS_WRITE(rtc_control, RTC_CONTROL);
+//	hpet_set_rtc_irq_bit(mask);
 
 	cmos_checkintr(cmos, rtc_control);
 }





^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: CPU hyperthreading turned on after soft power-cycle
  2011-11-29 23:34                             ` John Stultz
@ 2011-12-02 10:44                               ` Clarinet
  0 siblings, 0 replies; 16+ messages in thread
From: Clarinet @ 2011-12-02 10:44 UTC (permalink / raw)
  To: John Stultz; +Cc: Clarinet, 647095, Jonathan Nieder, Ben Hutchings, LKML, x86


> Ok, this also confirms that the board had issues *before* any changes
> were made to the RTC core. I'd push the board vendor to update the BIOS
> to avoid this issue.
>
> Even so, I'm curious as to what exactly trips it up. Maybe we can
> provide a module option for the rtc-cmos driver to disable the alarm
> functionality, so you can at least avoid the issue until the board
> vendor fixes the problem (if ever).
>
> Assuming its the alarm being set, could you try the following on a
> current kernel and let me know if it still shows the problem? hwclock
> might throw some odd messages with this test patch, but those can be
> ignored.

John,

I apllied the patch to 2.6.38 and tested the patched kernel - it is 
"bad", i.e. it exhibits the strange behavior the same way as unpatched 
2.6.38.

I understand that BIOS is bad, but I am also very curious what exactly 
in the kernel reveals the problem. Please let's go on with testing.

By the way, why do you think the problem appeared only when "halt" was 
called after running rtctest, and did not appear when "reboot" was 
called after running rtctest?

Best regards,

Jiri

>
> thanks
> -john
>
> diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c
> index 05beb6c..d9814aa 100644
> --- a/drivers/rtc/rtc-cmos.c
> +++ b/drivers/rtc/rtc-cmos.c
> @@ -305,8 +305,8 @@ static void cmos_irq_enable(struct cmos_rtc *cmos, unsigned char mask)
>   	cmos_checkintr(cmos, rtc_control);
>
>   	rtc_control |= mask;
> -	CMOS_WRITE(rtc_control, RTC_CONTROL);
> -	hpet_set_rtc_irq_bit(mask);
> +//	CMOS_WRITE(rtc_control, RTC_CONTROL);
> +//	hpet_set_rtc_irq_bit(mask);
>
>   	cmos_checkintr(cmos, rtc_control);
>   }


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2011-12-02 10:44 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20111030110543.5872.61279.reportbug@supermicro.uochb.cas.cz>
2011-10-30 15:25 ` CPU hyperthreading turned on after soft power-cycle Ben Hutchings
2011-10-31 13:06   ` Clarinet
2011-11-08 12:33     ` Jiri Polach
2011-11-10  1:52       ` Jonathan Nieder
2011-11-11 13:50         ` Clarinet
2011-11-16 22:49           ` Clarinet
2011-11-17 20:32             ` John Stultz
2011-11-17 23:42               ` Jiri Polach
2011-11-17 23:53                 ` John Stultz
2011-11-21 13:27                   ` Jiri Polach
2011-11-21 20:02                     ` John Stultz
2011-11-21 21:31                       ` Jiri Polach
2011-11-29  2:31                         ` John Stultz
2011-11-29 12:26                           ` Clarinet
2011-11-29 23:34                             ` John Stultz
2011-12-02 10:44                               ` Clarinet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.