LKML Archive on lore.kernel.org
 help / color / Atom feed
* intel_pstate divide error with v3.13-rc4-256-gb7000ad
@ 2013-12-24 14:36 Josh Boyer
  2013-12-24 16:06 ` Viresh Kumar
  0 siblings, 1 reply; 32+ messages in thread
From: Josh Boyer @ 2013-12-24 14:36 UTC (permalink / raw)
  To: Rafael J. Wysocki, Viresh Kumar
  Cc: cpufreq, Linux PM list, Linux-Kernel@Vger. Kernel. Org

Hi All,

We've had a report [1] that the pstate driver causes KVM guests to
fail to boot because of a divide error.  See the backtrace below.

    4.839784] Intel P-state driver initializing.
[    4.859972] Intel pstate controlling: cpu 0
[    4.867653] cpufreq: __cpufreq_add_dev: ->get() failed
[    4.877269] divide error: 0000 [#1] SMP
[    4.878127] Modules linked in:
[    4.878127] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
3.13.0-0.rc4.git5.1.fc21.x86_64 #1
[    4.878127] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[    4.878127] task: ffff88001ea20000 ti: ffff88001e9bc000 task.ti:
ffff88001e9bc000
[    4.878127] RIP: 0010:[<ffffffff815c551d>]  [<ffffffff815c551d>]
intel_pstate_timer_func+0x11d/0x2b0
[    4.878127] RSP: 0000:ffff88001ee03e18  EFLAGS: 00010246
[    4.878127] RAX: 0000000000000000 RBX: ffff88001a454348 RCX: 0000000000006100
[    4.878127] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[    4.878127] RBP: ffff88001ee03e38 R08: 0000000000000000 R09: 0000000000000000
[    4.878127] R10: ffff88001ea20000 R11: 0000000000000000 R12: 00000c0a1ea20000
[    4.878127] R13: 1ea200001ea20000 R14: ffffffff815c5400 R15: ffff88001a454348
[    4.878127] FS:  0000000000000000(0000) GS:ffff88001ee00000(0000)
knlGS:0000000000000000
[    4.878127] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[    4.878127] CR2: 0000000000000000 CR3: 0000000001c0c000 CR4: 00000000000006f0
[    4.878127] Stack:
[    4.878127]  fffffffb1a454390 ffffffff821a4500 ffff88001a454390
0000000000000100
[    4.878127]  ffff88001ee03ea8 ffffffff81083e9a ffffffff81083e15
ffffffff82d5ed40
[    4.878127]  ffffffff8258cc60 0000000000000000 ffffffff81ac39de
0000000000000000
[    4.878127] Call Trace:
[    4.878127]  <IRQ>
[    4.878127]  [<ffffffff81083e9a>] call_timer_fn+0x8a/0x310
[    4.878127]  [<ffffffff81083e15>] ? call_timer_fn+0x5/0x310
[    4.878127]  [<ffffffff815c5400>] ? pid_param_set+0x130/0x130
[    4.878127]  [<ffffffff81084354>] run_timer_softirq+0x234/0x380
[    4.878127]  [<ffffffff8107aee4>] __do_softirq+0x104/0x430
[    4.878127]  [<ffffffff8107b5fd>] irq_exit+0xcd/0xe0
[    4.878127]  [<ffffffff81770645>] smp_apic_timer_interrupt+0x45/0x60
[    4.878127]  [<ffffffff8176efb2>] apic_timer_interrupt+0x72/0x80
[    4.878127]  <EOI>
[    4.878127]  [<ffffffff810e15cd>] ? vprintk_emit+0x1dd/0x5e0
[    4.878127]  [<ffffffff81757719>] printk+0x67/0x69
[    4.878127]  [<ffffffff815c1493>] __cpufreq_add_dev.isra.13+0x883/0x8d0
[    4.878127]  [<ffffffff815c14f0>] cpufreq_add_dev+0x10/0x20
[    4.878127]  [<ffffffff814a14d1>] subsys_interface_register+0xb1/0xf0
[    4.878127]  [<ffffffff815bf5cf>] cpufreq_register_driver+0x9f/0x210
[    4.878127]  [<ffffffff81fb19af>] intel_pstate_init+0x27d/0x3be
[    4.878127]  [<ffffffff81761e3e>] ? mutex_unlock+0xe/0x10
[    4.878127]  [<ffffffff81fb1732>] ? cpufreq_gov_dbs_init+0x12/0x12
[    4.878127]  [<ffffffff8100214a>] do_one_initcall+0xfa/0x1b0
[    4.878127]  [<ffffffff8109dbf5>] ? parse_args+0x225/0x3f0
[    4.878127]  [<ffffffff81f64193>] kernel_init_freeable+0x1fc/0x287
[    4.878127]  [<ffffffff81f638d0>] ? do_early_param+0x88/0x88
[    4.878127]  [<ffffffff8174b530>] ? rest_init+0x150/0x150
[    4.878127]  [<ffffffff8174b53e>] kernel_init+0xe/0x130
[    4.878127]  [<ffffffff8176e27c>] ret_from_fork+0x7c/0xb0
[    4.878127]  [<ffffffff8174b530>] ? rest_init+0x150/0x150
[    4.878127] Code: c1 e0 05 48 63 bc 03 10 01 00 00 48 63 83 d0 00
00 00 48 63 d6 48 c1 e2 08 c1 e1 08 4c 63 c2 48 c1 e0 08 48 98 48 c1
e0 08 48 99 <49> f7 f8 48 98 48 0f af f8 48 c1 ff 08 29 f9 89 ca c1 fa
1f 89
[    4.878127] RIP  [<ffffffff815c551d>] intel_pstate_timer_func+0x11d/0x2b0
[    4.878127]  RSP <ffff88001ee03e18>
[    5.438189] ---[ end trace f166110ed22cc37a ]---
[    5.446428] Kernel panic - not syncing: Fatal exception in interrupt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: intel_pstate divide error with v3.13-rc4-256-gb7000ad
  2013-12-24 14:36 intel_pstate divide error with v3.13-rc4-256-gb7000ad Josh Boyer
@ 2013-12-24 16:06 ` Viresh Kumar
  2013-12-27 12:24   ` One Thousand Gnomes
  2013-12-27 14:35   ` Rafael J. Wysocki
  0 siblings, 2 replies; 32+ messages in thread
From: Viresh Kumar @ 2013-12-24 16:06 UTC (permalink / raw)
  To: Josh Boyer, Dirk Brandewie
  Cc: Rafael J. Wysocki, cpufreq, Linux PM list,
	Linux-Kernel@Vger. Kernel. Org

Adding Dirk..

On 24 December 2013 20:06, Josh Boyer <jwboyer@fedoraproject.org> wrote:
> Hi All,
>
> We've had a report [1] that the pstate driver causes KVM guests to
> fail to boot because of a divide error.  See the backtrace below.
>
>     4.839784] Intel P-state driver initializing.
> [    4.859972] Intel pstate controlling: cpu 0
> [    4.867653] cpufreq: __cpufreq_add_dev: ->get() failed

After a call to ->init(), ->get() is supposed to work.
@Dirk: Any idea why it failed?

And then I don't know what made this divide by zero to happen :)

> [    4.877269] divide error: 0000 [#1] SMP
> [    4.878127] Modules linked in:
> [    4.878127] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
> 3.13.0-0.rc4.git5.1.fc21.x86_64 #1
> [    4.878127] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [    4.878127] task: ffff88001ea20000 ti: ffff88001e9bc000 task.ti:
> ffff88001e9bc000
> [    4.878127] RIP: 0010:[<ffffffff815c551d>]  [<ffffffff815c551d>]
> intel_pstate_timer_func+0x11d/0x2b0
> [    4.878127] RSP: 0000:ffff88001ee03e18  EFLAGS: 00010246
> [    4.878127] RAX: 0000000000000000 RBX: ffff88001a454348 RCX: 0000000000006100
> [    4.878127] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> [    4.878127] RBP: ffff88001ee03e38 R08: 0000000000000000 R09: 0000000000000000
> [    4.878127] R10: ffff88001ea20000 R11: 0000000000000000 R12: 00000c0a1ea20000
> [    4.878127] R13: 1ea200001ea20000 R14: ffffffff815c5400 R15: ffff88001a454348
> [    4.878127] FS:  0000000000000000(0000) GS:ffff88001ee00000(0000)
> knlGS:0000000000000000
> [    4.878127] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [    4.878127] CR2: 0000000000000000 CR3: 0000000001c0c000 CR4: 00000000000006f0
> [    4.878127] Stack:
> [    4.878127]  fffffffb1a454390 ffffffff821a4500 ffff88001a454390
> 0000000000000100
> [    4.878127]  ffff88001ee03ea8 ffffffff81083e9a ffffffff81083e15
> ffffffff82d5ed40
> [    4.878127]  ffffffff8258cc60 0000000000000000 ffffffff81ac39de
> 0000000000000000
> [    4.878127] Call Trace:
> [    4.878127]  <IRQ>
> [    4.878127]  [<ffffffff81083e9a>] call_timer_fn+0x8a/0x310
> [    4.878127]  [<ffffffff81083e15>] ? call_timer_fn+0x5/0x310
> [    4.878127]  [<ffffffff815c5400>] ? pid_param_set+0x130/0x130
> [    4.878127]  [<ffffffff81084354>] run_timer_softirq+0x234/0x380
> [    4.878127]  [<ffffffff8107aee4>] __do_softirq+0x104/0x430
> [    4.878127]  [<ffffffff8107b5fd>] irq_exit+0xcd/0xe0
> [    4.878127]  [<ffffffff81770645>] smp_apic_timer_interrupt+0x45/0x60
> [    4.878127]  [<ffffffff8176efb2>] apic_timer_interrupt+0x72/0x80
> [    4.878127]  <EOI>
> [    4.878127]  [<ffffffff810e15cd>] ? vprintk_emit+0x1dd/0x5e0
> [    4.878127]  [<ffffffff81757719>] printk+0x67/0x69
> [    4.878127]  [<ffffffff815c1493>] __cpufreq_add_dev.isra.13+0x883/0x8d0
> [    4.878127]  [<ffffffff815c14f0>] cpufreq_add_dev+0x10/0x20
> [    4.878127]  [<ffffffff814a14d1>] subsys_interface_register+0xb1/0xf0
> [    4.878127]  [<ffffffff815bf5cf>] cpufreq_register_driver+0x9f/0x210
> [    4.878127]  [<ffffffff81fb19af>] intel_pstate_init+0x27d/0x3be
> [    4.878127]  [<ffffffff81761e3e>] ? mutex_unlock+0xe/0x10
> [    4.878127]  [<ffffffff81fb1732>] ? cpufreq_gov_dbs_init+0x12/0x12
> [    4.878127]  [<ffffffff8100214a>] do_one_initcall+0xfa/0x1b0
> [    4.878127]  [<ffffffff8109dbf5>] ? parse_args+0x225/0x3f0
> [    4.878127]  [<ffffffff81f64193>] kernel_init_freeable+0x1fc/0x287
> [    4.878127]  [<ffffffff81f638d0>] ? do_early_param+0x88/0x88
> [    4.878127]  [<ffffffff8174b530>] ? rest_init+0x150/0x150
> [    4.878127]  [<ffffffff8174b53e>] kernel_init+0xe/0x130
> [    4.878127]  [<ffffffff8176e27c>] ret_from_fork+0x7c/0xb0
> [    4.878127]  [<ffffffff8174b530>] ? rest_init+0x150/0x150
> [    4.878127] Code: c1 e0 05 48 63 bc 03 10 01 00 00 48 63 83 d0 00
> 00 00 48 63 d6 48 c1 e2 08 c1 e1 08 4c 63 c2 48 c1 e0 08 48 98 48 c1
> e0 08 48 99 <49> f7 f8 48 98 48 0f af f8 48 c1 ff 08 29 f9 89 ca c1 fa
> 1f 89
> [    4.878127] RIP  [<ffffffff815c551d>] intel_pstate_timer_func+0x11d/0x2b0
> [    4.878127]  RSP <ffff88001ee03e18>
> [    5.438189] ---[ end trace f166110ed22cc37a ]---
> [    5.446428] Kernel panic - not syncing: Fatal exception in interrupt

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: intel_pstate divide error with v3.13-rc4-256-gb7000ad
  2013-12-24 16:06 ` Viresh Kumar
@ 2013-12-27 12:24   ` One Thousand Gnomes
  2013-12-27 12:47     ` Gleb Natapov
  2013-12-27 14:35   ` Rafael J. Wysocki
  1 sibling, 1 reply; 32+ messages in thread
From: One Thousand Gnomes @ 2013-12-27 12:24 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Josh Boyer, Dirk Brandewie, Rafael J. Wysocki, cpufreq,
	Linux PM list, Linux-Kernel@Vger. Kernel. Org

On Tue, 24 Dec 2013 21:36:01 +0530
Viresh Kumar <viresh.kumar@linaro.org> wrote:

> Adding Dirk..
> 
> On 24 December 2013 20:06, Josh Boyer <jwboyer@fedoraproject.org> wrote:
> > Hi All,
> >
> > We've had a report [1] that the pstate driver causes KVM guests to
> > fail to boot because of a divide error.  See the backtrace below.
> >
> >     4.839784] Intel P-state driver initializing.
> > [    4.859972] Intel pstate controlling: cpu 0
> > [    4.867653] cpufreq: __cpufreq_add_dev: ->get() failed
> 
> After a call to ->init(), ->get() is supposed to work.
> @Dirk: Any idea why it failed?
> 
> And then I don't know what made this divide by zero to happen :)

> > [    4.878127] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011

Emulator bug perhaps ?


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: intel_pstate divide error with v3.13-rc4-256-gb7000ad
  2013-12-27 12:24   ` One Thousand Gnomes
@ 2013-12-27 12:47     ` Gleb Natapov
  2013-12-27 13:46       ` Josh Boyer
  0 siblings, 1 reply; 32+ messages in thread
From: Gleb Natapov @ 2013-12-27 12:47 UTC (permalink / raw)
  To: One Thousand Gnomes
  Cc: Viresh Kumar, Josh Boyer, Dirk Brandewie, Rafael J. Wysocki,
	cpufreq, Linux PM list, Linux-Kernel@Vger. Kernel. Org

On Fri, Dec 27, 2013 at 12:24:22PM +0000, One Thousand Gnomes wrote:
> On Tue, 24 Dec 2013 21:36:01 +0530
> Viresh Kumar <viresh.kumar@linaro.org> wrote:
> 
> > Adding Dirk..
> > 
> > On 24 December 2013 20:06, Josh Boyer <jwboyer@fedoraproject.org> wrote:
> > > Hi All,
> > >
> > > We've had a report [1] that the pstate driver causes KVM guests to
> > > fail to boot because of a divide error.  See the backtrace below.
> > >
> > >     4.839784] Intel P-state driver initializing.
> > > [    4.859972] Intel pstate controlling: cpu 0
> > > [    4.867653] cpufreq: __cpufreq_add_dev: ->get() failed
> > 
> > After a call to ->init(), ->get() is supposed to work.
> > @Dirk: Any idea why it failed?
> > 
> > And then I don't know what made this divide by zero to happen :)
> 
> > > [    4.878127] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> 
> Emulator bug perhaps ?
>
KVM does not emulate P-states at all. intel_pstate_init() calls
intel_pstate_msrs_not_valid() before printing "Intel P-state driver
initializing."  which suppose to fail since it checks that two reads of
MSR_IA32_APERF return different values, but KVM does not emulate this msr
at all, so both calls should return zero (KVM suppose to inject #GP, all rdmsrl
are patched to be rdmsrl_safe in a guest).

Anything interesting in host dmesg? Is it reproducible? 

--
			Gleb.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: intel_pstate divide error with v3.13-rc4-256-gb7000ad
  2013-12-27 12:47     ` Gleb Natapov
@ 2013-12-27 13:46       ` Josh Boyer
  2013-12-27 14:15         ` Kashyap Chamarthy
  0 siblings, 1 reply; 32+ messages in thread
From: Josh Boyer @ 2013-12-27 13:46 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: One Thousand Gnomes, Viresh Kumar, Dirk Brandewie,
	Rafael J. Wysocki, cpufreq, Linux PM list,
	Linux-Kernel@Vger. Kernel. Org, Kashyap Chamarthy,
	Richard W.M. Jones

On Fri, Dec 27, 2013 at 7:47 AM, Gleb Natapov <gleb@kernel.org> wrote:
> On Fri, Dec 27, 2013 at 12:24:22PM +0000, One Thousand Gnomes wrote:
>> On Tue, 24 Dec 2013 21:36:01 +0530
>> Viresh Kumar <viresh.kumar@linaro.org> wrote:
>>
>> > Adding Dirk..
>> >
>> > On 24 December 2013 20:06, Josh Boyer <jwboyer@fedoraproject.org> wrote:
>> > > Hi All,
>> > >
>> > > We've had a report [1] that the pstate driver causes KVM guests to
>> > > fail to boot because of a divide error.  See the backtrace below.
>> > >
>> > >     4.839784] Intel P-state driver initializing.
>> > > [    4.859972] Intel pstate controlling: cpu 0
>> > > [    4.867653] cpufreq: __cpufreq_add_dev: ->get() failed
>> >
>> > After a call to ->init(), ->get() is supposed to work.
>> > @Dirk: Any idea why it failed?
>> >
>> > And then I don't know what made this divide by zero to happen :)
>>
>> > > [    4.878127] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>>
>> Emulator bug perhaps ?
>>
> KVM does not emulate P-states at all. intel_pstate_init() calls
> intel_pstate_msrs_not_valid() before printing "Intel P-state driver
> initializing."  which suppose to fail since it checks that two reads of
> MSR_IA32_APERF return different values, but KVM does not emulate this msr
> at all, so both calls should return zero (KVM suppose to inject #GP, all rdmsrl
> are patched to be rdmsrl_safe in a guest).
>
> Anything interesting in host dmesg? Is it reproducible?

Seems reproducible.  I forgot to link to the actual bug in my original
report, but you can find it here:

https://bugzilla.redhat.com/show_bug.cgi?id=1046317

Adding Richard and Kashyap on CC.

josh

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: intel_pstate divide error with v3.13-rc4-256-gb7000ad
  2013-12-27 13:46       ` Josh Boyer
@ 2013-12-27 14:15         ` Kashyap Chamarthy
  2013-12-27 14:34           ` Richard W.M. Jones
  2013-12-27 16:52           ` Gleb Natapov
  0 siblings, 2 replies; 32+ messages in thread
From: Kashyap Chamarthy @ 2013-12-27 14:15 UTC (permalink / raw)
  To: Josh Boyer
  Cc: Gleb Natapov, One Thousand Gnomes, Viresh Kumar, Dirk Brandewie,
	Rafael J. Wysocki, cpufreq, Linux PM list,
	Linux-Kernel@Vger. Kernel. Org, Richard W.M. Jones

[. . .]

>> KVM does not emulate P-states at all. intel_pstate_init() calls
>> intel_pstate_msrs_not_valid() before printing "Intel P-state driver
>> initializing."  which suppose to fail since it checks that two reads of
>> MSR_IA32_APERF return different values, but KVM does not emulate this msr
>> at all, so both calls should return zero (KVM suppose to inject #GP, all rdmsrl
>> are patched to be rdmsrl_safe in a guest).
>>
>> Anything interesting in host dmesg?

Heya Gleb,

Here's the relevant dmesg snippet (full dmesg, refer the attachment below):
=====
.
.
.
[    3.462741] Intel pstate controlling: cpu 0
[    3.463972] cpufreq: __cpufreq_add_dev: ->get() failed
[    3.465399] Intel pstate controlling: cpu 1
[    3.466505] cpufreq: __cpufreq_add_dev: ->get() failed
[    3.468056] Intel pstate controlling: cpu 2
[    3.469208] cpufreq: __cpufreq_add_dev: ->get() failed
[    3.470751] Intel pstate controlling: cpu 3
[    3.471870] cpufreq: __cpufreq_add_dev: ->get() failed
[    3.473289] Intel pstate controlling: cpu 4
[    3.474545] cpufreq: __cpufreq_add_dev: ->get() failed
[    3.475957] Intel pstate controlling: cpu 5
[    3.477135] cpufreq: __cpufreq_add_dev: ->get() failed
[    3.478544] Intel pstate controlling: cpu 6
[    3.479670] cpufreq: __cpufreq_add_dev: ->get() failed
[    3.481124] Intel pstate controlling: cpu 7
[    3.482288] cpufreq: __cpufreq_add_dev: ->get() failed
[    3.483752] Intel pstate controlling: cpu 8
[    3.484986] cpufreq: __cpufreq_add_dev: ->get() failed
[    3.486396] Intel pstate controlling: cpu 9
[    3.488000] cpufreq: __cpufreq_add_dev: ->get() failed
[    3.489684] Intel pstate controlling: cpu 10
[    3.491027] cpufreq: __cpufreq_add_dev: ->get() failed
[    3.492647] Intel pstate controlling: cpu 11
[    3.494026] cpufreq: __cpufreq_add_dev: ->get() failed
[    3.495662] Intel pstate controlling: cpu 12
[    3.497065] cpufreq: __cpufreq_add_dev: ->get() failed
[    3.498733] Intel pstate controlling: cpu 13
[    3.500071] cpufreq: __cpufreq_add_dev: ->get() failed
[    3.501787] Intel pstate controlling: cpu 14
[    3.503105] cpufreq: __cpufreq_add_dev: ->get() failed
[    3.504809] Intel pstate controlling: cpu 15
[    3.506142] cpufreq: __cpufreq_add_dev: ->get() failed
[    3.507825] Intel pstate controlling: cpu 16
[    3.509175] cpufreq: __cpufreq_add_dev: ->get() failed
[    3.510780] Intel pstate controlling: cpu 17
[    3.512158] cpufreq: __cpufreq_add_dev: ->get() failed
[    3.513765] Intel pstate controlling: cpu 18
[    3.515225] cpufreq: __cpufreq_add_dev: ->get() failed
[    3.516898] Intel pstate controlling: cpu 19
.
.
.
=====

Full dmesg output: https://bugzilla.kernel.org/attachment.cgi?id=119701

  (Kernel bug: https://bugzilla.kernel.org/show_bug.cgi?id=67761)

>> Is it reproducible?

Yes, just now I was able to reproduce it again. It's trivial with
libguestfs-test-tool:

  $ export LIBGUESTFS_BACKEND=direct

  $ guestfish get-backend
  direct

  $ libguestfs-test-tool

You see the Kernel panic on stdout.

NOTE: You can reproduce the issue without setting LIBGUESTFS_BACKEND=direct,
      in which case, it'll default to 'libvirt' back-end.


Version:

  $ uname -r; rpm -q libguestfs libvirt qemu-system-x86
  3.13.0-0.rc4.git5.1.fc21.x86_64
  libguestfs-1.25.18-1.fc21.x86_64
  libvirt-1.1.3.1-2.fc20.x86_64
  qemu-system-x86-1.6.1-2.fc20.x86_64


> 
> Seems reproducible.  I forgot to link to the actual bug in my original
> report, but you can find it here:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=1046317
> 
> Adding Richard and Kashyap on CC.
> 
> josh
> 


-- 
/kashyap

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: intel_pstate divide error with v3.13-rc4-256-gb7000ad
  2013-12-27 14:15         ` Kashyap Chamarthy
@ 2013-12-27 14:34           ` Richard W.M. Jones
  2013-12-27 16:52           ` Gleb Natapov
  1 sibling, 0 replies; 32+ messages in thread
From: Richard W.M. Jones @ 2013-12-27 14:34 UTC (permalink / raw)
  To: Kashyap Chamarthy
  Cc: Josh Boyer, Gleb Natapov, One Thousand Gnomes, Viresh Kumar,
	Dirk Brandewie, Rafael J. Wysocki, cpufreq, Linux PM list,
	Linux-Kernel@Vger. Kernel. Org

Probably the qemu command line is more interesting, which is in this
comment and reproduced below.

https://bugzilla.redhat.com/show_bug.cgi?id=1046317#c1

/usr/bin/qemu-kvm \
    -global virtio-blk-pci.scsi=off \
    -nodefconfig \
    -enable-fips \
    -nodefaults \
    -display none \
    -machine accel=kvm:tcg \
    -cpu host,+kvmclock \
    -m 500 \
    -no-reboot \
    -no-hpet \
    -kernel /var/tmp/.guestfs-0/kernel.32370 \
    -initrd /var/tmp/.guestfs-0/initrd.32370 \
    -device virtio-scsi-pci,id=scsi \
    -drive file=/tmp/libguestfswnVOx7/scratch.1,cache=unsafe,format=raw,id=hd0,if=none \
    -device scsi-hd,drive=hd0 \
    -drive file=/var/tmp/.guestfs-0/root.32370,snapshot=on,id=appliance,cache=unsafe,if=none \
    -device scsi-hd,drive=appliance \
    -device virtio-serial-pci \
    -serial stdio \
    -device sga \
    -chardev socket,path=/tmp/libguestfswnVOx7/guestfsd.sock,id=channel0 \
    -device virtserialport,chardev=channel0,name=org.libguestfs.channel.0 \
    -append 'panic=1 console=ttyS0 udevtimeout=600 no_timer_check acpi=off printk.time=1 cgroup_disable=memory root=/dev/sdb selinux=0 guestfs_verbose=1 TERM=screen'

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: intel_pstate divide error with v3.13-rc4-256-gb7000ad
  2013-12-24 16:06 ` Viresh Kumar
  2013-12-27 12:24   ` One Thousand Gnomes
@ 2013-12-27 14:35   ` Rafael J. Wysocki
  1 sibling, 0 replies; 32+ messages in thread
From: Rafael J. Wysocki @ 2013-12-27 14:35 UTC (permalink / raw)
  To: Viresh Kumar, Josh Boyer
  Cc: Dirk Brandewie, cpufreq, Linux PM list, Linux-Kernel@Vger. Kernel. Org

On Tuesday, December 24, 2013 09:36:01 PM Viresh Kumar wrote:
> Adding Dirk..
> 
> On 24 December 2013 20:06, Josh Boyer <jwboyer@fedoraproject.org> wrote:
> > Hi All,
> >
> > We've had a report [1] that the pstate driver causes KVM guests to
> > fail to boot because of a divide error.  See the backtrace below.
> >
> >     4.839784] Intel P-state driver initializing.
> > [    4.859972] Intel pstate controlling: cpu 0
> > [    4.867653] cpufreq: __cpufreq_add_dev: ->get() failed
> 
> After a call to ->init(), ->get() is supposed to work.
> @Dirk: Any idea why it failed?

Well, it looks like sample->freq is 0 in intel_pstate_get().

> And then I don't know what made this divide by zero to happen :)

>From code inspection it looks like that is caused by the
intel_pstate_get_scaled_busy() called from intel_pstate_adjust_busy_pstate(),
so it appears that cpu->pstate.current_state is 0 at that point.

I'm wondering if something like the (untested) patch below helps, then?

Rafael


---
 drivers/cpufreq/intel_pstate.c |    5 +++++
 1 file changed, 5 insertions(+)

Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -652,6 +652,11 @@ static int intel_pstate_init_cpu(unsigne
 	cpu = all_cpu_data[cpunum];
 
 	intel_pstate_get_cpu_pstates(cpu);
+	if (!cpu->pstate.current_state) {
+		all_cpu_data[cpunum] = NULL;
+		kfree(cpu);
+		return -ENODATA;
+	}
 
 	cpu->cpu = cpunum;
 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: intel_pstate divide error with v3.13-rc4-256-gb7000ad
  2013-12-27 14:15         ` Kashyap Chamarthy
  2013-12-27 14:34           ` Richard W.M. Jones
@ 2013-12-27 16:52           ` Gleb Natapov
  2013-12-27 17:01             ` Gleb Natapov
  1 sibling, 1 reply; 32+ messages in thread
From: Gleb Natapov @ 2013-12-27 16:52 UTC (permalink / raw)
  To: Kashyap Chamarthy
  Cc: Josh Boyer, One Thousand Gnomes, Viresh Kumar, Dirk Brandewie,
	Rafael J. Wysocki, cpufreq, Linux PM list,
	Linux-Kernel@Vger. Kernel. Org, Richard W.M. Jones

On Fri, Dec 27, 2013 at 03:15:39PM +0100, Kashyap Chamarthy wrote:
> [. . .]
> 
> >> KVM does not emulate P-states at all. intel_pstate_init() calls
> >> intel_pstate_msrs_not_valid() before printing "Intel P-state driver
> >> initializing."  which suppose to fail since it checks that two reads of
> >> MSR_IA32_APERF return different values, but KVM does not emulate this msr
> >> at all, so both calls should return zero (KVM suppose to inject #GP, all rdmsrl
> >> are patched to be rdmsrl_safe in a guest).
> >>
> >> Anything interesting in host dmesg?
> 
> Heya Gleb,
> 
> Here's the relevant dmesg snippet (full dmesg, refer the attachment below):
That's guest dmesg. What about host one? Can you ftrace the failure?

--
			Gleb.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: intel_pstate divide error with v3.13-rc4-256-gb7000ad
  2013-12-27 16:52           ` Gleb Natapov
@ 2013-12-27 17:01             ` Gleb Natapov
  2013-12-27 17:17               ` Richard W.M. Jones
  2013-12-27 17:17               ` Kashyap Chamarthy
  0 siblings, 2 replies; 32+ messages in thread
From: Gleb Natapov @ 2013-12-27 17:01 UTC (permalink / raw)
  To: Kashyap Chamarthy
  Cc: Josh Boyer, One Thousand Gnomes, Viresh Kumar, Dirk Brandewie,
	Rafael J. Wysocki, cpufreq, Linux PM list,
	Linux-Kernel@Vger. Kernel. Org, Richard W.M. Jones

On Fri, Dec 27, 2013 at 06:52:48PM +0200, Gleb Natapov wrote:
> On Fri, Dec 27, 2013 at 03:15:39PM +0100, Kashyap Chamarthy wrote:
> > [. . .]
> > 
> > >> KVM does not emulate P-states at all. intel_pstate_init() calls
> > >> intel_pstate_msrs_not_valid() before printing "Intel P-state driver
> > >> initializing."  which suppose to fail since it checks that two reads of
> > >> MSR_IA32_APERF return different values, but KVM does not emulate this msr
> > >> at all, so both calls should return zero (KVM suppose to inject #GP, all rdmsrl
> > >> are patched to be rdmsrl_safe in a guest).
> > >>
> > >> Anything interesting in host dmesg?
> > 
> > Heya Gleb,
> > 
> > Here's the relevant dmesg snippet (full dmesg, refer the attachment below):
> That's guest dmesg. What about host one? Can you ftrace the failure?
> 
Ugh, it looks like guest dmesg but there are KVM messages there too ("[
281.443662] kvm [2452]: vcpu0 unhandled rdmsr: 0xe8" is unhandled access
to MSR_IA32_APERF I was talking about above), so I guess this is nested
guest invocation?  Does it happen in non nested guest?

--
			Gleb.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: intel_pstate divide error with v3.13-rc4-256-gb7000ad
  2013-12-27 17:01             ` Gleb Natapov
@ 2013-12-27 17:17               ` Richard W.M. Jones
  2013-12-27 17:17               ` Kashyap Chamarthy
  1 sibling, 0 replies; 32+ messages in thread
From: Richard W.M. Jones @ 2013-12-27 17:17 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Kashyap Chamarthy, Josh Boyer, One Thousand Gnomes, Viresh Kumar,
	Dirk Brandewie, Rafael J. Wysocki, cpufreq, Linux PM list,
	Linux-Kernel@Vger. Kernel. Org

On Fri, Dec 27, 2013 at 07:01:48PM +0200, Gleb Natapov wrote:
> On Fri, Dec 27, 2013 at 06:52:48PM +0200, Gleb Natapov wrote:
> > On Fri, Dec 27, 2013 at 03:15:39PM +0100, Kashyap Chamarthy wrote:
> > > [. . .]
> > > 
> > > >> KVM does not emulate P-states at all. intel_pstate_init() calls
> > > >> intel_pstate_msrs_not_valid() before printing "Intel P-state driver
> > > >> initializing."  which suppose to fail since it checks that two reads of
> > > >> MSR_IA32_APERF return different values, but KVM does not emulate this msr
> > > >> at all, so both calls should return zero (KVM suppose to inject #GP, all rdmsrl
> > > >> are patched to be rdmsrl_safe in a guest).
> > > >>
> > > >> Anything interesting in host dmesg?
> > > 
> > > Heya Gleb,
> > > 
> > > Here's the relevant dmesg snippet (full dmesg, refer the attachment below):
> > That's guest dmesg. What about host one? Can you ftrace the failure?
> > 
> Ugh, it looks like guest dmesg but there are KVM messages there too ("[
> 281.443662] kvm [2452]: vcpu0 unhandled rdmsr: 0xe8" is unhandled access
> to MSR_IA32_APERF I was talking about above), so I guess this is nested
> guest invocation?  Does it happen in non nested guest?

FWIW I could not reproduce the reported bug.  I am also using a guest
for testing, but *not* using nested KVM.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming blog: http://rwmj.wordpress.com
Fedora now supports 80 OCaml packages (the OPEN alternative to F#)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: intel_pstate divide error with v3.13-rc4-256-gb7000ad
  2013-12-27 17:01             ` Gleb Natapov
  2013-12-27 17:17               ` Richard W.M. Jones
@ 2013-12-27 17:17               ` Kashyap Chamarthy
  2013-12-27 21:51                 ` Rafael J. Wysocki
  1 sibling, 1 reply; 32+ messages in thread
From: Kashyap Chamarthy @ 2013-12-27 17:17 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Josh Boyer, One Thousand Gnomes, Viresh Kumar, Dirk Brandewie,
	Rafael J. Wysocki, cpufreq, Linux PM list,
	Linux-Kernel@Vger. Kernel. Org, Richard W.M. Jones

On 12/27/2013 06:01 PM, Gleb Natapov wrote:
> On Fri, Dec 27, 2013 at 06:52:48PM +0200, Gleb Natapov wrote:
>> On Fri, Dec 27, 2013 at 03:15:39PM +0100, Kashyap Chamarthy wrote:
>>> [. . .]
>>>
>>>>> KVM does not emulate P-states at all. intel_pstate_init() calls
>>>>> intel_pstate_msrs_not_valid() before printing "Intel P-state driver
>>>>> initializing."  which suppose to fail since it checks that two reads of
>>>>> MSR_IA32_APERF return different values, but KVM does not emulate this msr
>>>>> at all, so both calls should return zero (KVM suppose to inject #GP, all rdmsrl
>>>>> are patched to be rdmsrl_safe in a guest).
>>>>>
>>>>> Anything interesting in host dmesg?
>>>
>>> Heya Gleb,
>>>
>>> Here's the relevant dmesg snippet (full dmesg, refer the attachment below):
>> That's guest dmesg. What about host one? 

Here's host dmesg: https://bugzilla.kernel.org/attachment.cgi?id=119751

>> Can you ftrace the failure?

Can try, need some time (rest of the day I'll be away travelling,
will try to do it over the weekend, and update the Kernel
bugzilla with observations).

>>
> Ugh, it looks like guest dmesg but there are KVM messages there too ("[
> 281.443662] kvm [2452]: vcpu0 unhandled rdmsr: 0xe8" is unhandled access
> to MSR_IA32_APERF I was talking about above), so I guess this is nested
> guest invocation? 

Yeah -- sorry, I forgot to note it's in a nested environment :(

> Does it happen in non nested guest?

I need to that.

Note to self: Also try with a newer Kernel on the host.



-- 
/kashyap

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: intel_pstate divide error with v3.13-rc4-256-gb7000ad
  2013-12-27 17:17               ` Kashyap Chamarthy
@ 2013-12-27 21:51                 ` Rafael J. Wysocki
  2013-12-29 12:12                   ` Kashyap Chamarthy
  0 siblings, 1 reply; 32+ messages in thread
From: Rafael J. Wysocki @ 2013-12-27 21:51 UTC (permalink / raw)
  To: Kashyap Chamarthy
  Cc: Gleb Natapov, Josh Boyer, One Thousand Gnomes, Viresh Kumar,
	Dirk Brandewie, cpufreq, Linux PM list,
	Linux-Kernel@Vger. Kernel. Org, Richard W.M. Jones

On Friday, December 27, 2013 06:17:47 PM Kashyap Chamarthy wrote:
> On 12/27/2013 06:01 PM, Gleb Natapov wrote:
> > On Fri, Dec 27, 2013 at 06:52:48PM +0200, Gleb Natapov wrote:
> >> On Fri, Dec 27, 2013 at 03:15:39PM +0100, Kashyap Chamarthy wrote:
> >>> [. . .]
> >>>
> >>>>> KVM does not emulate P-states at all. intel_pstate_init() calls
> >>>>> intel_pstate_msrs_not_valid() before printing "Intel P-state driver
> >>>>> initializing."  which suppose to fail since it checks that two reads of
> >>>>> MSR_IA32_APERF return different values, but KVM does not emulate this msr
> >>>>> at all, so both calls should return zero (KVM suppose to inject #GP, all rdmsrl
> >>>>> are patched to be rdmsrl_safe in a guest).
> >>>>>
> >>>>> Anything interesting in host dmesg?
> >>>
> >>> Heya Gleb,
> >>>
> >>> Here's the relevant dmesg snippet (full dmesg, refer the attachment below):
> >> That's guest dmesg. What about host one? 
> 
> Here's host dmesg: https://bugzilla.kernel.org/attachment.cgi?id=119751
> 
> >> Can you ftrace the failure?
> 
> Can try, need some time (rest of the day I'll be away travelling,
> will try to do it over the weekend, and update the Kernel
> bugzilla with observations).
> 
> >>
> > Ugh, it looks like guest dmesg but there are KVM messages there too ("[
> > 281.443662] kvm [2452]: vcpu0 unhandled rdmsr: 0xe8" is unhandled access
> > to MSR_IA32_APERF I was talking about above), so I guess this is nested
> > guest invocation? 
> 
> Yeah -- sorry, I forgot to note it's in a nested environment :(
> 
> > Does it happen in non nested guest?
> 
> I need to that.
> 
> Note to self: Also try with a newer Kernel on the host.

Please try the patch I posted earlier today when you're at it:

https://patchwork.kernel.org/patch/3411991/

Rafael


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: intel_pstate divide error with v3.13-rc4-256-gb7000ad
  2013-12-27 21:51                 ` Rafael J. Wysocki
@ 2013-12-29 12:12                   ` Kashyap Chamarthy
  2013-12-29 15:04                     ` Rafael J. Wysocki
  0 siblings, 1 reply; 32+ messages in thread
From: Kashyap Chamarthy @ 2013-12-29 12:12 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Gleb Natapov, Josh Boyer, One Thousand Gnomes, Viresh Kumar,
	Dirk Brandewie, cpufreq, Linux PM list,
	Linux-Kernel@Vger. Kernel. Org, Richard W.M. Jones

[. . .]

>> Here's host dmesg: https://bugzilla.kernel.org/attachment.cgi?id=119751
>>
>>>> Can you ftrace the failure?
>>
>> Can try, need some time (rest of the day I'll be away travelling,
>> will try to do it over the weekend, and update the Kernel
>> bugzilla with observations).
>>
>>>>
>>> Ugh, it looks like guest dmesg but there are KVM messages there too ("[
>>> 281.443662] kvm [2452]: vcpu0 unhandled rdmsr: 0xe8" is unhandled access
>>> to MSR_IA32_APERF I was talking about above), so I guess this is nested
>>> guest invocation? 
>>
>> Yeah -- sorry, I forgot to note it's in a nested environment :(
>>
>>> Does it happen in non nested guest?
>>
>> I need to that.
>>
>> Note to self: Also try with a newer Kernel on the host.
> 
> Please try the patch I posted earlier today when you're at it:
> 
> https://patchwork.kernel.org/patch/3411991/

I applied the patch & tried to build the Kernel, it failed with:

=======
.
.
.
Generating a 4096 bit RSA private key
............................................................................................................drivers/cpufreq/intel_pstate.c:
In function 'intel_pstate_init_cpu':
drivers/cpufreq/intel_pstate.c:617:18: error: 'struct pstate_data' has no member named
'current_state'
  if (!cpu->pstate.current_state) {
                  ^
...make[2]: *** [drivers/cpufreq/intel_pstate.o] Error 1
make[1]: *** [drivers/cpufreq] Error 2
make[1]: *** Waiting for unfinished jobs....
.....................................make: *** [drivers] Error 2
make: *** Waiting for unfinished jobs....
...................................++
=======

I'll try it on Fedora Kernel git, do a test build and re-try again.

-- 
/kashyap

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: intel_pstate divide error with v3.13-rc4-256-gb7000ad
  2013-12-29 12:12                   ` Kashyap Chamarthy
@ 2013-12-29 15:04                     ` Rafael J. Wysocki
  2013-12-30 14:58                       ` Kashyap Chamarthy
  0 siblings, 1 reply; 32+ messages in thread
From: Rafael J. Wysocki @ 2013-12-29 15:04 UTC (permalink / raw)
  To: Kashyap Chamarthy
  Cc: Gleb Natapov, Josh Boyer, One Thousand Gnomes, Viresh Kumar,
	Dirk Brandewie, cpufreq, Linux PM list,
	Linux-Kernel@Vger. Kernel. Org, Richard W.M. Jones

On Sunday, December 29, 2013 01:12:18 PM Kashyap Chamarthy wrote:
> [. . .]
> 
> >> Here's host dmesg: https://bugzilla.kernel.org/attachment.cgi?id=119751
> >>
> >>>> Can you ftrace the failure?
> >>
> >> Can try, need some time (rest of the day I'll be away travelling,
> >> will try to do it over the weekend, and update the Kernel
> >> bugzilla with observations).
> >>
> >>>>
> >>> Ugh, it looks like guest dmesg but there are KVM messages there too ("[
> >>> 281.443662] kvm [2452]: vcpu0 unhandled rdmsr: 0xe8" is unhandled access
> >>> to MSR_IA32_APERF I was talking about above), so I guess this is nested
> >>> guest invocation? 
> >>
> >> Yeah -- sorry, I forgot to note it's in a nested environment :(
> >>
> >>> Does it happen in non nested guest?
> >>
> >> I need to that.
> >>
> >> Note to self: Also try with a newer Kernel on the host.
> > 
> > Please try the patch I posted earlier today when you're at it:
> > 
> > https://patchwork.kernel.org/patch/3411991/
> 
> I applied the patch & tried to build the Kernel, it failed with:
> 
> =======
> .
> .
> .
> Generating a 4096 bit RSA private key
> ............................................................................................................drivers/cpufreq/intel_pstate.c:
> In function 'intel_pstate_init_cpu':
> drivers/cpufreq/intel_pstate.c:617:18: error: 'struct pstate_data' has no member named
> 'current_state'
>   if (!cpu->pstate.current_state) {

My bad, that should have been current_pstate.  Updated patch is appended.

Thanks,
Rafael


---
 drivers/cpufreq/intel_pstate.c |    5 +++++
 1 file changed, 5 insertions(+)

Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -614,6 +614,11 @@ static int intel_pstate_init_cpu(unsigne
 	cpu = all_cpu_data[cpunum];
 
 	intel_pstate_get_cpu_pstates(cpu);
+	if (!cpu->pstate.current_pstate) {
+		all_cpu_data[cpunum] = NULL;
+		kfree(cpu);
+		return -ENODATA;
+	}
 
 	cpu->cpu = cpunum;
 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: intel_pstate divide error with v3.13-rc4-256-gb7000ad
  2013-12-29 15:04                     ` Rafael J. Wysocki
@ 2013-12-30 14:58                       ` Kashyap Chamarthy
  2013-12-31  2:07                         ` Rafael J. Wysocki
  0 siblings, 1 reply; 32+ messages in thread
From: Kashyap Chamarthy @ 2013-12-30 14:58 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Gleb Natapov, Josh Boyer, One Thousand Gnomes, Viresh Kumar,
	Dirk Brandewie, cpufreq, Linux PM list,
	Linux-Kernel@Vger. Kernel. Org, Richard W.M. Jones

On 12/29/2013 04:04 PM, Rafael J. Wysocki wrote:
> On Sunday, December 29, 2013 01:12:18 PM Kashyap Chamarthy wrote:
>> [. . .]
>>
>>>> Here's host dmesg: https://bugzilla.kernel.org/attachment.cgi?id=119751
>>>>
>>>>>> Can you ftrace the failure?
>>>>
>>>> Can try, need some time (rest of the day I'll be away travelling,
>>>> will try to do it over the weekend, and update the Kernel
>>>> bugzilla with observations).
>>>>
>>>>>>
>>>>> Ugh, it looks like guest dmesg but there are KVM messages there too ("[
>>>>> 281.443662] kvm [2452]: vcpu0 unhandled rdmsr: 0xe8" is unhandled access
>>>>> to MSR_IA32_APERF I was talking about above), so I guess this is nested
>>>>> guest invocation? 
>>>>
>>>> Yeah -- sorry, I forgot to note it's in a nested environment :(
>>>>
>>>>> Does it happen in non nested guest?
>>>>
>>>> I need to that.
>>>>
>>>> Note to self: Also try with a newer Kernel on the host.
>>>
>>> Please try the patch I posted earlier today when you're at it:
>>>
>>> https://patchwork.kernel.org/patch/3411991/
>>
>> I applied the patch & tried to build the Kernel, it failed with:
>>
>> =======
>> .
>> .
>> .
>> Generating a 4096 bit RSA private key
>> ............................................................................................................drivers/cpufreq/intel_pstate.c:
>> In function 'intel_pstate_init_cpu':
>> drivers/cpufreq/intel_pstate.c:617:18: error: 'struct pstate_data' has no member named
>> 'current_state'
>>   if (!cpu->pstate.current_state) {
> 
> My bad, that should have been current_pstate.  Updated patch is appended.
> 
> Thanks,
> Rafael
> 
> 
> ---
>  drivers/cpufreq/intel_pstate.c |    5 +++++
>  1 file changed, 5 insertions(+)
> 
> Index: linux-pm/drivers/cpufreq/intel_pstate.c
> ===================================================================
> --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
> +++ linux-pm/drivers/cpufreq/intel_pstate.c
> @@ -614,6 +614,11 @@ static int intel_pstate_init_cpu(unsigne
>  	cpu = all_cpu_data[cpunum];
>  
>  	intel_pstate_get_cpu_pstates(cpu);
> +	if (!cpu->pstate.current_pstate) {
> +		all_cpu_data[cpunum] = NULL;
> +		kfree(cpu);
> +		return -ENODATA;
> +	}
>  
>  	cpu->cpu = cpunum;
>  
> 


Thanks Rafel, I can confirm this patch helps.

I re-built the Kernel on L0 (physical host) and L1 (guest hypervisor) with
the above patch, and re-ran the libguestfs test (which invokes an L2 appliance),
it now successfully completes:

   http://kashyapc.fedorapeople.org/temp/libguestfs-test-tool-stdout.txt


Here's a Fedora Kernel scratch build (not retained for more than 2 weeks)
with the above patch I used -- http://koji.fedoraproject.org/koji/taskinfo?taskID=6342414



-- 
/kashyap

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: intel_pstate divide error with v3.13-rc4-256-gb7000ad
  2013-12-30 14:58                       ` Kashyap Chamarthy
@ 2013-12-31  2:07                         ` Rafael J. Wysocki
  2014-01-03 17:30                           ` Dirk Brandewie
  0 siblings, 1 reply; 32+ messages in thread
From: Rafael J. Wysocki @ 2013-12-31  2:07 UTC (permalink / raw)
  To: Kashyap Chamarthy
  Cc: Gleb Natapov, Josh Boyer, One Thousand Gnomes, Viresh Kumar,
	Dirk Brandewie, cpufreq, Linux PM list,
	Linux-Kernel@Vger. Kernel. Org, Richard W.M. Jones

On Monday, December 30, 2013 03:58:43 PM Kashyap Chamarthy wrote:
> On 12/29/2013 04:04 PM, Rafael J. Wysocki wrote:
> > On Sunday, December 29, 2013 01:12:18 PM Kashyap Chamarthy wrote:
> >> [. . .]
> >>
> >>>> Here's host dmesg: https://bugzilla.kernel.org/attachment.cgi?id=119751
> >>>>
> >>>>>> Can you ftrace the failure?
> >>>>
> >>>> Can try, need some time (rest of the day I'll be away travelling,
> >>>> will try to do it over the weekend, and update the Kernel
> >>>> bugzilla with observations).
> >>>>
> >>>>>>
> >>>>> Ugh, it looks like guest dmesg but there are KVM messages there too ("[
> >>>>> 281.443662] kvm [2452]: vcpu0 unhandled rdmsr: 0xe8" is unhandled access
> >>>>> to MSR_IA32_APERF I was talking about above), so I guess this is nested
> >>>>> guest invocation? 
> >>>>
> >>>> Yeah -- sorry, I forgot to note it's in a nested environment :(
> >>>>
> >>>>> Does it happen in non nested guest?
> >>>>
> >>>> I need to that.
> >>>>
> >>>> Note to self: Also try with a newer Kernel on the host.
> >>>
> >>> Please try the patch I posted earlier today when you're at it:
> >>>
> >>> https://patchwork.kernel.org/patch/3411991/
> >>
> >> I applied the patch & tried to build the Kernel, it failed with:
> >>
> >> =======
> >> .
> >> .
> >> .
> >> Generating a 4096 bit RSA private key
> >> ............................................................................................................drivers/cpufreq/intel_pstate.c:
> >> In function 'intel_pstate_init_cpu':
> >> drivers/cpufreq/intel_pstate.c:617:18: error: 'struct pstate_data' has no member named
> >> 'current_state'
> >>   if (!cpu->pstate.current_state) {
> > 
> > My bad, that should have been current_pstate.  Updated patch is appended.
> > 
> > Thanks,
> > Rafael
> > 
> > 
> > ---
> >  drivers/cpufreq/intel_pstate.c |    5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > Index: linux-pm/drivers/cpufreq/intel_pstate.c
> > ===================================================================
> > --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
> > +++ linux-pm/drivers/cpufreq/intel_pstate.c
> > @@ -614,6 +614,11 @@ static int intel_pstate_init_cpu(unsigne
> >  	cpu = all_cpu_data[cpunum];
> >  
> >  	intel_pstate_get_cpu_pstates(cpu);
> > +	if (!cpu->pstate.current_pstate) {
> > +		all_cpu_data[cpunum] = NULL;
> > +		kfree(cpu);
> > +		return -ENODATA;
> > +	}
> >  
> >  	cpu->cpu = cpunum;
> >  
> > 
> 
> 
> Thanks Rafel, I can confirm this patch helps.

Awesome, thanks!

Below is an official version with a changelog.  I'll queue it up as a fix
for 3.13.

Thanks,
Rafael


---
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Subject: intel_pstate: Fail initialization if P-state information is missing

If pstate.current_pstate is 0 after the initial
intel_pstate_get_cpu_pstates(), this means that we were unable to
obtain any useful P-state information and there is no reason to
continue, so free memory and return an error in that case.

This fixes the following divide error occuring in a nested KVM
guest:

Intel P-state driver initializing.
Intel pstate controlling: cpu 0
cpufreq: __cpufreq_add_dev: ->get() failed
divide error: 0000 [#1] SMP
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.0-0.rc4.git5.1.fc21.x86_64 #1
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
task: ffff88001ea20000 ti: ffff88001e9bc000 task.ti: ffff88001e9bc000
RIP: 0010:[<ffffffff815c551d>]  [<ffffffff815c551d>] intel_pstate_timer_func+0x11d/0x2b0
RSP: 0000:ffff88001ee03e18  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff88001a454348 RCX: 0000000000006100
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff88001ee03e38 R08: 0000000000000000 R09: 0000000000000000
R10: ffff88001ea20000 R11: 0000000000000000 R12: 00000c0a1ea20000
R13: 1ea200001ea20000 R14: ffffffff815c5400 R15: ffff88001a454348
FS:  0000000000000000(0000) GS:ffff88001ee00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000001c0c000 CR4: 00000000000006f0
Stack:
 fffffffb1a454390 ffffffff821a4500 ffff88001a454390 0000000000000100
 ffff88001ee03ea8 ffffffff81083e9a ffffffff81083e15 ffffffff82d5ed40
 ffffffff8258cc60 0000000000000000 ffffffff81ac39de 0000000000000000
Call Trace:
 <IRQ>
 [<ffffffff81083e9a>] call_timer_fn+0x8a/0x310
 [<ffffffff81083e15>] ? call_timer_fn+0x5/0x310
 [<ffffffff815c5400>] ? pid_param_set+0x130/0x130
 [<ffffffff81084354>] run_timer_softirq+0x234/0x380
 [<ffffffff8107aee4>] __do_softirq+0x104/0x430
 [<ffffffff8107b5fd>] irq_exit+0xcd/0xe0
 [<ffffffff81770645>] smp_apic_timer_interrupt+0x45/0x60
 [<ffffffff8176efb2>] apic_timer_interrupt+0x72/0x80
 <EOI>
 [<ffffffff810e15cd>] ? vprintk_emit+0x1dd/0x5e0
 [<ffffffff81757719>] printk+0x67/0x69
 [<ffffffff815c1493>] __cpufreq_add_dev.isra.13+0x883/0x8d0
 [<ffffffff815c14f0>] cpufreq_add_dev+0x10/0x20
 [<ffffffff814a14d1>] subsys_interface_register+0xb1/0xf0
 [<ffffffff815bf5cf>] cpufreq_register_driver+0x9f/0x210
 [<ffffffff81fb19af>] intel_pstate_init+0x27d/0x3be
 [<ffffffff81761e3e>] ? mutex_unlock+0xe/0x10
 [<ffffffff81fb1732>] ? cpufreq_gov_dbs_init+0x12/0x12
 [<ffffffff8100214a>] do_one_initcall+0xfa/0x1b0
 [<ffffffff8109dbf5>] ? parse_args+0x225/0x3f0
 [<ffffffff81f64193>] kernel_init_freeable+0x1fc/0x287
 [<ffffffff81f638d0>] ? do_early_param+0x88/0x88
 [<ffffffff8174b530>] ? rest_init+0x150/0x150
 [<ffffffff8174b53e>] kernel_init+0xe/0x130
 [<ffffffff8176e27c>] ret_from_fork+0x7c/0xb0
 [<ffffffff8174b530>] ? rest_init+0x150/0x150
Code: c1 e0 05 48 63 bc 03 10 01 00 00 48 63 83 d0 00 00 00 48 63 d6 48 c1 e2 08 c1 e1 08 4c 63 c2 48 c1 e0 08 48 98 48 c1 e0 08 48 99 <49> f7 f8 48 98 48 0f af f8 48 c1 ff 08 29 f9 89 ca c1 fa 1f 89
RIP  [<ffffffff815c551d>] intel_pstate_timer_func+0x11d/0x2b0
 RSP <ffff88001ee03e18>
---[ end trace f166110ed22cc37a ]---
Kernel panic - not syncing: Fatal exception in interrupt

Reported-and-tested-by: Kashyap Chamarthy <kchamart@redhat.com>
Cc: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpufreq/intel_pstate.c |    5 +++++
 1 file changed, 5 insertions(+)

Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -614,6 +614,11 @@ static int intel_pstate_init_cpu(unsigne
 	cpu = all_cpu_data[cpunum];
 
 	intel_pstate_get_cpu_pstates(cpu);
+	if (!cpu->pstate.current_pstate) {
+		all_cpu_data[cpunum] = NULL;
+		kfree(cpu);
+		return -ENODATA;
+	}
 
 	cpu->cpu = cpunum;
 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: intel_pstate divide error with v3.13-rc4-256-gb7000ad
  2013-12-31  2:07                         ` Rafael J. Wysocki
@ 2014-01-03 17:30                           ` Dirk Brandewie
  2014-01-03 18:04                             ` Gleb Natapov
  0 siblings, 1 reply; 32+ messages in thread
From: Dirk Brandewie @ 2014-01-03 17:30 UTC (permalink / raw)
  To: Rafael J. Wysocki, Kashyap Chamarthy
  Cc: Gleb Natapov, Josh Boyer, One Thousand Gnomes, Viresh Kumar,
	cpufreq, Linux PM list, Linux-Kernel@Vger. Kernel. Org,
	Richard W.M. Jones

Hi All,

Sorry for being late to the party but I just got back from vacation.

There is something deeply wrong here.  We should have never gotten to
intel_pstate_init_cpu().  The VM had to have returned value from the read
of the max pstate at driver init time and 0 when the CPU was being brought up.

intel_pstate_msrs_not_valid() was added to solve this issue early on
if I remember correctly it was Josh that reported it then.  Is there
a definative way to detect whether we are running in a VM?

Can some one tell me how the nested environment differs in regards to
reading MSRs?

TIA
--Dirk

On 12/30/2013 06:07 PM, Rafael J. Wysocki wrote:
>>> ---
>>>   drivers/cpufreq/intel_pstate.c |    5 +++++
>>>   1 file changed, 5 insertions(+)
>>>
>>> Index: linux-pm/drivers/cpufreq/intel_pstate.c
>>> ===================================================================
>>> --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
>>> +++ linux-pm/drivers/cpufreq/intel_pstate.c
>>> @@ -614,6 +614,11 @@ static int intel_pstate_init_cpu(unsigne
>>>   	cpu = all_cpu_data[cpunum];
>>>
>>>   	intel_pstate_get_cpu_pstates(cpu);
>>> +	if (!cpu->pstate.current_pstate) {
>>> +		all_cpu_data[cpunum] = NULL;
>>> +		kfree(cpu);
>>> +		return -ENODATA;
>>> +	}
>>>
>>>   	cpu->cpu = cpunum;
>>>
>>>
>>
>>
>> Thanks Rafel, I can confirm this patch helps.
>
> Awesome, thanks!
>
> Below is an official version with a changelog.  I'll queue it up as a fix
> for 3.13.
>
> Thanks,
> Rafael
>
>
> ---
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Subject: intel_pstate: Fail initialization if P-state information is missing
>
> If pstate.current_pstate is 0 after the initial
> intel_pstate_get_cpu_pstates(), this means that we were unable to
> obtain any useful P-state information and there is no reason to
> continue, so free memory and return an error in that case.
>
> This fixes the following divide error occuring in a nested KVM
> guest:
>
> Intel P-state driver initializing.
> Intel pstate controlling: cpu 0
> cpufreq: __cpufreq_add_dev: ->get() failed
> divide error: 0000 [#1] SMP
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.0-0.rc4.git5.1.fc21.x86_64 #1
> Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> task: ffff88001ea20000 ti: ffff88001e9bc000 task.ti: ffff88001e9bc000
> RIP: 0010:[<ffffffff815c551d>]  [<ffffffff815c551d>] intel_pstate_timer_func+0x11d/0x2b0
> RSP: 0000:ffff88001ee03e18  EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffff88001a454348 RCX: 0000000000006100
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: ffff88001ee03e38 R08: 0000000000000000 R09: 0000000000000000
> R10: ffff88001ea20000 R11: 0000000000000000 R12: 00000c0a1ea20000
> R13: 1ea200001ea20000 R14: ffffffff815c5400 R15: ffff88001a454348
> FS:  0000000000000000(0000) GS:ffff88001ee00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000000000000 CR3: 0000000001c0c000 CR4: 00000000000006f0
> Stack:
>   fffffffb1a454390 ffffffff821a4500 ffff88001a454390 0000000000000100
>   ffff88001ee03ea8 ffffffff81083e9a ffffffff81083e15 ffffffff82d5ed40
>   ffffffff8258cc60 0000000000000000 ffffffff81ac39de 0000000000000000
> Call Trace:
>   <IRQ>
>   [<ffffffff81083e9a>] call_timer_fn+0x8a/0x310
>   [<ffffffff81083e15>] ? call_timer_fn+0x5/0x310
>   [<ffffffff815c5400>] ? pid_param_set+0x130/0x130
>   [<ffffffff81084354>] run_timer_softirq+0x234/0x380
>   [<ffffffff8107aee4>] __do_softirq+0x104/0x430
>   [<ffffffff8107b5fd>] irq_exit+0xcd/0xe0
>   [<ffffffff81770645>] smp_apic_timer_interrupt+0x45/0x60
>   [<ffffffff8176efb2>] apic_timer_interrupt+0x72/0x80
>   <EOI>
>   [<ffffffff810e15cd>] ? vprintk_emit+0x1dd/0x5e0
>   [<ffffffff81757719>] printk+0x67/0x69
>   [<ffffffff815c1493>] __cpufreq_add_dev.isra.13+0x883/0x8d0
>   [<ffffffff815c14f0>] cpufreq_add_dev+0x10/0x20
>   [<ffffffff814a14d1>] subsys_interface_register+0xb1/0xf0
>   [<ffffffff815bf5cf>] cpufreq_register_driver+0x9f/0x210
>   [<ffffffff81fb19af>] intel_pstate_init+0x27d/0x3be
>   [<ffffffff81761e3e>] ? mutex_unlock+0xe/0x10
>   [<ffffffff81fb1732>] ? cpufreq_gov_dbs_init+0x12/0x12
>   [<ffffffff8100214a>] do_one_initcall+0xfa/0x1b0
>   [<ffffffff8109dbf5>] ? parse_args+0x225/0x3f0
>   [<ffffffff81f64193>] kernel_init_freeable+0x1fc/0x287
>   [<ffffffff81f638d0>] ? do_early_param+0x88/0x88
>   [<ffffffff8174b530>] ? rest_init+0x150/0x150
>   [<ffffffff8174b53e>] kernel_init+0xe/0x130
>   [<ffffffff8176e27c>] ret_from_fork+0x7c/0xb0
>   [<ffffffff8174b530>] ? rest_init+0x150/0x150
> Code: c1 e0 05 48 63 bc 03 10 01 00 00 48 63 83 d0 00 00 00 48 63 d6 48 c1 e2 08 c1 e1 08 4c 63 c2 48 c1 e0 08 48 98 48 c1 e0 08 48 99 <49> f7 f8 48 98 48 0f af f8 48 c1 ff 08 29 f9 89 ca c1 fa 1f 89
> RIP  [<ffffffff815c551d>] intel_pstate_timer_func+0x11d/0x2b0
>   RSP <ffff88001ee03e18>
> ---[ end trace f166110ed22cc37a ]---
> Kernel panic - not syncing: Fatal exception in interrupt
>
> Reported-and-tested-by: Kashyap Chamarthy <kchamart@redhat.com>
> Cc: Josh Boyer <jwboyer@fedoraproject.org>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>   drivers/cpufreq/intel_pstate.c |    5 +++++
>   1 file changed, 5 insertions(+)
>
> Index: linux-pm/drivers/cpufreq/intel_pstate.c
> ===================================================================
> --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
> +++ linux-pm/drivers/cpufreq/intel_pstate.c
> @@ -614,6 +614,11 @@ static int intel_pstate_init_cpu(unsigne
>   	cpu = all_cpu_data[cpunum];
>
>   	intel_pstate_get_cpu_pstates(cpu);
> +	if (!cpu->pstate.current_pstate) {
> +		all_cpu_data[cpunum] = NULL;
> +		kfree(cpu);
> +		return -ENODATA;
> +	}
>
>   	cpu->cpu = cpunum;
>
>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: intel_pstate divide error with v3.13-rc4-256-gb7000ad
  2014-01-03 17:30                           ` Dirk Brandewie
@ 2014-01-03 18:04                             ` Gleb Natapov
  2014-01-03 20:00                               ` Dirk Brandewie
  2014-01-03 22:46                               ` Rafael J. Wysocki
  0 siblings, 2 replies; 32+ messages in thread
From: Gleb Natapov @ 2014-01-03 18:04 UTC (permalink / raw)
  To: Dirk Brandewie
  Cc: Rafael J. Wysocki, Kashyap Chamarthy, Josh Boyer,
	One Thousand Gnomes, Viresh Kumar, cpufreq, Linux PM list,
	Linux-Kernel@Vger. Kernel. Org, Richard W.M. Jones

On Fri, Jan 03, 2014 at 09:30:28AM -0800, Dirk Brandewie wrote:
> Hi All,
> 
> Sorry for being late to the party but I just got back from vacation.
> 
> There is something deeply wrong here.  We should have never gotten to
> intel_pstate_init_cpu().  The VM had to have returned value from the read
> of the max pstate at driver init time and 0 when the CPU was being brought up.
> 
> intel_pstate_msrs_not_valid() was added to solve this issue early on
> if I remember correctly it was Josh that reported it then.  Is there
> a definative way to detect whether we are running in a VM?
> 
Checking for VM is a wrong thing to do here. KVM should behave like it
does not support p-state.

> Can some one tell me how the nested environment differs in regards to
> reading MSRs?
> 
It shouldn't differ, but there may be bug somewhere in nested emulation.
We shouldn't try and hind the bug by doing more checks in Linux but
rather fixing KVM bug that causes Linux to behave incorrectly.

> TIA
> --Dirk
> 
> On 12/30/2013 06:07 PM, Rafael J. Wysocki wrote:
> >>>---
> >>>  drivers/cpufreq/intel_pstate.c |    5 +++++
> >>>  1 file changed, 5 insertions(+)
> >>>
> >>>Index: linux-pm/drivers/cpufreq/intel_pstate.c
> >>>===================================================================
> >>>--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
> >>>+++ linux-pm/drivers/cpufreq/intel_pstate.c
> >>>@@ -614,6 +614,11 @@ static int intel_pstate_init_cpu(unsigne
> >>>  	cpu = all_cpu_data[cpunum];
> >>>
> >>>  	intel_pstate_get_cpu_pstates(cpu);
> >>>+	if (!cpu->pstate.current_pstate) {
> >>>+		all_cpu_data[cpunum] = NULL;
> >>>+		kfree(cpu);
> >>>+		return -ENODATA;
> >>>+	}
> >>>
> >>>  	cpu->cpu = cpunum;
> >>>
> >>>
> >>
> >>
> >>Thanks Rafel, I can confirm this patch helps.
> >
> >Awesome, thanks!
> >
> >Below is an official version with a changelog.  I'll queue it up as a fix
> >for 3.13.
> >
> >Thanks,
> >Rafael
> >
> >
> >---
> >From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >Subject: intel_pstate: Fail initialization if P-state information is missing
> >
> >If pstate.current_pstate is 0 after the initial
> >intel_pstate_get_cpu_pstates(), this means that we were unable to
> >obtain any useful P-state information and there is no reason to
> >continue, so free memory and return an error in that case.
> >
> >This fixes the following divide error occuring in a nested KVM
> >guest:
> >
> >Intel P-state driver initializing.
> >Intel pstate controlling: cpu 0
> >cpufreq: __cpufreq_add_dev: ->get() failed
> >divide error: 0000 [#1] SMP
> >Modules linked in:
> >CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.0-0.rc4.git5.1.fc21.x86_64 #1
> >Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> >task: ffff88001ea20000 ti: ffff88001e9bc000 task.ti: ffff88001e9bc000
> >RIP: 0010:[<ffffffff815c551d>]  [<ffffffff815c551d>] intel_pstate_timer_func+0x11d/0x2b0
> >RSP: 0000:ffff88001ee03e18  EFLAGS: 00010246
> >RAX: 0000000000000000 RBX: ffff88001a454348 RCX: 0000000000006100
> >RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> >RBP: ffff88001ee03e38 R08: 0000000000000000 R09: 0000000000000000
> >R10: ffff88001ea20000 R11: 0000000000000000 R12: 00000c0a1ea20000
> >R13: 1ea200001ea20000 R14: ffffffff815c5400 R15: ffff88001a454348
> >FS:  0000000000000000(0000) GS:ffff88001ee00000(0000) knlGS:0000000000000000
> >CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> >CR2: 0000000000000000 CR3: 0000000001c0c000 CR4: 00000000000006f0
> >Stack:
> >  fffffffb1a454390 ffffffff821a4500 ffff88001a454390 0000000000000100
> >  ffff88001ee03ea8 ffffffff81083e9a ffffffff81083e15 ffffffff82d5ed40
> >  ffffffff8258cc60 0000000000000000 ffffffff81ac39de 0000000000000000
> >Call Trace:
> >  <IRQ>
> >  [<ffffffff81083e9a>] call_timer_fn+0x8a/0x310
> >  [<ffffffff81083e15>] ? call_timer_fn+0x5/0x310
> >  [<ffffffff815c5400>] ? pid_param_set+0x130/0x130
> >  [<ffffffff81084354>] run_timer_softirq+0x234/0x380
> >  [<ffffffff8107aee4>] __do_softirq+0x104/0x430
> >  [<ffffffff8107b5fd>] irq_exit+0xcd/0xe0
> >  [<ffffffff81770645>] smp_apic_timer_interrupt+0x45/0x60
> >  [<ffffffff8176efb2>] apic_timer_interrupt+0x72/0x80
> >  <EOI>
> >  [<ffffffff810e15cd>] ? vprintk_emit+0x1dd/0x5e0
> >  [<ffffffff81757719>] printk+0x67/0x69
> >  [<ffffffff815c1493>] __cpufreq_add_dev.isra.13+0x883/0x8d0
> >  [<ffffffff815c14f0>] cpufreq_add_dev+0x10/0x20
> >  [<ffffffff814a14d1>] subsys_interface_register+0xb1/0xf0
> >  [<ffffffff815bf5cf>] cpufreq_register_driver+0x9f/0x210
> >  [<ffffffff81fb19af>] intel_pstate_init+0x27d/0x3be
> >  [<ffffffff81761e3e>] ? mutex_unlock+0xe/0x10
> >  [<ffffffff81fb1732>] ? cpufreq_gov_dbs_init+0x12/0x12
> >  [<ffffffff8100214a>] do_one_initcall+0xfa/0x1b0
> >  [<ffffffff8109dbf5>] ? parse_args+0x225/0x3f0
> >  [<ffffffff81f64193>] kernel_init_freeable+0x1fc/0x287
> >  [<ffffffff81f638d0>] ? do_early_param+0x88/0x88
> >  [<ffffffff8174b530>] ? rest_init+0x150/0x150
> >  [<ffffffff8174b53e>] kernel_init+0xe/0x130
> >  [<ffffffff8176e27c>] ret_from_fork+0x7c/0xb0
> >  [<ffffffff8174b530>] ? rest_init+0x150/0x150
> >Code: c1 e0 05 48 63 bc 03 10 01 00 00 48 63 83 d0 00 00 00 48 63 d6 48 c1 e2 08 c1 e1 08 4c 63 c2 48 c1 e0 08 48 98 48 c1 e0 08 48 99 <49> f7 f8 48 98 48 0f af f8 48 c1 ff 08 29 f9 89 ca c1 fa 1f 89
> >RIP  [<ffffffff815c551d>] intel_pstate_timer_func+0x11d/0x2b0
> >  RSP <ffff88001ee03e18>
> >---[ end trace f166110ed22cc37a ]---
> >Kernel panic - not syncing: Fatal exception in interrupt
> >
> >Reported-and-tested-by: Kashyap Chamarthy <kchamart@redhat.com>
> >Cc: Josh Boyer <jwboyer@fedoraproject.org>
> >Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >---
> >  drivers/cpufreq/intel_pstate.c |    5 +++++
> >  1 file changed, 5 insertions(+)
> >
> >Index: linux-pm/drivers/cpufreq/intel_pstate.c
> >===================================================================
> >--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
> >+++ linux-pm/drivers/cpufreq/intel_pstate.c
> >@@ -614,6 +614,11 @@ static int intel_pstate_init_cpu(unsigne
> >  	cpu = all_cpu_data[cpunum];
> >
> >  	intel_pstate_get_cpu_pstates(cpu);
> >+	if (!cpu->pstate.current_pstate) {
> >+		all_cpu_data[cpunum] = NULL;
> >+		kfree(cpu);
> >+		return -ENODATA;
> >+	}
> >
> >  	cpu->cpu = cpunum;
> >
> >
> 

--
			Gleb.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: intel_pstate divide error with v3.13-rc4-256-gb7000ad
  2014-01-03 18:04                             ` Gleb Natapov
@ 2014-01-03 20:00                               ` Dirk Brandewie
  2014-01-03 22:06                                 ` Paolo Bonzini
  2014-01-03 22:46                               ` Rafael J. Wysocki
  1 sibling, 1 reply; 32+ messages in thread
From: Dirk Brandewie @ 2014-01-03 20:00 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Rafael J. Wysocki, Kashyap Chamarthy, Josh Boyer,
	One Thousand Gnomes, Viresh Kumar, cpufreq, Linux PM list,
	Linux-Kernel@Vger. Kernel. Org, Richard W.M. Jones

On 01/03/2014 10:04 AM, Gleb Natapov wrote:
> On Fri, Jan 03, 2014 at 09:30:28AM -0800, Dirk Brandewie wrote:
>> Hi All,
>>
>> Sorry for being late to the party but I just got back from vacation.
>>
>> There is something deeply wrong here.  We should have never gotten to
>> intel_pstate_init_cpu().  The VM had to have returned value from the read
>> of the max pstate at driver init time and 0 when the CPU was being brought up.
>>
>> intel_pstate_msrs_not_valid() was added to solve this issue early on
>> if I remember correctly it was Josh that reported it then.  Is there
>> a definative way to detect whether we are running in a VM?
>>
> Checking for VM is a wrong thing to do here. KVM should behave like it
> does not support p-state.
>
>> Can some one tell me how the nested environment differs in regards to
>> reading MSRs?
>>
> It shouldn't differ, but there may be bug somewhere in nested emulation.
> We shouldn't try and hind the bug by doing more checks in Linux but
> rather fixing KVM bug that causes Linux to behave incorrectly.

Based on the unhandled MSR messages in the host dmesg the following patch
should make sure the correct values are returned for PLAT_FORM_INFO, APERF
and MPERF.  intel_pstate and turbostat are the only users of these registers.

Could you try the folloinw patch minus Rafael's patch please.
Compile tested only.

TIA
--Dirk

commit 5594b89bee7f83200c1a70bf95d50ac35e4fe3f8
Author: Dirk Brandewie <dirk.j.brandewie@intel.com>
Date:   Fri Jan 3 11:44:15 2014 -0800

     x86/kvm: Handle MSR_PLATFORM_INFO, MSR_IA32_MPERF and MSR_IA32_APERF MSRs

     Handle MSRs correctly when read from a nested KVM

     Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
---
  arch/x86/kvm/x86.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5d004da..390ef27 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2336,6 +2336,9 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 
*pdata)

  	switch (msr) {
  	case MSR_IA32_PLATFORM_ID:
+	case MSR_IA32_MPERF:
+	case MSR_IA32_APERF:
+	case MSR_PLATFORM_INFO:
  	case MSR_IA32_EBL_CR_POWERON:
  	case MSR_IA32_DEBUGCTLMSR:
  	case MSR_IA32_LASTBRANCHFROMIP:


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: intel_pstate divide error with v3.13-rc4-256-gb7000ad
  2014-01-03 20:00                               ` Dirk Brandewie
@ 2014-01-03 22:06                                 ` Paolo Bonzini
  2014-01-06 17:18                                   ` Dirk Brandewie
  0 siblings, 1 reply; 32+ messages in thread
From: Paolo Bonzini @ 2014-01-03 22:06 UTC (permalink / raw)
  To: Dirk Brandewie
  Cc: Gleb Natapov, Rafael J. Wysocki, Kashyap Chamarthy, Josh Boyer,
	One Thousand Gnomes, Viresh Kumar, cpufreq, Linux PM list,
	Linux-Kernel@Vger. Kernel. Org, Richard W.M. Jones

Il 03/01/2014 21:00, Dirk Brandewie ha scritto:
> +    case MSR_IA32_MPERF:
> +    case MSR_IA32_APERF:

These should never be accessed.  A KVM VM will always have
CPUID[06H].ECX = 0, and the Intel manual says that the MSRs are only
present if CPUID returns that value with bit 0 set.

I think the actual bug is that intel_pstate_init does not check the
feature bits in CPUID (either manually or via x86_match_cpu).

Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: intel_pstate divide error with v3.13-rc4-256-gb7000ad
  2014-01-03 18:04                             ` Gleb Natapov
  2014-01-03 20:00                               ` Dirk Brandewie
@ 2014-01-03 22:46                               ` Rafael J. Wysocki
  2014-01-04  8:35                                 ` Paolo Bonzini
  1 sibling, 1 reply; 32+ messages in thread
From: Rafael J. Wysocki @ 2014-01-03 22:46 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Dirk Brandewie, Kashyap Chamarthy, Josh Boyer,
	One Thousand Gnomes, Viresh Kumar, cpufreq, Linux PM list,
	Linux-Kernel@Vger. Kernel. Org, Richard W.M. Jones

On Friday, January 03, 2014 08:04:36 PM Gleb Natapov wrote:
> On Fri, Jan 03, 2014 at 09:30:28AM -0800, Dirk Brandewie wrote:
> > Hi All,
> > 
> > Sorry for being late to the party but I just got back from vacation.
> > 
> > There is something deeply wrong here.  We should have never gotten to
> > intel_pstate_init_cpu().  The VM had to have returned value from the read
> > of the max pstate at driver init time and 0 when the CPU was being brought up.
> > 
> > intel_pstate_msrs_not_valid() was added to solve this issue early on
> > if I remember correctly it was Josh that reported it then.  Is there
> > a definative way to detect whether we are running in a VM?
> > 
> Checking for VM is a wrong thing to do here. KVM should behave like it
> does not support p-state.
> 
> > Can some one tell me how the nested environment differs in regards to
> > reading MSRs?
> > 
> It shouldn't differ, but there may be bug somewhere in nested emulation.
> We shouldn't try and hind the bug by doing more checks in Linux but
> rather fixing KVM bug that causes Linux to behave incorrectly.

Well, fixing the KVM bug is surely welcome.

That said, adding checks to ensure that your assumptions are valid is rarely
wrong, especially if they are done once per kernel boot.  And the kernel only
should panic if it cannot continue to run, which isn't the case here.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: intel_pstate divide error with v3.13-rc4-256-gb7000ad
  2014-01-03 22:46                               ` Rafael J. Wysocki
@ 2014-01-04  8:35                                 ` Paolo Bonzini
  2014-01-04 14:38                                   ` Rafael J. Wysocki
  0 siblings, 1 reply; 32+ messages in thread
From: Paolo Bonzini @ 2014-01-04  8:35 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Gleb Natapov, Dirk Brandewie, Kashyap Chamarthy, Josh Boyer,
	One Thousand Gnomes, Viresh Kumar, cpufreq, Linux PM list,
	Linux-Kernel@Vger. Kernel. Org, Richard W.M. Jones

Il 03/01/2014 23:46, Rafael J. Wysocki ha scritto:
> Well, fixing the KVM bug is surely welcome.
> 
> That said, adding checks to ensure that your assumptions are valid is rarely
> wrong, especially if they are done once per kernel boot.  And the kernel only
> should panic if it cannot continue to run, which isn't the case here.

I agree, but I suspect even your check is already late.  Your patch is
welcome but perhaps it should have a WARN_ON too.

Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: intel_pstate divide error with v3.13-rc4-256-gb7000ad
  2014-01-04  8:35                                 ` Paolo Bonzini
@ 2014-01-04 14:38                                   ` Rafael J. Wysocki
  2014-01-04 17:38                                     ` Paolo Bonzini
  0 siblings, 1 reply; 32+ messages in thread
From: Rafael J. Wysocki @ 2014-01-04 14:38 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Gleb Natapov, Dirk Brandewie, Kashyap Chamarthy, Josh Boyer,
	One Thousand Gnomes, Viresh Kumar, cpufreq, Linux PM list,
	Linux-Kernel@Vger. Kernel. Org, Richard W.M. Jones

On Saturday, January 04, 2014 09:35:58 AM Paolo Bonzini wrote:
> Il 03/01/2014 23:46, Rafael J. Wysocki ha scritto:
> > Well, fixing the KVM bug is surely welcome.
> > 
> > That said, adding checks to ensure that your assumptions are valid is rarely
> > wrong, especially if they are done once per kernel boot.  And the kernel only
> > should panic if it cannot continue to run, which isn't the case here.
> 
> I agree, but I suspect even your check is already late.

Well, it's just a sanity check and it makes the problem go away for the reporter.

> Your patch is welcome but perhaps it should have a WARN_ON too.

It has been pulled in already, so the WARN_ON() can only be added via a separate
patch now.  Would you like to prepare that patch?

Rafael


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: intel_pstate divide error with v3.13-rc4-256-gb7000ad
  2014-01-04 14:38                                   ` Rafael J. Wysocki
@ 2014-01-04 17:38                                     ` Paolo Bonzini
  2014-01-04 17:48                                       ` Gleb Natapov
  0 siblings, 1 reply; 32+ messages in thread
From: Paolo Bonzini @ 2014-01-04 17:38 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Gleb Natapov, Dirk Brandewie, Kashyap Chamarthy, Josh Boyer,
	One Thousand Gnomes, Viresh Kumar, cpufreq, Linux PM list,
	Linux-Kernel@Vger. Kernel. Org, Richard W.M. Jones

Il 04/01/2014 15:38, Rafael J. Wysocki ha scritto:
> Well, it's just a sanity check and it makes the problem go away for the reporter.
> 
>> > Your patch is welcome but perhaps it should have a WARN_ON too.
> It has been pulled in already, so the WARN_ON() can only be added via a separate
> patch now.  Would you like to prepare that patch?

Yes, I'll add it together with the CPUID check.  I'll send the patch so
that it can get into 3.14.

Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: intel_pstate divide error with v3.13-rc4-256-gb7000ad
  2014-01-04 17:38                                     ` Paolo Bonzini
@ 2014-01-04 17:48                                       ` Gleb Natapov
  2014-01-04 21:38                                         ` Rafael J. Wysocki
  0 siblings, 1 reply; 32+ messages in thread
From: Gleb Natapov @ 2014-01-04 17:48 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Rafael J. Wysocki, Dirk Brandewie, Kashyap Chamarthy, Josh Boyer,
	One Thousand Gnomes, Viresh Kumar, cpufreq, Linux PM list,
	Linux-Kernel@Vger. Kernel. Org, Richard W.M. Jones

On Sat, Jan 04, 2014 at 06:38:59PM +0100, Paolo Bonzini wrote:
> Il 04/01/2014 15:38, Rafael J. Wysocki ha scritto:
> > Well, it's just a sanity check and it makes the problem go away for the reporter.
> > 
> >> > Your patch is welcome but perhaps it should have a WARN_ON too.
> > It has been pulled in already, so the WARN_ON() can only be added via a separate
> > patch now.  Would you like to prepare that patch?
> 
> Yes, I'll add it together with the CPUID check.  I'll send the patch so
> that it can get into 3.14.
> 
CPUID check, while correct, will sweep the problem under the rug. Current
Linux logic should detect non working pstate in KVM. We should look into
why this is not happening for nested.

--
			Gleb.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: intel_pstate divide error with v3.13-rc4-256-gb7000ad
  2014-01-04 17:48                                       ` Gleb Natapov
@ 2014-01-04 21:38                                         ` Rafael J. Wysocki
  2014-01-06 11:20                                           ` Paolo Bonzini
  0 siblings, 1 reply; 32+ messages in thread
From: Rafael J. Wysocki @ 2014-01-04 21:38 UTC (permalink / raw)
  To: Gleb Natapov
  Cc: Paolo Bonzini, Dirk Brandewie, Kashyap Chamarthy, Josh Boyer,
	One Thousand Gnomes, Viresh Kumar, cpufreq, Linux PM list,
	Linux-Kernel@Vger. Kernel. Org, Richard W.M. Jones

On Saturday, January 04, 2014 07:48:13 PM Gleb Natapov wrote:
> On Sat, Jan 04, 2014 at 06:38:59PM +0100, Paolo Bonzini wrote:
> > Il 04/01/2014 15:38, Rafael J. Wysocki ha scritto:
> > > Well, it's just a sanity check and it makes the problem go away for the reporter.
> > > 
> > >> > Your patch is welcome but perhaps it should have a WARN_ON too.
> > > It has been pulled in already, so the WARN_ON() can only be added via a separate
> > > patch now.  Would you like to prepare that patch?
> > 
> > Yes, I'll add it together with the CPUID check.  I'll send the patch so
> > that it can get into 3.14.
> > 
> CPUID check, while correct, will sweep the problem under the rug. Current
> Linux logic should detect non working pstate in KVM. We should look into
> why this is not happening for nested.

I agree.  It's better not to use CPUID for that in my opinion.

Thanks!

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: intel_pstate divide error with v3.13-rc4-256-gb7000ad
  2014-01-04 21:38                                         ` Rafael J. Wysocki
@ 2014-01-06 11:20                                           ` Paolo Bonzini
  2014-01-06 11:37                                             ` Rafael J. Wysocki
  0 siblings, 1 reply; 32+ messages in thread
From: Paolo Bonzini @ 2014-01-06 11:20 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Gleb Natapov, Dirk Brandewie, Kashyap Chamarthy, Josh Boyer,
	One Thousand Gnomes, Viresh Kumar, cpufreq, Linux PM list,
	Linux-Kernel@Vger. Kernel. Org, Richard W.M. Jones

Il 04/01/2014 22:38, Rafael J. Wysocki ha scritto:
> On Saturday, January 04, 2014 07:48:13 PM Gleb Natapov wrote:
>> On Sat, Jan 04, 2014 at 06:38:59PM +0100, Paolo Bonzini wrote:
>>> Il 04/01/2014 15:38, Rafael J. Wysocki ha scritto:
>>>> Well, it's just a sanity check and it makes the problem go away for the reporter.
>>>>
>>>>>> Your patch is welcome but perhaps it should have a WARN_ON too.
>>>> It has been pulled in already, so the WARN_ON() can only be added via a separate
>>>> patch now.  Would you like to prepare that patch?
>>>
>>> Yes, I'll add it together with the CPUID check.  I'll send the patch so
>>> that it can get into 3.14.
>>>
>> CPUID check, while correct, will sweep the problem under the rug. Current
>> Linux logic should detect non working pstate in KVM. We should look into
>> why this is not happening for nested.
> 
> I agree.  It's better not to use CPUID for that in my opinion.

Among hypervisors, RHEL5's Xen is probably one of the oldest in actual
use with new hardware and new kernels, and the CPUID bit has been fixed
in 2011.  Older versions wouldn't run new kernels due to other CPUID
bits not being cleared properly in VMs.

Is there real hardware that has the CPUID bit set and non-working
pstate?  If there's no such real hardware, CPUID is what the SDM says
you should use to detect presence of the APERF/MPERF msrs.

Having extra safety checks is fine on top of what the SDM says, but IMO
they should be WARN_ONs.  Otherwise you are sweeping bugs under the rug
just as much.

Paolo

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: intel_pstate divide error with v3.13-rc4-256-gb7000ad
  2014-01-06 11:20                                           ` Paolo Bonzini
@ 2014-01-06 11:37                                             ` Rafael J. Wysocki
  2014-01-06 18:40                                               ` Dirk Brandewie
  0 siblings, 1 reply; 32+ messages in thread
From: Rafael J. Wysocki @ 2014-01-06 11:37 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Gleb Natapov, Dirk Brandewie, Kashyap Chamarthy, Josh Boyer,
	One Thousand Gnomes, Viresh Kumar, cpufreq, Linux PM list,
	Linux-Kernel@Vger. Kernel. Org, Richard W.M. Jones

On Monday, January 06, 2014 12:20:32 PM Paolo Bonzini wrote:
> Il 04/01/2014 22:38, Rafael J. Wysocki ha scritto:
> > On Saturday, January 04, 2014 07:48:13 PM Gleb Natapov wrote:
> >> On Sat, Jan 04, 2014 at 06:38:59PM +0100, Paolo Bonzini wrote:
> >>> Il 04/01/2014 15:38, Rafael J. Wysocki ha scritto:
> >>>> Well, it's just a sanity check and it makes the problem go away for the reporter.
> >>>>
> >>>>>> Your patch is welcome but perhaps it should have a WARN_ON too.
> >>>> It has been pulled in already, so the WARN_ON() can only be added via a separate
> >>>> patch now.  Would you like to prepare that patch?
> >>>
> >>> Yes, I'll add it together with the CPUID check.  I'll send the patch so
> >>> that it can get into 3.14.
> >>>
> >> CPUID check, while correct, will sweep the problem under the rug. Current
> >> Linux logic should detect non working pstate in KVM. We should look into
> >> why this is not happening for nested.
> > 
> > I agree.  It's better not to use CPUID for that in my opinion.
> 
> Among hypervisors, RHEL5's Xen is probably one of the oldest in actual
> use with new hardware and new kernels, and the CPUID bit has been fixed
> in 2011.  Older versions wouldn't run new kernels due to other CPUID
> bits not being cleared properly in VMs.
> 
> Is there real hardware that has the CPUID bit set and non-working
> pstate?  If there's no such real hardware, CPUID is what the SDM says
> you should use to detect presence of the APERF/MPERF msrs.

OK

> Having extra safety checks is fine on top of what the SDM says, but IMO
> they should be WARN_ONs.  Otherwise you are sweeping bugs under the rug
> just as much.

As I said I'm not against adding WARN_ON()s there. :-)

Thanks!

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: intel_pstate divide error with v3.13-rc4-256-gb7000ad
  2014-01-03 22:06                                 ` Paolo Bonzini
@ 2014-01-06 17:18                                   ` Dirk Brandewie
  2014-01-07 16:11                                     ` Gleb Natapov
  0 siblings, 1 reply; 32+ messages in thread
From: Dirk Brandewie @ 2014-01-06 17:18 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Gleb Natapov, Rafael J. Wysocki, Kashyap Chamarthy, Josh Boyer,
	One Thousand Gnomes, Viresh Kumar, cpufreq, Linux PM list,
	Linux-Kernel@Vger. Kernel. Org, Richard W.M. Jones

On 01/03/2014 02:06 PM, Paolo Bonzini wrote:
> Il 03/01/2014 21:00, Dirk Brandewie ha scritto:
>> +    case MSR_IA32_MPERF:
>> +    case MSR_IA32_APERF:
>
OK I will spin the patch to only add MSR_PLATFORM_INFO.

> These should never be accessed.  A KVM VM will always have
> CPUID[06H].ECX = 0, and the Intel manual says that the MSRs are only
> present if CPUID returns that value with bit 0 set.
>
> I think the actual bug is that intel_pstate_init does not check the
> feature bits in CPUID (either manually or via x86_match_cpu).

I will add the feature check.

What are the differences between the first and the nested KVM's?
At load time intel_pstate checks that APERF and MPERF are incrementing
and that PLATFORM_INFO has some value.  Somehow these checks pass
in the nested environment and we fall over when the CPU is being added
by cpufreq.

--Dirk


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: intel_pstate divide error with v3.13-rc4-256-gb7000ad
  2014-01-06 11:37                                             ` Rafael J. Wysocki
@ 2014-01-06 18:40                                               ` Dirk Brandewie
  0 siblings, 0 replies; 32+ messages in thread
From: Dirk Brandewie @ 2014-01-06 18:40 UTC (permalink / raw)
  To: Rafael J. Wysocki, Paolo Bonzini
  Cc: Gleb Natapov, Kashyap Chamarthy, Josh Boyer, One Thousand Gnomes,
	Viresh Kumar, cpufreq, Linux PM list,
	Linux-Kernel@Vger. Kernel. Org, Richard W.M. Jones

On 01/06/2014 03:37 AM, Rafael J. Wysocki wrote:
> On Monday, January 06, 2014 12:20:32 PM Paolo Bonzini wrote:
>> Il 04/01/2014 22:38, Rafael J. Wysocki ha scritto:
>>> On Saturday, January 04, 2014 07:48:13 PM Gleb Natapov wrote:
>>>> On Sat, Jan 04, 2014 at 06:38:59PM +0100, Paolo Bonzini wrote:
>>>>> Il 04/01/2014 15:38, Rafael J. Wysocki ha scritto:
>>>>>> Well, it's just a sanity check and it makes the problem go away for the reporter.
>>>>>>
>>>>>>>> Your patch is welcome but perhaps it should have a WARN_ON too.
>>>>>> It has been pulled in already, so the WARN_ON() can only be added via a separate
>>>>>> patch now.  Would you like to prepare that patch?
>>>>>
>>>>> Yes, I'll add it together with the CPUID check.  I'll send the patch so
>>>>> that it can get into 3.14.
>>>>>
>>>> CPUID check, while correct, will sweep the problem under the rug. Current
>>>> Linux logic should detect non working pstate in KVM. We should look into
>>>> why this is not happening for nested.
>>>
>>> I agree.  It's better not to use CPUID for that in my opinion.
>>
>> Among hypervisors, RHEL5's Xen is probably one of the oldest in actual
>> use with new hardware and new kernels, and the CPUID bit has been fixed
>> in 2011.  Older versions wouldn't run new kernels due to other CPUID
>> bits not being cleared properly in VMs.
>>
>> Is there real hardware that has the CPUID bit set and non-working
>> pstate?  If there's no such real hardware, CPUID is what the SDM says
>> you should use to detect presence of the APERF/MPERF msrs.
>
> OK
>
>> Having extra safety checks is fine on top of what the SDM says, but IMO
>> they should be WARN_ONs.  Otherwise you are sweeping bugs under the rug
>> just as much.
>
> As I said I'm not against adding WARN_ON()s there. :-)
>

The patch below adds a feature check for APERF/MPERF.  With this patch
you should NOT see "Intel P-state driver initializing." in dmesg for KVM.

commit 4279f36818bd3ac42f077de114b17eb27d81d482
Author: Dirk Brandewie <dirk.j.brandewie@intel.com>
Date:   Mon Jan 6 10:19:38 2014 -0800

     intel_pstate: Add X86_FEATURE_APERFMPERF to cpu match parameters.


     KVM environments do not support APERF/MPERF MSRs. intel_pstate cannot
     operate without these registers.

     The previous validity checks in intel_pstate_msrs_not_valid() are
     insufficent in nested KVMs.

     Fixes:
     https://bugzilla.redhat.com/show_bug.cgi?id=1046317

     Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
---
  drivers/cpufreq/intel_pstate.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 0f63f5d..fe91dad 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -619,7 +619,8 @@ static void intel_pstate_timer_func(unsigned long __data)
  }

  #define ICPU(model, policy) \
-	{ X86_VENDOR_INTEL, 6, model, X86_FEATURE_ANY, (unsigned long)&policy }
+	{ X86_VENDOR_INTEL, 6, model, X86_FEATURE_APERFMPERF,\
+			(unsigned long)&policy }

  static const struct x86_cpu_id intel_pstate_cpu_ids[] = {
  	ICPU(0x2a, core_params),


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: intel_pstate divide error with v3.13-rc4-256-gb7000ad
  2014-01-06 17:18                                   ` Dirk Brandewie
@ 2014-01-07 16:11                                     ` Gleb Natapov
  0 siblings, 0 replies; 32+ messages in thread
From: Gleb Natapov @ 2014-01-07 16:11 UTC (permalink / raw)
  To: Dirk Brandewie
  Cc: Paolo Bonzini, Rafael J. Wysocki, Kashyap Chamarthy, Josh Boyer,
	One Thousand Gnomes, Viresh Kumar, cpufreq, Linux PM list,
	Linux-Kernel@Vger. Kernel. Org, Richard W.M. Jones

On Mon, Jan 06, 2014 at 09:18:44AM -0800, Dirk Brandewie wrote:
> On 01/03/2014 02:06 PM, Paolo Bonzini wrote:
> >Il 03/01/2014 21:00, Dirk Brandewie ha scritto:
> >>+    case MSR_IA32_MPERF:
> >>+    case MSR_IA32_APERF:
> >
> OK I will spin the patch to only add MSR_PLATFORM_INFO.
> 
> >These should never be accessed.  A KVM VM will always have
> >CPUID[06H].ECX = 0, and the Intel manual says that the MSRs are only
> >present if CPUID returns that value with bit 0 set.
> >
> >I think the actual bug is that intel_pstate_init does not check the
> >feature bits in CPUID (either manually or via x86_match_cpu).
> 
> I will add the feature check.
> 
> What are the differences between the first and the nested KVM's?
There shouldn't be any. There is a bug in nested emulation probably.

> At load time intel_pstate checks that APERF and MPERF are incrementing
> and that PLATFORM_INFO has some value.  Somehow these checks pass
> in the nested environment and we fall over when the CPU is being added
> by cpufreq.
> 
KVM does not emulate either of those and inject #GP if one is accessed. Linux
catches those #GPs and fixs up rdmsr to return zero. It would be interesting to
see ftrace for nested kvm run.

--
			Gleb.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, back to index

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-24 14:36 intel_pstate divide error with v3.13-rc4-256-gb7000ad Josh Boyer
2013-12-24 16:06 ` Viresh Kumar
2013-12-27 12:24   ` One Thousand Gnomes
2013-12-27 12:47     ` Gleb Natapov
2013-12-27 13:46       ` Josh Boyer
2013-12-27 14:15         ` Kashyap Chamarthy
2013-12-27 14:34           ` Richard W.M. Jones
2013-12-27 16:52           ` Gleb Natapov
2013-12-27 17:01             ` Gleb Natapov
2013-12-27 17:17               ` Richard W.M. Jones
2013-12-27 17:17               ` Kashyap Chamarthy
2013-12-27 21:51                 ` Rafael J. Wysocki
2013-12-29 12:12                   ` Kashyap Chamarthy
2013-12-29 15:04                     ` Rafael J. Wysocki
2013-12-30 14:58                       ` Kashyap Chamarthy
2013-12-31  2:07                         ` Rafael J. Wysocki
2014-01-03 17:30                           ` Dirk Brandewie
2014-01-03 18:04                             ` Gleb Natapov
2014-01-03 20:00                               ` Dirk Brandewie
2014-01-03 22:06                                 ` Paolo Bonzini
2014-01-06 17:18                                   ` Dirk Brandewie
2014-01-07 16:11                                     ` Gleb Natapov
2014-01-03 22:46                               ` Rafael J. Wysocki
2014-01-04  8:35                                 ` Paolo Bonzini
2014-01-04 14:38                                   ` Rafael J. Wysocki
2014-01-04 17:38                                     ` Paolo Bonzini
2014-01-04 17:48                                       ` Gleb Natapov
2014-01-04 21:38                                         ` Rafael J. Wysocki
2014-01-06 11:20                                           ` Paolo Bonzini
2014-01-06 11:37                                             ` Rafael J. Wysocki
2014-01-06 18:40                                               ` Dirk Brandewie
2013-12-27 14:35   ` Rafael J. Wysocki

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git