All of lore.kernel.org
 help / color / mirror / Atom feed
* High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
@ 2013-03-13 20:50 Marek Marczykowski
  2013-03-15  3:00 ` Dario Faggioli
  2013-03-15 13:02 ` Konrad Rzeszutek Wilk
  0 siblings, 2 replies; 68+ messages in thread
From: Marek Marczykowski @ 2013-03-13 20:50 UTC (permalink / raw)
  To: xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 1447 bytes --]

Hi,

I've still have problems with ACPI(?) on Xen. After some system startup or
resume CPU temperature goes high although all domUs (and dom0) are idle. On
"good" system startup it is about 50-55C, on "bad" - above 67C (most time
above 70C). I've noticed difference in C-states repored by Xen (attached
files). On "bad" startups in addition suspend doesn't work - system restarts
during suspend (still didn't managed to get console messages - I don't have
serial port on this system). Note that sometimes system boots fine ("good"
state), but problem occurs after some suspend/resume cycles. Some time ago
I've got other symptoms: only CPU0 was used - for all VCPUs (according to xl
vcpu-list). Maybe it is related?

Hardware: Dell Latitude E6420
CPU: Intel i5-2520M

Software:
xen stable-4.1 as of 15.02 (last commit: "xen: sched_creadit: improve picking
up the idle CPU for a VCPU"), with reverted commit "Introduce system_state
variable."
But the same problem on vanilla xen 4.1.2.

Linux 3.7.6 - happens almost every boot. On Linux 3.7.4 happens much rarer
(but still occurs).
Kernel config:
http://git.qubes-os.org/gitweb/?p=marmarek/kernel.git;a=blob;f=config-pvops;h=a6e953f71cdc84556571b592b8af87a5a4f9a8d0;hb=HEAD
I've tried some bisect from 3.7.4 to 3.7.6, but without success because
problem isn't 100% reproducible.

Any ideas?

-- 
Best Regards / Pozdrawiam,
Marek Marczykowski
Invisible Things Lab

[-- Attachment #1.1.2: xl-dmesg-4.1.5-pre-bad.txt --]
[-- Type: text/plain, Size: 10305 bytes --]

 __  __            _  _    _   ____                    
 \ \/ /___ _ __   | || |  / | | ___|    _ __  _ __ ___ 
  \  // _ \ '_ \  | || |_ | | |___ \ __| '_ \| '__/ _ \
  /  \  __/ | | | |__   _|| |_ ___) |__| |_) | | |  __/
 /_/\_\___|_| |_|    |_|(_)_(_)____/   | .__/|_|  \___|
                                       |_|             
(XEN) Xen version 4.1.5-pre (marmarek@marmarek.net) (gcc version 4.7.2 20120921 (Red Hat 4.7.2-2) (GCC) ) Sun Dec 23 03:10:15 CET 2012
(XEN) Latest ChangeSet: unavailable
(XEN) Bootloader: GRUB 2.00
(XEN) Command line: placeholder console=none
(XEN) Video information:
(XEN)  VGA is text mode 80x25, font 8x16
(XEN)  VBE/DDC methods: V2; EDID transfer time: 1 seconds
(XEN) Disc information:
(XEN)  Found 1 MBR signatures
(XEN)  Found 1 EDD information structures
(XEN) Xen-e820 RAM map:
(XEN)  0000000000000000 - 000000000009a800 (usable)
(XEN)  000000000009a800 - 00000000000a0000 (reserved)
(XEN)  00000000000e0000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 0000000020000000 (usable)
(XEN)  0000000020000000 - 0000000020200000 (reserved)
(XEN)  0000000020200000 - 0000000040000000 (usable)
(XEN)  0000000040000000 - 0000000040200000 (reserved)
(XEN)  0000000040200000 - 00000000ca61e000 (usable)
(XEN)  00000000ca61e000 - 00000000ca662000 (reserved)
(XEN)  00000000ca662000 - 00000000ca9b7000 (usable)
(XEN)  00000000ca9b7000 - 00000000ca9e7000 (reserved)
(XEN)  00000000ca9e7000 - 00000000cabe7000 (ACPI NVS)
(XEN)  00000000cabe7000 - 00000000cabff000 (ACPI data)
(XEN)  00000000cabff000 - 00000000cac00000 (usable)
(XEN)  00000000cb800000 - 00000000cfa00000 (reserved)
(XEN)  00000000fed1c000 - 00000000fed20000 (reserved)
(XEN)  00000000ffc00000 - 00000000ffc20000 (reserved)
(XEN)  0000000100000000 - 000000042e000000 (usable)
(XEN) ACPI: RSDP 000FE300, 0024 (r2 DELL  )
(XEN) ACPI: XSDT CABFDE18, 007C (r1 DELL    CBX3     6222004 MSFT    10013)
(XEN) ACPI: FACP CAB77D98, 00F4 (r4 DELL    CBX3     6222004 MSFT    10013)
(XEN) ACPI: DSDT CAB45018, 885D (r2 INT430 SYSFexxx     1001 INTL 20090903)
(XEN) ACPI: FACS CABD4D40, 0040
(XEN) ACPI: APIC CABFCF18, 00CC (r2 DELL    CBX3     6222004 MSFT    10013)
(XEN) ACPI: TCPA CABD5D18, 0032 (r2                        0             0)
(XEN) ACPI: SSDT CAB78A98, 02F9 (r1 DELLTP      TPM     3000 INTL 20090903)
(XEN) ACPI: MCFG CABD5C98, 003C (r1 DELL   SNDYBRDG  6222004 MSFT       97)
(XEN) ACPI: HPET CABD5C18, 0038 (r1 A M I   PCHHPET  6222004 AMI.        3)
(XEN) ACPI: BOOT CABD5B98, 0028 (r1 DELL   CBX3      6222004 AMI     10013)
(XEN) ACPI: SSDT CAB5C018, 0804 (r1  PmRef  Cpu0Ist     3000 INTL 20090903)
(XEN) ACPI: SSDT CAB5B018, 0996 (r1  PmRef    CpuPm     3000 INTL 20090903)
(XEN) ACPI: DMAR CAB77C18, 00E8 (r1 INTEL      SNB         1 INTL        1)
(XEN) ACPI: SLIC CAB65C18, 0176 (r3 DELL    CBX3     6222004 MSFT    10013)
(XEN) System RAM: 16261MB (16651320kB)
(XEN) Domain heap initialised
(XEN) ACPI: 32/64X FACS address mismatch in FADT - cabd4e40/00000000cabd4d40, using 32
(XEN) Processor #0 6:10 APIC version 21
(XEN) Processor #2 6:10 APIC version 21
(XEN) Processor #1 6:10 APIC version 21
(XEN) Processor #3 6:10 APIC version 21
(XEN) IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23
(XEN) Enabling APIC mode:  Flat.  Using 1 I/O APICs
(XEN) Table is not found!
(XEN) Switched to APIC driver x2apic_cluster.
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Detected 2494.398 MHz processor.
(XEN) Initing memory sharing.
(XEN) Intel VT-d supported page sizes: 4kB.
(XEN) Intel VT-d supported page sizes: 4kB.
(XEN) Intel VT-d Snoop Control not enabled.
(XEN) Intel VT-d Dom0 DMA Passthrough not enabled.
(XEN) Intel VT-d Queued Invalidation enabled.
(XEN) Intel VT-d Interrupt Remapping enabled.
(XEN) Intel VT-d Shared EPT tables not enabled.
(XEN) I/O virtualisation enabled
(XEN)  - Dom0 mode: Relaxed
(XEN) Enabled directed EOI with ioapic_ack_old on!
(XEN) ENABLING IO-APIC IRQs
(XEN)  -> Using old ACK method
(XEN) Platform timer is 14.318MHz HPET
(XEN) Allocated console ring of 16 KiB.
(XEN) VMX: Supported advanced features:
(XEN)  - APIC MMIO access virtualisation
(XEN)  - APIC TPR shadow
(XEN)  - Extended Page Tables (EPT)
(XEN)  - Virtual-Processor Identifiers (VPID)
(XEN)  - Virtual NMI
(XEN)  - MSR direct-access bitmap
(XEN)  - Unrestricted Guest
(XEN) HVM: ASIDs enabled.
(XEN) HVM: VMX enabled
(XEN) HVM: Hardware Assisted Paging (HAP) detected
(XEN) HVM: HAP page sizes: 4kB, 2MB
(XEN) Brought up 4 CPUs
(XEN) *** LOADING DOMAIN 0 ***
(XEN)  Xen  kernel: 64-bit, lsb, compat32
(XEN)  Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x2021000
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN)  Dom0 alloc.:   0000000410000000->0000000418000000 (4025908 pages to be allocated)
(XEN)  Init. ramdisk: 000000042a3a0000->000000042dfff200
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN)  Loaded kernel: ffffffff81000000->ffffffff82021000
(XEN)  Init. ramdisk: ffffffff82021000->ffffffff85c80200
(XEN)  Phys-Mach map: ffffffff85c81000->ffffffff87b964a0
(XEN)  Start info:    ffffffff87b97000->ffffffff87b974b4
(XEN)  Page tables:   ffffffff87b98000->ffffffff87bdb000
(XEN)  Boot stack:    ffffffff87bdb000->ffffffff87bdc000
(XEN)  TOTAL:         ffffffff80000000->ffffffff88000000
(XEN)  ENTRY ADDRESS: ffffffff81a94210
(XEN) Dom0 has maximum 4 VCPUs
(XEN) Scrubbing Free RAM: .done.
(XEN) Xen trace buffers: disabled
(XEN) Std. Loglevel: Errors and warnings
(XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings)
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen)
(XEN) Freed 244kB init memory.
(XEN) no cpu_id for acpi_id 5
(XEN) no cpu_id for acpi_id 6
(XEN) no cpu_id for acpi_id 7
(XEN) no cpu_id for acpi_id 8
(XEN) physdev.c:168: dom0: wrong map_pirq type 3
(XEN) Disabling non-boot CPUs ...
(XEN) Broke affinity for irq 8
(XEN) Broke affinity for irq 16
(XEN) Broke affinity for irq 1
(XEN) Broke affinity for irq 9
(XEN) Broke affinity for irq 12
(XEN) Broke affinity for irq 16
(XEN) Broke affinity for irq 27
(XEN) Broke affinity for irq 16
(XEN) Broke affinity for irq 17
(XEN) Broke affinity for irq 20
(XEN) Broke affinity for irq 31
(XEN) Entering ACPI S3 state.
(XEN) mce_intel.c:1162: MCA Capability: BCAST 1 SER 0 CMCI 1 firstbank 0 extended MCE MSR 0
(XEN) CPU0 CMCI LVT vector (0xf7) already installed
(XEN) CPU0: Thermal LVT vector (0xfa) already installed
(XEN) Finishing wakeup from ACPI S3 state.
(XEN) Enabling non-boot CPUs  ...
(XEN) Disabling non-boot CPUs ...
(XEN) Broke affinity for irq 16
(XEN) Broke affinity for irq 16
(XEN) Broke affinity for irq 1
(XEN) Broke affinity for irq 8
(XEN) Broke affinity for irq 9
(XEN) Broke affinity for irq 12
(XEN) Broke affinity for irq 16
(XEN) Broke affinity for irq 20
(XEN) Entering ACPI S3 state.
(XEN) mce_intel.c:1162: MCA Capability: BCAST 1 SER 0 CMCI 1 firstbank 0 extended MCE MSR 0
(XEN) CPU0 CMCI LVT vector (0xf7) already installed
(XEN) CPU0: Thermal LVT vector (0xfa) already installed
(XEN) Finishing wakeup from ACPI S3 state.
(XEN) Enabling non-boot CPUs  ...
(XEN) Disabling non-boot CPUs ...
(XEN) Broke affinity for irq 16
(XEN) Broke affinity for irq 16
(XEN) Broke affinity for irq 1
(XEN) Broke affinity for irq 16
(XEN) Broke affinity for irq 20
(XEN) Entering ACPI S3 state.
(XEN) mce_intel.c:1162: MCA Capability: BCAST 1 SER 0 CMCI 1 firstbank 0 extended MCE MSR 0
(XEN) CPU0 CMCI LVT vector (0xf7) already installed
(XEN) CPU0: Thermal LVT vector (0xfa) already installed
(XEN) Finishing wakeup from ACPI S3 state.
(XEN) Enabling non-boot CPUs  ...
(XEN) 'h' pressed -> showing installed handlers
(XEN)  key '%' (ascii '25') => trap to xendbg
(XEN)  key '*' (ascii '2a') => print all diagnostics
(XEN)  key '0' (ascii '30') => dump Dom0 registers
(XEN)  key 'A' (ascii '41') => toggle alternative key handling
(XEN)  key 'C' (ascii '43') => trigger a crashdump
(XEN)  key 'D' (ascii '44') => dump ept p2m table
(XEN)  key 'H' (ascii '48') => dump heap info
(XEN)  key 'M' (ascii '4d') => dump MSI state
(XEN)  key 'N' (ascii '4e') => trigger an NMI
(XEN)  key 'Q' (ascii '51') => dump PCI devices
(XEN)  key 'R' (ascii '52') => reboot machine
(XEN)  key 'V' (ascii '56') => dump iommu info
(XEN)  key 'a' (ascii '61') => dump timer queues
(XEN)  key 'c' (ascii '63') => dump ACPI Cx structures
(XEN)  key 'd' (ascii '64') => dump registers
(XEN)  key 'e' (ascii '65') => dump evtchn info
(XEN)  key 'g' (ascii '67') => print grant table usage
(XEN)  key 'h' (ascii '68') => show this message
(XEN)  key 'i' (ascii '69') => dump interrupt bindings
(XEN)  key 'm' (ascii '6d') => memory info
(XEN)  key 'n' (ascii '6e') => NMI statistics
(XEN)  key 'q' (ascii '71') => dump domain (and guest debug) info
(XEN)  key 'r' (ascii '72') => dump run queues
(XEN)  key 's' (ascii '73') => dump softtsc stats
(XEN)  key 't' (ascii '74') => display multi-cpu clock info
(XEN)  key 'u' (ascii '75') => dump numa info
(XEN)  key 'v' (ascii '76') => dump Intel's VMCS
(XEN)  key 'z' (ascii '7a') => print ioapic info
(XEN) 'c' pressed -> printing ACPI Cx structures
(XEN) ==cpu0==
(XEN) active state:		C3
(XEN) max_cstate:		C7
(XEN) states:
(XEN)     C1:	type[C1] latency[000] usage[15192725] method[ HALT] duration[2739156035871]
(XEN)     C2:	type[C2] latency[080] usage[02562493] method[SYSIO] duration[1089067679480]
(XEN)    *C3:	type[C3] latency[109] usage[44305151] method[SYSIO] duration[139082530623812]
(XEN)     C0:	usage[62060369] duration[13336595352900]
(XEN) PC3[0] PC6[0] PC7[0]
(XEN) CC3[0] CC6[0]
(XEN) ==cpu1==
(XEN) active state:		C1
(XEN) max_cstate:		C7
(XEN) states:
(XEN)    *C1:	type[C1] latency[000] usage[39917049] method[ HALT] duration[128878627800503]
(XEN)     C0:	usage[39917049] duration[27368721899990]
(XEN) PC3[0] PC6[0] PC7[0]
(XEN) CC3[0] CC6[0]
(XEN) ==cpu2==
(XEN) active state:		C1
(XEN) max_cstate:		C7
(XEN) states:
(XEN)    *C1:	type[C1] latency[000] usage[39887295] method[ HALT] duration[128898918584254]
(XEN)     C0:	usage[39887295] duration[27348431121369]
(XEN) PC3[0] PC6[0] PC7[0]
(XEN) CC3[0] CC6[0]
(XEN) ==cpu3==
(XEN) active state:		C1
(XEN) max_cstate:		C7
(XEN) states:
(XEN)    *C1:	type[C1] latency[000] usage[40037671] method[ HALT] duration[128853784642915]
(XEN)     C0:	usage[40037671] duration[27393565068537]
(XEN) PC3[0] PC6[0] PC7[0]
(XEN) CC3[0] CC6[0]

[-- Attachment #1.1.3: xl-dmesg-4.1.5-pre-good.txt --]
[-- Type: text/plain, Size: 12421 bytes --]

 __  __            _  _    _   ____                    
 \ \/ /___ _ __   | || |  / | | ___|    _ __  _ __ ___ 
  \  // _ \ '_ \  | || |_ | | |___ \ __| '_ \| '__/ _ \
  /  \  __/ | | | |__   _|| |_ ___) |__| |_) | | |  __/
 /_/\_\___|_| |_|    |_|(_)_(_)____/   | .__/|_|  \___|
                                       |_|             
(XEN) Xen version 4.1.5-pre (marmarek@marmarek.net) (gcc version 4.7.2 20120921 (Red Hat 4.7.2-2) (GCC) ) Sun Dec 23 03:10:15 CET 2012
(XEN) Latest ChangeSet: unavailable
(XEN) Bootloader: GRUB 2.00
(XEN) Command line: placeholder console=none
(XEN) Video information:
(XEN)  VGA is text mode 80x25, font 8x16
(XEN)  VBE/DDC methods: V2; EDID transfer time: 1 seconds
(XEN) Disc information:
(XEN)  Found 1 MBR signatures
(XEN)  Found 1 EDD information structures
(XEN) Xen-e820 RAM map:
(XEN)  0000000000000000 - 000000000009a800 (usable)
(XEN)  000000000009a800 - 00000000000a0000 (reserved)
(XEN)  00000000000e0000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 0000000007eea000 (usable)
(XEN)  0000000007eea000 - 0000000007f67000 (ACPI NVS)
(XEN)  0000000007f67000 - 0000000007f70000 (usable)
(XEN)  0000000007f70000 - 0000000007f76000 (ACPI NVS)
(XEN)  0000000007f76000 - 0000000007f78000 (usable)
(XEN)  0000000007f78000 - 0000000008000000 (ACPI NVS)
(XEN)  0000000008000000 - 000000000dffd000 (usable)
(XEN)  000000000dffd000 - 000000000e000000 (ACPI data)
(XEN)  000000000e000000 - 0000000020000000 (usable)
(XEN)  0000000020000000 - 0000000020200000 (reserved)
(XEN)  0000000020200000 - 0000000040000000 (usable)
(XEN)  0000000040000000 - 0000000040200000 (reserved)
(XEN)  0000000040200000 - 00000000c83b4000 (usable)
(XEN)  00000000c83b4000 - 00000000c840a000 (reserved)
(XEN)  00000000c840a000 - 00000000c840e000 (usable)
(XEN)  00000000c840e000 - 00000000c840f000 (reserved)
(XEN)  00000000c840f000 - 00000000c8411000 (usable)
(XEN)  00000000c8411000 - 00000000c8414000 (reserved)
(XEN)  00000000c8414000 - 00000000c841e000 (usable)
(XEN)  00000000c841e000 - 00000000c8428000 (reserved)
(XEN)  00000000c8428000 - 00000000c8432000 (usable)
(XEN)  00000000c8432000 - 00000000c8436000 (reserved)
(XEN)  00000000c8436000 - 00000000cac00000 (usable)
(XEN)  00000000cb800000 - 00000000cfa00000 (reserved)
(XEN)  00000000fed1c000 - 00000000fed20000 (reserved)
(XEN)  00000000ffc00000 - 00000000ffc20000 (reserved)
(XEN)  0000000100000000 - 000000042e000000 (usable)
(XEN) ACPI: RSDP 000FE300, 0024 (r2 DELL  )
(XEN) ACPI: XSDT 0DFFEE18, 007C (r1 DELL    CBX3     6222004 MSFT    10013)
(XEN) ACPI: FACP 07F90D98, 00F4 (r4 DELL    CBX3     6222004 MSFT    10013)
(XEN) ACPI: DSDT 07F5E018, 8834 (r2 INT430 SYSFexxx     1001 INTL 20090903)
(XEN) ACPI: FACS 07FEDD40, 0040
(XEN) ACPI: APIC 0DFFDF18, 00CC (r2 DELL    CBX3     6222004 MSFT    10013)
(XEN) ACPI: TCPA 07FEED18, 0032 (r2                        0             0)
(XEN) ACPI: SSDT 07F91A98, 02F9 (r1 DELLTP      TPM     3000 INTL 20090903)
(XEN) ACPI: MCFG 07FEEC98, 003C (r1 DELL   SNDYBRDG  6222004 MSFT       97)
(XEN) ACPI: HPET 07FEEC18, 0038 (r1 A M I   PCHHPET  6222004 AMI.        3)
(XEN) ACPI: BOOT 07FEEB98, 0028 (r1 DELL   CBX3      6222004 AMI     10013)
(XEN) ACPI: SSDT 07F75018, 0804 (r1  PmRef  Cpu0Ist     3000 INTL 20090903)
(XEN) ACPI: SSDT 07F74018, 0996 (r1  PmRef    CpuPm     3000 INTL 20090903)
(XEN) ACPI: DMAR 07F90C18, 00E8 (r1 INTEL      SNB         1 INTL        1)
(XEN) ACPI: SLIC 07F7EC18, 0176 (r3 DELL    CBX3     6222004 MSFT    10013)
(XEN) System RAM: 16262MB (16652432kB)
(XEN) Domain heap initialised
(XEN) ACPI: 32/64X FACS address mismatch in FADT - 07fede40/0000000007fedd40, using 32
(XEN) Processor #0 6:10 APIC version 21
(XEN) Processor #2 6:10 APIC version 21
(XEN) Processor #1 6:10 APIC version 21
(XEN) Processor #3 6:10 APIC version 21
(XEN) IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23
(XEN) Enabling APIC mode:  Flat.  Using 1 I/O APICs
(XEN) Table is not found!
(XEN) Switched to APIC driver x2apic_cluster.
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Detected 2494.430 MHz processor.
(XEN) Initing memory sharing.
(XEN) Intel VT-d supported page sizes: 4kB.
(XEN) Intel VT-d supported page sizes: 4kB.
(XEN) Intel VT-d Snoop Control not enabled.
(XEN) Intel VT-d Dom0 DMA Passthrough not enabled.
(XEN) Intel VT-d Queued Invalidation enabled.
(XEN) Intel VT-d Interrupt Remapping enabled.
(XEN) Intel VT-d Shared EPT tables not enabled.
(XEN) I/O virtualisation enabled
(XEN)  - Dom0 mode: Relaxed
(XEN) Enabled directed EOI with ioapic_ack_old on!
(XEN) ENABLING IO-APIC IRQs
(XEN)  -> Using old ACK method
(XEN) Platform timer is 14.318MHz HPET
(XEN) Allocated console ring of 16 KiB.
(XEN) VMX: Supported advanced features:
(XEN)  - APIC MMIO access virtualisation
(XEN)  - APIC TPR shadow
(XEN)  - Extended Page Tables (EPT)
(XEN)  - Virtual-Processor Identifiers (VPID)
(XEN)  - Virtual NMI
(XEN)  - MSR direct-access bitmap
(XEN)  - Unrestricted Guest
(XEN) HVM: ASIDs enabled.
(XEN) HVM: VMX enabled
(XEN) HVM: Hardware Assisted Paging (HAP) detected
(XEN) HVM: HAP page sizes: 4kB, 2MB
(XEN) Brought up 4 CPUs
(XEN) *** LOADING DOMAIN 0 ***
(XEN)  Xen  kernel: 64-bit, lsb, compat32
(XEN)  Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x2021000
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN)  Dom0 alloc.:   0000000410000000->0000000418000000 (4026407 pages to be allocated)
(XEN)  Init. ramdisk: 000000042a47e000->000000042dfffc00
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN)  Loaded kernel: ffffffff81000000->ffffffff82021000
(XEN)  Init. ramdisk: ffffffff82021000->ffffffff85ba2c00
(XEN)  Phys-Mach map: ffffffff85ba3000->ffffffff87ab8d48
(XEN)  Start info:    ffffffff87ab9000->ffffffff87ab94b4
(XEN)  Page tables:   ffffffff87aba000->ffffffff87afb000
(XEN)  Boot stack:    ffffffff87afb000->ffffffff87afc000
(XEN)  TOTAL:         ffffffff80000000->ffffffff87c00000
(XEN)  ENTRY ADDRESS: ffffffff81a94210
(XEN) Dom0 has maximum 4 VCPUs
(XEN) Scrubbing Free RAM: .done.
(XEN) Xen trace buffers: disabled
(XEN) Std. Loglevel: Errors and warnings
(XEN) Guest Loglevel: Nothing (Rate-limited: Errors and warnings)
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen)
(XEN) Freed 244kB init memory.
(XEN) no cpu_id for acpi_id 5
(XEN) no cpu_id for acpi_id 6
(XEN) no cpu_id for acpi_id 7
(XEN) no cpu_id for acpi_id 8
(XEN) physdev.c:168: dom0: wrong map_pirq type 3
(XEN) traps.c:2489:d0 Domain attempted WRMSR 0000000000000079 from 0x0000000000000000 to 0xffffc90004085030.
(XEN) traps.c:2489:d0 Domain attempted WRMSR 0000000000000079 from 0x0000000000000000 to 0xffffc9000408d030.
(XEN) traps.c:2489:d0 Domain attempted WRMSR 0000000000000079 from 0x0000000000000000 to 0xffffc90004095030.
(XEN) traps.c:2489:d0 Domain attempted WRMSR 0000000000000079 from 0x0000000000000000 to 0xffffc9000409d030.
(XEN) traps.c:2489:d1 Domain attempted WRMSR 0000000000000079 from 0x0000000000000000 to 0xffffc900005a6030.
(XEN) traps.c:2489:d1 Domain attempted WRMSR 0000000000000079 from 0x0000000000000000 to 0xffffc900005ae030.
(XEN) traps.c:2489:d2 Domain attempted WRMSR 0000000000000079 from 0x0000000000000000 to 0xffffc9000042a030.
(XEN) traps.c:2489:d2 Domain attempted WRMSR 0000000000000079 from 0x0000000000000000 to 0xffffc90000432030.
(XEN) traps.c:2489:d3 Domain attempted WRMSR 0000000000000079 from 0x0000000000000000 to 0xffffc900004ac030.
(XEN) traps.c:2489:d3 Domain attempted WRMSR 0000000000000079 from 0x0000000000000000 to 0xffffc900004b4030.
(XEN) traps.c:2489:d4 Domain attempted WRMSR 0000000000000079 from 0x0000000000000000 to 0xffffc9000048d030.
(XEN) traps.c:2489:d4 Domain attempted WRMSR 0000000000000079 from 0x0000000000000000 to 0xffffc9000049e030.
(XEN) traps.c:2489:d5 Domain attempted WRMSR 0000000000000079 from 0x0000000000000000 to 0xffffc90000285030.
(XEN) traps.c:2489:d5 Domain attempted WRMSR 0000000000000079 from 0x0000000000000000 to 0xffffc9000028d030.
(XEN) traps.c:2489:d5 Domain attempted WRMSR 0000000000000079 from 0x0000000000000000 to 0xffffc90000295030.
(XEN) traps.c:2489:d5 Domain attempted WRMSR 0000000000000079 from 0x0000000000000000 to 0xffffc9000029d030.
(XEN) traps.c:2489:d6 Domain attempted WRMSR 0000000000000079 from 0x0000000000000000 to 0xffffc90000433030.
(XEN) traps.c:2489:d6 Domain attempted WRMSR 0000000000000079 from 0x0000000000000000 to 0xffffc9000043e030.
(XEN) traps.c:2489:d7 Domain attempted WRMSR 0000000000000079 from 0x0000000000000000 to 0xffffc9000042f030.
(XEN) 'h' pressed -> showing installed handlers
(XEN)  key '%' (ascii '25') => trap to xendbg
(XEN)  key '*' (ascii '2a') => print all diagnostics
(XEN)  key '0' (ascii '30') => dump Dom0 registers
(XEN)  key 'A' (ascii '41') => toggle alternative key handling
(XEN)  key 'C' (ascii '43') => trigger a crashdump
(XEN)  key 'D' (ascii '44') => dump ept p2m table
(XEN)  key 'H' (ascii '48') => dump heap info
(XEN)  key 'M' (ascii '4d') => dump MSI state
(XEN)  key 'N' (ascii '4e') => trigger an NMI
(XEN)  key 'Q' (ascii '51') => dump PCI devices
(XEN)  key 'R' (ascii '52') => reboot machine
(XEN)  key 'V' (ascii '56') => dump iommu info
(XEN)  key 'a' (ascii '61') => dump timer queues
(XEN)  key 'c' (ascii '63') => dump ACPI Cx structures
(XEN)  key 'd' (ascii '64') => dump registers
(XEN)  key 'e' (ascii '65') => dump evtchn info
(XEN)  key 'g' (ascii '67') => print grant table usage
(XEN)  key 'h' (ascii '68') => show this message
(XEN)  key 'i' (ascii '69') => dump interrupt bindings
(XEN)  key 'm' (ascii '6d') => memory info
(XEN)  key 'n' (ascii '6e') => NMI statistics
(XEN)  key 'q' (ascii '71') => dump domain (and guest debug) info
(XEN)  key 'r' (ascii '72') => dump run queues
(XEN)  key 's' (ascii '73') => dump softtsc stats
(XEN)  key 't' (ascii '74') => display multi-cpu clock info
(XEN)  key 'u' (ascii '75') => dump numa info
(XEN)  key 'v' (ascii '76') => dump Intel's VMCS
(XEN)  key 'z' (ascii '7a') => print ioapic info
(XEN) 'c' pressed -> printing ACPI Cx structures
(XEN) ==cpu0==
(XEN) active state:		C3
(XEN) max_cstate:		C7
(XEN) states:
(XEN)     C1:	type[C1] latency[000] usage[00351121] method[ HALT] duration[29325442472]
(XEN)     C2:	type[C2] latency[080] usage[00020757] method[SYSIO] duration[8547926696]
(XEN)    *C3:	type[C3] latency[109] usage[00284345] method[SYSIO] duration[1196126840025]
(XEN)     C0:	usage[00656223] duration[113236848228]
(XEN) PC3[0] PC6[0] PC7[0]
(XEN) CC3[0] CC6[0]
(XEN) ==cpu1==
(XEN) active state:		C3
(XEN) max_cstate:		C7
(XEN) states:
(XEN)     C1:	type[C1] latency[000] usage[00369615] method[ HALT] duration[32806291155]
(XEN)     C2:	type[C2] latency[080] usage[00019487] method[SYSIO] duration[8378413506]
(XEN)    *C3:	type[C3] latency[109] usage[00261760] method[SYSIO] duration[1193713036771]
(XEN)     C0:	usage[00650862] duration[112339324923]
(XEN) PC3[0] PC6[0] PC7[0]
(XEN) CC3[0] CC6[0]
(XEN) ==cpu2==
(XEN) active state:		C3
(XEN) max_cstate:		C7
(XEN) states:
(XEN)     C1:	type[C1] latency[000] usage[00359461] method[ HALT] duration[32061419293]
(XEN)     C2:	type[C2] latency[080] usage[00018403] method[SYSIO] duration[8204215406]
(XEN)    *C3:	type[C3] latency[109] usage[00223523] method[SYSIO] duration[1191309166155]
(XEN)     C0:	usage[00601387] duration[115662277832]
(XEN) PC3[0] PC6[0] PC7[0]
(XEN) CC3[0] CC6[0]
(XEN) ==cpu3==
(XEN) active state:		C1
(XEN) max_cstate:		C7
(XEN) states:
(XEN)    *C1:	type[C1] latency[000] usage[00352970] method[ HALT] duration[30640717660]
(XEN)     C2:	type[C2] latency[080] usage[00020061] method[SYSIO] duration[8441869588]
(XEN)     C3:	type[C3] latency[109] usage[00232669] method[SYSIO] duration[1196294921785]
(XEN)     C0:	usage[00605700] duration[111859579671]
(XEN) PC3[0] PC6[0] PC7[0]
(XEN) CC3[0] CC6[0]
(XEN) printk: 5 messages suppressed.
(XEN) traps.c:2489:d0 Domain attempted WRMSR 000000000000083f from 0x0000000000000000 to 0x00000000000000f6.
(XEN) traps.c:2489:d0 Domain attempted WRMSR 000000000000083f from 0x0000000000000000 to 0x00000000000000f6.
(XEN) traps.c:2489:d0 Domain attempted WRMSR 000000000000083f from 0x0000000000000000 to 0x00000000000000f6.
(XEN) traps.c:2489:d0 Domain attempted WRMSR 000000000000083f from 0x0000000000000000 to 0x00000000000000f6.
(XEN) traps.c:2489:d0 Domain attempted WRMSR 000000000000083f from 0x0000000000000000 to 0x00000000000000f6.
(XEN) traps.c:2489:d0 Domain attempted WRMSR 000000000000083f from 0x0000000000000000 to 0x00000000000000f6.

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 553 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-13 20:50 High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x Marek Marczykowski
@ 2013-03-15  3:00 ` Dario Faggioli
  2013-03-15  3:22   ` Marek Marczykowski
  2013-03-15 13:02 ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 68+ messages in thread
From: Dario Faggioli @ 2013-03-15  3:00 UTC (permalink / raw)
  To: Marek Marczykowski; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 991 bytes --]

On mer, 2013-03-13 at 21:50 +0100, Marek Marczykowski wrote:
> Hi,
> 
> I've still have problems with ACPI(?) on Xen. After some system startup or
> resume CPU temperature goes high although all domUs (and dom0) are idle.
>
Resume? Sorry for going a bit off-topic (or, if you want, for not being
able to help with the issue you're seeing), but that means
suspend/resume works for you under Xen?

That would be really nice, as I've never seen it working properly... Is
that me that am missing something? :-O

Actually, now that I think of it, there was a guy at FOSDEM, with
QubesOS installed on its laptop, telling us suspend was working for him,
but I've never had the chance to try it yet.

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-15  3:00 ` Dario Faggioli
@ 2013-03-15  3:22   ` Marek Marczykowski
  0 siblings, 0 replies; 68+ messages in thread
From: Marek Marczykowski @ 2013-03-15  3:22 UTC (permalink / raw)
  To: Dario Faggioli; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1066 bytes --]

On 15.03.2013 04:00, Dario Faggioli wrote:
> On mer, 2013-03-13 at 21:50 +0100, Marek Marczykowski wrote:
>> Hi,
>>
>> I've still have problems with ACPI(?) on Xen. After some system startup or
>> resume CPU temperature goes high although all domUs (and dom0) are idle.
>>
> Resume? Sorry for going a bit off-topic (or, if you want, for not being
> able to help with the issue you're seeing), but that means
> suspend/resume works for you under Xen?

Yes, with patches from Konrad's devel/acpi-s3.v10 branch. Actually one of
those patches looks to be already in upstream linux, but two remaining still
need to be applied.

> 
> That would be really nice, as I've never seen it working properly... Is
> that me that am missing something? :-O
> 
> Actually, now that I think of it, there was a guy at FOSDEM, with
> QubesOS installed on its laptop, telling us suspend was working for him,
> but I've never had the chance to try it yet.
> 
> Regards,
> Dario
> 


-- 
Best Regards / Pozdrawiam,
Marek Marczykowski
Invisible Things Lab


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 553 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-13 20:50 High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x Marek Marczykowski
  2013-03-15  3:00 ` Dario Faggioli
@ 2013-03-15 13:02 ` Konrad Rzeszutek Wilk
  2013-03-22 15:34   ` Marek Marczykowski
  1 sibling, 1 reply; 68+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-15 13:02 UTC (permalink / raw)
  To: Marek Marczykowski; +Cc: xen-devel

On Wed, Mar 13, 2013 at 09:50:39PM +0100, Marek Marczykowski wrote:
> Hi,
> 
> I've still have problems with ACPI(?) on Xen. After some system startup or
> resume CPU temperature goes high although all domUs (and dom0) are idle. On
> "good" system startup it is about 50-55C, on "bad" - above 67C (most time
> above 70C). I've noticed difference in C-states repored by Xen (attached
> files). On "bad" startups in addition suspend doesn't work - system restarts
> during suspend (still didn't managed to get console messages - I don't have
> serial port on this system). Note that sometimes system boots fine ("good"
> state), but problem occurs after some suspend/resume cycles. Some time ago
> I've got other symptoms: only CPU0 was used - for all VCPUs (according to xl
> vcpu-list). Maybe it is related?
> 
> Hardware: Dell Latitude E6420
> CPU: Intel i5-2520M
> 
> Software:
> xen stable-4.1 as of 15.02 (last commit: "xen: sched_creadit: improve picking
> up the idle CPU for a VCPU"), with reverted commit "Introduce system_state
> variable."
> But the same problem on vanilla xen 4.1.2.
> 
> Linux 3.7.6 - happens almost every boot. On Linux 3.7.4 happens much rarer
> (but still occurs).
> Kernel config:
> http://git.qubes-os.org/gitweb/?p=marmarek/kernel.git;a=blob;f=config-pvops;h=a6e953f71cdc84556571b592b8af87a5a4f9a8d0;hb=HEAD
> I've tried some bisect from 3.7.4 to 3.7.6, but without success because
> problem isn't 100% reproducible.
> 
> Any ideas?

That C-states difference is important. The SYSIO part on your box means that the
CPU ends up doing an MWAIT. An HALT on the other hand is not so power-saving
friendly.

Looking at this:
> (XEN) no cpu_id for acpi_id 5
> (XEN) no cpu_id for acpi_id 6
> (XEN) no cpu_id for acpi_id 7
> (XEN) no cpu_id for acpi_id 8

.. means that xen-acpi-processor was trying to probe for the ACPI IDs of the
the other CPUs that the machine theoritcally can support. That means it got
the ACPI information for the first four CPUs (which is good).

You can as the first step in trying to figure this out, add #define DEBUG 1
in xen-acpi-processor.c right before any of the #includes. And also boot
Xen with 'cpufreq=verbose'. That should tell you what kind of C-states the
xen-acpi-processor uploaded (And if it did it for all of the vCPUS).

If both bootups show that we do upload the C-states for all the CPUs but they
vary that means digging a bit deeper in the ACPI code. Specifically in 
acpi_processor_get_power_info_cst and seeing if it hits any of the 'continue'.

Then I would say take also the DSDT for both bootups and compare them. It might
be that the BIOS is using a scratch register at reboot to construct the C-states
and somehow it ends up being corrupted. Which means that on the next warm reboot
the C-states has bogus data. This does show up in the field :-(

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-15 13:02 ` Konrad Rzeszutek Wilk
@ 2013-03-22 15:34   ` Marek Marczykowski
  2013-03-22 16:56     ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 68+ messages in thread
From: Marek Marczykowski @ 2013-03-22 15:34 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 4044 bytes --]

On 15.03.2013 14:02, Konrad Rzeszutek Wilk wrote:
> On Wed, Mar 13, 2013 at 09:50:39PM +0100, Marek Marczykowski wrote:
>> Hi,
>>
>> I've still have problems with ACPI(?) on Xen. After some system startup or
>> resume CPU temperature goes high although all domUs (and dom0) are idle. On
>> "good" system startup it is about 50-55C, on "bad" - above 67C (most time
>> above 70C). I've noticed difference in C-states repored by Xen (attached
>> files). On "bad" startups in addition suspend doesn't work - system restarts
>> during suspend (still didn't managed to get console messages - I don't have
>> serial port on this system). Note that sometimes system boots fine ("good"
>> state), but problem occurs after some suspend/resume cycles. Some time ago
>> I've got other symptoms: only CPU0 was used - for all VCPUs (according to xl
>> vcpu-list). Maybe it is related?
>>
>> Hardware: Dell Latitude E6420
>> CPU: Intel i5-2520M
>>
>> Software:
>> xen stable-4.1 as of 15.02 (last commit: "xen: sched_creadit: improve picking
>> up the idle CPU for a VCPU"), with reverted commit "Introduce system_state
>> variable."
>> But the same problem on vanilla xen 4.1.2.
>>
>> Linux 3.7.6 - happens almost every boot. On Linux 3.7.4 happens much rarer
>> (but still occurs).
>> Kernel config:
>> http://git.qubes-os.org/gitweb/?p=marmarek/kernel.git;a=blob;f=config-pvops;h=a6e953f71cdc84556571b592b8af87a5a4f9a8d0;hb=HEAD
>> I've tried some bisect from 3.7.4 to 3.7.6, but without success because
>> problem isn't 100% reproducible.
>>
>> Any ideas?
> 
> That C-states difference is important. The SYSIO part on your box means that the
> CPU ends up doing an MWAIT. An HALT on the other hand is not so power-saving
> friendly.
> 
> Looking at this:
>> (XEN) no cpu_id for acpi_id 5
>> (XEN) no cpu_id for acpi_id 6
>> (XEN) no cpu_id for acpi_id 7
>> (XEN) no cpu_id for acpi_id 8
> 
> .. means that xen-acpi-processor was trying to probe for the ACPI IDs of the
> the other CPUs that the machine theoritcally can support. That means it got
> the ACPI information for the first four CPUs (which is good).
> 
> You can as the first step in trying to figure this out, add #define DEBUG 1
> in xen-acpi-processor.c right before any of the #includes. And also boot
> Xen with 'cpufreq=verbose'. That should tell you what kind of C-states the
> xen-acpi-processor uploaded (And if it did it for all of the vCPUS).
> 
> If both bootups show that we do upload the C-states for all the CPUs but they
> vary that means digging a bit deeper in the ACPI code. Specifically in 
> acpi_processor_get_power_info_cst and seeing if it hits any of the 'continue'.
> 
> Then I would say take also the DSDT for both bootups and compare them. It might
> be that the BIOS is using a scratch register at reboot to construct the C-states
> and somehow it ends up being corrupted. Which means that on the next warm reboot
> the C-states has bogus data. This does show up in the field :-(

Finally I've found some time for further debugging this. And it looks like
some deeper ACPI code problem...

I've switched to 3.8.4, on which problem is much easier to reproduce (almost
every startup).

On bad bootup, xen-acpi-processor didn't found any C-state: for each CPU
_pr.flags.power and _pr->power.count was 0 (but flags.power_setup_done=1). In
this case suspend (or shutdown) always ends up with reset.

On good one xen-acpi-processor got C1-C3 states for each CPU, then suspend
succeeded, but after resume CPU0 had C1-C3, but others only C1. Reloading
xen-acpi-processor (rmmod -f...) fixes this (according to xl debug-key c), but
still temperature keep high. Regardless of xen-acpi-processor reloading, next
suspend always fails.

Not sure how C-states can be related to S3 suspend, but perhaps something more
general with ACPI is wrong?

Each time DSDT (get from /sys/firmware/acpi/tables) is exactly the same.

-- 
Best Regards / Pozdrawiam,
Marek Marczykowski
Invisible Things Lab


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 553 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-22 15:34   ` Marek Marczykowski
@ 2013-03-22 16:56     ` Konrad Rzeszutek Wilk
  2013-03-25 11:36       ` Marek Marczykowski
  0 siblings, 1 reply; 68+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-22 16:56 UTC (permalink / raw)
  To: Marek Marczykowski; +Cc: xen-devel

On Fri, Mar 22, 2013 at 04:34:11PM +0100, Marek Marczykowski wrote:
> On 15.03.2013 14:02, Konrad Rzeszutek Wilk wrote:
> > On Wed, Mar 13, 2013 at 09:50:39PM +0100, Marek Marczykowski wrote:
> >> Hi,
> >>
> >> I've still have problems with ACPI(?) on Xen. After some system startup or
> >> resume CPU temperature goes high although all domUs (and dom0) are idle. On
> >> "good" system startup it is about 50-55C, on "bad" - above 67C (most time
> >> above 70C). I've noticed difference in C-states repored by Xen (attached
> >> files). On "bad" startups in addition suspend doesn't work - system restarts
> >> during suspend (still didn't managed to get console messages - I don't have
> >> serial port on this system). Note that sometimes system boots fine ("good"
> >> state), but problem occurs after some suspend/resume cycles. Some time ago
> >> I've got other symptoms: only CPU0 was used - for all VCPUs (according to xl
> >> vcpu-list). Maybe it is related?
> >>
> >> Hardware: Dell Latitude E6420
> >> CPU: Intel i5-2520M
> >>
> >> Software:
> >> xen stable-4.1 as of 15.02 (last commit: "xen: sched_creadit: improve picking
> >> up the idle CPU for a VCPU"), with reverted commit "Introduce system_state
> >> variable."
> >> But the same problem on vanilla xen 4.1.2.
> >>
> >> Linux 3.7.6 - happens almost every boot. On Linux 3.7.4 happens much rarer
> >> (but still occurs).
> >> Kernel config:
> >> http://git.qubes-os.org/gitweb/?p=marmarek/kernel.git;a=blob;f=config-pvops;h=a6e953f71cdc84556571b592b8af87a5a4f9a8d0;hb=HEAD
> >> I've tried some bisect from 3.7.4 to 3.7.6, but without success because
> >> problem isn't 100% reproducible.
> >>
> >> Any ideas?
> > 
> > That C-states difference is important. The SYSIO part on your box means that the
> > CPU ends up doing an MWAIT. An HALT on the other hand is not so power-saving
> > friendly.
> > 
> > Looking at this:
> >> (XEN) no cpu_id for acpi_id 5
> >> (XEN) no cpu_id for acpi_id 6
> >> (XEN) no cpu_id for acpi_id 7
> >> (XEN) no cpu_id for acpi_id 8
> > 
> > .. means that xen-acpi-processor was trying to probe for the ACPI IDs of the
> > the other CPUs that the machine theoritcally can support. That means it got
> > the ACPI information for the first four CPUs (which is good).
> > 
> > You can as the first step in trying to figure this out, add #define DEBUG 1
> > in xen-acpi-processor.c right before any of the #includes. And also boot
> > Xen with 'cpufreq=verbose'. That should tell you what kind of C-states the
> > xen-acpi-processor uploaded (And if it did it for all of the vCPUS).
> > 
> > If both bootups show that we do upload the C-states for all the CPUs but they
> > vary that means digging a bit deeper in the ACPI code. Specifically in 
> > acpi_processor_get_power_info_cst and seeing if it hits any of the 'continue'.
> > 
> > Then I would say take also the DSDT for both bootups and compare them. It might
> > be that the BIOS is using a scratch register at reboot to construct the C-states
> > and somehow it ends up being corrupted. Which means that on the next warm reboot
> > the C-states has bogus data. This does show up in the field :-(
> 
> Finally I've found some time for further debugging this. And it looks like
> some deeper ACPI code problem...
> 
> I've switched to 3.8.4, on which problem is much easier to reproduce (almost
> every startup).
> 
> On bad bootup, xen-acpi-processor didn't found any C-state: for each CPU
> _pr.flags.power and _pr->power.count was 0 (but flags.power_setup_done=1). In
> this case suspend (or shutdown) always ends up with reset.

This is you booting the machine from a cold-state or a warm one?

There are some BIOSes out there that I know that use the scratchpad registers in
IOH (so depending on the platform that can be 0:0e.1 , Reg 0x84). If Xen or Linux
touch it then the P-states and C-states that the BIOS generates are buggy.

But that is not the case here - you are saying that the DSDT after disassembling
(so cat /sys/firmware/acpi/tables/DSDT, or SSDT* and the iasl -d on them), the
_PSD, _PSS, and _PCT look the same?

You could also look at the FACP table and see if they are different.
> 
> On good one xen-acpi-processor got C1-C3 states for each CPU, then suspend
> succeeded, but after resume CPU0 had C1-C3, but others only C1. Reloading
> xen-acpi-processor (rmmod -f...) fixes this (according to xl debug-key c), but
> still temperature keep high. Regardless of xen-acpi-processor reloading, next
> suspend always fails.

If you reload, and look at the runqeueus, are all of them using the ACPI
idler or the default one?

> 
> Not sure how C-states can be related to S3 suspend, but perhaps something more
> general with ACPI is wrong?

This reminds me of something. I recall a long long time ago seeing something like this....
Completly forgot about this until now. The difference was whether the Xen's cpu_idle 
as running a) the acpi_idle (so using the different C-states), or b) the default one
(so just using HLT).

With the b), during resume it would get half-way through
(http://darnok.org/xen/devel.acpi-s3.v1.serial.log) while with a) it would actually
continue on - http://darnok.org/xen/devel.acpi-s3.v0.serial.log

This was on some MSI MS-7680/H61M-P23 (MS-7680) motherboard.

Oh look: http://lists.xen.org/archives/html/xen-devel/2011-06/msg02059.html

And it looks Kevin's recommendation was use the a) case with max_cstates=1
to narrow it down.

> 
> Each time DSDT (get from /sys/firmware/acpi/tables) is exactly the same.
> 
> -- 
> Best Regards / Pozdrawiam,
> Marek Marczykowski
> Invisible Things Lab
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-22 16:56     ` Konrad Rzeszutek Wilk
@ 2013-03-25 11:36       ` Marek Marczykowski
  2013-03-25 14:17         ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 68+ messages in thread
From: Marek Marczykowski @ 2013-03-25 11:36 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3759 bytes --]

On 22.03.2013 17:56, Konrad Rzeszutek Wilk wrote:
> On Fri, Mar 22, 2013 at 04:34:11PM +0100, Marek Marczykowski wrote:
>> I've switched to 3.8.4, on which problem is much easier to reproduce (almost
>> every startup).
>>
>> On bad bootup, xen-acpi-processor didn't found any C-state: for each CPU
>> _pr.flags.power and _pr->power.count was 0 (but flags.power_setup_done=1). In
>> this case suspend (or shutdown) always ends up with reset.
> 
> This is you booting the machine from a cold-state or a warm one?

Doesn't matter - in both cases the same result.

> There are some BIOSes out there that I know that use the scratchpad registers in
> IOH (so depending on the platform that can be 0:0e.1 , Reg 0x84). If Xen or Linux
> touch it then the P-states and C-states that the BIOS generates are buggy.
> 
> But that is not the case here - you are saying that the DSDT after disassembling
> (so cat /sys/firmware/acpi/tables/DSDT, or SSDT* and the iasl -d on them), the
> _PSD, _PSS, and _PCT look the same?

Binary versions are the same so assume disassembled also. I've copied full
/sys/firmware/acpi/tables at some startups and in all cases (both cold and
warm startups) all were the same.
In case of any noticed difference will check disassembled versions.

> You could also look at the FACP table and see if they are different.
>>
>> On good one xen-acpi-processor got C1-C3 states for each CPU, then suspend
>> succeeded, but after resume CPU0 had C1-C3, but others only C1. Reloading
>> xen-acpi-processor (rmmod -f...) fixes this (according to xl debug-key c), but
>> still temperature keep high. Regardless of xen-acpi-processor reloading, next
>> suspend always fails.
> 
> If you reload, and look at the runqeueus, are all of them using the ACPI
> idler or the default one?

The ACPI one (before reload and after).

>> Not sure how C-states can be related to S3 suspend, but perhaps something more
>> general with ACPI is wrong?
> 
> This reminds me of something. I recall a long long time ago seeing something like this....
> Completly forgot about this until now. The difference was whether the Xen's cpu_idle 
> as running a) the acpi_idle (so using the different C-states), or b) the default one
> (so just using HLT).
> 
> With the b), during resume it would get half-way through
> (http://darnok.org/xen/devel.acpi-s3.v1.serial.log) while with a) it would actually
> continue on - http://darnok.org/xen/devel.acpi-s3.v0.serial.log
> 
> This was on some MSI MS-7680/H61M-P23 (MS-7680) motherboard.
> 
> Oh look: http://lists.xen.org/archives/html/xen-devel/2011-06/msg02059.html
> 
> And it looks Kevin's recommendation was use the a) case with max_cstates=1
> to narrow it down.

When default_idle used, resume doesn't work at all (even the first one). Details:
(1) With max_cstates=1, without xen-acpi-processor module: default_idle used.
Suspend succeed, but always hang at resume.

(2) With max_cstate=1, with xen-acpi-processor module loaded: acpi_idle used.
Suspend succeed, resume also, but after resume above problem exists (high
temperature, C2-C3 states only present on CPU0, subsequent suspends always
ends up with reboot).

(3) Without max_cstate=1, with xen-acpi-processor module loaded: same as (2).

(4) Without max_cstate=1, without xen-acpi-processor module loaded: same as (1).

One more observation: when xen compiled with debug=y, (2) and (4) cases
behaves the same as (1).

Hopefully I will have real serial console somehow in this week and will be
able to get more details from hang and reboot cases.

BTW Any chances for Xen ACPI S3 patches in upstream kernel?

-- 
Best Regards / Pozdrawiam,
Marek Marczykowski
Invisible Things Lab


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 553 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-25 11:36       ` Marek Marczykowski
@ 2013-03-25 14:17         ` Konrad Rzeszutek Wilk
  2013-03-25 14:56           ` Marek Marczykowski
  2013-03-26 12:17           ` Marek Marczykowski
  0 siblings, 2 replies; 68+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-25 14:17 UTC (permalink / raw)
  To: Marek Marczykowski; +Cc: xen-devel

On Mon, Mar 25, 2013 at 12:36:31PM +0100, Marek Marczykowski wrote:
> On 22.03.2013 17:56, Konrad Rzeszutek Wilk wrote:
> > On Fri, Mar 22, 2013 at 04:34:11PM +0100, Marek Marczykowski wrote:
> >> I've switched to 3.8.4, on which problem is much easier to reproduce (almost
> >> every startup).
> >>
> >> On bad bootup, xen-acpi-processor didn't found any C-state: for each CPU
> >> _pr.flags.power and _pr->power.count was 0 (but flags.power_setup_done=1). In
> >> this case suspend (or shutdown) always ends up with reset.
> > 
> > This is you booting the machine from a cold-state or a warm one?
> 
> Doesn't matter - in both cases the same result.
> 
> > There are some BIOSes out there that I know that use the scratchpad registers in
> > IOH (so depending on the platform that can be 0:0e.1 , Reg 0x84). If Xen or Linux
> > touch it then the P-states and C-states that the BIOS generates are buggy.
> > 
> > But that is not the case here - you are saying that the DSDT after disassembling
> > (so cat /sys/firmware/acpi/tables/DSDT, or SSDT* and the iasl -d on them), the
> > _PSD, _PSS, and _PCT look the same?
> 
> Binary versions are the same so assume disassembled also. I've copied full
> /sys/firmware/acpi/tables at some startups and in all cases (both cold and
> warm startups) all were the same.
> In case of any noticed difference will check disassembled versions.

<sigh> Was hoping it was something as simple as that :-)
.. snip..
> > This reminds me of something. I recall a long long time ago seeing something like this....
> > Completly forgot about this until now. The difference was whether the Xen's cpu_idle 
> > as running a) the acpi_idle (so using the different C-states), or b) the default one
> > (so just using HLT).
> > 
> > With the b), during resume it would get half-way through
> > (http://darnok.org/xen/devel.acpi-s3.v1.serial.log) while with a) it would actually
> > continue on - http://darnok.org/xen/devel.acpi-s3.v0.serial.log
> > 
> > This was on some MSI MS-7680/H61M-P23 (MS-7680) motherboard.
> > 
> > Oh look: http://lists.xen.org/archives/html/xen-devel/2011-06/msg02059.html
> > 
> > And it looks Kevin's recommendation was use the a) case with max_cstates=1
> > to narrow it down.
> 
> When default_idle used, resume doesn't work at all (even the first one). Details:
> (1) With max_cstates=1, without xen-acpi-processor module: default_idle used.
> Suspend succeed, but always hang at resume.

AHA! So the bug persist.
> 
> (2) With max_cstate=1, with xen-acpi-processor module loaded: acpi_idle used.
> Suspend succeed, resume also, but after resume above problem exists (high
> temperature, C2-C3 states only present on CPU0, subsequent suspends always
> ends up with reboot).
> 
> (3) Without max_cstate=1, with xen-acpi-processor module loaded: same as (2).
> 
> (4) Without max_cstate=1, without xen-acpi-processor module loaded: same as (1).
> 
> One more observation: when xen compiled with debug=y, (2) and (4) cases
> behaves the same as (1).

Oh, that is something new.
> 
> Hopefully I will have real serial console somehow in this week and will be
> able to get more details from hang and reboot cases.
> 
> BTW Any chances for Xen ACPI S3 patches in upstream kernel?

<sigh> Now that the regression storm of v3.9 has subsided I should have
some breathing room to address that. 


> 
> -- 
> Best Regards / Pozdrawiam,
> Marek Marczykowski
> Invisible Things Lab
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-25 14:17         ` Konrad Rzeszutek Wilk
@ 2013-03-25 14:56           ` Marek Marczykowski
  2013-03-26 12:17           ` Marek Marczykowski
  1 sibling, 0 replies; 68+ messages in thread
From: Marek Marczykowski @ 2013-03-25 14:56 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2821 bytes --]

On 25.03.2013 15:17, Konrad Rzeszutek Wilk wrote:
> On Mon, Mar 25, 2013 at 12:36:31PM +0100, Marek Marczykowski wrote:
>> On 22.03.2013 17:56, Konrad Rzeszutek Wilk wrote:
>>> This reminds me of something. I recall a long long time ago seeing something like this....
>>> Completly forgot about this until now. The difference was whether the Xen's cpu_idle 
>>> as running a) the acpi_idle (so using the different C-states), or b) the default one
>>> (so just using HLT).
>>>
>>> With the b), during resume it would get half-way through
>>> (http://darnok.org/xen/devel.acpi-s3.v1.serial.log) while with a) it would actually
>>> continue on - http://darnok.org/xen/devel.acpi-s3.v0.serial.log
>>>
>>> This was on some MSI MS-7680/H61M-P23 (MS-7680) motherboard.
>>>
>>> Oh look: http://lists.xen.org/archives/html/xen-devel/2011-06/msg02059.html
>>>
>>> And it looks Kevin's recommendation was use the a) case with max_cstates=1
>>> to narrow it down.
>>
>> When default_idle used, resume doesn't work at all (even the first one). Details:
>> (1) With max_cstates=1, without xen-acpi-processor module: default_idle used.
>> Suspend succeed, but always hang at resume.
> 
> AHA! So the bug persist.
>>
>> (2) With max_cstate=1, with xen-acpi-processor module loaded: acpi_idle used.
>> Suspend succeed, resume also, but after resume above problem exists (high
>> temperature, C2-C3 states only present on CPU0, subsequent suspends always
>> ends up with reboot).
>>
>> (3) Without max_cstate=1, with xen-acpi-processor module loaded: same as (2).
>>
>> (4) Without max_cstate=1, without xen-acpi-processor module loaded: same as (1).
>>
>> One more observation: when xen compiled with debug=y, (2) and (4) cases
>> behaves the same as (1).
> 
> Oh, that is something new.

I've tried also some (automated :) ) bisection on xen from 4.1.2 to 4.1.4, but
unfortunately results wasn't deterministic... My script don't distinguish
different symptoms (reboot at suspend, hang at resume, incomplete C-states
after resume, etc), so this can be reason for such non-deterministic results...

One time I've got this commit as first bad:
commit 329d4280255ff44300913f24119f52d3459c1ed0
Author: Jan Beulich <jbeulich@suse.com>
Date:   Tue Apr 17 08:33:33 2012 +0100

    XENPF_set_processor_pminfo XEN_PM_CX overflows states array

Maybe related?

>>
>> Hopefully I will have real serial console somehow in this week and will be
>> able to get more details from hang and reboot cases.
>>
>> BTW Any chances for Xen ACPI S3 patches in upstream kernel?
> 
> <sigh> Now that the regression storm of v3.9 has subsided I should have
> some breathing room to address that. 

I keep fingers crossed.

-- 
Best Regards / Pozdrawiam,
Marek Marczykowski
Invisible Things Lab


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 553 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-25 14:17         ` Konrad Rzeszutek Wilk
  2013-03-25 14:56           ` Marek Marczykowski
@ 2013-03-26 12:17           ` Marek Marczykowski
  2013-03-26 13:11             ` Jan Beulich
  1 sibling, 1 reply; 68+ messages in thread
From: Marek Marczykowski @ 2013-03-26 12:17 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 4093 bytes --]

On 25.03.2013 15:17, Konrad Rzeszutek Wilk wrote:
> On Mon, Mar 25, 2013 at 12:36:31PM +0100, Marek Marczykowski wrote:
>> On 22.03.2013 17:56, Konrad Rzeszutek Wilk wrote:
>>> On Fri, Mar 22, 2013 at 04:34:11PM +0100, Marek Marczykowski wrote:
>>> This reminds me of something. I recall a long long time ago seeing something like this....
>>> Completly forgot about this until now. The difference was whether the Xen's cpu_idle 
>>> as running a) the acpi_idle (so using the different C-states), or b) the default one
>>> (so just using HLT).
>>>
>>> With the b), during resume it would get half-way through
>>> (http://darnok.org/xen/devel.acpi-s3.v1.serial.log) while with a) it would actually
>>> continue on - http://darnok.org/xen/devel.acpi-s3.v0.serial.log
>>>
>>> This was on some MSI MS-7680/H61M-P23 (MS-7680) motherboard.
>>>
>>> Oh look: http://lists.xen.org/archives/html/xen-devel/2011-06/msg02059.html
>>>
>>> And it looks Kevin's recommendation was use the a) case with max_cstates=1
>>> to narrow it down.
>>
>> When default_idle used, resume doesn't work at all (even the first one). Details:
>> (1) With max_cstates=1, without xen-acpi-processor module: default_idle used.
>> Suspend succeed, but always hang at resume.
> 
> AHA! So the bug persist.
>>
>> (2) With max_cstate=1, with xen-acpi-processor module loaded: acpi_idle used.
>> Suspend succeed, resume also, but after resume above problem exists (high
>> temperature, C2-C3 states only present on CPU0, subsequent suspends always
>> ends up with reboot).
>>
>> (3) Without max_cstate=1, with xen-acpi-processor module loaded: same as (2).
>>
>> (4) Without max_cstate=1, without xen-acpi-processor module loaded: same as (1).
>>
>> One more observation: when xen compiled with debug=y, (2) and (4) cases
>> behaves the same as (1).
> 
> Oh, that is something new.

Finally got serial console :)
The debug=y problem is (actually at resume):
(XEN) Assertion 'test_bit(vector, cfg->used_vectors)' failed at io_apic.c:542
(XEN) ----[ Xen-4.1.5-rc1  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82c48015e288>] smp_irq_move_cleanup_interrupt+0x1c3/0x23d
(XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: 00000000000000e9   rcx: ffff82c48029ff18
(XEN) rdx: 00000000000000e9   rsi: 000000000000002a   rdi: ffff830421060538
(XEN) rbp: ffff82c48029ff08   rsp: ffff82c48029feb8   r8:  ffff88041820eb60
(XEN) r9:  0000000000000000   r10: 0000000000007ff0   r11: 0000000000000000
(XEN) r12: ffff830421080250   r13: ffff830421060534   r14: ffff82c48029ff18
(XEN) r15: ffff82c4802dd9e0   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 0000000300b81000   cr2: ffff880402070198
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen stack trace from rsp=ffff82c48029feb8:
(XEN)    0000000000000000 000000000000e030 ffff82c48029ff18 ffff82c4802dd9e0
(XEN)    ffff8802cac3c7c0 00000000ffff3729 00000000ffff3729 000000013fff3728
(XEN)    ffffffff81b907c0 00000000ffff3729 00007d3b7fd600c7 ffff82c48014de60
(XEN)    00000000ffff3729 ffffffff81b907c0 000000013fff3728 00000000ffff3729
(XEN)    ffffffff81a01e18 00000000ffff3729 0000000000000000 0000000000007ff0
(XEN)    0000000000000000 ffff88041820eb60 ffff8803fd1820a8 ffffffff81b90a88
(XEN)    000000000000002a 000000000000002a 00000000ffff372a 0000002000000000
(XEN)    ffffffff8105dd5a 000000000000e033 0000000000000246 ffffffff81a01db8
(XEN)    000000000000e02b 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 ffff8300ca9a0000 0000000000000000
(XEN)    0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82c48015e288>] smp_irq_move_cleanup_interrupt+0x1c3/0x23d
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion 'test_bit(vector, cfg->used_vectors)' failed at io_apic.c:542
(XEN) ****************************************


-- 
Best Regards / Pozdrawiam,
Marek Marczykowski
Invisible Things Lab


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 553 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-26 12:17           ` Marek Marczykowski
@ 2013-03-26 13:11             ` Jan Beulich
  2013-03-26 13:50               ` Marek Marczykowski
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Beulich @ 2013-03-26 13:11 UTC (permalink / raw)
  To: Marek Marczykowski; +Cc: Konrad Rzeszutek Wilk, xen-devel

>>> On 26.03.13 at 13:17, Marek Marczykowski <marmarek@invisiblethingslab.com> wrote:
> Finally got serial console :)
> The debug=y problem is (actually at resume):
> (XEN) Assertion 'test_bit(vector, cfg->used_vectors)' failed at io_apic.c:542
> (XEN) ----[ Xen-4.1.5-rc1  x86_64  debug=y  Tainted:    C ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff82c48015e288>] 
> smp_irq_move_cleanup_interrupt+0x1c3/0x23d
> (XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
> (XEN) rax: 0000000000000000   rbx: 00000000000000e9   rcx: ffff82c48029ff18
> (XEN) rdx: 00000000000000e9   rsi: 000000000000002a   rdi: ffff830421060538
> (XEN) rbp: ffff82c48029ff08   rsp: ffff82c48029feb8   r8:  ffff88041820eb60
> (XEN) r9:  0000000000000000   r10: 0000000000007ff0   r11: 0000000000000000
> (XEN) r12: ffff830421080250   r13: ffff830421060534   r14: ffff82c48029ff18
> (XEN) r15: ffff82c4802dd9e0   cr0: 000000008005003b   cr4: 00000000000026f0
> (XEN) cr3: 0000000300b81000   cr2: ffff880402070198
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) Xen stack trace from rsp=ffff82c48029feb8:
> (XEN)    0000000000000000 000000000000e030 ffff82c48029ff18 ffff82c4802dd9e0
> (XEN)    ffff8802cac3c7c0 00000000ffff3729 00000000ffff3729 000000013fff3728
> (XEN)    ffffffff81b907c0 00000000ffff3729 00007d3b7fd600c7 ffff82c48014de60
> (XEN)    00000000ffff3729 ffffffff81b907c0 000000013fff3728 00000000ffff3729
> (XEN)    ffffffff81a01e18 00000000ffff3729 0000000000000000 0000000000007ff0
> (XEN)    0000000000000000 ffff88041820eb60 ffff8803fd1820a8 ffffffff81b90a88
> (XEN)    000000000000002a 000000000000002a 00000000ffff372a 0000002000000000
> (XEN)    ffffffff8105dd5a 000000000000e033 0000000000000246 ffffffff81a01db8
> (XEN)    000000000000e02b 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 ffff8300ca9a0000 0000000000000000
> (XEN)    0000000000000000
> (XEN) Xen call trace:
> (XEN)    [<ffff82c48015e288>] smp_irq_move_cleanup_interrupt+0x1c3/0x23d
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) Assertion 'test_bit(vector, cfg->used_vectors)' failed at io_apic.c:542
> (XEN) ****************************************

To make sense of this, we need to know the register (and maybe
stack) allocation at this point, to know which vector it was that
triggered the assertion. You can either do this analysis for us, or
point us at the xen-syms binary matching the xen.gz you used.

>From the register values, the most likely candidates are vector 0xe9
and 0x2a. The former having two registers set to this value seems
more likely from than angle, but vectors in the 0xe? range should
never end up in smp_irq_move_cleanup_interrupt().

And if it's the 0x2a one, then we'd need to know what IRQ it was
last used for. That can't be reconstructed from the data above, so
would require you being able to reproduce this and adding some
instrumentation to the code.

Jan

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-26 13:11             ` Jan Beulich
@ 2013-03-26 13:50               ` Marek Marczykowski
  2013-03-26 15:47                 ` Andrew Cooper
  2013-03-26 16:03                 ` Jan Beulich
  0 siblings, 2 replies; 68+ messages in thread
From: Marek Marczykowski @ 2013-03-26 13:50 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Konrad Rzeszutek Wilk, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3323 bytes --]

On 26.03.2013 14:11, Jan Beulich wrote:
>>>> On 26.03.13 at 13:17, Marek Marczykowski <marmarek@invisiblethingslab.com> wrote:
>> Finally got serial console :)
>> The debug=y problem is (actually at resume):
>> (XEN) Assertion 'test_bit(vector, cfg->used_vectors)' failed at io_apic.c:542
>> (XEN) ----[ Xen-4.1.5-rc1  x86_64  debug=y  Tainted:    C ]----
>> (XEN) CPU:    0
>> (XEN) RIP:    e008:[<ffff82c48015e288>] 
>> smp_irq_move_cleanup_interrupt+0x1c3/0x23d
>> (XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
>> (XEN) rax: 0000000000000000   rbx: 00000000000000e9   rcx: ffff82c48029ff18
>> (XEN) rdx: 00000000000000e9   rsi: 000000000000002a   rdi: ffff830421060538
>> (XEN) rbp: ffff82c48029ff08   rsp: ffff82c48029feb8   r8:  ffff88041820eb60
>> (XEN) r9:  0000000000000000   r10: 0000000000007ff0   r11: 0000000000000000
>> (XEN) r12: ffff830421080250   r13: ffff830421060534   r14: ffff82c48029ff18
>> (XEN) r15: ffff82c4802dd9e0   cr0: 000000008005003b   cr4: 00000000000026f0
>> (XEN) cr3: 0000000300b81000   cr2: ffff880402070198
>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>> (XEN) Xen stack trace from rsp=ffff82c48029feb8:
>> (XEN)    0000000000000000 000000000000e030 ffff82c48029ff18 ffff82c4802dd9e0
>> (XEN)    ffff8802cac3c7c0 00000000ffff3729 00000000ffff3729 000000013fff3728
>> (XEN)    ffffffff81b907c0 00000000ffff3729 00007d3b7fd600c7 ffff82c48014de60
>> (XEN)    00000000ffff3729 ffffffff81b907c0 000000013fff3728 00000000ffff3729
>> (XEN)    ffffffff81a01e18 00000000ffff3729 0000000000000000 0000000000007ff0
>> (XEN)    0000000000000000 ffff88041820eb60 ffff8803fd1820a8 ffffffff81b90a88
>> (XEN)    000000000000002a 000000000000002a 00000000ffff372a 0000002000000000
>> (XEN)    ffffffff8105dd5a 000000000000e033 0000000000000246 ffffffff81a01db8
>> (XEN)    000000000000e02b 0000000000000000 0000000000000000 0000000000000000
>> (XEN)    0000000000000000 0000000000000000 ffff8300ca9a0000 0000000000000000
>> (XEN)    0000000000000000
>> (XEN) Xen call trace:
>> (XEN)    [<ffff82c48015e288>] smp_irq_move_cleanup_interrupt+0x1c3/0x23d
>> (XEN)
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 0:
>> (XEN) Assertion 'test_bit(vector, cfg->used_vectors)' failed at io_apic.c:542
>> (XEN) ****************************************
> 
> To make sense of this, we need to know the register (and maybe
> stack) allocation at this point, to know which vector it was that
> triggered the assertion. You can either do this analysis for us, or
> point us at the xen-syms binary matching the xen.gz you used.

"info scope smp_irq_move_cleanup_interrupt" said vector is in %rbx, so 0xe9.

> From the register values, the most likely candidates are vector 0xe9
> and 0x2a. The former having two registers set to this value seems
> more likely from than angle, but vectors in the 0xe? range should
> never end up in smp_irq_move_cleanup_interrupt().
> 
> And if it's the 0x2a one, then we'd need to know what IRQ it was
> last used for. That can't be reconstructed from the data above, so
> would require you being able to reproduce this and adding some
> instrumentation to the code.
> 
> Jan
> 


-- 
Best Regards / Pozdrawiam,
Marek Marczykowski
Invisible Things Lab


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 553 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-26 13:50               ` Marek Marczykowski
@ 2013-03-26 15:47                 ` Andrew Cooper
  2013-03-26 16:12                   ` Andrew Cooper
  2013-03-26 16:03                 ` Jan Beulich
  1 sibling, 1 reply; 68+ messages in thread
From: Andrew Cooper @ 2013-03-26 15:47 UTC (permalink / raw)
  To: Marek Marczykowski; +Cc: xen-devel, Jan Beulich, Konrad Rzeszutek Wilk

On 26/03/2013 13:50, Marek Marczykowski wrote:
> On 26.03.2013 14:11, Jan Beulich wrote:
>>>>> On 26.03.13 at 13:17, Marek Marczykowski <marmarek@invisiblethingslab.com> wrote:
>>> Finally got serial console :)
>>> The debug=y problem is (actually at resume):
>>> (XEN) Assertion 'test_bit(vector, cfg->used_vectors)' failed at io_apic.c:542
>>> (XEN) ----[ Xen-4.1.5-rc1  x86_64  debug=y  Tainted:    C ]----
>>> (XEN) CPU:    0
>>> (XEN) RIP:    e008:[<ffff82c48015e288>] 
>>> smp_irq_move_cleanup_interrupt+0x1c3/0x23d
>>> (XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
>>> (XEN) rax: 0000000000000000   rbx: 00000000000000e9   rcx: ffff82c48029ff18
>>> (XEN) rdx: 00000000000000e9   rsi: 000000000000002a   rdi: ffff830421060538
>>> (XEN) rbp: ffff82c48029ff08   rsp: ffff82c48029feb8   r8:  ffff88041820eb60
>>> (XEN) r9:  0000000000000000   r10: 0000000000007ff0   r11: 0000000000000000
>>> (XEN) r12: ffff830421080250   r13: ffff830421060534   r14: ffff82c48029ff18
>>> (XEN) r15: ffff82c4802dd9e0   cr0: 000000008005003b   cr4: 00000000000026f0
>>> (XEN) cr3: 0000000300b81000   cr2: ffff880402070198
>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>>> (XEN) Xen stack trace from rsp=ffff82c48029feb8:
>>> (XEN)    0000000000000000 000000000000e030 ffff82c48029ff18 ffff82c4802dd9e0
>>> (XEN)    ffff8802cac3c7c0 00000000ffff3729 00000000ffff3729 000000013fff3728
>>> (XEN)    ffffffff81b907c0 00000000ffff3729 00007d3b7fd600c7 ffff82c48014de60
>>> (XEN)    00000000ffff3729 ffffffff81b907c0 000000013fff3728 00000000ffff3729
>>> (XEN)    ffffffff81a01e18 00000000ffff3729 0000000000000000 0000000000007ff0
>>> (XEN)    0000000000000000 ffff88041820eb60 ffff8803fd1820a8 ffffffff81b90a88
>>> (XEN)    000000000000002a 000000000000002a 00000000ffff372a 0000002000000000
>>> (XEN)    ffffffff8105dd5a 000000000000e033 0000000000000246 ffffffff81a01db8
>>> (XEN)    000000000000e02b 0000000000000000 0000000000000000 0000000000000000
>>> (XEN)    0000000000000000 0000000000000000 ffff8300ca9a0000 0000000000000000
>>> (XEN)    0000000000000000
>>> (XEN) Xen call trace:
>>> (XEN)    [<ffff82c48015e288>] smp_irq_move_cleanup_interrupt+0x1c3/0x23d
>>> (XEN)
>>> (XEN)
>>> (XEN) ****************************************
>>> (XEN) Panic on CPU 0:
>>> (XEN) Assertion 'test_bit(vector, cfg->used_vectors)' failed at io_apic.c:542
>>> (XEN) ****************************************
>> To make sense of this, we need to know the register (and maybe
>> stack) allocation at this point, to know which vector it was that
>> triggered the assertion. You can either do this analysis for us, or
>> point us at the xen-syms binary matching the xen.gz you used.
> "info scope smp_irq_move_cleanup_interrupt" said vector is in %rbx, so 0xe9.
>
>> From the register values, the most likely candidates are vector 0xe9
>> and 0x2a. The former having two registers set to this value seems
>> more likely from than angle, but vectors in the 0xe? range should
>> never end up in smp_irq_move_cleanup_interrupt().
>>
>> And if it's the 0x2a one, then we'd need to know what IRQ it was
>> last used for. That can't be reconstructed from the data above, so
>> would require you being able to reproduce this and adding some
>> instrumentation to the code.
>>
>> Jan
>>
>

Could it be something to do with switching virtual wire mode, and having
PIC compatibility stuff left in the IO-APIC after leaving the BIOS but
before starting back up again?

Looking at the stack dump, there is an extra exception frame under what
is printed by the assertion failure.

0000002000000000 TRAP_syscall
ffffffff81a01db8 guest kernel addr
0000000000000246 FLAGS
000000000000e033 FLAT_RING3_CS64
ffffffff8105dd5a guest kernel addr
000000000000e02b FLAT_RING3_SS{64,32}

So it appears that we are already executing a guest (presumably dom0) by the time this assertion occurs.  From the serial, is there any indication that dom0 has started up again?

I would have thought that we should have successfully reset the IO-APIC back up properly before we would ever get back around to executing dom0.

~Andrew

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-26 13:50               ` Marek Marczykowski
  2013-03-26 15:47                 ` Andrew Cooper
@ 2013-03-26 16:03                 ` Jan Beulich
  2013-03-26 16:45                   ` Marek Marczykowski
  1 sibling, 1 reply; 68+ messages in thread
From: Jan Beulich @ 2013-03-26 16:03 UTC (permalink / raw)
  To: Marek Marczykowski; +Cc: Konrad Rzeszutek Wilk, xen-devel

>>> On 26.03.13 at 14:50, Marek Marczykowski <marmarek@invisiblethingslab.com>
wrote:
> On 26.03.2013 14:11, Jan Beulich wrote:
>>>>> On 26.03.13 at 13:17, Marek Marczykowski <marmarek@invisiblethingslab.com> 
> wrote:
>>> Finally got serial console :)
>>> The debug=y problem is (actually at resume):
>>> (XEN) Assertion 'test_bit(vector, cfg->used_vectors)' failed at io_apic.c:542
>>> (XEN) ----[ Xen-4.1.5-rc1  x86_64  debug=y  Tainted:    C ]----
>>> (XEN) CPU:    0
>>> (XEN) RIP:    e008:[<ffff82c48015e288>] 
>>> smp_irq_move_cleanup_interrupt+0x1c3/0x23d
>>> (XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
>>> (XEN) rax: 0000000000000000   rbx: 00000000000000e9   rcx: ffff82c48029ff18
>>> (XEN) rdx: 00000000000000e9   rsi: 000000000000002a   rdi: ffff830421060538
>>> (XEN) rbp: ffff82c48029ff08   rsp: ffff82c48029feb8   r8:  ffff88041820eb60
>>> (XEN) r9:  0000000000000000   r10: 0000000000007ff0   r11: 0000000000000000
>>> (XEN) r12: ffff830421080250   r13: ffff830421060534   r14: ffff82c48029ff18
>>> (XEN) r15: ffff82c4802dd9e0   cr0: 000000008005003b   cr4: 00000000000026f0
>>> (XEN) cr3: 0000000300b81000   cr2: ffff880402070198
>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>>> (XEN) Xen stack trace from rsp=ffff82c48029feb8:
>>> (XEN)    0000000000000000 000000000000e030 ffff82c48029ff18 ffff82c4802dd9e0
>>> (XEN)    ffff8802cac3c7c0 00000000ffff3729 00000000ffff3729 000000013fff3728
>>> (XEN)    ffffffff81b907c0 00000000ffff3729 00007d3b7fd600c7 ffff82c48014de60
>>> (XEN)    00000000ffff3729 ffffffff81b907c0 000000013fff3728 00000000ffff3729
>>> (XEN)    ffffffff81a01e18 00000000ffff3729 0000000000000000 0000000000007ff0
>>> (XEN)    0000000000000000 ffff88041820eb60 ffff8803fd1820a8 ffffffff81b90a88
>>> (XEN)    000000000000002a 000000000000002a 00000000ffff372a 0000002000000000
>>> (XEN)    ffffffff8105dd5a 000000000000e033 0000000000000246 ffffffff81a01db8
>>> (XEN)    000000000000e02b 0000000000000000 0000000000000000 0000000000000000
>>> (XEN)    0000000000000000 0000000000000000 ffff8300ca9a0000 0000000000000000
>>> (XEN)    0000000000000000
>>> (XEN) Xen call trace:
>>> (XEN)    [<ffff82c48015e288>] smp_irq_move_cleanup_interrupt+0x1c3/0x23d
>>> (XEN)
>>> (XEN)
>>> (XEN) ****************************************
>>> (XEN) Panic on CPU 0:
>>> (XEN) Assertion 'test_bit(vector, cfg->used_vectors)' failed at io_apic.c:542
>>> (XEN) ****************************************
>> 
>> To make sense of this, we need to know the register (and maybe
>> stack) allocation at this point, to know which vector it was that
>> triggered the assertion. You can either do this analysis for us, or
>> point us at the xen-syms binary matching the xen.gz you used.
> 
> "info scope smp_irq_move_cleanup_interrupt" said vector is in %rbx, so 0xe9.

And that system isn't using a strange mixed mode IO-APIC/legacy
PIC model, where particularly IRQ 9 (usually ACPI SCI) gets
channeled through the legacy PIC?

Could you attach the complete log, ideally with 'i' output logged
right before suspending?

Is this reproducible with 4.2.x or 4.3-unstable? If not, but if readily
reproducible with 4.1.5-rc1, could you try changing the containing
loop's upper bound from "< NR_VECTORS" to
"<= LAST_DYNAMIC_VECTOR"?

Jan

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-26 15:47                 ` Andrew Cooper
@ 2013-03-26 16:12                   ` Andrew Cooper
  2013-03-26 16:47                     ` Marek Marczykowski
  0 siblings, 1 reply; 68+ messages in thread
From: Andrew Cooper @ 2013-03-26 16:12 UTC (permalink / raw)
  To: Marek Marczykowski; +Cc: Konrad Rzeszutek Wilk, Jan Beulich, xen-devel

On 26/03/2013 15:47, Andrew Cooper wrote:
> On 26/03/2013 13:50, Marek Marczykowski wrote:
>> On 26.03.2013 14:11, Jan Beulich wrote:
>>>>>> On 26.03.13 at 13:17, Marek Marczykowski <marmarek@invisiblethingslab.com> wrote:
>>>> Finally got serial console :)
>>>> The debug=y problem is (actually at resume):
>>>> (XEN) Assertion 'test_bit(vector, cfg->used_vectors)' failed at io_apic.c:542
>>>> (XEN) ----[ Xen-4.1.5-rc1  x86_64  debug=y  Tainted:    C ]----
>>>> (XEN) CPU:    0
>>>> (XEN) RIP:    e008:[<ffff82c48015e288>] 
>>>> smp_irq_move_cleanup_interrupt+0x1c3/0x23d
>>>> (XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
>>>> (XEN) rax: 0000000000000000   rbx: 00000000000000e9   rcx: ffff82c48029ff18
>>>> (XEN) rdx: 00000000000000e9   rsi: 000000000000002a   rdi: ffff830421060538
>>>> (XEN) rbp: ffff82c48029ff08   rsp: ffff82c48029feb8   r8:  ffff88041820eb60
>>>> (XEN) r9:  0000000000000000   r10: 0000000000007ff0   r11: 0000000000000000
>>>> (XEN) r12: ffff830421080250   r13: ffff830421060534   r14: ffff82c48029ff18
>>>> (XEN) r15: ffff82c4802dd9e0   cr0: 000000008005003b   cr4: 00000000000026f0
>>>> (XEN) cr3: 0000000300b81000   cr2: ffff880402070198
>>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>>>> (XEN) Xen stack trace from rsp=ffff82c48029feb8:
>>>> (XEN)    0000000000000000 000000000000e030 ffff82c48029ff18 ffff82c4802dd9e0
>>>> (XEN)    ffff8802cac3c7c0 00000000ffff3729 00000000ffff3729 000000013fff3728
>>>> (XEN)    ffffffff81b907c0 00000000ffff3729 00007d3b7fd600c7 ffff82c48014de60
>>>> (XEN)    00000000ffff3729 ffffffff81b907c0 000000013fff3728 00000000ffff3729
>>>> (XEN)    ffffffff81a01e18 00000000ffff3729 0000000000000000 0000000000007ff0
>>>> (XEN)    0000000000000000 ffff88041820eb60 ffff8803fd1820a8 ffffffff81b90a88
>>>> (XEN)    000000000000002a 000000000000002a 00000000ffff372a 0000002000000000
>>>> (XEN)    ffffffff8105dd5a 000000000000e033 0000000000000246 ffffffff81a01db8
>>>> (XEN)    000000000000e02b 0000000000000000 0000000000000000 0000000000000000
>>>> (XEN)    0000000000000000 0000000000000000 ffff8300ca9a0000 0000000000000000
>>>> (XEN)    0000000000000000
>>>> (XEN) Xen call trace:
>>>> (XEN)    [<ffff82c48015e288>] smp_irq_move_cleanup_interrupt+0x1c3/0x23d
>>>> (XEN)
>>>> (XEN)
>>>> (XEN) ****************************************
>>>> (XEN) Panic on CPU 0:
>>>> (XEN) Assertion 'test_bit(vector, cfg->used_vectors)' failed at io_apic.c:542
>>>> (XEN) ****************************************
>>> To make sense of this, we need to know the register (and maybe
>>> stack) allocation at this point, to know which vector it was that
>>> triggered the assertion. You can either do this analysis for us, or
>>> point us at the xen-syms binary matching the xen.gz you used.
>> "info scope smp_irq_move_cleanup_interrupt" said vector is in %rbx, so 0xe9.
>>
>>> From the register values, the most likely candidates are vector 0xe9
>>> and 0x2a. The former having two registers set to this value seems
>>> more likely from than angle, but vectors in the 0xe? range should
>>> never end up in smp_irq_move_cleanup_interrupt().
>>>
>>> And if it's the 0x2a one, then we'd need to know what IRQ it was
>>> last used for. That can't be reconstructed from the data above, so
>>> would require you being able to reproduce this and adding some
>>> instrumentation to the code.
>>>
>>> Jan
>>>
> Could it be something to do with switching virtual wire mode, and having
> PIC compatibility stuff left in the IO-APIC after leaving the BIOS but
> before starting back up again?
>
> Looking at the stack dump, there is an extra exception frame under what
> is printed by the assertion failure.
>
> 0000002000000000 TRAP_syscall

Apologies - this is a vector 0x20 interrupt, not TRAP_syscall, which
makes sense as 0x20 is FIRST_DYNAMIC_IRQ which is also the cleanup IPI
vector.

The other comments still stand, espcially as we appear to be
interrupting dom0 which is already running.

~Andrew

> ffffffff81a01db8 guest kernel addr
> 0000000000000246 FLAGS
> 000000000000e033 FLAT_RING3_CS64
> ffffffff8105dd5a guest kernel addr
> 000000000000e02b FLAT_RING3_SS{64,32}
>
> So it appears that we are already executing a guest (presumably dom0) by the time this assertion occurs.  From the serial, is there any indication that dom0 has started up again?
>
> I would have thought that we should have successfully reset the IO-APIC back up properly before we would ever get back around to executing dom0.
>
> ~Andrew
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-26 16:03                 ` Jan Beulich
@ 2013-03-26 16:45                   ` Marek Marczykowski
  2013-03-26 17:02                     ` Andrew Cooper
  0 siblings, 1 reply; 68+ messages in thread
From: Marek Marczykowski @ 2013-03-26 16:45 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Konrad Rzeszutek Wilk, xen-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 3705 bytes --]

On 26.03.2013 17:03, Jan Beulich wrote:
>>>> On 26.03.13 at 14:50, Marek Marczykowski <marmarek@invisiblethingslab.com>
> wrote:
>> On 26.03.2013 14:11, Jan Beulich wrote:
>>>>>> On 26.03.13 at 13:17, Marek Marczykowski <marmarek@invisiblethingslab.com> 
>> wrote:
>>>> Finally got serial console :)
>>>> The debug=y problem is (actually at resume):
>>>> (XEN) Assertion 'test_bit(vector, cfg->used_vectors)' failed at io_apic.c:542
>>>> (XEN) ----[ Xen-4.1.5-rc1  x86_64  debug=y  Tainted:    C ]----
>>>> (XEN) CPU:    0
>>>> (XEN) RIP:    e008:[<ffff82c48015e288>] 
>>>> smp_irq_move_cleanup_interrupt+0x1c3/0x23d
>>>> (XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
>>>> (XEN) rax: 0000000000000000   rbx: 00000000000000e9   rcx: ffff82c48029ff18
>>>> (XEN) rdx: 00000000000000e9   rsi: 000000000000002a   rdi: ffff830421060538
>>>> (XEN) rbp: ffff82c48029ff08   rsp: ffff82c48029feb8   r8:  ffff88041820eb60
>>>> (XEN) r9:  0000000000000000   r10: 0000000000007ff0   r11: 0000000000000000
>>>> (XEN) r12: ffff830421080250   r13: ffff830421060534   r14: ffff82c48029ff18
>>>> (XEN) r15: ffff82c4802dd9e0   cr0: 000000008005003b   cr4: 00000000000026f0
>>>> (XEN) cr3: 0000000300b81000   cr2: ffff880402070198
>>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>>>> (XEN) Xen stack trace from rsp=ffff82c48029feb8:
>>>> (XEN)    0000000000000000 000000000000e030 ffff82c48029ff18 ffff82c4802dd9e0
>>>> (XEN)    ffff8802cac3c7c0 00000000ffff3729 00000000ffff3729 000000013fff3728
>>>> (XEN)    ffffffff81b907c0 00000000ffff3729 00007d3b7fd600c7 ffff82c48014de60
>>>> (XEN)    00000000ffff3729 ffffffff81b907c0 000000013fff3728 00000000ffff3729
>>>> (XEN)    ffffffff81a01e18 00000000ffff3729 0000000000000000 0000000000007ff0
>>>> (XEN)    0000000000000000 ffff88041820eb60 ffff8803fd1820a8 ffffffff81b90a88
>>>> (XEN)    000000000000002a 000000000000002a 00000000ffff372a 0000002000000000
>>>> (XEN)    ffffffff8105dd5a 000000000000e033 0000000000000246 ffffffff81a01db8
>>>> (XEN)    000000000000e02b 0000000000000000 0000000000000000 0000000000000000
>>>> (XEN)    0000000000000000 0000000000000000 ffff8300ca9a0000 0000000000000000
>>>> (XEN)    0000000000000000
>>>> (XEN) Xen call trace:
>>>> (XEN)    [<ffff82c48015e288>] smp_irq_move_cleanup_interrupt+0x1c3/0x23d
>>>> (XEN)
>>>> (XEN)
>>>> (XEN) ****************************************
>>>> (XEN) Panic on CPU 0:
>>>> (XEN) Assertion 'test_bit(vector, cfg->used_vectors)' failed at io_apic.c:542
>>>> (XEN) ****************************************
>>>
>>> To make sense of this, we need to know the register (and maybe
>>> stack) allocation at this point, to know which vector it was that
>>> triggered the assertion. You can either do this analysis for us, or
>>> point us at the xen-syms binary matching the xen.gz you used.
>>
>> "info scope smp_irq_move_cleanup_interrupt" said vector is in %rbx, so 0xe9.
> 
> And that system isn't using a strange mixed mode IO-APIC/legacy
> PIC model, where particularly IRQ 9 (usually ACPI SCI) gets
> channeled through the legacy PIC?

I don't know...

> Could you attach the complete log, ideally with 'i' output logged
> right before suspending?

Sure, attached.

> Is this reproducible with 4.2.x or 4.3-unstable? If not, but if readily
> reproducible with 4.1.5-rc1, could you try changing the containing
> loop's upper bound from "< NR_VECTORS" to
> "<= LAST_DYNAMIC_VECTOR"?

I've tried 4.2.x some time ago and bug also exists there (but I had not
console, so not sure if exactly the same). 4.3 seems to be not affected.

-- 
Best Regards / Pozdrawiam,
Marek Marczykowski
Invisible Things Lab

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.1.2: console-failed-resume.log --]
[-- Type: text/x-log; name="console-failed-resume.log", Size: 85172 bytes --]

 __  __            _  _    _   ____              _ 
 \ \/ /___ _ __   | || |  / | | ___|    _ __ ___/ |
  \  // _ \ '_ \  | || |_ | | |___ \ __| '__/ __| |
  /  \  __/ | | | |__   _|| |_ ___) |__| | | (__| |
 /_/\_\___|_| |_|    |_|(_)_(_)____/   |_|  \___|_|
                                                   
(XEN) Xen version 4.1.5-rc1 (marmarek@marmarek.net) (gcc version 4.7.2 20120921 (Red Hat 4.7.2-2) (GCC) ) Tue Mar 26 13:07:36 CET 2013
(XEN) Latest ChangeSet: unavailable
(XEN) Console output is synchronous.
(XEN) Bootloader: PXELINUX 4.05 2011-12-09 
(XEN) Command line: cpufreq=verbose loglvl=all guest_loglvl=all com1=115200,8n1 sync_console console=com1,vga
(XEN) Video information:
(XEN)  VGA is text mode 80x25, font 8x16
(XEN)  VBE/DDC methods: V2; EDID transfer time: 1 seconds
(XEN) Disc information:
(XEN)  Found 1 MBR signatures
(XEN)  Found 1 EDD information structures
(XEN) Xen-e820 RAM map:
(XEN)  0000000000000000 - 000000000009a800 (usable)
(XEN)  000000000009a800 - 00000000000a0000 (reserved)
(XEN)  00000000000e0000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 0000000020000000 (usable)
(XEN)  0000000020000000 - 0000000020200000 (reserved)
(XEN)  0000000020200000 - 0000000040000000 (usable)
(XEN)  0000000040000000 - 0000000040200000 (reserved)
(XEN)  0000000040200000 - 00000000ca61e000 (usable)
(XEN)  00000000ca61e000 - 00000000ca662000 (reserved)
(XEN)  00000000ca662000 - 00000000ca9b7000 (usable)
(XEN)  00000000ca9b7000 - 00000000ca9e7000 (reserved)
(XEN)  00000000ca9e7000 - 00000000cabe7000 (ACPI NVS)
(XEN)  00000000cabe7000 - 00000000cabff000 (ACPI data)
(XEN)  00000000cabff000 - 00000000cac00000 (usable)
(XEN)  00000000cb800000 - 00000000cfa00000 (reserved)
(XEN)  00000000fed1c000 - 00000000fed20000 (reserved)
(XEN)  00000000ffc00000 - 00000000ffc20000 (reserved)
(XEN)  0000000100000000 - 000000042e000000 (usable)
(XEN) ACPI: RSDP 000FE300, 0024 (r2 DELL  )
(XEN) ACPI: XSDT CABFDE18, 007C (r1 DELL    CBX3     6222004 MSFT    10013)
(XEN) ACPI: FACP CAB77D98, 00F4 (r4 DELL    CBX3     6222004 MSFT    10013)
(XEN) ACPI: DSDT CAB45018, 885D (r2 INT430 SYSFexxx     1001 INTL 20090903)
(XEN) ACPI: FACS CABD4D40, 0040
(XEN) ACPI: APIC CABFCF18, 00CC (r2 DELL    CBX3     6222004 MSFT    10013)
(XEN) ACPI: TCPA CABD5D18, 0032 (r2                        0             0)
(XEN) ACPI: SSDT CAB78A98, 02F9 (r1 DELLTP      TPM     3000 INTL 20090903)
(XEN) ACPI: MCFG CABD5C98, 003C (r1 DELL   SNDYBRDG  6222004 MSFT       97)
(XEN) ACPI: HPET CABD5C18, 0038 (r1 A M I   PCHHPET  6222004 AMI.        3)
(XEN) ACPI: BOOT CABD5B98, 0028 (r1 DELL   CBX3      6222004 AMI     10013)
(XEN) ACPI: SSDT CAB5C018, 0804 (r1  PmRef  Cpu0Ist     3000 INTL 20090903)
(XEN) ACPI: SSDT CAB5B018, 0996 (r1  PmRef    CpuPm     3000 INTL 20090903)
(XEN) ACPI: DMAR CAB77C18, 00E8 (r1 INTEL      SNB         1 INTL        1)
(XEN) ACPI: SLIC CAB65C18, 0176 (r3 DELL    CBX3     6222004 MSFT    10013)
(XEN) System RAM: 16261MB (16651320kB)
(XEN) No NUMA configuration found
(XEN) Faking a node at 0000000000000000-000000042e000000
(XEN) Domain heap initialised
(XEN) found SMP MP-table at 000f1e00
(XEN) DMI 2.6 present.
(XEN) Using APIC driver default
(XEN) ACPI: PM-Timer IO Port: 0x408
(XEN) ACPI: ACPI SLEEP INFO: pm1x_cnt[404,0], pm1x_evt[400,0]
(XEN) ACPI: 32/64X FACS address mismatch in FADT - cabd4e40/00000000cabd4d40, using 32
(XEN) ACPI:                  wakeup_vec[cabd4e4c], vec_size[20]
(XEN) ACPI: Local APIC address 0xfee00000
(XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
(XEN) Processor #0 6:10 APIC version 21
(XEN) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
(XEN) Processor #2 6:10 APIC version 21
(XEN) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x01] enabled)
(XEN) Processor #1 6:10 APIC version 21
(XEN) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x03] enabled)
(XEN) Processor #3 6:10 APIC version 21
(XEN) ACPI: LAPIC (acpi_id[0x05] lapic_id[0x04] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x06] lapic_id[0x05] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x07] lapic_id[0x06] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x08] lapic_id[0x07] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x09] lapic_id[0x08] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x09] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x0a] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x0c] lapic_id[0x0b] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x0d] lapic_id[0x0c] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x0e] lapic_id[0x0d] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x0f] lapic_id[0x0e] disabled)
(XEN) ACPI: LAPIC (acpi_id[0x10] lapic_id[0x0f] disabled)
(XEN) ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
(XEN) IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
(XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
(XEN) ACPI: IRQ0 used by override.
(XEN) ACPI: IRQ2 used by override.
(XEN) ACPI: IRQ9 used by override.
(XEN) Enabling APIC mode:  Flat.  Using 1 I/O APICs
(XEN) ACPI: HPET id: 0x8086a701 base: 0xfed00000
(XEN) PCI: MCFG configuration 0: base f8000000 segment 0 buses 0 - 63
(XEN) PCI: Not using MMCONFIG.
(XEN) Table is not found!
(XEN) Using ACPI (MADT) for SMP configuration information
(XEN) IRQ limits: 24 GSI, 760 MSI/MSI-X
(XEN) Switched to APIC driver x2apic_cluster.
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) Detected 2494.356 MHz processor.
(XEN) Initing memory sharing.
(XEN) mce_intel.c:1162: MCA Capability: BCAST 1 SER 0 CMCI 1 firstbank 0 extended MCE MSR 0
(XEN) Intel machine check reporting enabled
(XEN) Intel VT-d supported page sizes: 4kB.
(XEN) Intel VT-d supported page sizes: 4kB.
(XEN) Intel VT-d Snoop Control not enabled.
(XEN) Intel VT-d Dom0 DMA Passthrough not enabled.
(XEN) Intel VT-d Queued Invalidation enabled.
(XEN) Intel VT-d Interrupt Remapping enabled.
(XEN) Intel VT-d Shared EPT tables not enabled.
(XEN) I/O virtualisation enabled
(XEN)  - Dom0 mode: Relaxed
(XEN) Enabled directed EOI with ioapic_ack_old on!
(XEN) ENABLING IO-APIC IRQs
(XEN)  -> Using old ACK method
(XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1
(XEN) TSC deadline timer enabled
(XEN) Platform timer is 14.318MHz HPET
(XEN) Allocated console ring of 32 KiB.
(XEN) VMX: Supported advanced features:
(XEN)  - APIC MMIO access virtualisation
(XEN)  - APIC TPR shadow
(XEN)  - Extended Page Tables (EPT)
(XEN)  - Virtual-Processor Identifiers (VPID)
(XEN)  - Virtual NMI
(XEN)  - MSR direct-access bitmap
(XEN)  - Unrestricted Guest
(XEN) HVM: ASIDs enabled.
(XEN) HVM: VMX enabled
(XEN) HVM: Hardware Assisted Paging (HAP) detected
(XEN) HVM: HAP page sizes: 4kB, 2MB
(XEN) Brought up 4 CPUs
(XEN) ACPI sleep modes: S3
(XEN) mcheck_poll: Machine check polling timer started.
(XEN) *** LOADING DOMAIN 0 ***
(XEN) elf_parse_binary: phdr: paddr=0x1000000 memsz=0x815000
(XEN) elf_parse_binary: phdr: paddr=0x1a00000 memsz=0x7e0f0
(XEN) elf_parse_binary: phdr: paddr=0x1a7f000 memsz=0x14680
(XEN) elf_parse_binary: phdr: paddr=0x1a94000 memsz=0x58d000
(XEN) elf_parse_binary: memory: 0x1000000 -> 0x2021000
(XEN) elf_xen_parse_note: GUEST_OS = "linux"
(XEN) elf_xen_parse_note: GUEST_VERSION = "2.6"
(XEN) elf_xen_parse_note: XEN_VERSION = "xen-3.0"
(XEN) elf_xen_parse_note: VIRT_BASE = 0xffffffff80000000
(XEN) elf_xen_parse_note: ENTRY = 0xffffffff81a94210
(XEN) elf_xen_parse_note: HYPERCALL_PAGE = 0xffffffff81001000
(XEN) elf_xen_parse_note: FEATURES = "!writable_page_tables|pae_pgdir_above_4gb"
(XEN) elf_xen_parse_note: PAE_MODE = "yes"
(XEN) elf_xen_parse_note: LOADER = "generic"
(XEN) elf_xen_parse_note: unknown xen elf note (0xd)
(XEN) elf_xen_parse_note: SUSPEND_CANCEL = 0x1
(XEN) elf_xen_parse_note: HV_START_LOW = 0xffff800000000000
(XEN) elf_xen_parse_note: PADDR_OFFSET = 0x0
(XEN) elf_xen_addr_calc_check: addresses:
(XEN)     virt_base        = 0xffffffff80000000
(XEN)     elf_paddr_offset = 0x0
(XEN)     virt_offset      = 0xffffffff80000000
(XEN)     virt_kstart      = 0xffffffff81000000
(XEN)     virt_kend        = 0xffffffff82021000
(XEN)     virt_entry       = 0xffffffff81a94210
(XEN)     p2m_base         = 0xffffffffffffffff
(XEN)  Xen  kernel: 64-bit, lsb, compat32
(XEN)  Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x2021000
(XEN) PHYSICAL MEMORY ARRANGEMENT:
(XEN)  Dom0 alloc.:   0000000410000000->0000000418000000 (4025724 pages to be allocated)
(XEN)  Init. ramdisk: 000000042a30f000->000000042dfffa00
(XEN) VIRTUAL MEMORY ARRANGEMENT:
(XEN)  Loaded kernel: ffffffff81000000->ffffffff82021000
(XEN)  Init. ramdisk: ffffffff82021000->ffffffff85d11a00
(XEN)  Phys-Mach map: ffffffff85d12000->ffffffff87c27368
(XEN)  Start info:    ffffffff87c28000->ffffffff87c284b4
(XEN)  Page tables:   ffffffff87c29000->ffffffff87c6c000
(XEN)  Boot stack:    ffffffff87c6c000->ffffffff87c6d000
(XEN)  TOTAL:         ffffffff80000000->ffffffff88000000
(XEN)  ENTRY ADDRESS: ffffffff81a94210
(XEN) Dom0 has maximum 4 VCPUs
(XEN) elf_load_binary: phdr 0 at 0xffffffff81000000 -> 0xffffffff81815000
(XEN) elf_load_binary: phdr 1 at 0xffffffff81a00000 -> 0xffffffff81a7e0f0
(XEN) elf_load_binary: phdr 2 at 0xffffffff81a7f000 -> 0xffffffff81a93680
(XEN) elf_load_binary: phdr 3 at 0xffffffff81a94000 -> 0xffffffff81b1a000
(XEN) Scrubbing Free RAM: .done.
(XEN) Xen trace buffers: disabled
(XEN) Std. Loglevel: All
(XEN) Guest Loglevel: All
(XEN) **********************************************
(XEN) ******* WARNING: CONSOLE OUTPUT IS SYNCHRONOUS
(XEN) ******* This option is intended to aid debugging of Xen by ensuring
(XEN) ******* that all output is synchronously delivered on the serial line.
(XEN) ******* However it can introduce SIGNIFICANT latencies and affect
(XEN) ******* timekeeping. It is NOT recommended for production use!
(XEN) **********************************************
(XEN) 3... 2... 1... 
(XEN) Xen is relinquishing VGA console.
(XEN) *** Serial input -> DOM0 (type 'CTRL-a' three times to switch input to Xen)
(XEN) Freed 232kB init memory.
mapping kernel into physical memory
about to get started...
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 3.7.4-3.pvops.qubes.x86_64 (user@devel17) (gcc version 4.7.2 20121109 (Red Hat 4.7.2-8) (GCC) ) #1 SMP Fri Mar 22 19:58:09 UTC 2013
[    0.000000] Command line: root=/dev/mapper/qubes_test500-root_test ro rd.lvm.lv=qubes_test500/root_test rd.lvm.lv=qubes_test500/lv_swap rd.luks=0 rd.md.0 rd.dm=0 console=tty0 console=hvc0 no_console_suspend testrun=d682e0a testcount=2
[    0.000000] Freeing 9a-100 pfn range: 102 pages freed
[    0.000000] Freeing 20000-20200 pfn range: 512 pages freed
[    0.000000] Freeing 40000-40200 pfn range: 512 pages freed
[    0.000000] Freeing ca61e-ca662 pfn range: 68 pages freed
[    0.000000] Freeing ca9b7-cabff pfn range: 584 pages freed
[    0.000000] Freeing cac00-100000 pfn range: 218112 pages freed
[    0.000000] Released 219890 pages of unused memory
[    0.000000] Set 219890 page(s) to 1-1 mapping
[    0.000000] Populating 3e2a6d-41855f pfn range: 219890 pages added
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] Xen: [mem 0x0000000000000000-0x0000000000099fff] usable
[    0.000000] Xen: [mem 0x000000000009a800-0x00000000000fffff] reserved
[    0.000000] Xen: [mem 0x0000000000100000-0x000000001fffffff] usable
[    0.000000] Xen: [mem 0x0000000020000000-0x00000000201fffff] reserved
[    0.000000] Xen: [mem 0x0000000020200000-0x000000003fffffff] usable
[    0.000000] Xen: [mem 0x0000000040000000-0x00000000401fffff] reserved
[    0.000000] Xen: [mem 0x0000000040200000-0x00000000ca61dfff] usable
[    0.000000] Xen: [mem 0x00000000ca61e000-0x00000000ca661fff] reserved
[    0.000000] Xen: [mem 0x00000000ca662000-0x00000000ca9b6fff] usable
[    0.000000] Xen: [mem 0x00000000ca9b7000-0x00000000ca9e6fff] reserved
[    0.000000] Xen: [mem 0x00000000ca9e7000-0x00000000cabe6fff] ACPI NVS
[    0.000000] Xen: [mem 0x00000000cabe7000-0x00000000cabfefff] ACPI data
[    0.000000] Xen: [mem 0x00000000cabff000-0x00000000cabfffff] usable
[    0.000000] Xen: [mem 0x00000000cb800000-0x00000000cf9fffff] reserved
[    0.000000] Xen: [mem 0x00000000fec00000-0x00000000fec00fff] reserved
[    0.000000] Xen: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved
[    0.000000] Xen: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
[    0.000000] Xen: [mem 0x00000000ffc00000-0x00000000ffc1ffff] reserved
[    0.000000] Xen: [mem 0x0000000100000000-0x000000042dffffff] usable
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] DMI 2.6 present.
[    0.000000] No AGP bridge found
[    0.000000] e820: last_pfn = 0x42e000 max_arch_pfn = 0x400000000
[    0.000000] x2apic enabled by BIOS, switching to x2apic ops
[    0.000000] e820: last_pfn = 0xcac00 max_arch_pfn = 0x400000000
[    0.000000] init_memory_mapping: [mem 0x00000000-0xcabfffff]
[    0.000000] init_memory_mapping: [mem 0x100000000-0x42dffffff]
[    0.000000] RAMDISK: [mem 0x02021000-0x05d11fff]
[    0.000000] ACPI: RSDP 00000000000fe300 00024 (v02 DELL  )
[    0.000000] ACPI: XSDT 00000000cabfde18 0007C (v01 DELL    CBX3    06222004 MSFT 00010013)
[    0.000000] ACPI: FACP 00000000cab77d98 000F4 (v04 DELL    CBX3    06222004 MSFT 00010013)
[    0.000000] ACPI Warning: 32/64 FACS address mismatch in FADT - two FACS tables! (20120913/tbfadt-394)
[    0.000000] ACPI BIOS Bug: Warning: 32/64X FACS address mismatch in FADT - 0xCABD4E40/0x00000000CABD4D40, using 32 (20120913/tbfadt-521)
[    0.000000] ACPI: DSDT 00000000cab45018 0885D (v02 INT430 SYSFexxx 00001001 INTL 20090903)
[    0.000000] ACPI: FACS 00000000cabd4e40 00040
[    0.000000] ACPI: APIC 00000000cabfcf18 000CC (v02 DELL    CBX3    06222004 MSFT 00010013)
[    0.000000] ACPI: TCPA 00000000cabd5d18 00032 (v02                 00000000      00000000)
[    0.000000] ACPI: SSDT 00000000cab78a98 002F9 (v01 DELLTP      TPM 00003000 INTL 20090903)
[    0.000000] ACPI: MCFG 00000000cabd5c98 0003C (v01 DELL   SNDYBRDG 06222004 MSFT 00000097)
[    0.000000] ACPI: HPET 00000000cabd5c18 00038 (v01 A M I   PCHHPET 06222004 AMI. 00000003)
[    0.000000] ACPI: BOOT 00000000cabd5b98 00028 (v01 DELL   CBX3     06222004 AMI  00010013)
[    0.000000] ACPI: SSDT 00000000cab5c018 00804 (v01  PmRef  Cpu0Ist 00003000 INTL 20090903)
[    0.000000] ACPI: SSDT 00000000cab5b018 00996 (v01  PmRef    CpuPm 00003000 INTL 20090903)
[    0.000000] ACPI: XMAR 00000000cab77c18 000E8 (v01 INTEL      SNB  00000001 INTL 00000001)
[    0.000000] ACPI: SLIC 00000000cab65c18 00176 (v03 DELL    CBX3    06222004 MSFT 00010013)
[    0.000000] Setting APIC routing to cluster x2apic.
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x00010000-0x00ffffff]
[    0.000000]   DMA32    [mem 0x01000000-0xffffffff]
[    0.000000]   Normal   [mem 0x100000000-0x42dffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x00010000-0x00099fff]
[    0.000000]   node   0: [mem 0x00100000-0x1fffffff]
[    0.000000]   node   0: [mem 0x20200000-0x3fffffff]
[    0.000000]   node   0: [mem 0x40200000-0xca61dfff]
[    0.000000]   node   0: [mem 0xca662000-0xca9b6fff]
[    0.000000]   node   0: [mem 0xcabff000-0xcabfffff]
[    0.000000]   node   0: [mem 0x100000000-0x42dffffff]
[    0.000000] ACPI: PM-Timer IO Port: 0x408
[    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x01] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x03] enabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x04] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x05] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x07] lapic_id[0x06] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x08] lapic_id[0x07] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x09] lapic_id[0x08] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x09] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x0a] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x0c] lapic_id[0x0b] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x0d] lapic_id[0x0c] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x0e] lapic_id[0x0d] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x0f] lapic_id[0x0e] disabled)
[    0.000000] ACPI: LAPIC (acpi_id[0x10] lapic_id[0x0f] disabled)
[    0.000000] ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
[    0.000000] IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    0.000000] Using ACPI (MADT) for SMP configuration information
[    0.000000] ACPI: HPET id: 0x8086a701 base: 0xfed00000
[    0.000000] smpboot: Allowing 16 CPUs, 12 hotplug CPUs
[    0.000000] e820: [mem 0xcfa00000-0xfebfffff] available for PCI devices
[    0.000000] Booting paravirtualized kernel on Xen
[    0.000000] Xen version: 4.1.5-rc1 (preserve-AD)
[    0.000000] setup_percpu: NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:16 nr_node_ids:1
[    0.000000] PERCPU: Embedded 28 pages/cpu @ffff880418200000 s83584 r8192 d22912 u131072
[    6.678990] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 4101195
[    6.678993] Kernel command line: root=/dev/mapper/qubes_test500-root_test ro rd.lvm.lv=qubes_test500/root_test rd.lvm.lv=qubes_test500/lv_swap rd.luks=0 rd.md.0 rd.dm=0 console=tty0 console=hvc0 no_console_suspend testrun=d682e0a testcount=2
[    6.679082] PID hash table entries: 4096 (order: 3, 32768 bytes)
[    6.680036] Dentry cache hash table entries: 2097152 (order: 12, 16777216 bytes)
[    6.683103] Inode-cache hash table entries: 1048576 (order: 11, 8388608 bytes)
[    6.684423] __ex_table already sorted, skipping sort
[    6.708045] software IO TLB [mem 0x402400000-0x4063fffff] (64MB) mapped at [ffff880402400000-ffff8804063fffff]
[    6.749141] Memory: 15833904k/17530880k available (4699k kernel code, 879624k absent, 817352k reserved, 6043k data, 600k init)
[    6.749188] Hierarchical RCU implementation.
[    6.749188] 	RCU dyntick-idle grace-period acceleration is enabled.
[    6.749189] 	RCU restricting CPUs from NR_CPUS=512 to nr_cpu_ids=4.
[    6.749195] NR_IRQS:33024 nr_irqs:712 16
[    6.749255] xen: sci override: global_irq=9 trigger=0 polarity=0
[    6.749286] xen: acpi sci 9
[    6.750932] Console: colour VGA+ 80x25
[    6.757258] console [tty0] enabled
[    7.514612] console [hvc0] enabled
[    7.518141] installing Xen timer for CPU 0
[    7.522307] tsc: Detected 2494.356 MHz processor
[    7.526981] Calibrating delay loop (skipped), value calculated using timer frequency.. 4988.71 BogoMIPS (lpj=9977424)
[    7.537626] pid_max: default: 32768 minimum: 301
[    7.542338] Mount-cache hash table entries: 256
[    7.547035] Initializing cgroup subsys cpuacct
[    7.551441] Initializing cgroup subsys devices
[    7.555945] Initializing cgroup subsys freezer
[    7.560452] Initializing cgroup subsys net_cls
[    7.564959] Initializing cgroup subsys blkio
[    7.569342] ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
[    7.569342] ENERGY_PERF_BIAS: View and update with x86_energy_perf_policy(8)
[    7.582509] CPU: Physical Processor ID: 0
[    7.586534] CPU: Processor Core ID: 0
[    7.590264] mce: CPU supports 7 MCE banks
[    7.594360] Last level iTLB entries: 4KB 512, 2MB 0, 4MB 0
[    7.594360] Last level dTLB entries: 4KB 512, 2MB 32, 4MB 32
[    7.594360] tlb_flushall_shift: 5
[    7.609120] Freeing SMP alternatives: 20k freed
[    7.614752] ACPI: Core revision 20120913
[    7.628886] Performance Events: unsupported p6 CPU model 42 no PMU driver, software events only.
[    7.638014] installing Xen timer for CPU 1
[    7.642325] installing Xen timer for CPU 2
[    7.646590] installing Xen timer for CPU 3
[    7.650775] Brought up 4 CPUs
[    7.653942] devtmpfs: initialized
[    7.658818] PM: Registering ACPI NVS region [mem 0xca9e7000-0xcabe6fff] (2097152 bytes)
[    7.666843] reboot: PCI series board detected. Selecting Dell Latitude E6420-method for reboots.
[    7.675708] Grant tables using version 2 layout.
[    7.680312] Grant table initialized
[    7.683889] regulator-dummy: no parameters
[    7.688062] RTC time: 16:41:09, date: 03/26/13
[    7.692571] NET: Registered protocol family 16
[    7.697242] ACPI FADT declares the system doesn't support PCIe ASPM, so disable it
[    7.704801] ACPI: bus type pci registered
[    7.708992] PCI: MMCONFIG for domain 0000 [bus 00-3f] at [mem 0xf8000000-0xfbffffff] (base 0xf8000000)
[    7.718289] PCI: not using MMCONFIG
[    7.721799] PCI: Using configuration type 1 for base access
[    7.727446] dmi type 0xB1 record - unknown flag
[    7.732579] bio: create slab <bio-0> at 0
[    7.736685] ACPI: Added _OSI(Module Device)
[    7.740833] ACPI: Added _OSI(Processor Device)
[    7.745336] ACPI: Added _OSI(3.0 _SCP Extensions)
[    7.750181] ACPI: Added _OSI(Processor Aggregator Device)
[    7.762185] [Firmware Bug]: ACPI: BIOS _OSI(Linux) query ignored
[    7.996748] ACPI: SSDT 00000000ca9d7798 00727 (v01  PmRef  Cpu0Cst 00003001 INTL 20090903)
[    8.005333] ACPI: Dynamic OEM Table Load:
[    8.009293] ACPI: SSDT           (null) 00727 (v01  PmRef  Cpu0Cst 00003001 INTL 20090903)
[    8.018079] ACPI: SSDT 00000000ca9d8a98 00303 (v01  PmRef    ApIst 00003000 INTL 20090903)
[    8.027113] ACPI: Dynamic OEM Table Load:
[    8.031073] ACPI: SSDT           (null) 00303 (v01  PmRef    ApIst 00003000 INTL 20090903)
[    8.039591] ACPI: SSDT 00000000ca9d6d98 00119 (v01  PmRef    ApCst 00003000 INTL 20090903)
[    8.048170] ACPI: Dynamic OEM Table Load:
[    8.052131] ACPI: SSDT           (null) 00119 (v01  PmRef    ApCst 00003000 INTL 20090903)
[    8.062710] ACPI: Interpreter enabled
[    8.066340] ACPI: (supports S0 S3 S5)
[    8.070052] ACPI: Using IOAPIC for interrupt routing
[    8.075118] PCI: MMCONFIG for domain 0000 [bus 00-3f] at [mem 0xf8000000-0xfbffffff] (base 0xf8000000)
[    8.085030] PCI: MMCONFIG at [mem 0xf8000000-0xfbffffff] reserved in ACPI motherboard resources
[    8.164309] ACPI: EC: GPE = 0x10, I/O: command/status = 0x934, data = 0x930
[    8.176343] ACPI: No dock devices found.
[    8.180236] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[    8.190035] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-3e])
[    8.197216] PCI host bridge to bus 0000:00
[    8.201279] pci_bus 0000:00: root bus resource [bus 00-3e]
[    8.206825] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7]
[    8.213063] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff]
[    8.219303] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff]
[    8.226238] pci_bus 0000:00: root bus resource [mem 0xcfa00000-0xfeafffff]
[    8.233170] pci_bus 0000:00: root bus resource [mem 0xfed40000-0xfed44fff]
[    8.244014] pci 0000:00:1c.0: PCI bridge to [bus 01]
[    8.257247] pci 0000:00:1c.1: PCI bridge to [bus 02]
[    8.270362] pci 0000:00:1c.2: PCI bridge to [bus 03-08]
[    8.275693] pci 0000:00:1c.3: PCI bridge to [bus 09]
[    8.288858] pci 0000:00:1c.5: PCI bridge to [bus 0a]
[    8.294253]  pci0000:00: ACPI _OSC support notification failed, disabling PCIe ASPM
[    8.301899]  pci0000:00: Unable to request _OSC control (_OSC support mask: 0x08)
(XEN) PCI add device 00:00.0
(XEN) PCI add device 00:02.0
(XEN) PCI add device 00:16.0
(XEN) PCI add device 00:19.0
(XEN) PCI add device 00:1a.0
(XEN) PCI add device 00:1b.0
(XEN) PCI add device 00:1c.0
(XEN) PCI add device 00:1c.1
(XEN) PCI add device 00:1c.2
(XEN) PCI add device 00:1c.3
(XEN) PCI add device 00:1c.5
(XEN) PCI add device 00:1d.0
(XEN) PCI add device 00:1f.0
(XEN) PCI add device 00:1f.2
(XEN) PCI add device 00:1f.3
(XEN) PCI add device 02:00.0
(XEN) PCI add device 03:00.0
(XEN) PCI add device 0a:00.0
(XEN) PCI add device 0a:00.1
[    8.360114] ACPI: PCI Interrupt Link [LNKA] (IRQs 1 3 4 5 6 7 10 12 14 15) *11
[    8.367421] ACPI: PCI Interrupt Link [LNKB] (IRQs 1 3 4 5 6 7 11 12 14 15) *10
[    8.374697] ACPI: PCI Interrupt Link [LNKC] (IRQs 1 3 4 5 6 7 10 12 14 15) *11
[    8.381978] ACPI: PCI Interrupt Link [LNKD] (IRQs 1 3 4 5 6 7 11 12 14 15) *10
[    8.389257] ACPI: PCI Interrupt Link [LNKE] (IRQs 1 3 4 *5 6 7 10 12 14 15)
[    8.396280] ACPI: PCI Interrupt Link [LNKF] (IRQs 1 3 4 5 6 7 11 12 14 15) *0, disabled.
[    8.404422] ACPI: PCI Interrupt Link [LNKG] (IRQs 1 *3 4 5 6 7 10 12 14 15)
[    8.411441] ACPI: PCI Interrupt Link [LNKH] (IRQs 1 3 4 5 6 7 11 12 14 15) *0, disabled.
[    8.419561] xen/balloon: Initialising balloon driver.
[    8.425498] xen-balloon: Initialising balloon driver.
[    8.430559] xen/balloon: Xen selfballooning driver disabled for domain0.
[    8.437342] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
[    8.445456] vgaarb: loaded
[    8.448184] vgaarb: bridge control possible 0000:00:02.0
[    8.453693] PCI: Using ACPI for IRQ routing
[    8.463110] Switching to clocksource xen
[    8.468416] pnp: PnP ACPI init
[    8.471450] ACPI: bus type pnp registered
[    8.476316] system 00:03: [mem 0xfed00000-0xfed003ff] has been reserved
[    8.483020] system 00:05: [io  0x0680-0x069f] has been reserved
[    8.488904] system 00:05: [io  0x1000-0x100f] has been reserved
[    8.494881] system 00:05: [io  0xffff] has been reserved
[    8.500253] system 00:05: [io  0xffff] has been reserved
[    8.505625] system 00:05: [io  0x0400-0x047f] has been reserved
[    8.511607] system 00:05: [io  0x0500-0x057f] has been reserved
[    8.517586] system 00:05: [io  0x164e-0x164f] has been reserved
[    8.524522] Already setup the GSI :4
[    8.529963] system 00:0c: [mem 0xfed1c000-0xfed1ffff] has been reserved
[    8.536548] system 00:0c: [mem 0xfed10000-0xfed17fff] has been reserved
[    8.543212] system 00:0c: [mem 0xfed18000-0xfed18fff] has been reserved
[    8.549887] system 00:0c: [mem 0xfed19000-0xfed19fff] has been reserved
[    8.556559] system 00:0c: [mem 0xf8000000-0xfbffffff] has been reserved
[    8.563231] system 00:0c: [mem 0xfed20000-0xfed3ffff] has been reserved
[    8.569905] system 00:0c: [mem 0xfed90000-0xfed93fff] has been reserved
[    8.576577] system 00:0c: [mem 0xfed45000-0xfed8ffff] has been reserved
[    8.583249] system 00:0c: [mem 0xff000000-0xffffffff] could not be reserved
[    8.590271] system 00:0c: [mem 0xfee00000-0xfeefffff] could not be reserved
[    8.612224] system 00:0e: [mem 0x20000000-0x201fffff] has been reserved
[    8.618804] system 00:0e: [mem 0x40000000-0x401fffff] has been reserved
[    8.625483] pnp: PnP ACPI: found 15 devices
[    8.629714] ACPI: ACPI bus type pnp unregistered
[    8.641256] PM-Timer failed consistency check  (0x0xffffff) - aborting.
[    8.647923] pci 0000:00:1c.0: PCI bridge to [bus 01]
[    8.652879] pci 0000:00:1c.1: PCI bridge to [bus 02]
[    8.657891] pci 0000:00:1c.1:   bridge window [mem 0xe2d00000-0xe2dfffff]
[    8.664745] pci 0000:00:1c.2: PCI bridge to [bus 03-08]
[    8.670017] pci 0000:00:1c.2:   bridge window [io  0x3000-0x3fff]
[    8.676174] pci 0000:00:1c.2:   bridge window [mem 0xe2200000-0xe2bfffff]
[    8.683022] pci 0000:00:1c.2:   bridge window [mem 0xe0a00000-0xe13fffff 64bit pref]
[    8.690854] pci 0000:00:1c.3: PCI bridge to [bus 09]
[    8.695841] pci 0000:00:1c.3:   bridge window [io  0x2000-0x2fff]
[    8.702002] pci 0000:00:1c.3:   bridge window [mem 0xe1800000-0xe21fffff]
[    8.708848] pci 0000:00:1c.3:   bridge window [mem 0xe0000000-0xe09fffff 64bit pref]
[    8.716679] pci 0000:00:1c.5: PCI bridge to [bus 0a]
[    8.721670] pci 0000:00:1c.5:   bridge window [mem 0xe2c00000-0xe2cfffff]
[    8.728672] Already setup the GSI :17
[    8.732397] NET: Registered protocol family 2
[    8.736898] TCP established hash table entries: 262144 (order: 10, 4194304 bytes)
[    8.744934] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
[    8.751748] TCP: Hash tables configured (established 262144 bind 65536)
[    8.758341] TCP: reno registered
[    8.761615] UDP hash table entries: 8192 (order: 6, 262144 bytes)
[    8.767806] UDP-Lite hash table entries: 8192 (order: 6, 262144 bytes)
[    8.774493] NET: Registered protocol family 1
[    8.778856] Already setup the GSI :16
[    8.782632] Already setup the GSI :17
[    8.786365] Already setup the GSI :18
[    8.790120] Unpacking initramfs...
[    8.840431] Freeing initrd memory: 62404k freed
[    8.855935] Simple Boot Flag at 0xf3 set to 0x1
[    8.860890] audit: initializing netlink socket (disabled)
[    8.866267] type=2000 audit(1364316070.634:1): initialized
[    8.890892] VFS: Disk quotas dquot_6.5.2
[    8.894812] Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    8.901450] msgmni has been set to 31047
[    8.905558] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
[    8.912948] io scheduler noop registered
[    8.916896] io scheduler deadline registered
[    8.921256] io scheduler cfq registered (default)
[    8.926652] ACPI: Requesting acpi_cpufreq
(XEN) no cpu_id for acpi_id 5
(XEN) no cpu_id for acpi_id 6
(XEN) no cpu_id for acpi_id 7
(XEN) no cpu_id for acpi_id 8
[    8.977651] Non-volatile memory driver v1.3
[    8.982953] loop: module loaded
[    8.986081] libphy: Fixed MDIO Bus: probed
[    8.990238] i8042: PNP: PS/2 Controller [PNP0303:PS2K,PNP0f13:PS2M] at 0x60,0x64 irq 1,12
[    8.998845] i8042: Warning: Keylock active
[    9.004271] serio: i8042 KBD port at 0x60,0x64 irq 1
[    9.009194] serio: i8042 AUX port at 0x60,0x64 irq 12
[    9.014494] mousedev: PS/2 mouse device common for all mice
[    9.020226] rtc_cmos 00:06: RTC can wake from S4
[    9.025310] rtc_cmos 00:06: rtc core: registered rtc_cmos as rtc0
[    9.031456] rtc0: alarms up to one year, y3k, 242 bytes nvram
[    9.037239] device-mapper: uevent: version 1.0.3
[    9.042024] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input0
[    9.050643] device-mapper: ioctl: 4.23.0-ioctl (2012-07-25) initialised: dm-devel@redhat.com
[    9.059228] TCP: cubic registered
[    9.062599] NET: Registered protocol family 10
[    9.067281] Key type dns_resolver registered
[    9.071791] registered taskstats version 1
[    9.077346]   Magic number: 5:764:692
[    9.081364] rtc_cmos 00:06: setting system clock to 2013-03-26 16:41:11 UTC (1364316071)
[    9.090824] Freeing unused kernel memory: 600k freed
[    9.096008] Write protecting the kernel read-only data: 10240k
[    9.104400] Freeing unused kernel memory: 1436k freed
[    9.109988] Freeing unused kernel memory: 1964k freed
[    9.155642] systemd[1]: systemd 197 running in system mode. (+PAM +LIBWRAP +AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ)
[    9.168793] systemd[1]: Running in initial RAM disk.

Welcome to Qubes 2 (R2) dracut-024-25.git20130205.fc18 (Initramfs)!

[    9.182572] systemd[1]: Inserted module 'autofs4'
[    9.187327] systemd[1]: No hostname configured.
[    9.191840] systemd[1]: Set hostname to <localhost>.
[    9.197025] systemd[1]: Initializing machine ID from random generator.
[    9.230251] systemd[1]: Starting Encrypted Volumes.
[  OK  ] Reached target Encrypted Volumes.
[    9.240161] systemd[1]: Reached target Encrypted Volumes.
[    9.245632] systemd[1]: Starting udev Kernel Socket.
[  OK  ] Listening on udev Kernel Socket.
[    9.255356] systemd[1]: Listening on udev Kernel Socket.
[    9.260800] systemd[1]: Starting udev Control Socket.
[  OK  ] Listening on udev Control Socket.
[    9.270670] systemd[1]: Listening on udev Control Socket.
[    9.276175] systemd[1]: Starting Journal Socket.
[  OK  ] Listening on Journal Socket.
[    9.285237] systemd[1]: Listening on Journal Socket.
[    9.290319] systemd[1]: Starting dracut cmdline hook...
         Starting dracut cmdline hook...
[    9.316394] systemd[1]: Started Load Kernel Modules.
[    9.321365] systemd[1]: Starting Journal Service...
         Starting Journal Service...
[  OK  ] Started Journal Service.
[    9.359272] systemd[1]: Started Journal Service.
[    9.364277] systemd[1]: Starting Sockets.
[  OK  ] Reached target Sockets.
[    9.372228] systemd[1]: Reached target Sockets.
[    9.376882] systemd[1]: Starting Swap.
[  OK  ] Reached target Swap.
[    9.384347] systemd[1]: Reached target Swap.
[    9.388752] systemd[1]: Starting Local File Systems.
[  OK  ] Reached target Local File Systems.
[    9.399136] systemd[1]: Reached target Local File Systems.
[    9.438492] pciback 0000:00:19.0: seizing device
[    9.443120] pciback 0000:02:00.0: seizing device
[    9.448176] Already setup the GSI :17
[    9.659793] xen-pciback: backend is vpci
[  OK  ] Started dracut cmdline hook.
         Starting Setup Virtual Console...
         Starting dracut pre-udev hook...
[  OK  ] Started Setup Virtual Console.
[  OK  ] Reached target System Initialization.
[  OK  ] Started dracut pre-udev hook.
         Starting udev Kernel Device Manager...
[   10.109354] systemd-udevd[123]: starting version 197
[  OK  ] Started udev Kernel Device Manager.
         Starting dracut pre-trigger hook...
[   10.208145] input: DualPoint Stick as /devices/platform/i8042/serio1/input/input1
[   10.230525] input: AlpsPS/2 ALPS DualPoint TouchPad as /devices/platform/i8042/serio1/input/input2
[  OK  ] Started dracut pre-trigger hook.
         Starting udev Coldplug all Devices...
[   10.509816] input: Lid Switch as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0D:00/input/input3
[   10.518644] ACPI: Lid Switch [LID]
[   10.522122] input: Power Button as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input4
[   10.530555] ACPI: Power Button [PBTN]
[   10.534259] input: Sleep Button as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0E:00/input/input5
[   10.542702] ACPI: Sleep Button [SBTN]
[   10.546418] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input6
[   10.553891] ACPI: Power Button [PWRF]
[  OK  ] Started udev Coldplug all Devices.
         Starting Show Plymouth Boot Screen...
[   10.580778] wmi: Mapper loaded
[   10.582203] Linux agpgart interface v0.103
[   10.582760] ACPI: bus type usb registered
[   10.582786] usbcore: registered new interface driver usbfs
[   10.582795] usbcore: registered new interface driver hub
[   10.584646] usbcore: registered new device driver usb
[   10.585169] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[   10.585207] Already setup the GSI :16
[   10.598870] Already setup the GSI :18
[   10.622141] xhci_hcd 0000:03:00.0: xHCI Host Controller
[   10.622143] ehci_hcd 0000:00:1a.0: EHCI Host Controller
[   10.623879] ehci_hcd 0000:00:1a.0: new USB bus registered, assigned bus number 1
[   10.623920] ehci_hcd 0000:00:1a.0: debug port 2
[   10.624411] sdhci: Secure Digital Host Controller Interface driver
[   10.624411] sdhci: Copyright(c) Pierre Ossman
[   10.624589] sdhci-pci 0000:0a:00.0: SDHCI controller found [1217:8221] (rev 5)
[   10.624622] Already setup the GSI :17
[   10.628247] [drm] Initialized drm 1.1.0 20060810
[   10.636471] ehci_hcd 0000:00:1a.0: irq 16, io mem 0xe2e70000
[   10.676831] 0000:0a:00.0 supply vqmmc not found, using dummy regulator
[   10.676914] SCSI subsystem initialized
[   10.679718] ehci_hcd 0000:00:1a.0: USB 2.0 started, EHCI 1.00
[   10.679746] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
[   10.679747] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[   10.679748] usb usb1: Product: EHCI Host Controller
[   10.679750] usb usb1: Manufacturer: Linux 3.7.4-3.pvops.qubes.x86_64 ehci_hcd
[   10.679751] usb usb1: SerialNumber: 0000:00:1a.0
[   10.679888] hub 1-0:1.0: USB hub found
[   10.679893] hub 1-0:1.0: 2 ports detected
[   10.680046] Already setup the GSI :17
[   10.683760] ACPI: bus type scsi registered
[   10.687358] xhci_hcd 0000:03:00.0: new USB bus registered, assigned bus number 2
[   10.693191] xhci_hcd 0000:03:00.0: irq 18, io mem 0xe2200000
[   10.752921] 0(XEN) physdev.c:168: dom0: wrong map_pirq type 3
000:0a:00.0 supply vmmc not found, using dummy regulator
[   10.752934] ehci_hcd 0000:00:1d.0: EHCI Host Controller
[   10.752942] ehci_hcd 0000:00:1d.0: new USB bus registered, assigned bus number 3
[   10.752991] ehci_hcd 0000:00:1d.0: debug port 2
[   10.760366] usb usb2: New USB device found, idVendor=1d6b, idProduct=0002
[   10.760368] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[   10.760370] usb usb2: Product: xHCI Host Controller
[   10.760371] usb usb2: Manufacturer: Linux 3.7.4-3.pvops.qubes.x86_64 xhci_hcd
[   10.760372] usb usb2: SerialNumber: 0000:03:00.0
[   10.760552] hub 2-0:1.0: USB hub found
[   10.760567] hub 2-0:1.0: 2 ports detected
[   10.760849] xhci_hcd 0000:03:00.0: xHCI Host Controller
[   10.760853] xhci_hcd 0000:03:00.0: new USB bus registered, assigned bus number 4
[   10.760914] usb usb4: New USB device found, idVendor=1d6b, idProduct=0003
[   10.760916] usb usb4: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[   10.760918] usb usb4: Product: xHCI Host Controller
[   10.760919] usb usb4: Manufacturer: Linux 3.7.4-3.pvops.qubes.x86_64 xhci_hcd
[   10.760920] usb usb4: SerialNumber: 0000:03:00.0
[   10.761038] hub 4-0:1.0: USB hub found
[   10.761053] hub 4-0:1.0: 2 ports detected
[   10.765383] ehci_hcd 0000:00:1d.0: irq 17, io mem 0xe2e50000
[   10.775094] ehci_hcd 0000:00:1d.0: USB 2.0 started, EHCI 1.00
[   10.775113] usb usb3: New USB device found, idVendor=1d6b, idProduct=0002
[   10.775114] usb usb3: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[   10.775115] usb usb3: Product: EHCI Host Controller
[   10.775116] usb usb3: Manufacturer: Linux 3.7.4-3.pvops.qubes.x86_64 ehci_hcd
[   10.775117] usb usb3: SerialNumber: 0000:00:1d.0
[   10.775287] hub 3-0:1.0: USB hub found
[   10.775294] hub 3-0:1.0: 2 ports detected
[   10.775532] Already setup the GSI :16
[   10.775737] pci 0000:00:00.0: Intel Sandybridge Chipset
[   10.930875] pci 0000:00:00.0: detected gtt size: 2097152K total, 262144K mappable
[   10.931209] mmc0: Hardware doesn't report any support voltages.
[   10.945568] pci 0000:00:00.0: detected 65536K stolen memory
         Starting dracut initqueue hook...
[   10.997002] [drm] Supports vblank timestamp caching Rev 1 (10.10.2010).
[   11.003583] [drm] Driver supports precise vblank timestamp query.
[   11.009806] vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem
[   11.074669] usb 4-1: new SuperSpeed USB device number 2 using xhci_hcd
[   11.095618] usb 4-1: Parent hub missing LPM exit latency info.  Power management will be impacted.
[   11.105280] usb 4-1: New USB device found, idVendor=125f, idProduct=a11a
[   11.111989] usb 4-1: New USB device strings: Mfr=2, Product=3, SerialNumber=1
[   11.119169] usb 4-1: Product: CH11
[   11.122604] usb 4-1: Manufacturer: ADATA
[   11.126593] usb 4-1: SerialNumber: C11A0302012071100127
[   11.134062] Initializing USB Mass Storage driver...
[   11.139095] scsi0 : usb-storage 4-1:1.0
[   11.143192] usbcore: registered new interface driver usb-storage
[   11.149257] USB Mass Storage support registered.
[   11.222402] [drm] Enabling RC6 states: RC6 on, RC6p off, RC6pp off
[   11.243216] usb 3-1: new high-speed USB device number 2 using ehci_hcd
[   11.379901] usb 3-1: New USB device found, idVendor=8087, idProduct=0024
[   11.386705] usb 3-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[   11.394539] hub 3-1:1.0: USB hub found
[   11.398576] hub 3-1:1.0: 8 ports detected
[   11.515246] usb 1-1: new high-speed USB device number 2 using ehci_hcd
[   11.623811] fbcon: inteldrmfb (fb0) is primary device
[   11.652022] usb 1-1: New USB device found, idVendor=8087, idProduct=0024
[   11.652027] usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[   11.652617] hub 1-1:1.0: USB hub found
[   11.652811] hub 1-1:1.0: 6 ports detected
[   11.727590] usb 3-1.6: new high-speed USB device number 3 using ehci_hcd
[   11.829626] usb 3-1.6: New USB device found, idVendor=413c, idProduct=818d
[   11.829631] usb 3-1.6: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[   11.829635] usb 3-1.6: Product: DW5550
[   11.829638] usb 3-1.6: Manufacturer: Dell
[   11.829641] usb 3-1.6: SerialNumber: 88FA653FDF944970
[   11.939525] usb 3-1.8: new full-speed USB device number 4 using ehci_hcd
[   12.054482] usb 3-1.8: New USB device found, idVendor=0a5c, idProduct=5801
[   12.054487] usb 3-1.8: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[   12.054491] usb 3-1.8: Product: 5880
[   12.054494] usb 3-1.8: Manufacturer: Broadcom Corp
[   12.054497] usb 3-1.8: SerialNumber: 0123456789ABCD
[   12.054953] usb 3-1.8: config 0 descriptor??
[   12.135475] usb 1-1.1: new high-speed USB device number 3 using ehci_hcd
[   12.145874] scsi 0:0:0:0: Direct-Access     ADATA    CH11             AX00 PQ: 0 ANSI: 5
[   12.150080] sd 0:0:0:0: [sda] 976773168 512-byte logical blocks: (500 GB/465 GiB)
[   12.150794] sd 0:0:0:0: [sda] Write Protect is off
[   12.151449] sd 0:0:0:0: [sda] No Caching mode page present
[   12.151452] sd 0:0:0:0: [sda] Assuming drive cache: write through
[   12.153415] sd 0:0:0:0: [sda] No Caching mode page present
[   12.153418] sd 0:0:0:0: [sda] Assuming drive cache: write through
[   12.228013] usb 1-1.1: New USB device found, idVendor=413c, idProduct=2513
[   12.228018] usb 1-1.1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[   12.228618] hub 1-1.1:1.0: USB hub found
[   12.228866] hub 1-1.1:1.0: 2 ports detected
[   12.299515] usb 1-1.4: new full-speed USB device number 4 using ehci_hcd
[   12.396416] usb 1-1.4: New USB device found, idVendor=413c, idProduct=8187
[   12.396421] usb 1-1.4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[   12.396425] usb 1-1.4: Product: DW375 Bluetooth Module
[   12.396427] usb 1-1.4: Manufacturer: Dell Computer Corp
[   12.396430] usb 1-1.4: SerialNumber: D0DF9A40FE60
[   12.435262]  sda: sda1 sda2
[   12.437911] sd 0:0:0:0: [sda] No Caching mode page present
[   12.437915] sd 0:0:0:0: [sda] Assuming drive cache: write through
[   12.437920] sd 0:0:0:0: [sda] Attached SCSI disk
[   12.467406] usb 1-1.5: new high-speed USB device number 5 using ehci_hcd
[   12.567527] usb 1-1.5: New USB device found, idVendor=05ca, idProduct=181c
[   12.567532] usb 1-1.5: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[   12.567536] usb 1-1.5: Product: Laptop_Integrated_Webcam_FHD
[   12.567539] usb 1-1.5: Manufacturer: CN0CJ3P27248717F040SA01
[   12.656041] Console: switching to colour frame buffer device 160x56
[   12.940944] fb0: inteldrmfb frame buffer device
[   12.940945] drm: registered panic notifier
[  OK  ] Started Show Plymouth Boot Screen.
[  OK  ] Reached target Basic System.
[   13.080883] acpi device:37: registered as cooling_device4
[   13.125027] ACPI: Video Device [VID] (multi-head: yes  rom: no  post: no)
[   13.135242] input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input7
[   13.147924] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
[   13.158515] Already setup the GSI :18
[   13.164510] ahci: SSS flag set, parallel bus scan disabled
[   13.173329] ahci 0000:00:1f.2: AHCI 0001.0300 32 slots 6 ports 6 Gbps 0x3b impl SATA mode
[   13.182481] ahci 0000:00:1f.2: flags: 64bit ncq sntf stag pm led clo pio slum part ems sxs apst 
[   13.225340] scsi1 : ahci
[   13.231562] scsi2 : ahci
[   13.237717] scsi3 : ahci
[   13.241424] scsi4 : ahci
[   13.247411] scsi5 : ahci
[   13.251151] scsi6 : ahci
[   13.254650] ata1: SATA max UDMA/133 abar m2048@0xe2e40000 port 0xe2e40100 irq 74
[   13.262979] ata2: SATA max UDMA/133 abar m2048@0xe2e40000 port 0xe2e40180 irq 74
[   13.271303] ata3: DUMMY
[   13.274668] ata4: SATA max UDMA/133 abar m2048@0xe2e40000 port 0xe2e40280 irq 74
[   13.283009] ata5: SATA max UDMA/133 abar m2048@0xe2e40000 port 0xe2e40300 irq 74
[   13.291334] ata6: SATA max UDMA/133 abar m2048@0xe2e40000 port 0xe2e40380 irq 74
[   13.619246] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   13.663828] ata1.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
[   13.676006] ata1.00: ATA-9: KINGSTON SV200S3256G, E111008a, max UDMA/100
[   13.683694] ata1.00: 500118192 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[   13.705802] ata1.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by device (Stat=0x51 Err=0x04)
[   13.715873] ata1.00: configured for UDMA/100
[   13.727632] scsi 1:0:0:0: Direct-Access     ATA      KINGSTON SV200S3 E111 PQ: 0 ANSI: 5
[   13.741630] sd 1:0:0:0: [sdb] 500118192 512-byte logical blocks: (256 GB/238 GiB)
[   13.752707] sd 1:0:0:0: [sdb] Write Protect is off
[   13.760410] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   13.771155]  sdb: sdb1 sdb2
[   13.776218] sd 1:0:0:0: [sdb] Attached SCSI disk
[   14.123252] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[   14.166750] ata2.00: ATAPI: HL-DT-ST DVD+/-RW GU60N, A103, max UDMA/133
[   14.195345] ata2.00: configured for UDMA/133
[   14.215519] scsi 2:0:0:0: CD-ROM            HL-DT-ST DVD+-RW GU60N    A103 PQ: 0 ANSI: 5
[   14.559333] ata4: SATA link down (SStatus 0 SControl 300)
[   14.887252] ata5: SATA link down (SStatus 0 SControl 300)
[   15.215248] ata6: SATA link down (SStatus 0 SControl 300)
[   15.243268] sr0: scsi3-mmc drive: 24x/8x writer dvd-ram cd/rw xa/form2 cdda tray
[   15.254152] cdrom: Uniform CD-ROM driver Revision: 3.20
[   15.685628] bio: create slab <bio-1> at 1
[   16.126307] EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null)
[   16.449579] EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null)
[   16.957050] systemd-journald[57]: Received SIGTERM

Welcome to Qubes 2 (R2)!

         Starting Replay Read-Ahead Data...
         Starting Collect Read-Ahead Data...
         Expecting device dev-hvc0.device...
[  OK  ] Reached target Remote File Systems.
[  OK  ] Listening on Syslog Socket.
[  OK  ] Reached target Syslog.
[  OK  ] Listening on /dev/initctl Compatibility Named Pipe.
[  OK  ] Listening on Delayed Shutdown Socket.
[  OK  ] Set up automount Arbitrary Executable File Formats F...utomount Point.
[  OK  ] Reached target Encrypted Volumes.
         Mounting Debug File System...
         Mounting POSIX Message Queue File System...
[  OK  ] Listening on LVM2 metadata daemon socket.
[  OK  ] Listening on Device-mapper event daemon FIFOs.
         Expecting device dev-mapper-qubes_test500\x2dlv_swap.device...
         Mounting Temporary Directory...
[  OK  ] Listening on udev Kernel Socket.
[  OK  ] Listening on udev Control Socket.
         Starting udev Coldplug all Devices...
         Starting udev Kernel Device Manager...
[  OK  ] Started Collect Read-Ahead Data.
[  OK  ] Stopped Trigger Flushing of Journal to Persistent Storage.
         Stopping Journal Service...
[  OK  ] Stopped Journal Service.
         Starting Journal Service...
[  OK  ] Started Journal Service.
[   19.382184] systemd-udevd[385]: starting version 197
[  OK  ] Started Replay Read-Ahead Data.
[  OK  ] Started udev Kernel Device Manager.
[  OK  ] Mounted Debug File System.
[  OK  ] Mounted POSIX Message Queue File System.
[  OK  ] Mounted Temporary Directory.
         Starting Load legacy module configuration...
         Starting Remount Root and Kernel File Systems...
         Starting Apply Kernel Variables...
         Starting Setup Virtual Console...
[  OK  ] Started udev Coldplug all Devices.
         Starting udev Wait for Complete Device Initialization...
G[   20.181173] EXT4-fs (dm-1): re-mounted. Opts: (null)
[  OK  ] Started Remount Root and Kernel File Systems.
[  OK  ] Reached target Local File Systems (Pre).
         Starting Configure read-only root support...
         Starting Load Random Seed...
[  OK  ] Started Apply Kernel Variables.
[  OK  ] Found device /dev/hvc0.
[  OK  ] Started Load Random Seed.
[   21.106473] microcode: CPU0 sig=0x206a7, pf=0x10, revision=0x28
[   21.362567] microcode: CPU1 sig=0x206a7, pf=0x10, revision=0x28
[   21.371365] microcode: CPU2 sig=0x206a7, pf=0x10, revision=0x28
[   21.380875] microcode: CPU3 sig=0x206a7, pf=0x10, revision=0x28
[   21.389794] microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba
[   21.566282] ACPI: Deprecated procfs I/F for AC is loaded, please retry with CONFIG_ACPI_PROCFS_POWER cleared
[   21.581082] ACPI: AC Adapter [AC] (on-line)
[   21.816001] dcdbas dcdbas: Dell Systems Management Base Driver (version 5.6.0-3.2)
[   21.824446] tpm_tis 00:0b: 1.2 TPM (device-id 0x2001, rev-id 32)
[   21.858550] ACPI: Deprecated procfs I/F for battery is loaded, please retry with CONFIG_ACPI_PROCFS_POWER cleared
[   21.869982] ACPI: Battery Slot [BAT0] (battery absent)
[   21.900587] ACPI: Deprecated procfs I/F for battery is loaded, please retry with CONFIG_ACPI_PROCFS_POWER cleared
[   21.915363] ACPI: Battery Slot [BAT1] (battery absent)
[   21.931898] input: PC Speaker as /devices/platform/pcspkr/input/input8
[   21.942478] thermal LNXTHERM:00: registered as thermal_zone0
[   21.951326] ACPI: Thermal Zone [THM] (25 C)
[   21.972130] ACPI: Deprecated procfs I/F for battery is loaded, please retry with CONFIG_ACPI_PROCFS_POWER cleared
[   21.983723] ACPI: Battery Slot [BAT2] (battery absent)
[   22.033106] Already setup the GSI :16
[   22.047522] ACPI Warning: 0x0000000000000428-0x000000000000042f SystemIO conflicts with Region \PMIO 1 (20120913/utaddress-251)
[   22.063073] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
[   22.077752] ACPI Warning: 0x0000000000000540-0x000000000000054f SystemIO conflicts with Region \GPIO 1 (20120913/utaddress-251)
[   22.090205] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
[   22.101455] ACPI Warning: 0x0000000000000530-0x000000000000053f SystemIO conflicts with Region \GPIO 1 (20120913/utaddress-251)
[   22.114329] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
[   22.125972] ACPI Warning: 0x0000000000000500-0x000000000000052f SystemIO conflicts with Region \GPIO 1 (20120913/utaddress-251)
[   22.138744] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
[   22.150021] lpc_ich: Resource conflict(s) found affecting gpio_ich
[   22.157511] Already setup the GSI :18
[   22.165125] ACPI Warning: [   22.167593] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
[  OK  ] Started Setup Virtual Console.0x00000000000040
40-0x000000000000405f SystemIO conflicts with Region \_SB_.PCI0.SBUS.SMBI 1 (20120913/utaddress-251)
         Starting Show Plymouth Boot Screen...
[   22.334874] parport_pc 00:09: reported by Plug and Play ACPI
[   22.344893] parport0: PC-style at 0x378, irq 7 [PCSPP,EPP]
[   22.351005] Serial: 8250/16550 driver, 8 ports, IRQ sharing disabled
[   22.578683] iTCO_vendor_support: vendor-support=0
[   22.704806] ppdev: user-space parallel port driver
[  OK  ] Started Configure read-only root support.
[  OK  ] Started Show Plymouth Boot Screen.
[   23.047579] input: Dell WMI hotkeys as /devices/virtual/input/input9
[   23.056453] iTCO_wdt: Intel TCO WatchDog Timer Driver v1.10
[   23.063945] iTCO_wdt: Found a Cougar Point TCO device (Version=2, TCOBASE=0x0460)
[   23.073388] iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0)
[   23.362380] Event-channel device installed.
[  OK  ] Found device /dev/mapper/qubes_test500-lv_swap.
         Activating swap /dev/mapper/qubes_test500-lv_swap...
[  OK  ] Started Load legacy module configuration.
[   24.166707] Adding 4947964k swap on /dev/mapper/qubes_test500-lv_swap.  Priority:-1 extents:1 across:4947964k 
[   24.183272] input: HDA Digital PCBeep as /devices/pci0000:00/0000:00:1b.0/input/input10
[  OK  ] Activated swap /dev/mapper/qubes_test500-lv_swap.
[  OK  ] Reached target Swap.
[   24.338497] input: HDA Intel PCH HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:1b.0/sound/card0/input11
[   24.353237] input: HDA Intel PCH HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:1b.0/sound/card0/input12
[   24.367655] input: HDA Intel PCH HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:1b.0/sound/card0/input13
[   24.380740] input: HDA Intel PCH Dock Mic as /devices/pci0000:00/0000:00:1b.0/sound/card0/input14
[   24.391069] input: HDA Intel PCH Mic as /devices/pci0000:00/0000:00:1b.0/sound/card0/input15
[   24.401174] input: HDA Intel PCH Headphone as /devices/pci0000:00/0000:00:1b.0/sound/card0/input16
[   24.411816] input: HDA Intel PCH Dock Line Out as /devices/pci0000:00/0000:00:1b.0/sound/card0/input17
[  OK  ] Reached target Sound Card.
[   53.064668] cdc_acm 3-1.6:1.1: ttyACM0: USB ACM device
[   53.074976] Linux video capture interface: v2.00
[   53.085225] uvcvideo: Found UVC 1.00 device Laptop_Integrated_Webcam_FHD (05ca:181c)
[   53.098384] cdc_wdm 3-1.6:1.5: cdc-wdm0: USB WDM device
[   53.099719] Bluetooth: Core ver 2.16
[   53.099731] NET: Registered protocol family 31
[   53.099732] Bluetooth: HCI device and connection manager initialized
[   53.099738] Bluetooth: HCI socket layer initialized
[   53.099740] Bluetooth: L2CAP socket layer initialized
[   53.099744] Bluetooth: SCO socket layer initialized
[   53.103964] usbcore: registered new interface driver btusb
[  OK  ] Reached target Bluetooth.
[   53.158010] input: Laptop_Integrated_Webcam_FHD as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.5/1-1.5:1.0/input/input18
[   53.158130] usbcore: registered new interface driver uvcvideo
[   53.158131] USB Video Class driver (1.1.1)
[   53.185687] usb 3-1.6: MAC-Address: 02:80:37:ec:02:00
[   53.185859] cdc_ncm 3-1.6:1.6 wwan0: register 'cdc_ncm' at usb-0000:00:1d.0-1.6, Mobile Broadband Network Device, 02:80:37:ec:02:00
[   53.185891] usbcore: registered new interface driver cdc_ncm
[   53.190943] cdc_acm 3-1.6:1.3: ttyACM1: USB ACM device
[   53.212171] cdc_acm 3-1.6:1.9: ttyACM2: USB ACM device
[   53.220872] usbcore: registered new interface driver cdc_acm
[   53.220873] cdc_acm: USB Abstract Control Model driver for USB modems and ISDN adapters
[   53.221062] cdc_wdm 3-1.6:1.8: cdc-wdm1: USB WDM device
[   53.221086] usbcore: registered new interface driver cdc_wdm
[  OK  ] Started udev Wait for Complete Device Initialization.
         Starting Initialize storage subsystems (RAID, LVM, etc.)...
[  OK  ] Started Initialize storage subsystems (RAID, LVM, etc.).
         Starting Initialize storage subsystems (RAID, LVM, etc.)...
[  OK  ] Started Initialize storage subsystems (RAID, LVM, etc.).
         Starting Monitoring of LVM2 mirrors, snapshots etc. ...ress polling...
[  OK  ] Started Monitoring of LVM2 mirrors, snapshots etc. u...ogress polling.
[  OK  ] Reached target Local File Systems.
         Starting Recreate Volatile Files and Directories...
         Starting Security Auditing Service...
         Starting Trigger Flushing of Journal to Persistent Storage...
         Starting Tell Plymouth To Write Out Runtime Data...
[  OK  ] Started Recreate Volatile Files and Directories.
[  OK  ] Started Security Auditing Service.
[  OK  ] Started Tell Plymouth To Write Out Runtime Data.
[  OK  ] Reached target System Initialization.
         Starting Console System Startup Logging...
[  OK  ] Listening on D-Bus System Message Bus Socket.
[  OK  ] Reached target Sockets.
[  OK  ] Reached target Basic System.
         Starting LSB: Start/stop xenstored...
         Starting firewalld - dynamic firewall daemon...
         Starting Machine Check Exception Logging Daemon...
         Starting Self Monitoring and Reporting Technology (SMART) Daemon...
[  OK  ] Started Self Monitoring and Reporting Technology (SMART) Daemon.
         Starting Qubes Dom0 startup setup...
         Starting Initialize hardware monitoring sensors...
         Starting watchdog daemon...
         Starting irqbalance daemon...
         Starting Login Service...
         Starting RealtimeKit Scheduling Policy Service...
         Starting System Logging Service...
[  OK  ] Started System Logging Service.
         Starting D-Bus System Message Bus...
[  OK  ] Started D-Bus System Message Bus.
         Starting Accounts Service...
[   55.158503] ip_tables: (C) 2000-2006 Netfilter Core Team
[  OK  ] Started Console System Startup Logging.
[  OK  ] Started LSB: Start/stop xenstored.
[   55.193003] ip6_tables: (C) 2000-2006 Netfilter Core Team
[  OK  ] Started Machine Check Exception Logging Daemon.
[   55.224259] Ebtables v2.0 registered
[  OK  ] Started Qubes Dom0 startup setup.
[  OK  ] Started watchdog daemon.
[  OK  ] Started irqbalance daemon.
         Starting Qubes memory management daemon...
         Starting Qubes DispVM startup setup...
         Starting Qubes memory information reporter...
[  OK  ] Started Qubes memory information reporter.
         Starting Qubes block device cleaner (xen front/back)...
[  OK  ] Started Qubes block device cleaner (xen front/back).
         Starting LSB: Start/stop xenconsoled...
[   55.418436] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
[   55.454133] systemd-journald[387]: Received SIGUSR1
[  OK  ] Started Initialize hardware monitoring sensors.
[  OK  ] Started Trigger Flushing of Journal to Persistent Storage.
[  OK  ] Started LSB: Start/stop xenconsoled.
         Starting Permit User Sessions...
         Starting Authorization Manager...
[  OK  ] Started Login Service.
[  OK  ] Started firewalld - dynamic firewall daemon.
[  OK  ] Started RealtimeKit Scheduling Policy Service.
[  OK  ] Started Permit User Sessions.
         Starting Command Scheduler...
[  OK  ] Started Command Scheduler.
         Starting Job spooling tools...
[  OK  ] Started Job spooling tools.
         Starting Wait for Plymouth Boot Screen to Quit...
         Starting Terminate Plymouth Boot Screen...
[  OK  ] Started Authorization Manager.
[  OK  ] Started Accounts Service.
[   57.154790] systemd-readahead[378]: Failed to read event: Value too large for defined data type
[   58.461861] xen-pciback: vpci: 0000:00:19.0: assign to virtual slot 0
[   58.468912] xen-pciback: vpci: 0000:02:00.0: assign to virtual slot 1
mapping kernel into physical memory
about to get started...
[   59.057861] pciback 0000:00:19.0: Driver tried to write to a read-only configuration space field at offset 0xd2, size 2. This may be harmless, but if you have problems with your device:
[   59.057861] 1) see permissive attribute in sysfs
[   59.057861] 2) report problems to the xen-devel mailing list along with details of your device obtained from lspci.
[   59.093423] pciback 0000:02:00.0: Driver tried to write to a read-only configuration space field at offset 0xd2, size 2. This may be harmless, but if you have problems with your device:
[   59.093423] 1) see permissive attribute in sysfs
[   59.093423] 2) report problems to the xen-devel mailing list along with details of your device obtained from lspci.
[   60.485484] xen-blkback:ring-ref 9, event-channel 28, protocol 1 (x86_64-abi)
[   60.501922] xen-blkback:ring-ref 10, event-channel 29, protocol 1 (x86_64-abi)
[   60.514535] xen-blkback:ring-ref 11, event-channel 30, protocol 1 (x86_64-abi)
[   60.526625] xen-blkback:ring-ref 12, event-channel 31, protocol 1 (x86_64-abi)

Qubes release 2 (R2)
Kernel 3.7.4-3.pvops.qubes.x86_64 on an x86_64 (hvc0)

dom0 login: [   67.255529] pciback 0000:00:19.0: enabling device (0000 -> 0003)
[   67.261583] Already setup the GSI :20
[   67.896825] pciback 0000:02:00.0: enabling device (0000 -> 0002)
[   67.902820] Already setup the GSI :17
(XEN) *** Serial input -> Xen (type 'CTRL-a' three times to switch input to DOM0)
mapping kernel into physical memory
about to get started...
[   72.543916] dell_wmi: Received unknown WMI event (0x0)
[   74.235014] xen-blkback:ring-ref 9, event-channel 28, protocol 1 (x86_64-abi)
[   74.259436] xen-blkback:ring-ref 10, event-channel 29, protocol 1 (x86_64-abi)
[   74.273215] xen-blkback:ring-ref 11, event-channel 30, protocol 1 (x86_64-abi)
[   74.295309] xen-blkback:ring-ref 12, event-channel 31, protocol 1 (x86_64-abi)
(XEN) 'd' pressed -> dumping registers
(XEN) 
(XEN) *** Dumping CPU0 guest state (d0:v0): ***
(XEN) ----[ Xen-4.1.5-rc1  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    0
(XEN) RIP:    e033:[<ffffffff8100146a>]
(XEN) RFLAGS: 0000000000000286   EM: 0   CONTEXT: pv guest
(XEN) rax: 0000000000000023   rbx: 00007fffff8c48c0   rcx: ffffffff8100146a
(XEN) rdx: 000000000061b0b0   rsi: 00000000004133ad   rdi: 0000000000a4e004
(XEN) rbp: ffff8803a8a05e98   rsp: ffff8803a8a05e00   r8:  00007f20f9e824f0
(XEN) r9:  ffff880401e08680   r10: 0000000000000000   r11: 0000000000000286
(XEN) r12: 0000000000000003   r13: ffff8803fad914d0   r14: 00007fffff8c48c0
(XEN) r15: 0000000000000003   cr0: 0000000080050033   cr4: 00000000000026f0
(XEN) cr3: 000000025ff45000   cr2: 00007f20f8bb6150
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e02b   cs: e033
(XEN) Guest stack trace from rsp=ffff8803a8a05e00:
(XEN)    0000000000000202 0000000000000000 ffffffff812cf062 ffff8803fc80f780
(XEN)    ffff8803a8a05f58 ffff8803a8a05f38 ffffffff8148f07c 0000000000000000
(XEN)    ffff8803a8878100 0000000000000023 0000000000a4e004 00000000004133ad
(XEN)    000000000061b0b0 0000000000000000 00007f20f9e824f0 ffff8803af91ae80
(XEN)    0000000000000003 ffff8803fad914d0 00007fffff8c48c0 ffff8803a8a05f28
(XEN)    ffffffff81154b73 ffff88041820b210 0000000000a40000 00007fffff8c4efb
(XEN)    0000000000a4b900 ffff8803a8a05f18 ffffffff810040b3 ffff88041820b210
(XEN)    ffff88041820ba10 ffff88041820b210 ffff88041820ba10 ffff8803fc96fe78
(XEN)    ffff8803af91ae80 0000000000000003 0000000000305000 00007fffff8c48c0
(XEN)    0000000000000000 ffff8803a8a05f78 ffffffff81154cbb ffff8803a8a05f48
(XEN)    000000008148f2d9 0000000000000000 0000000000a4b050 00007fffff8c4890
(XEN)    00007f20f8deb284 00007fffff8c4efb 0000000000a4b900 00007fffff8c4860
(XEN)    ffffffff814933a9 0000000000000202 00007fffff8c45f0 00000000000c0000
(XEN)    0000000000000000 0000000000000010 0000000000000000 00007fffff8c48c0
(XEN)    0000000000305000 0000000000000003 0000000000000010 00007f20f8b250c7
(XEN)    000000000000e033 0000000000000202 00007fffff8c4858 000000000000e02b
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 
(XEN) *** Dumping CPU1 host state: ***
(XEN) ----[ Xen-4.1.5-rc1  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    1
(XEN) RIP:    e008:[<ffff82c4801975b0>] lapic_timer_nop+0x0/0x6
(XEN) RFLAGS: 0000000000000246   CONTEXT: hypervisor
(XEN) rax: 0000000000003f40   rbx: ffff830421043650   rcx: 0000000000000001
(XEN) rdx: 0000000000000000   rsi: ffff830421041c80   rdi: 0000000098a26889
(XEN) rbp: ffff83042102fef0   rsp: ffff83042102fe68   r8:  00000019ae0a4d09
(XEN) r9:  ffff830007ef0060   r10: 00000019ee7364ba   r11: 0000ffff0000ffff
(XEN) r12: ffff830421043710   r13: 00000019eaa4c0c8   r14: 00000019eb3b6cc0
(XEN) r15: ffff830421041080   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 0000000265d1e000   cr2: 00007f30abbfb960
(XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff83042102fe68:
(XEN)    ffff82c480198427 ffff830421041200 00000000ca662000 ffffffffffffffff
(XEN)    aaaaaaaaaaaaaa00 aaaaaaaaaaaaaaaa 00000019ee7364ba 0000000000000000
(XEN)    0000000000000000 ffffffffffffffff 000023f5000172da 0000000000000000
(XEN)    ffff83042102ff18 ffff83042102ff18 00000000ffffffff 0000000000000002
(XEN)    ffff830421041080 ffff83042102ff10 ffff82c4801549ce ffff8300ca662000
(XEN)    ffff8300ca666000 ffff83042102fdc8 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000001 ffff880402105f00 ffff880402105fd8
(XEN)    0000000000000246 0000000000000001 0000000000000000 0000000000000000
(XEN)    0000000000000000 ffffffff810013aa ffffffff81a2a858 00000000deadbeef
(XEN)    00000000deadbeef 0000010000000000 ffffffff810013aa 000000000000e033
(XEN)    0000000000000246 ffff880402105ee8 000000000000e02b 000000000020fea1
(XEN)    000000000020fea0 000000000020fe9f 000000000020fe9e 0000000000000001
(XEN)    ffff8300ca662000 0000003fa0d63a80 000000000020fe9a
(XEN) Xen call trace:
(XEN)    [<ffff82c4801975b0>] lapic_timer_nop+0x0/0x6
(XEN)    [<ffff82c4801549ce>] idle_loop+0x4b/0x59
(XEN)    
(XEN) *** Dumping CPU2 host state: ***
(XEN) ----[ Xen-4.1.5-rc1  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    2
(XEN) RIP:    e008:[<ffff82c4801975b0>] lapic_timer_nop+0x0/0x6
(XEN) RFLAGS: 0000000000000246   CONTEXT: hypervisor
(XEN) rax: 0000000000003f40   rbx: ffff830421043a60   rcx: 0000000000000001
(XEN) rdx: 0000000000000000   rsi: ffff830421026c80   rdi: 00000000b296c539
(XEN) rbp: ffff83042101fef0   rsp: ffff83042101fe68   r8:  00000019ae0a4d09
(XEN) r9:  ffff830007a76060   r10: 0000001a04e49482   r11: 0000ffff0000ffff
(XEN) r12: ffff830421043b20   r13: 00000019f1935d2f   r14: 00000019f5a32eb7
(XEN) r15: ffff830421026080   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 000000040b51d000   cr2: ffff88000a593310
(XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff83042101fe68:
(XEN)    ffff82c480198427 ffff88040210fd30 00000000ca2fc000 ffffffffffffffff
(XEN)    aaaaaaaaaaaaaa00 aaaaaaaaaaaaaaaa 0000001a04e49482 0000000000000000
(XEN)    0000000000000000 ffffffffffffffff 00000b5a00004eda 0000000000000000
(XEN)    ffff83042101ff18 ffff83042101ff18 00000000ffffffff 0000000000000002
(XEN)    ffff830421026080 ffff83042101ff10 ffff82c4801549ce ffff8300ca2fc000
(XEN)    ffff8300ca61a000 ffff83042101fdc8 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000003 ffff88040210ff00 ffff88040210ffd8
(XEN)    0000000000000246 0000000000000001 0000000000000000 0000000000000000
(XEN)    0000000000000000 ffffffff810013aa ffffffff81a2a858 00000000deadbeef
(XEN)    00000000deadbeef 0000010000000000 ffffffff810013aa 000000000000e033
(XEN)    0000000000000246 ffff88040210fee8 000000000000e02b 000000000020dea1
(XEN)    000000000020dea0 000000000020de9f 000000000020de9e 0000000000000002
(XEN)    ffff8300ca2fc000 0000003fa0d48a80 000000000020de9a
(XEN) Xen call trace:
(XEN)    [<ffff82c4801975b0>] lapic_timer_nop+0x0/0x6
(XEN)    [<ffff82c4801549ce>] idle_loop+0x4b/0x59
(XEN)    
(XEN) *** Dumping CPU3 host state: ***
(XEN) ----[ Xen-4.1.5-rc1  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    3
(XEN) RIP:    e008:[<ffff82c4801975b0>] lapic_timer_nop+0x0/0x6
(XEN) RFLAGS: 0000000000000246   CONTEXT: hypervisor
(XEN) rax: 0000000000003f40   rbx: ffff83042100a010   rcx: 0000000000000001
(XEN) rdx: 0000000000000000   rsi: ffff830421020c80   rdi: 00000000cc88483a
(XEN) rbp: ffff830421017ef0   rsp: ffff830421017e68   r8:  00000019ae0a4d09
(XEN) r9:  ffff830007a78060   r10: 0000001a0f45c0d0   r11: 0000ffff0000ffff
(XEN) r12: ffff83042100a0d0   r13: 00000019fe1b130f   r14: 0000001a0009cc25
(XEN) r15: ffff830421020080   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 0000000116b88000   cr2: 00007f30abbfb960
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff830421017e68:
(XEN)    ffff82c480198427 ffff830007ef4000 00000000ca9a2000 ffffffffffffffff
(XEN)    ffff82c4802b8a00 ffff82c4802b8880 ffffffffffffffff 0000000000000000
(XEN)    0000000000000000 ffff830421017ee0 00000a7c000021ef ffff830421017f18
(XEN)    ffff830421017f18 ffff830421017f18 00000000ffffffff 0000000000000002
(XEN)    ffff830421020080 ffff830421017f10 ffff82c4801549ce ffff8300ca9a2000
(XEN)    ffff830007ef4000 ffff830421017dc8 0000000000000000 ffff88000c766c80
(XEN)    ffffffff81ada2e0 0000000000000000 ffffffff81a01e88 ffffffff81a01fd8
(XEN)    0000000000000246 0000000000000001 0000000000000000 0000000000000000
(XEN)    0000000000000000 ffffffff810013aa ffffffff81a2a858 00000000deadbeef
(XEN)    00000000deadbeef 0000010000000000 ffffffff810013aa 000000000000e033
(XEN)    0000000000000246 ffffffff81a01e70 000000000000e02b 000000000020cea1
(XEN)    000000000020cea0 000000000020ce9f 000000000020ce9e 0000000000000003
(XEN)    ffff8300ca9a2000 0000003fa0d42a80 000000000020ce9a
(XEN) Xen call trace:
(XEN)    [<ffff82c4801975b0>] lapic_timer_nop+0x0/0x6
(XEN)    [<ffff82c4801549ce>] idle_loop+0x4b/0x59
(XEN)    
(XEN) 'c' pressed -> printing ACPI Cx structures
(XEN) ==cpu0==
(XEN) active state:		C1
(XEN) max_cstate:		C7
(XEN) states:
(XEN)     C1:	type[C1] latency[000] usage[00015106] method[ HALT] duration[3487008611]
(XEN)    *C2:	type[C2] latency[080] usage[00002882] method[SYSIO] duration[1309079505]
(XEN)    *C3:	type[C3] latency[109] usage[00025035] method[SYSIO] duration[81515998645]
(XEN)     C0:	usage[00043018] duration[25573511918]
(XEN) PC3[0] PC6[0] PC7[0]
(XEN) CC3[0] CC6[0]
(XEN) ==cpu1==
(XEN) active state:		C3
(XEN) max_cstate:		C7
(XEN) states:
(XEN)     C1:	type[C1] latency[000] usage[00017879] method[ HALT] duration[4365237847]
(XEN)     C2:	type[C2] latency[080] usage[00003062] method[SYSIO] duration[1658518403]
(XEN)    *C3:	type[C3] latency[109] usage[00021490] method[SYSIO] duration[80308964622]
(XEN)     C0:	usage[00042430] duration[25591765423]
(XEN) PC3[0] PC6[0] PC7[0]
(XEN) CC3[0] CC6[0]
(XEN) ==cpu2==
(XEN) active state:		C3
(XEN) max_cstate:		C7
(XEN) states:
(XEN)     C1:	type[C1] latency[000] usage[00023770] method[ HALT] duration[5010549092]
(XEN)     C2:	type[C2] latency[080] usage[00003092] method[SYSIO] duration[1851960502]
(XEN)    *C3:	type[C3] latency[109] usage[00022457] method[SYSIO] duration[80283005727]
(XEN)     C0:	usage[00049319] duration[24817617896]
(XEN) PC3[0] PC6[0] PC7[0]
(XEN) CC3[0] CC6[0]
(XEN) ==cpu3==
(XEN) active state:		C3
(XEN) max_cstate:		C7
(XEN) states:
(XEN)     C1:	type[C1] latency[000] usage[00017217] method[ HALT] duration[4455153364]
(XEN)     C2:	type[C2] latency[080] usage[00002789] method[SYSIO] duration[1354471895]
(XEN)     C3:	type[C3] latency[109] usage[00022043] method[SYSIO] duration[81489684715]
(XEN)     C0:	usage[00042046] duration[24702472966]
(XEN) PC3[0] PC6[0] PC7[0]
(XEN) CC3[0] CC6[0]
(XEN) Guest interrupt information:
(XEN)    IRQ:   0 affinity:00000000,00000000,00000000,00000001 vec:f0 type=IO-APIC-edge    status=00000000 mapped, unbound
(XEN)    IRQ:   1 affinity:00000000,00000000,00000000,00000008 vec:22 type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0:  1(----),
(XEN)    IRQ:   2 affinity:ffffffff,ffffffff,ffffffff,ffffffff vec:e2 type=XT-PIC          status=00000000 mapped, unbound
(XEN)    IRQ:   3 affinity:00000000,00000000,00000000,00000001 vec:40 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:   4 affinity:00000000,00000000,00000000,00000001 vec:f1 type=IO-APIC-edge    status=00000000 mapped, unbound
(XEN)    IRQ:   5 affinity:00000000,00000000,00000000,00000001 vec:48 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:   6 affinity:00000000,00000000,00000000,00000001 vec:50 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:   7 affinity:00000000,00000000,00000000,00000002 vec:ba type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0:  7(----),
(XEN)    IRQ:   8 affinity:00000000,00000000,00000000,00000001 vec:60 type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0:  8(----),
(XEN)    IRQ:   9 affinity:00000000,00000000,00000000,00000001 vec:b0 type=IO-APIC-level   status=00000050 in-flight=0 domain-list=0:  9(----),
(XEN)    IRQ:  10 affinity:00000000,00000000,00000000,00000001 vec:70 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  11 affinity:00000000,00000000,00000000,00000001 vec:78 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  12 affinity:00000000,00000000,00000000,00000002 vec:21 type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0: 12(----),
(XEN)    IRQ:  13 affinity:00000000,00000000,00000000,0000000f vec:90 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  14 affinity:00000000,00000000,00000000,00000001 vec:98 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  15 affinity:00000000,00000000,00000000,00000001 vec:a0 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  16 affinity:00000000,00000000,00000000,00000002 vec:83 type=IO-APIC-level   status=00000050 in-flight=0 domain-list=0: 16(-S--),
(XEN)    IRQ:  17 affinity:00000000,00000000,00000000,00000008 vec:3b type=IO-APIC-level   status=00000050 in-flight=0 domain-list=0: 17(-S--),
(XEN)    IRQ:  18 affinity:00000000,00000000,00000000,00000008 vec:39 type=IO-APIC-level   status=00000002 mapped, unbound
(XEN)    IRQ:  19 affinity:00000000,00000000,00000000,0000000f vec:c8 type=IO-APIC-level   status=00000002 mapped, unbound
(XEN)    IRQ:  20 affinity:00000000,00000000,00000000,00000008 vec:9d type=IO-APIC-level   status=00000050 in-flight=0 domain-list=0: 20(-S--),
(XEN)    IRQ:  22 affinity:00000000,00000000,00000000,0000000f vec:c2 type=IO-APIC-level   status=00000002 mapped, unbound
(XEN)    IRQ:  23 affinity:00000000,00000000,00000000,0000000f vec:a8 type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  24 affinity:00000000,00000000,00000000,00000001 vec:28 type=DMA_MSI         status=00000000 mapped, unbound
(XEN)    IRQ:  25 affinity:00000000,00000000,00000000,00000001 vec:30 type=DMA_MSI         status=00000000 mapped, unbound
(XEN)    IRQ:  26 affinity:00000000,00000000,00000000,00000001 vec:25 type=PCI-MSI         status=00000010 in-flight=0 domain-list=0:279(PS--),
(XEN)    IRQ:  27 affinity:00000000,00000000,00000000,00000008 vec:71 type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:278(-S--),
(XEN)    IRQ:  28 affinity:00000000,00000000,00000000,00000008 vec:79 type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:277(-S--),
(XEN)    IRQ:  29 affinity:00000000,00000000,00000000,00000008 vec:81 type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:276(-S--),
(XEN)    IRQ:  30 affinity:00000000,00000000,00000000,00000008 vec:89 type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:275(-S--),
(XEN)    IRQ:  31 affinity:00000000,00000000,00000000,00000008 vec:5c type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:274(-S--),
(XEN)    IRQ:  32 affinity:00000000,00000000,00000000,00000004 vec:50 type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:273(-S--),
(XEN)    IRQ:  33 affinity:00000000,00000000,00000000,00000002 vec:b2 type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:272(-S--),
(XEN)    IRQ:  34 affinity:00000000,00000000,00000000,00000002 vec:d2 type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:271(-S--),
(XEN)    IRQ:  35 affinity:00000000,00000000,00000000,00000004 vec:dc type=PCI-MSI         status=00000010 in-flight=0 domain-list=1: 55(----),
(XEN) IO-APIC interrupt information:
(XEN)     IRQ  0 Vec240:
(XEN)       Apic 0x00, Pin  2: vec=f0 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  1 Vec 34:
(XEN)       Apic 0x00, Pin  1: vec=22 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  3 Vec 64:
(XEN)       Apic 0x00, Pin  3: vec=40 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  4 Vec241:
(XEN)       Apic 0x00, Pin  4: vec=f1 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  5 Vec 72:
(XEN)       Apic 0x00, Pin  5: vec=48 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  6 Vec 80:
(XEN)       Apic 0x00, Pin  6: vec=50 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  7 Vec186:
(XEN)       Apic 0x00, Pin  7: vec=ba delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  8 Vec 96:
(XEN)       Apic 0x00, Pin  8: vec=60 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  9 Vec176:
(XEN)       Apic 0x00, Pin  9: vec=b0 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=L mask=0 dest_id:0
(XEN)     IRQ 10 Vec112:
(XEN)       Apic 0x00, Pin 10: vec=70 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 11 Vec120:
(XEN)       Apic 0x00, Pin 11: vec=78 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 12 Vec 33:
(XEN)       Apic 0x00, Pin 12: vec=21 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 13 Vec144:
(XEN)       Apic 0x00, Pin 13: vec=90 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=1 dest_id:0
(XEN)     IRQ 14 Vec152:
(XEN)       Apic 0x00, Pin 14: vec=98 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 15 Vec160:
(XEN)       Apic 0x00, Pin 15: vec=a0 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 16 Vec131:
(XEN)       Apic 0x00, Pin 16: vec=83 delivery=LoPri dest=L status=0 polarity=1 irr=0 trig=L mask=0 dest_id:0
(XEN)     IRQ 17 Vec 59:
(XEN)       Apic 0x00, Pin 17: vec=3b delivery=LoPri dest=L status=0 polarity=1 irr=0 trig=L mask=0 dest_id:0
(XEN)     IRQ 18 Vec 57:
(XEN)       Apic 0x00, Pin 18: vec=39 delivery=LoPri dest=L status=0 polarity=1 irr=0 trig=L mask=1 dest_id:0
(XEN)     IRQ 19 Vec200:
(XEN)       Apic 0x00, Pin 19: vec=c8 delivery=LoPri dest=L status=0 polarity=1 irr=0 trig=L mask=1 dest_id:0
(XEN)     IRQ 20 Vec157:
(XEN)       Apic 0x00, Pin 20: vec=9d delivery=LoPri dest=L status=0 polarity=1 irr=0 trig=L mask=0 dest_id:0
(XEN)     IRQ 22 Vec194:
(XEN)       Apic 0x00, Pin 22: vec=c2 delivery=LoPri dest=L status=0 polarity=1 irr=0 trig=L mask=1 dest_id:0
(XEN)     IRQ 23 Vec168:
(XEN)       Apic 0x00, Pin 23: vec=a8 delivery=LoPri dest=L status=0 polarity=0 irr=0 trig=E mask=1 dest_id:0
[  123.064409] PM: Syncing filesystems ... done.
[  123.124075] Freezing user space processes ... (elapsed 0.01 seconds) done.
[  123.145929] Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
[  123.166637] sd 1:0:0:0: [sdb] Synchronizing SCSI cache
[  123.168019] xhci_hcd 0000:03:00.0: WARN Event TRB for slot 1 ep 3 with no TDs queued?
[  123.168090] xhci_hcd 0000:03:00.0: WARN Event TRB for slot 1 ep 2 with no TDs queued?
[  123.168168] xhci_hcd 0000:03:00.0: WARN Event TRB for slot 1 ep 0 with no TDs queued?
[  123.195516] sd 1:0:0:0: [sdb] Stopping disk
[  123.591705] parport_pc 00:09: disabled
(XEN) mce_intel.c:1162: MCA Capability: BCAST 1 SER 0 CMCI 1 firstbank 0 extended MCE MSR 0
(XEN) CPU0 CMCI LVT vector (0xf7) already installed
(XEN) CPU0: Thermal LVT vector (0xfa) already installed
(XEN) Finishing wakeup from ACPI S3 state.
(XEN) Enabling non-boot CPUs  ...
[  124.460159] Broke affinity for irq 16
[  124.460175] Broke affinity for irq 17
[  124.460177] Broke affinity for irq 68
[  124.460179] Broke affinity for irq 74
[  125.605393] ACPI: Low-level resume complete
[  125.624465] PM: Restoring platform NVS memory
[  125.709964] Enabling non-boot CPUs ...
[  125.713783] installing Xen timer for CPU 1
[  125.723180] CPU1 is up
[  125.755647] installing Xen timer for CPU 2
[  125.764981] CPU2 is up
[  125.797455] installing Xen timer for CPU 3
[  125.806777] CPU3 is up
[  125.810157] ACPI: Waking up from system sleep state S3
[  126.431228] ehci_hcd 0000:00:1a.0: wake-up capability disabled by ACPI
[  126.503226] ehci_hcd 0000:00:1d.0: wake-up capability disabled by ACPI
[  126.585218] xhci_hcd 0000:03:00.0: wake-up capability disabled by ACPI
[  126.592091] PM: noirq resume of devices complete after 238.720 msecs
[  126.598468] PM: early resume of devices complete after 0.079 msecs
[  126.636074] Already setup the GSI :20
[  126.639672] Already setup the GSI :16
[  126.643401] Already setup the GSI :22
[  126.647320] Already setup the GSI :17
[  126.650997] Already setup the GSI :18
[  126.654741] usb usb2: root hub lost power or was reset
[  126.659818] usb usb4: root hub lost power or was reset
(XEN) Assertion 'test_bit(vector, cfg->used_vectors)' failed at io_apic.c:542
(XEN) ----[ Xen-4.1.5-rc1  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82c48015e288>] smp_irq_move_cleanup_interrupt+0x1c3/0x23d
(XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: 00000000000000e9   rcx: ffff82c48029ff18
(XEN) rdx: 00000000000000e9   rsi: 0000000000000286   rdi: ffff830421060538
(XEN) rbp: ffff82c48029fb28   rsp: ffff82c48029fad8   r8:  0000000000000008
(XEN) r9:  00000000ffffffff   r10: ffff82c48021d160   r11: 0000000000000246
(XEN) r12: ffff830421080250   r13: ffff830421060534   r14: ffff82c48029ff18
(XEN) r15: ffff82c4802dd9e0   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 0000000265c5d000   cr2: ffff8804020701f8
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff82c48029fad8:
(XEN)    0000000000000000 00000000020701f0 ffff82c48029ff18 ffff82c4802dd9e0
(XEN)    ffff82c48029fe58 0000000000000001 0000000000000050 0000000080030034
(XEN)    0000000000000000 0000000000000003 00007d3b7fd604a7 ffff82c48014de60
(XEN)    0000000000000003 0000000000000000 0000000080030034 0000000000000050
(XEN)    ffff82c48029fbe8 0000000000000001 0000000000000246 ffff82c48021d160
(XEN)    00000000ffffffff 0000000000000008 0000000000000000 0000000000000001
(XEN)    0000000000000cfc 0000000000000286 ffff82c48025a9c0 0000002000000000
(XEN)    ffff82c4801226c0 000000000000e008 0000000000000286 ffff82c48029fbe8
(XEN)    000000000000e010 0000000000000286 ffff82c48029fc18 ffff82c480175890
(XEN)    ffff82c48029fc18 0000000000000034 0000000000000030 0000000000000000
(XEN)    ffff82c48029fc38 ffff82c48021042e ffff82c48029fc58 ffff82c480175890
(XEN)    ffff82c48029fc88 ffff82c48013db18 000000000000002f 1100000000000082
(XEN)    ffff82c48029fc78 0000000000000003 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 ffff82c48029fcc8 ffff82c48015fd16
(XEN)    ffff82c48029fe10 ffff830421050ac0 ffff82c48029fe10 0000000000000003
(XEN)    0000000000000000 0000000000000000 ffff82c48029fd58 ffff82c4801601bc
(XEN)    000000000000002f 0000000000000082 000782c48029fd08 ffff82c48029fe10
(XEN)    0000006a00000008 ffff82c48029fe78 0000000300000068 0000000000000000
(XEN)    0000000000000202 ffff82c48029fe78 ffff82c48029fe10 ffff82c48029fe78
(XEN)    ffff82c48029fe10 ffff830421050ac0 0000000000000000 000000000000001d
(XEN) Xen call trace:
(XEN)    [<ffff82c48015e288>] smp_irq_move_cleanup_interrupt+0x1c3/0x23d
(XEN)    [<ffff82c48014de60>] irq_move_cleanup_interrupt+0x30/0x40
(XEN)    [<ffff82c4801226c0>] _spin_unlock_irqrestore+0x22/0x24
(XEN)    [<ffff82c480175890>] pci_conf_read+0xb0/0xc1
(XEN)    [<ffff82c48021042e>] pci_conf_read8+0x7e/0x80
(XEN)    [<ffff82c48013db18>] pci_find_cap_offset+0x58/0xaf
(XEN)    [<ffff82c48015fd16>] msix_set_enable+0x4e/0xb2
(XEN)    [<ffff82c4801601bc>] msix_capability_init+0xa6/0x5fa
(XEN)    [<ffff82c48016102f>] pci_enable_msi+0x19b/0x49b
(XEN)    [<ffff82c4801642fd>] map_domain_pirq+0x281/0x3df
(XEN)    [<ffff82c48017650b>] do_physdev_op+0xa2b/0x1508
(XEN)    [<ffff82c480209ef8>] syscall_enter+0xc8/0x122
(XEN)    
(XEN) 
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion 'test_bit(vector, cfg->used_vectors)' failed at io_apic.c:542
(XEN) ****************************************
(XEN) 
(XEN) Reboot in five seconds...
(XEN) Resetting with ACPI MEMORY or I/O RESET_REG.

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 553 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-26 16:12                   ` Andrew Cooper
@ 2013-03-26 16:47                     ` Marek Marczykowski
  0 siblings, 0 replies; 68+ messages in thread
From: Marek Marczykowski @ 2013-03-26 16:47 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Konrad Rzeszutek Wilk, Jan Beulich, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 5031 bytes --]

On 26.03.2013 17:12, Andrew Cooper wrote:
> On 26/03/2013 15:47, Andrew Cooper wrote:
>> On 26/03/2013 13:50, Marek Marczykowski wrote:
>>> On 26.03.2013 14:11, Jan Beulich wrote:
>>>>>>> On 26.03.13 at 13:17, Marek Marczykowski <marmarek@invisiblethingslab.com> wrote:
>>>>> Finally got serial console :)
>>>>> The debug=y problem is (actually at resume):
>>>>> (XEN) Assertion 'test_bit(vector, cfg->used_vectors)' failed at io_apic.c:542
>>>>> (XEN) ----[ Xen-4.1.5-rc1  x86_64  debug=y  Tainted:    C ]----
>>>>> (XEN) CPU:    0
>>>>> (XEN) RIP:    e008:[<ffff82c48015e288>] 
>>>>> smp_irq_move_cleanup_interrupt+0x1c3/0x23d
>>>>> (XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
>>>>> (XEN) rax: 0000000000000000   rbx: 00000000000000e9   rcx: ffff82c48029ff18
>>>>> (XEN) rdx: 00000000000000e9   rsi: 000000000000002a   rdi: ffff830421060538
>>>>> (XEN) rbp: ffff82c48029ff08   rsp: ffff82c48029feb8   r8:  ffff88041820eb60
>>>>> (XEN) r9:  0000000000000000   r10: 0000000000007ff0   r11: 0000000000000000
>>>>> (XEN) r12: ffff830421080250   r13: ffff830421060534   r14: ffff82c48029ff18
>>>>> (XEN) r15: ffff82c4802dd9e0   cr0: 000000008005003b   cr4: 00000000000026f0
>>>>> (XEN) cr3: 0000000300b81000   cr2: ffff880402070198
>>>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>>>>> (XEN) Xen stack trace from rsp=ffff82c48029feb8:
>>>>> (XEN)    0000000000000000 000000000000e030 ffff82c48029ff18 ffff82c4802dd9e0
>>>>> (XEN)    ffff8802cac3c7c0 00000000ffff3729 00000000ffff3729 000000013fff3728
>>>>> (XEN)    ffffffff81b907c0 00000000ffff3729 00007d3b7fd600c7 ffff82c48014de60
>>>>> (XEN)    00000000ffff3729 ffffffff81b907c0 000000013fff3728 00000000ffff3729
>>>>> (XEN)    ffffffff81a01e18 00000000ffff3729 0000000000000000 0000000000007ff0
>>>>> (XEN)    0000000000000000 ffff88041820eb60 ffff8803fd1820a8 ffffffff81b90a88
>>>>> (XEN)    000000000000002a 000000000000002a 00000000ffff372a 0000002000000000
>>>>> (XEN)    ffffffff8105dd5a 000000000000e033 0000000000000246 ffffffff81a01db8
>>>>> (XEN)    000000000000e02b 0000000000000000 0000000000000000 0000000000000000
>>>>> (XEN)    0000000000000000 0000000000000000 ffff8300ca9a0000 0000000000000000
>>>>> (XEN)    0000000000000000
>>>>> (XEN) Xen call trace:
>>>>> (XEN)    [<ffff82c48015e288>] smp_irq_move_cleanup_interrupt+0x1c3/0x23d
>>>>> (XEN)
>>>>> (XEN)
>>>>> (XEN) ****************************************
>>>>> (XEN) Panic on CPU 0:
>>>>> (XEN) Assertion 'test_bit(vector, cfg->used_vectors)' failed at io_apic.c:542
>>>>> (XEN) ****************************************
>>>> To make sense of this, we need to know the register (and maybe
>>>> stack) allocation at this point, to know which vector it was that
>>>> triggered the assertion. You can either do this analysis for us, or
>>>> point us at the xen-syms binary matching the xen.gz you used.
>>> "info scope smp_irq_move_cleanup_interrupt" said vector is in %rbx, so 0xe9.
>>>
>>>> From the register values, the most likely candidates are vector 0xe9
>>>> and 0x2a. The former having two registers set to this value seems
>>>> more likely from than angle, but vectors in the 0xe? range should
>>>> never end up in smp_irq_move_cleanup_interrupt().
>>>>
>>>> And if it's the 0x2a one, then we'd need to know what IRQ it was
>>>> last used for. That can't be reconstructed from the data above, so
>>>> would require you being able to reproduce this and adding some
>>>> instrumentation to the code.
>>>>
>>>> Jan
>>>>
>> Could it be something to do with switching virtual wire mode, and having
>> PIC compatibility stuff left in the IO-APIC after leaving the BIOS but
>> before starting back up again?
>>
>> Looking at the stack dump, there is an extra exception frame under what
>> is printed by the assertion failure.
>>
>> 0000002000000000 TRAP_syscall
> 
> Apologies - this is a vector 0x20 interrupt, not TRAP_syscall, which
> makes sense as 0x20 is FIRST_DYNAMIC_IRQ which is also the cleanup IPI
> vector.
> 
> The other comments still stand, espcially as we appear to be
> interrupting dom0 which is already running.

Indeed, dom0 is running at this stage (see log in my second email).

> 
> ~Andrew
> 
>> ffffffff81a01db8 guest kernel addr
>> 0000000000000246 FLAGS
>> 000000000000e033 FLAT_RING3_CS64
>> ffffffff8105dd5a guest kernel addr
>> 000000000000e02b FLAT_RING3_SS{64,32}
>>
>> So it appears that we are already executing a guest (presumably dom0) by the time this assertion occurs.  From the serial, is there any indication that dom0 has started up again?
>>
>> I would have thought that we should have successfully reset the IO-APIC back up properly before we would ever get back around to executing dom0.
>>
>> ~Andrew
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
> 


-- 
Best Regards / Pozdrawiam,
Marek Marczykowski
Invisible Things Lab


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 553 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-26 16:45                   ` Marek Marczykowski
@ 2013-03-26 17:02                     ` Andrew Cooper
  2013-03-26 17:42                       ` Marek Marczykowski
  0 siblings, 1 reply; 68+ messages in thread
From: Andrew Cooper @ 2013-03-26 17:02 UTC (permalink / raw)
  To: Marek Marczykowski; +Cc: xen-devel, Jan Beulich, Konrad Rzeszutek Wilk

On 26/03/2013 16:45, Marek Marczykowski wrote:
> On 26.03.2013 17:03, Jan Beulich wrote:
>>>>> On 26.03.13 at 14:50, Marek Marczykowski <marmarek@invisiblethingslab.com>
>> wrote:
>>> On 26.03.2013 14:11, Jan Beulich wrote:
>>>>>>> On 26.03.13 at 13:17, Marek Marczykowski <marmarek@invisiblethingslab.com> 
>>> wrote:
>>>>> Finally got serial console :)
>>>>> The debug=y problem is (actually at resume):
>>>>> (XEN) Assertion 'test_bit(vector, cfg->used_vectors)' failed at io_apic.c:542
>>>>> (XEN) ----[ Xen-4.1.5-rc1  x86_64  debug=y  Tainted:    C ]----
>>>>> (XEN) CPU:    0
>>>>> (XEN) RIP:    e008:[<ffff82c48015e288>] 
>>>>> smp_irq_move_cleanup_interrupt+0x1c3/0x23d
>>>>> (XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
>>>>> (XEN) rax: 0000000000000000   rbx: 00000000000000e9   rcx: ffff82c48029ff18
>>>>> (XEN) rdx: 00000000000000e9   rsi: 000000000000002a   rdi: ffff830421060538
>>>>> (XEN) rbp: ffff82c48029ff08   rsp: ffff82c48029feb8   r8:  ffff88041820eb60
>>>>> (XEN) r9:  0000000000000000   r10: 0000000000007ff0   r11: 0000000000000000
>>>>> (XEN) r12: ffff830421080250   r13: ffff830421060534   r14: ffff82c48029ff18
>>>>> (XEN) r15: ffff82c4802dd9e0   cr0: 000000008005003b   cr4: 00000000000026f0
>>>>> (XEN) cr3: 0000000300b81000   cr2: ffff880402070198
>>>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>>>>> (XEN) Xen stack trace from rsp=ffff82c48029feb8:
>>>>> (XEN)    0000000000000000 000000000000e030 ffff82c48029ff18 ffff82c4802dd9e0
>>>>> (XEN)    ffff8802cac3c7c0 00000000ffff3729 00000000ffff3729 000000013fff3728
>>>>> (XEN)    ffffffff81b907c0 00000000ffff3729 00007d3b7fd600c7 ffff82c48014de60
>>>>> (XEN)    00000000ffff3729 ffffffff81b907c0 000000013fff3728 00000000ffff3729
>>>>> (XEN)    ffffffff81a01e18 00000000ffff3729 0000000000000000 0000000000007ff0
>>>>> (XEN)    0000000000000000 ffff88041820eb60 ffff8803fd1820a8 ffffffff81b90a88
>>>>> (XEN)    000000000000002a 000000000000002a 00000000ffff372a 0000002000000000
>>>>> (XEN)    ffffffff8105dd5a 000000000000e033 0000000000000246 ffffffff81a01db8
>>>>> (XEN)    000000000000e02b 0000000000000000 0000000000000000 0000000000000000
>>>>> (XEN)    0000000000000000 0000000000000000 ffff8300ca9a0000 0000000000000000
>>>>> (XEN)    0000000000000000
>>>>> (XEN) Xen call trace:
>>>>> (XEN)    [<ffff82c48015e288>] smp_irq_move_cleanup_interrupt+0x1c3/0x23d
>>>>> (XEN)
>>>>> (XEN)
>>>>> (XEN) ****************************************
>>>>> (XEN) Panic on CPU 0:
>>>>> (XEN) Assertion 'test_bit(vector, cfg->used_vectors)' failed at io_apic.c:542
>>>>> (XEN) ****************************************
>>>> To make sense of this, we need to know the register (and maybe
>>>> stack) allocation at this point, to know which vector it was that
>>>> triggered the assertion. You can either do this analysis for us, or
>>>> point us at the xen-syms binary matching the xen.gz you used.
>>> "info scope smp_irq_move_cleanup_interrupt" said vector is in %rbx, so 0xe9.
>> And that system isn't using a strange mixed mode IO-APIC/legacy
>> PIC model, where particularly IRQ 9 (usually ACPI SCI) gets
>> channeled through the legacy PIC?
> I don't know...
>
>> Could you attach the complete log, ideally with 'i' output logged
>> right before suspending?
> Sure, attached.
>
>> Is this reproducible with 4.2.x or 4.3-unstable? If not, but if readily
>> reproducible with 4.1.5-rc1, could you try changing the containing
>> loop's upper bound from "< NR_VECTORS" to
>> "<= LAST_DYNAMIC_VECTOR"?
> I've tried 4.2.x some time ago and bug also exists there (but I had not
> console, so not sure if exactly the same). 4.3 seems to be not affected.
>

Can you replace the ASSERT() with code similar to that in

http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/irq.c;h=5e0f463c381750090373dabd8967635bc297d457;hb=refs/heads/staging#l668

Which should call dump_irqs() in before dying because of the ASSERT. 
You might need to also take the latest version of dump_irqs() from
unstable, as I seem to remember there was another assertion failure due
to xfree()'ing in IRQ context.

~Andrew

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-26 17:02                     ` Andrew Cooper
@ 2013-03-26 17:42                       ` Marek Marczykowski
  2013-03-26 17:54                         ` Andrew Cooper
  0 siblings, 1 reply; 68+ messages in thread
From: Marek Marczykowski @ 2013-03-26 17:42 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Jan Beulich, Konrad Rzeszutek Wilk


[-- Attachment #1.1: Type: text/plain, Size: 15318 bytes --]

On 26.03.2013 18:02, Andrew Cooper wrote:
> On 26/03/2013 16:45, Marek Marczykowski wrote:
>> On 26.03.2013 17:03, Jan Beulich wrote:
>>>>>> On 26.03.13 at 14:50, Marek Marczykowski <marmarek@invisiblethingslab.com>
>>> wrote:
>>>> On 26.03.2013 14:11, Jan Beulich wrote:
>>>>>>>> On 26.03.13 at 13:17, Marek Marczykowski <marmarek@invisiblethingslab.com> 
>>>> wrote:
>>>>>> Finally got serial console :)
>>>>>> The debug=y problem is (actually at resume):
>>>>>> (XEN) Assertion 'test_bit(vector, cfg->used_vectors)' failed at io_apic.c:542
>>>>>> (XEN) ----[ Xen-4.1.5-rc1  x86_64  debug=y  Tainted:    C ]----
>>>>>> (XEN) CPU:    0
>>>>>> (XEN) RIP:    e008:[<ffff82c48015e288>] 
>>>>>> smp_irq_move_cleanup_interrupt+0x1c3/0x23d
>>>>>> (XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
>>>>>> (XEN) rax: 0000000000000000   rbx: 00000000000000e9   rcx: ffff82c48029ff18
>>>>>> (XEN) rdx: 00000000000000e9   rsi: 000000000000002a   rdi: ffff830421060538
>>>>>> (XEN) rbp: ffff82c48029ff08   rsp: ffff82c48029feb8   r8:  ffff88041820eb60
>>>>>> (XEN) r9:  0000000000000000   r10: 0000000000007ff0   r11: 0000000000000000
>>>>>> (XEN) r12: ffff830421080250   r13: ffff830421060534   r14: ffff82c48029ff18
>>>>>> (XEN) r15: ffff82c4802dd9e0   cr0: 000000008005003b   cr4: 00000000000026f0
>>>>>> (XEN) cr3: 0000000300b81000   cr2: ffff880402070198
>>>>>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>>>>>> (XEN) Xen stack trace from rsp=ffff82c48029feb8:
>>>>>> (XEN)    0000000000000000 000000000000e030 ffff82c48029ff18 ffff82c4802dd9e0
>>>>>> (XEN)    ffff8802cac3c7c0 00000000ffff3729 00000000ffff3729 000000013fff3728
>>>>>> (XEN)    ffffffff81b907c0 00000000ffff3729 00007d3b7fd600c7 ffff82c48014de60
>>>>>> (XEN)    00000000ffff3729 ffffffff81b907c0 000000013fff3728 00000000ffff3729
>>>>>> (XEN)    ffffffff81a01e18 00000000ffff3729 0000000000000000 0000000000007ff0
>>>>>> (XEN)    0000000000000000 ffff88041820eb60 ffff8803fd1820a8 ffffffff81b90a88
>>>>>> (XEN)    000000000000002a 000000000000002a 00000000ffff372a 0000002000000000
>>>>>> (XEN)    ffffffff8105dd5a 000000000000e033 0000000000000246 ffffffff81a01db8
>>>>>> (XEN)    000000000000e02b 0000000000000000 0000000000000000 0000000000000000
>>>>>> (XEN)    0000000000000000 0000000000000000 ffff8300ca9a0000 0000000000000000
>>>>>> (XEN)    0000000000000000
>>>>>> (XEN) Xen call trace:
>>>>>> (XEN)    [<ffff82c48015e288>] smp_irq_move_cleanup_interrupt+0x1c3/0x23d
>>>>>> (XEN)
>>>>>> (XEN)
>>>>>> (XEN) ****************************************
>>>>>> (XEN) Panic on CPU 0:
>>>>>> (XEN) Assertion 'test_bit(vector, cfg->used_vectors)' failed at io_apic.c:542
>>>>>> (XEN) ****************************************
>>>>> To make sense of this, we need to know the register (and maybe
>>>>> stack) allocation at this point, to know which vector it was that
>>>>> triggered the assertion. You can either do this analysis for us, or
>>>>> point us at the xen-syms binary matching the xen.gz you used.
>>>> "info scope smp_irq_move_cleanup_interrupt" said vector is in %rbx, so 0xe9.
>>> And that system isn't using a strange mixed mode IO-APIC/legacy
>>> PIC model, where particularly IRQ 9 (usually ACPI SCI) gets
>>> channeled through the legacy PIC?
>> I don't know...
>>
>>> Could you attach the complete log, ideally with 'i' output logged
>>> right before suspending?
>> Sure, attached.
>>
>>> Is this reproducible with 4.2.x or 4.3-unstable? If not, but if readily
>>> reproducible with 4.1.5-rc1, could you try changing the containing
>>> loop's upper bound from "< NR_VECTORS" to
>>> "<= LAST_DYNAMIC_VECTOR"?
>> I've tried 4.2.x some time ago and bug also exists there (but I had not
>> console, so not sure if exactly the same). 4.3 seems to be not affected.

Checked 4.2 and indeed also assert() in similar place. If anyone interested,
log here:
http://duch.mimuw.edu.pl/~marmarek/qubes/console-4.2-failed-resume.log

>>
> 
> Can you replace the ASSERT() with code similar to that in
> 
> http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/irq.c;h=5e0f463c381750090373dabd8967635bc297d457;hb=refs/heads/staging#l668
> 
> Which should call dump_irqs() in before dying because of the ASSERT. 
> You might need to also take the latest version of dump_irqs() from
> unstable, as I seem to remember there was another assertion failure due
> to xfree()'ing in IRQ context.

Full log here:
http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-failed-resume-dump-irqs.log
Interesting part:
(XEN) *** IRQ BUG found ***
(XEN) CPU0 -Testing vector 233 from bitmap
39,47,63-65,72,80,88,96,98,112,120,125,144,152,160,168,174,182-183,190,192,198,200,208,214,222
(XEN) Guest interrupt information:
(XEN)    IRQ:   0 affinity:00000000,00000000,00000000,00000001 vec:f0
type=IO-APIC-edge    status=00000000 mapped, unbound
(XEN)    IRQ:   1 affinity:00000000,00000000,00000000,00000002 vec:c6
type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0:  1(-S--),
(XEN)    IRQ:   2 affinity:ffffffff,ffffffff,ffffffff,ffffffff vec:e2
type=XT-PIC          status=00000000 mapped, unbound
(XEN)    IRQ:   3 affinity:00000000,00000000,00000000,00000001 vec:40
type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:   4 affinity:00000000,00000000,00000000,00000001 vec:f1
type=IO-APIC-edge    status=00000000 mapped, unbound
(XEN)    IRQ:   5 affinity:00000000,00000000,00000000,00000001 vec:48
type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:   6 affinity:00000000,00000000,00000000,00000001 vec:50
type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:   7 affinity:00000000,00000000,00000000,00000001 vec:58
type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0:  7(-S--),
(XEN)    IRQ:   8 affinity:00000000,00000000,00000000,00000001 vec:60
type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0:  8(-S--),
(XEN)    IRQ:   9 affinity:00000000,00000000,00000000,00000001 vec:de
type=IO-APIC-level   status=00000010 in-flight=0 domain-list=0:  9(-S--),
(XEN)    IRQ:  10 affinity:00000000,00000000,00000000,00000001 vec:70
type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  11 affinity:00000000,00000000,00000000,00000001 vec:78
type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  12 affinity:00000000,00000000,00000000,00000001 vec:27
type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0: 12(-S--),
(XEN)    IRQ:  13 affinity:00000000,00000000,00000000,0000000f vec:90
type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  14 affinity:00000000,00000000,00000000,00000001 vec:98
type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  15 affinity:00000000,00000000,00000000,00000001 vec:a0
type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  16 affinity:00000000,00000000,00000000,00000001 vec:2f
type=IO-APIC-level   status=00000050 in-flight=0 domain-list=0: 16(-S--),
(XEN)    IRQ:  17 affinity:00000000,00000000,00000000,00000001 vec:3f
type=IO-APIC-level   status=00000050 in-flight=0 domain-list=0: 17(-S--),
(XEN)    IRQ:  18 affinity:00000000,00000000,00000000,00000008 vec:41
type=IO-APIC-level   status=00000002 mapped, unbound
(XEN)    IRQ:  19 affinity:00000000,00000000,00000000,0000000f vec:c8
type=IO-APIC-level   status=00000002 mapped, unbound
(XEN)    IRQ:  20 affinity:00000000,00000000,00000000,00000002 vec:b7
type=IO-APIC-level   status=00000050 in-flight=0 domain-list=0: 20(-S--),
(XEN)    IRQ:  22 affinity:00000000,00000000,00000000,0000000f vec:62
type=IO-APIC-level   status=00000002 mapped, unbound
(XEN)    IRQ:  23 affinity:00000000,00000000,00000000,0000000f vec:a8
type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  24 affinity:00000000,00000000,00000000,00000001 vec:28
type=DMA_MSI         status=00000000 mapped, unbound
(XEN)    IRQ:  25 affinity:00000000,00000000,00000000,00000001 vec:30
type=DMA_MSI         status=00000000 mapped, unbound
(XEN)    IRQ:  26 affinity:ffffffff,ffffffff,ffffffff,ffffffff vec:6f
type=PCI-MSI         status=00000042 mapped, unbound
(XEN)    IRQ:  27 affinity:ffffffff,ffffffff,ffffffff,ffffffff vec:77
type=PCI-MSI         status=00000042 mapped, unbound
(XEN)    IRQ:  28 affinity:ffffffff,ffffffff,ffffffff,ffffffff vec:7f
type=PCI-MSI         status=00000042 mapped, unbound
(XEN)    IRQ:  29 affinity:ffffffff,ffffffff,ffffffff,ffffffff vec:87
type=PCI-MSI         status=00000042 mapped, unbound
(XEN)    IRQ:  31 affinity:00000000,00000000,00000000,00000002 vec:a6
type=PCI-MSI         status=00000002 mapped, unbound
(XEN)    IRQ:  32 affinity:00000000,00000000,00000000,00000001 vec:47
type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:273(-S--),
(XEN)    IRQ:  33 affinity:00000000,00000000,00000000,00000002 vec:5f
type=PCI-MSI         status=00000010 in-flight=0 domain-list=0:272(PS--),
(XEN)    IRQ:  34 affinity:00000000,00000000,00000000,00000001 vec:67
type=PCI-MSI         status=00000010 in-flight=0 domain-list=0:271(-S--),
(XEN)    IRQ:  35 affinity:00000000,00000000,00000000,00000001 vec:4f
type=PCI-MSI         status=00000050 in-flight=0 domain-list=1: 55(-S--),
(XEN) IO-APIC interrupt information:
(XEN)     IRQ  0 Vec240:
(XEN)       Apic 0x00, Pin  2: vec=f0 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  1 Vec198:
(XEN)       Apic 0x00, Pin  1: vec=c6 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  3 Vec 64:
(XEN)       Apic 0x00, Pin  3: vec=40 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  4 Vec241:
(XEN)       Apic 0x00, Pin  4: vec=f1 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  5 Vec 72:
(XEN)       Apic 0x00, Pin  5: vec=48 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  6 Vec 80:
(XEN)       Apic 0x00, Pin  6: vec=50 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  7 Vec 88:
(XEN)       Apic 0x00, Pin  7: vec=58 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  8 Vec 96:
(XEN)       Apic 0x00, Pin  8: vec=60 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  9 Vec222:
(XEN)       Apic 0x00, Pin  9: vec=de delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=L mask=0 dest_id:0
(XEN)     IRQ 10 Vec112:
(XEN)       Apic 0x00, Pin 10: vec=70 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 11 Vec120:
(XEN)       Apic 0x00, Pin 11: vec=78 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 12 Vec 39:
(XEN)       Apic 0x00, Pin 12: vec=27 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 13 Vec144:
(XEN)       Apic 0x00, Pin 13: vec=90 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=1 dest_id:0
(XEN)     IRQ 14 Vec152:
(XEN)       Apic 0x00, Pin 14: vec=98 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 15 Vec160:
(XEN)       Apic 0x00, Pin 15: vec=a0 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 16 Vec 47:
(XEN)       Apic 0x00, Pin 16: vec=2f delivery=LoPri dest=L status=0
polarity=1 irr=0 trig=L mask=0 dest_id:0
(XEN)     IRQ 17 Vec 63:
(XEN)       Apic 0x00, Pin 17: vec=3f delivery=LoPri dest=L status=0
polarity=1 irr=0 trig=L mask=0 dest_id:0
(XEN)     IRQ 18 Vec 65:
(XEN)       Apic 0x00, Pin 18: vec=41 delivery=LoPri dest=L status=0
polarity=1 irr=0 trig=L mask=1 dest_id:0
(XEN)     IRQ 19 Vec200:
(XEN)       Apic 0x00, Pin 19: vec=c8 delivery=LoPri dest=L status=0
polarity=1 irr=0 trig=L mask=1 dest_id:0
(XEN)     IRQ 20 Vec183:
(XEN)       Apic 0x00, Pin 20: vec=b7 delivery=LoPri dest=L status=0
polarity=1 irr=0 trig=L mask=0 dest_id:0
(XEN)     IRQ 22 Vec 98:
(XEN)       Apic 0x00, Pin 22: vec=62 delivery=LoPri dest=L status=0
polarity=1 irr=0 trig=L mask=1 dest_id:0
(XEN)     IRQ 23 Vec168:
(XEN)       Apic 0x00, Pin 23: vec=a8 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=1 dest_id:0
(XEN) Xen BUG at io_apic.c:554
(XEN) ----[ Xen-4.1.5-rc1  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82c48015e2d6>] smp_irq_move_cleanup_interrupt+0x211/0x289
(XEN) RFLAGS: 0000000000010092   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: 00000000000000e9   rcx: 0000000000000000
(XEN) rdx: 0000000000000016   rsi: 000000000000000a   rdi: ffff82c4802592e0
(XEN) rbp: ffff82c48029fd08   rsp: ffff82c48029fcb8   r8:  0000000000000018
(XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000001
(XEN) r12: ffff830421080250   r13: ffff830421060534   r14: ffff82c48029ff18
(XEN) r15: ffff82c4802dd9e0   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 0000000119a96000   cr2: ffff880402070198
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff82c48029fcb8:
(XEN)    0000000000000000 ffff82c48029ff18 ffff82c4802dd9e0 000000e900000000
(XEN)    ffff83042109ba04 ffff830421008000 0000000000000114 000000000000001d
(XEN)    0000000000000114 0000000000000000 00007d3b7fd602c7 ffff82c48014de60
(XEN)    0000000000000000 0000000000000114 000000000000001d 0000000000000114
(XEN)    ffff82c48029fdc8 ffff830421008000 0000000000000246 ffff82c48025c1f0
(XEN)    0000000000000003 0000001944602466 0000000000000000 0000000000000001
(XEN)    0000000000000000 0000000000000286 ffff830421060f34 0000002000000000
(XEN)    ffff82c4801226c0 000000000000e008 0000000000000286 ffff82c48029fdc8
(XEN)    000000000000e010 0000000000000286 ffff82c48029fe48 ffff82c480164446
(XEN)    ffff82c4802dd9e0 0000000000000286 ffff830421060f00 ffff830421060f34
(XEN)    ffff830421050ac0 000000000000001d 0000000000000246 ffff8301108fd140
(XEN)    ffff82c4801226d3 ffff82c48029fe78 000000000000001d ffff8803fa889af0
(XEN)    0000000000000114 ffff8804023be000 ffff82c48029fef8 ffff82c48017655b
(XEN)    ffff830114c7f300 ffffffff81381646 ffff82f600000008 ffff830421008000
(XEN)    0000000000000003 000000030000001d 00000000e2200000 0000000100a0fb00
(XEN)    0000000000007ff0 ffffffffffffffff 0000000000000003 0000000000000003
(XEN)    00000000e2200000 c390ed90d1ffffff 0000000000000202 ffff8300ca666000
(XEN)    ffff8803fc880240 0000000000000011 ffff8804023be858 ffff8804023be000
(XEN)    00007d3b7fd600c7 ffff82c480209f38 ffffffff8100142a 0000000000000021
(XEN)    ffff8804023be000 ffff8804023be858 0000000000000011 ffff8803fc880240
(XEN) Xen call trace:
(XEN)    [<ffff82c48015e2d6>] smp_irq_move_cleanup_interrupt+0x211/0x289
(XEN)    [<ffff82c48014de60>] irq_move_cleanup_interrupt+0x30/0x40
(XEN)    [<ffff82c4801226c0>] _spin_unlock_irqrestore+0x22/0x24
(XEN)    [<ffff82c480164446>] map_domain_pirq+0x37a/0x3df
(XEN)    [<ffff82c48017655b>] do_physdev_op+0xa2b/0x1508
(XEN)    [<ffff82c480209f38>] syscall_enter+0xc8/0x122


> 
> ~Andrew
> 


-- 
Best Regards / Pozdrawiam,
Marek Marczykowski
Invisible Things Lab


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 553 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-26 17:42                       ` Marek Marczykowski
@ 2013-03-26 17:54                         ` Andrew Cooper
  2013-03-26 18:21                           ` Marek Marczykowski
  0 siblings, 1 reply; 68+ messages in thread
From: Andrew Cooper @ 2013-03-26 17:54 UTC (permalink / raw)
  To: Marek Marczykowski; +Cc: xen-devel, Jan Beulich, Konrad Rzeszutek Wilk


>> Can you replace the ASSERT() with code similar to that in
>>
>> http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/irq.c;h=5e0f463c381750090373dabd8967635bc297d457;hb=refs/heads/staging#l668
>>
>> Which should call dump_irqs() in before dying because of the ASSERT. 
>> You might need to also take the latest version of dump_irqs() from
>> unstable, as I seem to remember there was another assertion failure due
>> to xfree()'ing in IRQ context.
> Full log here:
> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-failed-resume-dump-irqs.log
> Interesting part:
> (XEN) *** IRQ BUG found ***
> (XEN) CPU0 -Testing vector 233 from bitmap
> 39,47,63-65,72,80,88,96,98,112,120,125,144,152,160,168,174,182-183,190,192,198,200,208,214,222
> (XEN) Guest interrupt information:
> (XEN)    IRQ:   0 affinity:00000000,00000000,00000000,00000001 vec:f0
> type=IO-APIC-edge    status=00000000 mapped, unbound
> (XEN)    IRQ:   1 affinity:00000000,00000000,00000000,00000002 vec:c6
> type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0:  1(-S--),
> (XEN)    IRQ:   2 affinity:ffffffff,ffffffff,ffffffff,ffffffff vec:e2
> type=XT-PIC          status=00000000 mapped, unbound
> (XEN)    IRQ:   3 affinity:00000000,00000000,00000000,00000001 vec:40
> type=IO-APIC-edge    status=00000002 mapped, unbound
> (XEN)    IRQ:   4 affinity:00000000,00000000,00000000,00000001 vec:f1
> type=IO-APIC-edge    status=00000000 mapped, unbound
> (XEN)    IRQ:   5 affinity:00000000,00000000,00000000,00000001 vec:48
> type=IO-APIC-edge    status=00000002 mapped, unbound
> (XEN)    IRQ:   6 affinity:00000000,00000000,00000000,00000001 vec:50
> type=IO-APIC-edge    status=00000002 mapped, unbound
> (XEN)    IRQ:   7 affinity:00000000,00000000,00000000,00000001 vec:58
> type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0:  7(-S--),
> (XEN)    IRQ:   8 affinity:00000000,00000000,00000000,00000001 vec:60
> type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0:  8(-S--),
> (XEN)    IRQ:   9 affinity:00000000,00000000,00000000,00000001 vec:de
> type=IO-APIC-level   status=00000010 in-flight=0 domain-list=0:  9(-S--),
> (XEN)    IRQ:  10 affinity:00000000,00000000,00000000,00000001 vec:70
> type=IO-APIC-edge    status=00000002 mapped, unbound
> (XEN)    IRQ:  11 affinity:00000000,00000000,00000000,00000001 vec:78
> type=IO-APIC-edge    status=00000002 mapped, unbound
> (XEN)    IRQ:  12 affinity:00000000,00000000,00000000,00000001 vec:27
> type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0: 12(-S--),
> (XEN)    IRQ:  13 affinity:00000000,00000000,00000000,0000000f vec:90
> type=IO-APIC-edge    status=00000002 mapped, unbound
> (XEN)    IRQ:  14 affinity:00000000,00000000,00000000,00000001 vec:98
> type=IO-APIC-edge    status=00000002 mapped, unbound
> (XEN)    IRQ:  15 affinity:00000000,00000000,00000000,00000001 vec:a0
> type=IO-APIC-edge    status=00000002 mapped, unbound
> (XEN)    IRQ:  16 affinity:00000000,00000000,00000000,00000001 vec:2f
> type=IO-APIC-level   status=00000050 in-flight=0 domain-list=0: 16(-S--),
> (XEN)    IRQ:  17 affinity:00000000,00000000,00000000,00000001 vec:3f
> type=IO-APIC-level   status=00000050 in-flight=0 domain-list=0: 17(-S--),
> (XEN)    IRQ:  18 affinity:00000000,00000000,00000000,00000008 vec:41
> type=IO-APIC-level   status=00000002 mapped, unbound
> (XEN)    IRQ:  19 affinity:00000000,00000000,00000000,0000000f vec:c8
> type=IO-APIC-level   status=00000002 mapped, unbound
> (XEN)    IRQ:  20 affinity:00000000,00000000,00000000,00000002 vec:b7
> type=IO-APIC-level   status=00000050 in-flight=0 domain-list=0: 20(-S--),
> (XEN)    IRQ:  22 affinity:00000000,00000000,00000000,0000000f vec:62
> type=IO-APIC-level   status=00000002 mapped, unbound
> (XEN)    IRQ:  23 affinity:00000000,00000000,00000000,0000000f vec:a8
> type=IO-APIC-edge    status=00000002 mapped, unbound
> (XEN)    IRQ:  24 affinity:00000000,00000000,00000000,00000001 vec:28
> type=DMA_MSI         status=00000000 mapped, unbound
> (XEN)    IRQ:  25 affinity:00000000,00000000,00000000,00000001 vec:30
> type=DMA_MSI         status=00000000 mapped, unbound
> (XEN)    IRQ:  26 affinity:ffffffff,ffffffff,ffffffff,ffffffff vec:6f
> type=PCI-MSI         status=00000042 mapped, unbound
> (XEN)    IRQ:  27 affinity:ffffffff,ffffffff,ffffffff,ffffffff vec:77
> type=PCI-MSI         status=00000042 mapped, unbound
> (XEN)    IRQ:  28 affinity:ffffffff,ffffffff,ffffffff,ffffffff vec:7f
> type=PCI-MSI         status=00000042 mapped, unbound
> (XEN)    IRQ:  29 affinity:ffffffff,ffffffff,ffffffff,ffffffff vec:87
> type=PCI-MSI         status=00000042 mapped, unbound
> (XEN)    IRQ:  31 affinity:00000000,00000000,00000000,00000002 vec:a6
> type=PCI-MSI         status=00000002 mapped, unbound
> (XEN)    IRQ:  32 affinity:00000000,00000000,00000000,00000001 vec:47
> type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:273(-S--),
> (XEN)    IRQ:  33 affinity:00000000,00000000,00000000,00000002 vec:5f
> type=PCI-MSI         status=00000010 in-flight=0 domain-list=0:272(PS--),
> (XEN)    IRQ:  34 affinity:00000000,00000000,00000000,00000001 vec:67
> type=PCI-MSI         status=00000010 in-flight=0 domain-list=0:271(-S--),
> (XEN)    IRQ:  35 affinity:00000000,00000000,00000000,00000001 vec:4f
> type=PCI-MSI         status=00000050 in-flight=0 domain-list=1: 55(-S--),
> (XEN) IO-APIC interrupt information:
> (XEN)     IRQ  0 Vec240:
> (XEN)       Apic 0x00, Pin  2: vec=f0 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ  1 Vec198:
> (XEN)       Apic 0x00, Pin  1: vec=c6 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ  3 Vec 64:
> (XEN)       Apic 0x00, Pin  3: vec=40 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ  4 Vec241:
> (XEN)       Apic 0x00, Pin  4: vec=f1 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ  5 Vec 72:
> (XEN)       Apic 0x00, Pin  5: vec=48 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ  6 Vec 80:
> (XEN)       Apic 0x00, Pin  6: vec=50 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ  7 Vec 88:
> (XEN)       Apic 0x00, Pin  7: vec=58 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ  8 Vec 96:
> (XEN)       Apic 0x00, Pin  8: vec=60 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ  9 Vec222:
> (XEN)       Apic 0x00, Pin  9: vec=de delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=L mask=0 dest_id:0
> (XEN)     IRQ 10 Vec112:
> (XEN)       Apic 0x00, Pin 10: vec=70 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ 11 Vec120:
> (XEN)       Apic 0x00, Pin 11: vec=78 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ 12 Vec 39:
> (XEN)       Apic 0x00, Pin 12: vec=27 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ 13 Vec144:
> (XEN)       Apic 0x00, Pin 13: vec=90 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=1 dest_id:0
> (XEN)     IRQ 14 Vec152:
> (XEN)       Apic 0x00, Pin 14: vec=98 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ 15 Vec160:
> (XEN)       Apic 0x00, Pin 15: vec=a0 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ 16 Vec 47:
> (XEN)       Apic 0x00, Pin 16: vec=2f delivery=LoPri dest=L status=0
> polarity=1 irr=0 trig=L mask=0 dest_id:0
> (XEN)     IRQ 17 Vec 63:
> (XEN)       Apic 0x00, Pin 17: vec=3f delivery=LoPri dest=L status=0
> polarity=1 irr=0 trig=L mask=0 dest_id:0
> (XEN)     IRQ 18 Vec 65:
> (XEN)       Apic 0x00, Pin 18: vec=41 delivery=LoPri dest=L status=0
> polarity=1 irr=0 trig=L mask=1 dest_id:0
> (XEN)     IRQ 19 Vec200:
> (XEN)       Apic 0x00, Pin 19: vec=c8 delivery=LoPri dest=L status=0
> polarity=1 irr=0 trig=L mask=1 dest_id:0
> (XEN)     IRQ 20 Vec183:
> (XEN)       Apic 0x00, Pin 20: vec=b7 delivery=LoPri dest=L status=0
> polarity=1 irr=0 trig=L mask=0 dest_id:0
> (XEN)     IRQ 22 Vec 98:
> (XEN)       Apic 0x00, Pin 22: vec=62 delivery=LoPri dest=L status=0
> polarity=1 irr=0 trig=L mask=1 dest_id:0
> (XEN)     IRQ 23 Vec168:
> (XEN)       Apic 0x00, Pin 23: vec=a8 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=1 dest_id:0
> (XEN) Xen BUG at io_apic.c:554
> (XEN) ----[ Xen-4.1.5-rc1  x86_64  debug=y  Tainted:    C ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff82c48015e2d6>] smp_irq_move_cleanup_interrupt+0x211/0x289
> (XEN) RFLAGS: 0000000000010092   CONTEXT: hypervisor
> (XEN) rax: 0000000000000000   rbx: 00000000000000e9   rcx: 0000000000000000
> (XEN) rdx: 0000000000000016   rsi: 000000000000000a   rdi: ffff82c4802592e0
> (XEN) rbp: ffff82c48029fd08   rsp: ffff82c48029fcb8   r8:  0000000000000018
> (XEN) r9:  0000000000000000   r10: 0000000000000000   r11: 0000000000000001
> (XEN) r12: ffff830421080250   r13: ffff830421060534   r14: ffff82c48029ff18
> (XEN) r15: ffff82c4802dd9e0   cr0: 000000008005003b   cr4: 00000000000026f0
> (XEN) cr3: 0000000119a96000   cr2: ffff880402070198
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
> (XEN) Xen stack trace from rsp=ffff82c48029fcb8:
> (XEN)    0000000000000000 ffff82c48029ff18 ffff82c4802dd9e0 000000e900000000
> (XEN)    ffff83042109ba04 ffff830421008000 0000000000000114 000000000000001d
> (XEN)    0000000000000114 0000000000000000 00007d3b7fd602c7 ffff82c48014de60
> (XEN)    0000000000000000 0000000000000114 000000000000001d 0000000000000114
> (XEN)    ffff82c48029fdc8 ffff830421008000 0000000000000246 ffff82c48025c1f0
> (XEN)    0000000000000003 0000001944602466 0000000000000000 0000000000000001
> (XEN)    0000000000000000 0000000000000286 ffff830421060f34 0000002000000000
> (XEN)    ffff82c4801226c0 000000000000e008 0000000000000286 ffff82c48029fdc8
> (XEN)    000000000000e010 0000000000000286 ffff82c48029fe48 ffff82c480164446
> (XEN)    ffff82c4802dd9e0 0000000000000286 ffff830421060f00 ffff830421060f34
> (XEN)    ffff830421050ac0 000000000000001d 0000000000000246 ffff8301108fd140
> (XEN)    ffff82c4801226d3 ffff82c48029fe78 000000000000001d ffff8803fa889af0
> (XEN)    0000000000000114 ffff8804023be000 ffff82c48029fef8 ffff82c48017655b
> (XEN)    ffff830114c7f300 ffffffff81381646 ffff82f600000008 ffff830421008000
> (XEN)    0000000000000003 000000030000001d 00000000e2200000 0000000100a0fb00
> (XEN)    0000000000007ff0 ffffffffffffffff 0000000000000003 0000000000000003
> (XEN)    00000000e2200000 c390ed90d1ffffff 0000000000000202 ffff8300ca666000
> (XEN)    ffff8803fc880240 0000000000000011 ffff8804023be858 ffff8804023be000
> (XEN)    00007d3b7fd600c7 ffff82c480209f38 ffffffff8100142a 0000000000000021
> (XEN)    ffff8804023be000 ffff8804023be858 0000000000000011 ffff8803fc880240
> (XEN) Xen call trace:
> (XEN)    [<ffff82c48015e2d6>] smp_irq_move_cleanup_interrupt+0x211/0x289
> (XEN)    [<ffff82c48014de60>] irq_move_cleanup_interrupt+0x30/0x40
> (XEN)    [<ffff82c4801226c0>] _spin_unlock_irqrestore+0x22/0x24
> (XEN)    [<ffff82c480164446>] map_domain_pirq+0x37a/0x3df
> (XEN)    [<ffff82c48017655b>] do_physdev_op+0xa2b/0x1508
> (XEN)    [<ffff82c480209f38>] syscall_enter+0xc8/0x122
>
>
>> ~Andrew
>>
>

Even more curious.  vector e9 does not appear to be programmed in.  Can
you extend the debugging to also call __print_IO_APIC().

The i debug key and z debug key list IO-APIC entries from different
sources of information.

~Andrew

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-26 17:54                         ` Andrew Cooper
@ 2013-03-26 18:21                           ` Marek Marczykowski
  2013-03-26 18:50                             ` Andrew Cooper
  0 siblings, 1 reply; 68+ messages in thread
From: Marek Marczykowski @ 2013-03-26 18:21 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Jan Beulich, Konrad Rzeszutek Wilk


[-- Attachment #1.1: Type: text/plain, Size: 13374 bytes --]

On 26.03.2013 18:54, Andrew Cooper wrote:
> 
>>> Can you replace the ASSERT() with code similar to that in
>>>
>>> http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/irq.c;h=5e0f463c381750090373dabd8967635bc297d457;hb=refs/heads/staging#l668
>>>
>>> Which should call dump_irqs() in before dying because of the ASSERT. 
>>> You might need to also take the latest version of dump_irqs() from
>>> unstable, as I seem to remember there was another assertion failure due
>>> to xfree()'ing in IRQ context.
>> Full log here:
>> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-failed-resume-dump-irqs.log
>> Interesting part:
(...)
> Even more curious.  vector e9 does not appear to be programmed in.  Can
> you extend the debugging to also call __print_IO_APIC().
> 
> The i debug key and z debug key list IO-APIC entries from different
> sources of information.

As you wish, full log:
http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-failed-resume-dump-irqs2.log

Final part:
(XEN) *** IRQ BUG found ***
(XEN) CPU0 -Testing vector 233 from bitmap
43,49,64,72,80,87-88,95-96,103,112,119-121,127,135,143-144,151-152,159-160,168,192,197,200,211,216,218
(XEN) Guest interrupt information:
(XEN)    IRQ:   0 affinity:00000000,00000000,00000000,00000001 vec:f0
type=IO-APIC-edge    status=00000000 mapped, unbound
(XEN)    IRQ:   1 affinity:00000000,00000000,00000000,00000001 vec:7f
type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0:  1(-S--),
(XEN)    IRQ:   2 affinity:ffffffff,ffffffff,ffffffff,ffffffff vec:e2
type=XT-PIC          status=00000000 mapped, unbound
(XEN)    IRQ:   3 affinity:00000000,00000000,00000000,00000001 vec:40
type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:   4 affinity:00000000,00000000,00000000,00000001 vec:f1
type=IO-APIC-edge    status=00000000 mapped, unbound
(XEN)    IRQ:   5 affinity:00000000,00000000,00000000,00000001 vec:48
type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:   6 affinity:00000000,00000000,00000000,00000001 vec:50
type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:   7 affinity:00000000,00000000,00000000,00000008 vec:da
type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0:  7(-S--),
(XEN)    IRQ:   8 affinity:00000000,00000000,00000000,00000004 vec:d8
type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0:  8(-S--),
(XEN)    IRQ:   9 affinity:00000000,00000000,00000000,00000001 vec:87
type=IO-APIC-level   status=00000010 in-flight=0 domain-list=0:  9(-S--),
(XEN)    IRQ:  10 affinity:00000000,00000000,00000000,00000001 vec:70
type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  11 affinity:00000000,00000000,00000000,00000001 vec:78
type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  12 affinity:00000000,00000000,00000000,00000001 vec:8f
type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0: 12(-S--),
(XEN)    IRQ:  13 affinity:00000000,00000000,00000000,0000000f vec:90
type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  14 affinity:00000000,00000000,00000000,00000001 vec:98
type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  15 affinity:00000000,00000000,00000000,00000001 vec:a0
type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  16 affinity:00000000,00000000,00000000,00000001 vec:97
type=IO-APIC-level   status=00000050 in-flight=0 domain-list=0: 16(-S--),
(XEN)    IRQ:  17 affinity:00000000,00000000,00000000,00000001 vec:9f
type=IO-APIC-level   status=00000050 in-flight=0 domain-list=0: 17(-S--),
(XEN)    IRQ:  18 affinity:00000000,00000000,00000000,00000004 vec:79
type=IO-APIC-level   status=00000002 mapped, unbound
(XEN)    IRQ:  19 affinity:00000000,00000000,00000000,0000000f vec:c8
type=IO-APIC-level   status=00000002 mapped, unbound
(XEN)    IRQ:  20 affinity:00000000,00000000,00000000,00000002 vec:d3
type=IO-APIC-level   status=00000050 in-flight=0 domain-list=0: 20(-S--),
(XEN)    IRQ:  22 affinity:00000000,00000000,00000000,0000000f vec:2b
type=IO-APIC-level   status=00000002 mapped, unbound
(XEN)    IRQ:  23 affinity:00000000,00000000,00000000,0000000f vec:a8
type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  24 affinity:00000000,00000000,00000000,00000001 vec:28
type=DMA_MSI         status=00000000 mapped, unbound
(XEN)    IRQ:  25 affinity:00000000,00000000,00000000,00000001 vec:30
type=DMA_MSI         status=00000000 mapped, unbound
(XEN)    IRQ:  26 affinity:00000000,00000000,00000000,00000001 vec:c7
type=PCI-MSI         status=00000010 in-flight=0 domain-list=0:279(-S--),
(XEN)    IRQ:  27 affinity:00000000,00000000,00000000,00000001 vec:cf
type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:278(-S--),
(XEN)    IRQ:  28 affinity:00000000,00000000,00000000,00000001 vec:d7
type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:277(-S--),
(XEN)    IRQ:  29 affinity:00000000,00000000,00000000,00000001 vec:df
type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:276(-S--),
(XEN)    IRQ:  30 affinity:00000000,00000000,00000000,00000001 vec:38
type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:275(-S--),
(XEN)    IRQ:  31 affinity:00000000,00000000,00000000,00000004 vec:47
type=PCI-MSI         status=00000002 mapped, unbound
(XEN)    IRQ:  32 affinity:00000000,00000000,00000000,00000001 vec:a7
type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:273(-S--),
(XEN)    IRQ:  33 affinity:00000000,00000000,00000000,00000001 vec:b7
type=PCI-MSI         status=00000010 in-flight=0 domain-list=0:272(-S--),
(XEN)    IRQ:  34 affinity:00000000,00000000,00000000,00000004 vec:40
type=PCI-MSI         status=00000010 in-flight=0 domain-list=0:271(PS--),
(XEN)    IRQ:  35 affinity:00000000,00000000,00000000,00000001 vec:af
type=PCI-MSI         status=00000050 in-flight=0 domain-list=1: 55(-S--),
(XEN) IO-APIC interrupt information:
(XEN)     IRQ  0 Vec240:
(XEN)       Apic 0x00, Pin  2: vec=f0 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  1 Vec127:
(XEN)       Apic 0x00, Pin  1: vec=7f delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  3 Vec 64:
(XEN)       Apic 0x00, Pin  3: vec=40 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  4 Vec241:
(XEN)       Apic 0x00, Pin  4: vec=f1 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  5 Vec 72:
(XEN)       Apic 0x00, Pin  5: vec=48 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  6 Vec 80:
(XEN)       Apic 0x00, Pin  6: vec=50 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  7 Vec218:
(XEN)       Apic 0x00, Pin  7: vec=da delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  8 Vec216:
(XEN)       Apic 0x00, Pin  8: vec=d8 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  9 Vec135:
(XEN)       Apic 0x00, Pin  9: vec=87 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=L mask=0 dest_id:0
(XEN)     IRQ 10 Vec112:
(XEN)       Apic 0x00, Pin 10: vec=70 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 11 Vec120:
(XEN)       Apic 0x00, Pin 11: vec=78 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 12 Vec143:
(XEN)       Apic 0x00, Pin 12: vec=8f delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 13 Vec144:
(XEN)       Apic 0x00, Pin 13: vec=90 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=1 dest_id:0
(XEN)     IRQ 14 Vec152:
(XEN)       Apic 0x00, Pin 14: vec=98 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 15 Vec160:
(XEN)       Apic 0x00, Pin 15: vec=a0 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 16 Vec151:
(XEN)       Apic 0x00, Pin 16: vec=97 delivery=LoPri dest=L status=0
polarity=1 irr=0 trig=L mask=0 dest_id:0
(XEN)     IRQ 17 Vec159:
(XEN)       Apic 0x00, Pin 17: vec=9f delivery=LoPri dest=L status=0
polarity=1 irr=0 trig=L mask=0 dest_id:0
(XEN)     IRQ 18 Vec121:
(XEN)       Apic 0x00, Pin 18: vec=79 delivery=LoPri dest=L status=0
polarity=1 irr=0 trig=L mask=1 dest_id:0
(XEN)     IRQ 19 Vec200:
(XEN)       Apic 0x00, Pin 19: vec=c8 delivery=LoPri dest=L status=0
polarity=1 irr=0 trig=L mask=1 dest_id:0
(XEN)     IRQ 20 Vec211:
(XEN)       Apic 0x00, Pin 20: vec=d3 delivery=LoPri dest=L status=0
polarity=1 irr=0 trig=L mask=0 dest_id:0
(XEN)     IRQ 22 Vec 43:
(XEN)       Apic 0x00, Pin 22: vec=2b delivery=LoPri dest=L status=0
polarity=1 irr=0 trig=L mask=1 dest_id:0
(XEN)     IRQ 23 Vec168:
(XEN)       Apic 0x00, Pin 23: vec=a8 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=1 dest_id:0
(XEN) number of MP IRQ sources: 15.
(XEN) number of IO-APIC #2 registers: 24.
(XEN) testing the IO APIC.......................
(XEN) IO APIC #2......
(XEN) .... register #00: 02000000
(XEN) .......    : physical APIC id: 02
(XEN) .......    : Delivery Type: 0
(XEN) .......    : LTS          : 0
(XEN) .... register #01: 00170020
(XEN) .......     : max redirection entries: 0017
(XEN) .......     : PRQ implemented: 0
(XEN) .......     : IO APIC version: 0020
(XEN) .... IRQ redirection table:
(XEN)  NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
(XEN)  00 0DC 0C  1    0    0   0   0    1    2    87
(XEN)  01 000 00  0    0    0   0   0    1    1    7F
(XEN)  02 000 00  0    0    0   0   0    1    1    F0
(XEN)  03 000 00  0    0    0   0   0    1    1    40
(XEN)  04 000 00  0    0    0   0   0    1    1    F1
(XEN)  05 000 00  0    0    0   0   0    1    1    48
(XEN)  06 000 00  0    0    0   0   0    1    1    50
(XEN)  07 000 00  0    0    0   0   0    1    1    DA
(XEN)  08 000 00  0    0    0   0   0    1    1    D8
(XEN)  09 000 00  0    1    0   0   0    1    1    87
(XEN)  0a 000 00  0    0    0   0   0    1    1    70
(XEN)  0b 000 00  0    0    0   0   0    1    1    78
(XEN)  0c 000 00  0    0    0   0   0    1    1    8F
(XEN)  0d 000 00  1    0    0   0   0    1    1    90
(XEN)  0e 000 00  0    0    0   0   0    1    1    98
(XEN)  0f 000 00  0    0    0   0   0    1    1    A0
(XEN)  10 000 00  0    1    0   1   0    1    1    97
(XEN)  11 000 00  0    1    0   1   0    1    1    9F
(XEN)  12 000 00  1    1    0   1   0    1    1    79
(XEN)  13 000 00  1    1    0   1   0    1    1    C8
(XEN)  14 000 00  0    1    0   1   0    1    1    D3
(XEN)  15 000 00  1    0    0   0   0    0    0    00
(XEN)  16 000 00  1    1    0   1   0    1    1    2B
(XEN)  17 000 00  1    0    0   0   0    1    1    A8
(XEN) Using vector-based indexing
(XEN) IRQ to pin mappings:
(XEN) IRQ240 -> 0:2
(XEN) IRQ127 -> 0:1
(XEN) IRQ64 -> 0:3
(XEN) IRQ241 -> 0:4
(XEN) IRQ72 -> 0:5
(XEN) IRQ80 -> 0:6
(XEN) IRQ218 -> 0:7
(XEN) IRQ216 -> 0:8
(XEN) IRQ135 -> 0:9
(XEN) IRQ112 -> 0:10
(XEN) IRQ120 -> 0:11
(XEN) IRQ143 -> 0:12
(XEN) IRQ144 -> 0:13
(XEN) IRQ152 -> 0:14
(XEN) IRQ160 -> 0:15
(XEN) IRQ151 -> 0:16
(XEN) IRQ159 -> 0:17
(XEN) IRQ121 -> 0:18
(XEN) IRQ200 -> 0:19
(XEN) IRQ211 -> 0:20
(XEN) IRQ43 -> 0:22
(XEN) IRQ168 -> 0:23
(XEN) .................................... done.
(XEN) Xen BUG at io_apic.c:556
(XEN) ----[ Xen-4.1.5-rc1  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82c48015e2db>] smp_irq_move_cleanup_interrupt+0x216/0x28e
(XEN) RFLAGS: 0000000000010092   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: 00000000000000e9   rcx: 0000000000000000
(XEN) rdx: 0000000000000000   rsi: 000000000000000a   rdi: ffff82c4802592e0
(XEN) rbp: ffff82c48029ff08   rsp: ffff82c48029feb8   r8:  0000000000000004
(XEN) r9:  0000000000000004   r10: 0000000000000004   r11: 0000000000000002
(XEN) r12: ffff830421080250   r13: ffff830421060534   r14: ffff82c48029ff18
(XEN) r15: ffff82c4802dd9e0   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 000000026582c000   cr2: ffff8804020701d8
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) Xen stack trace from rsp=ffff82c48029feb8:
(XEN)    0000000000000000 ffff82c48029ff18 ffff82c4802dd9e0 000000e900000000
(XEN)    000000000000e02b 0000000000000000 000000004bf51982 00000000000060a9
(XEN)    0000000000000000 0000000000000000 00007d3b7fd600c7 ffff82c48014de60
(XEN)    0000000000000000 0000000000000000 00000000000060a9 000000004bf51982
(XEN)    ffff8802d2665b28 0000000000000000 0000000000000000 0000000000007ff0
(XEN)    0000000000000022 0000000000000000 000000024bf57322 0000000001307da0
(XEN)    00000000000059a0 0000000000000000 00000000000060a9 0000002000000000
(XEN)    ffffffff8123c51a 000000000000e033 0000000000000293 ffff8802d2665b08
(XEN)    000000000000e02b 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 ffff8300ca9a0000 0000000000000000
(XEN)    0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82c48015e2db>] smp_irq_move_cleanup_interrupt+0x216/0x28e



-- 
Best Regards / Pozdrawiam,
Marek Marczykowski
Invisible Things Lab


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 553 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-26 18:21                           ` Marek Marczykowski
@ 2013-03-26 18:50                             ` Andrew Cooper
  2013-03-27  8:50                               ` Marek Marczykowski
  2013-03-27  8:52                               ` Jan Beulich
  0 siblings, 2 replies; 68+ messages in thread
From: Andrew Cooper @ 2013-03-26 18:50 UTC (permalink / raw)
  To: Marek Marczykowski; +Cc: xen-devel, Jan Beulich, Konrad Rzeszutek Wilk

On 26/03/2013 18:21, Marek Marczykowski wrote:
> On 26.03.2013 18:54, Andrew Cooper wrote:
>>>> Can you replace the ASSERT() with code similar to that in
>>>>
>>>> http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/irq.c;h=5e0f463c381750090373dabd8967635bc297d457;hb=refs/heads/staging#l668
>>>>
>>>> Which should call dump_irqs() in before dying because of the ASSERT. 
>>>> You might need to also take the latest version of dump_irqs() from
>>>> unstable, as I seem to remember there was another assertion failure due
>>>> to xfree()'ing in IRQ context.
>>> Full log here:
>>> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-failed-resume-dump-irqs.log
>>> Interesting part:
> (...)
>> Even more curious.  vector e9 does not appear to be programmed in.  Can
>> you extend the debugging to also call __print_IO_APIC().
>>
>> The i debug key and z debug key list IO-APIC entries from different
>> sources of information.
> As you wish, full log:
> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-failed-resume-dump-irqs2.log
>
> Final part:
> (XEN) *** IRQ BUG found ***
> (XEN) CPU0 -Testing vector 233 from bitmap
> 43,49,64,72,80,87-88,95-96,103,112,119-121,127,135,143-144,151-152,159-160,168,192,197,200,211,216,218
> (XEN) Guest interrupt information:
> (XEN)    IRQ:   0 affinity:00000000,00000000,00000000,00000001 vec:f0
> type=IO-APIC-edge    status=00000000 mapped, unbound
> (XEN)    IRQ:   1 affinity:00000000,00000000,00000000,00000001 vec:7f
> type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0:  1(-S--),
> (XEN)    IRQ:   2 affinity:ffffffff,ffffffff,ffffffff,ffffffff vec:e2
> type=XT-PIC          status=00000000 mapped, unbound
> (XEN)    IRQ:   3 affinity:00000000,00000000,00000000,00000001 vec:40
> type=IO-APIC-edge    status=00000002 mapped, unbound
> (XEN)    IRQ:   4 affinity:00000000,00000000,00000000,00000001 vec:f1
> type=IO-APIC-edge    status=00000000 mapped, unbound
> (XEN)    IRQ:   5 affinity:00000000,00000000,00000000,00000001 vec:48
> type=IO-APIC-edge    status=00000002 mapped, unbound
> (XEN)    IRQ:   6 affinity:00000000,00000000,00000000,00000001 vec:50
> type=IO-APIC-edge    status=00000002 mapped, unbound
> (XEN)    IRQ:   7 affinity:00000000,00000000,00000000,00000008 vec:da
> type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0:  7(-S--),
> (XEN)    IRQ:   8 affinity:00000000,00000000,00000000,00000004 vec:d8
> type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0:  8(-S--),
> (XEN)    IRQ:   9 affinity:00000000,00000000,00000000,00000001 vec:87
> type=IO-APIC-level   status=00000010 in-flight=0 domain-list=0:  9(-S--),
> (XEN)    IRQ:  10 affinity:00000000,00000000,00000000,00000001 vec:70
> type=IO-APIC-edge    status=00000002 mapped, unbound
> (XEN)    IRQ:  11 affinity:00000000,00000000,00000000,00000001 vec:78
> type=IO-APIC-edge    status=00000002 mapped, unbound
> (XEN)    IRQ:  12 affinity:00000000,00000000,00000000,00000001 vec:8f
> type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0: 12(-S--),
> (XEN)    IRQ:  13 affinity:00000000,00000000,00000000,0000000f vec:90
> type=IO-APIC-edge    status=00000002 mapped, unbound
> (XEN)    IRQ:  14 affinity:00000000,00000000,00000000,00000001 vec:98
> type=IO-APIC-edge    status=00000002 mapped, unbound
> (XEN)    IRQ:  15 affinity:00000000,00000000,00000000,00000001 vec:a0
> type=IO-APIC-edge    status=00000002 mapped, unbound
> (XEN)    IRQ:  16 affinity:00000000,00000000,00000000,00000001 vec:97
> type=IO-APIC-level   status=00000050 in-flight=0 domain-list=0: 16(-S--),
> (XEN)    IRQ:  17 affinity:00000000,00000000,00000000,00000001 vec:9f
> type=IO-APIC-level   status=00000050 in-flight=0 domain-list=0: 17(-S--),
> (XEN)    IRQ:  18 affinity:00000000,00000000,00000000,00000004 vec:79
> type=IO-APIC-level   status=00000002 mapped, unbound
> (XEN)    IRQ:  19 affinity:00000000,00000000,00000000,0000000f vec:c8
> type=IO-APIC-level   status=00000002 mapped, unbound
> (XEN)    IRQ:  20 affinity:00000000,00000000,00000000,00000002 vec:d3
> type=IO-APIC-level   status=00000050 in-flight=0 domain-list=0: 20(-S--),
> (XEN)    IRQ:  22 affinity:00000000,00000000,00000000,0000000f vec:2b
> type=IO-APIC-level   status=00000002 mapped, unbound
> (XEN)    IRQ:  23 affinity:00000000,00000000,00000000,0000000f vec:a8
> type=IO-APIC-edge    status=00000002 mapped, unbound
> (XEN)    IRQ:  24 affinity:00000000,00000000,00000000,00000001 vec:28
> type=DMA_MSI         status=00000000 mapped, unbound
> (XEN)    IRQ:  25 affinity:00000000,00000000,00000000,00000001 vec:30
> type=DMA_MSI         status=00000000 mapped, unbound
> (XEN)    IRQ:  26 affinity:00000000,00000000,00000000,00000001 vec:c7
> type=PCI-MSI         status=00000010 in-flight=0 domain-list=0:279(-S--),
> (XEN)    IRQ:  27 affinity:00000000,00000000,00000000,00000001 vec:cf
> type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:278(-S--),
> (XEN)    IRQ:  28 affinity:00000000,00000000,00000000,00000001 vec:d7
> type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:277(-S--),
> (XEN)    IRQ:  29 affinity:00000000,00000000,00000000,00000001 vec:df
> type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:276(-S--),
> (XEN)    IRQ:  30 affinity:00000000,00000000,00000000,00000001 vec:38
> type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:275(-S--),
> (XEN)    IRQ:  31 affinity:00000000,00000000,00000000,00000004 vec:47
> type=PCI-MSI         status=00000002 mapped, unbound
> (XEN)    IRQ:  32 affinity:00000000,00000000,00000000,00000001 vec:a7
> type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:273(-S--),
> (XEN)    IRQ:  33 affinity:00000000,00000000,00000000,00000001 vec:b7
> type=PCI-MSI         status=00000010 in-flight=0 domain-list=0:272(-S--),
> (XEN)    IRQ:  34 affinity:00000000,00000000,00000000,00000004 vec:40
> type=PCI-MSI         status=00000010 in-flight=0 domain-list=0:271(PS--),
> (XEN)    IRQ:  35 affinity:00000000,00000000,00000000,00000001 vec:af
> type=PCI-MSI         status=00000050 in-flight=0 domain-list=1: 55(-S--),
> (XEN) IO-APIC interrupt information:
> (XEN)     IRQ  0 Vec240:
> (XEN)       Apic 0x00, Pin  2: vec=f0 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ  1 Vec127:
> (XEN)       Apic 0x00, Pin  1: vec=7f delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ  3 Vec 64:
> (XEN)       Apic 0x00, Pin  3: vec=40 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ  4 Vec241:
> (XEN)       Apic 0x00, Pin  4: vec=f1 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ  5 Vec 72:
> (XEN)       Apic 0x00, Pin  5: vec=48 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ  6 Vec 80:
> (XEN)       Apic 0x00, Pin  6: vec=50 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ  7 Vec218:
> (XEN)       Apic 0x00, Pin  7: vec=da delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ  8 Vec216:
> (XEN)       Apic 0x00, Pin  8: vec=d8 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ  9 Vec135:
> (XEN)       Apic 0x00, Pin  9: vec=87 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=L mask=0 dest_id:0
> (XEN)     IRQ 10 Vec112:
> (XEN)       Apic 0x00, Pin 10: vec=70 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ 11 Vec120:
> (XEN)       Apic 0x00, Pin 11: vec=78 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ 12 Vec143:
> (XEN)       Apic 0x00, Pin 12: vec=8f delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ 13 Vec144:
> (XEN)       Apic 0x00, Pin 13: vec=90 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=1 dest_id:0
> (XEN)     IRQ 14 Vec152:
> (XEN)       Apic 0x00, Pin 14: vec=98 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ 15 Vec160:
> (XEN)       Apic 0x00, Pin 15: vec=a0 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN)     IRQ 16 Vec151:
> (XEN)       Apic 0x00, Pin 16: vec=97 delivery=LoPri dest=L status=0
> polarity=1 irr=0 trig=L mask=0 dest_id:0
> (XEN)     IRQ 17 Vec159:
> (XEN)       Apic 0x00, Pin 17: vec=9f delivery=LoPri dest=L status=0
> polarity=1 irr=0 trig=L mask=0 dest_id:0
> (XEN)     IRQ 18 Vec121:
> (XEN)       Apic 0x00, Pin 18: vec=79 delivery=LoPri dest=L status=0
> polarity=1 irr=0 trig=L mask=1 dest_id:0
> (XEN)     IRQ 19 Vec200:
> (XEN)       Apic 0x00, Pin 19: vec=c8 delivery=LoPri dest=L status=0
> polarity=1 irr=0 trig=L mask=1 dest_id:0
> (XEN)     IRQ 20 Vec211:
> (XEN)       Apic 0x00, Pin 20: vec=d3 delivery=LoPri dest=L status=0
> polarity=1 irr=0 trig=L mask=0 dest_id:0
> (XEN)     IRQ 22 Vec 43:
> (XEN)       Apic 0x00, Pin 22: vec=2b delivery=LoPri dest=L status=0
> polarity=1 irr=0 trig=L mask=1 dest_id:0
> (XEN)     IRQ 23 Vec168:
> (XEN)       Apic 0x00, Pin 23: vec=a8 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=1 dest_id:0
> (XEN) number of MP IRQ sources: 15.
> (XEN) number of IO-APIC #2 registers: 24.
> (XEN) testing the IO APIC.......................
> (XEN) IO APIC #2......
> (XEN) .... register #00: 02000000
> (XEN) .......    : physical APIC id: 02
> (XEN) .......    : Delivery Type: 0
> (XEN) .......    : LTS          : 0
> (XEN) .... register #01: 00170020
> (XEN) .......     : max redirection entries: 0017
> (XEN) .......     : PRQ implemented: 0
> (XEN) .......     : IO APIC version: 0020
> (XEN) .... IRQ redirection table:
> (XEN)  NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
> (XEN)  00 0DC 0C  1    0    0   0   0    1    2    87
> (XEN)  01 000 00  0    0    0   0   0    1    1    7F
> (XEN)  02 000 00  0    0    0   0   0    1    1    F0
> (XEN)  03 000 00  0    0    0   0   0    1    1    40
> (XEN)  04 000 00  0    0    0   0   0    1    1    F1
> (XEN)  05 000 00  0    0    0   0   0    1    1    48
> (XEN)  06 000 00  0    0    0   0   0    1    1    50
> (XEN)  07 000 00  0    0    0   0   0    1    1    DA
> (XEN)  08 000 00  0    0    0   0   0    1    1    D8
> (XEN)  09 000 00  0    1    0   0   0    1    1    87
> (XEN)  0a 000 00  0    0    0   0   0    1    1    70
> (XEN)  0b 000 00  0    0    0   0   0    1    1    78
> (XEN)  0c 000 00  0    0    0   0   0    1    1    8F
> (XEN)  0d 000 00  1    0    0   0   0    1    1    90
> (XEN)  0e 000 00  0    0    0   0   0    1    1    98
> (XEN)  0f 000 00  0    0    0   0   0    1    1    A0
> (XEN)  10 000 00  0    1    0   1   0    1    1    97
> (XEN)  11 000 00  0    1    0   1   0    1    1    9F
> (XEN)  12 000 00  1    1    0   1   0    1    1    79
> (XEN)  13 000 00  1    1    0   1   0    1    1    C8
> (XEN)  14 000 00  0    1    0   1   0    1    1    D3
> (XEN)  15 000 00  1    0    0   0   0    0    0    00
> (XEN)  16 000 00  1    1    0   1   0    1    1    2B
> (XEN)  17 000 00  1    0    0   0   0    1    1    A8
> (XEN) Using vector-based indexing
> (XEN) IRQ to pin mappings:
> (XEN) IRQ240 -> 0:2
> (XEN) IRQ127 -> 0:1
> (XEN) IRQ64 -> 0:3
> (XEN) IRQ241 -> 0:4
> (XEN) IRQ72 -> 0:5
> (XEN) IRQ80 -> 0:6
> (XEN) IRQ218 -> 0:7
> (XEN) IRQ216 -> 0:8
> (XEN) IRQ135 -> 0:9
> (XEN) IRQ112 -> 0:10
> (XEN) IRQ120 -> 0:11
> (XEN) IRQ143 -> 0:12
> (XEN) IRQ144 -> 0:13
> (XEN) IRQ152 -> 0:14
> (XEN) IRQ160 -> 0:15
> (XEN) IRQ151 -> 0:16
> (XEN) IRQ159 -> 0:17
> (XEN) IRQ121 -> 0:18
> (XEN) IRQ200 -> 0:19
> (XEN) IRQ211 -> 0:20
> (XEN) IRQ43 -> 0:22
> (XEN) IRQ168 -> 0:23
> (XEN) .................................... done.
> (XEN) Xen BUG at io_apic.c:556
> (XEN) ----[ Xen-4.1.5-rc1  x86_64  debug=y  Tainted:    C ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff82c48015e2db>] smp_irq_move_cleanup_interrupt+0x216/0x28e
> (XEN) RFLAGS: 0000000000010092   CONTEXT: hypervisor
> (XEN) rax: 0000000000000000   rbx: 00000000000000e9   rcx: 0000000000000000
> (XEN) rdx: 0000000000000000   rsi: 000000000000000a   rdi: ffff82c4802592e0
> (XEN) rbp: ffff82c48029ff08   rsp: ffff82c48029feb8   r8:  0000000000000004
> (XEN) r9:  0000000000000004   r10: 0000000000000004   r11: 0000000000000002
> (XEN) r12: ffff830421080250   r13: ffff830421060534   r14: ffff82c48029ff18
> (XEN) r15: ffff82c4802dd9e0   cr0: 000000008005003b   cr4: 00000000000026f0
> (XEN) cr3: 000000026582c000   cr2: ffff8804020701d8
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) Xen stack trace from rsp=ffff82c48029feb8:
> (XEN)    0000000000000000 ffff82c48029ff18 ffff82c4802dd9e0 000000e900000000
> (XEN)    000000000000e02b 0000000000000000 000000004bf51982 00000000000060a9
> (XEN)    0000000000000000 0000000000000000 00007d3b7fd600c7 ffff82c48014de60
> (XEN)    0000000000000000 0000000000000000 00000000000060a9 000000004bf51982
> (XEN)    ffff8802d2665b28 0000000000000000 0000000000000000 0000000000007ff0
> (XEN)    0000000000000022 0000000000000000 000000024bf57322 0000000001307da0
> (XEN)    00000000000059a0 0000000000000000 00000000000060a9 0000002000000000
> (XEN)    ffffffff8123c51a 000000000000e033 0000000000000293 ffff8802d2665b08
> (XEN)    000000000000e02b 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 ffff8300ca9a0000 0000000000000000
> (XEN)    0000000000000000
> (XEN) Xen call trace:
> (XEN)    [<ffff82c48015e2db>] smp_irq_move_cleanup_interrupt+0x216/0x28e
>
>
>

So vector e9 doesn't appear to be programmed in anywhere.

I am starting to get more into the realm of guessing here but, can you
use apic_verbosity=debug on the command line and copy this extra
debugging logic to send_cleanup_vector()

You should be able to conditionally trigger it on "desc->arch.vector ==
0xe9".  You will probably also want to change the BUG() to a WARN(), so
we get the interrupt and ioapic information on both sides of the cleanup
vector, as well as getting the stack trace of the codepath through Xen
as a result of vector 0xe9.

~Andrew

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-26 18:50                             ` Andrew Cooper
@ 2013-03-27  8:50                               ` Marek Marczykowski
  2013-03-27  8:58                                 ` Jan Beulich
  2013-03-27  8:52                               ` Jan Beulich
  1 sibling, 1 reply; 68+ messages in thread
From: Marek Marczykowski @ 2013-03-27  8:50 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Jan Beulich, Konrad Rzeszutek Wilk


[-- Attachment #1.1: Type: text/plain, Size: 14917 bytes --]

On 26.03.2013 19:50, Andrew Cooper wrote:
> On 26/03/2013 18:21, Marek Marczykowski wrote:
>> On 26.03.2013 18:54, Andrew Cooper wrote:
>>>>> Can you replace the ASSERT() with code similar to that in
>>>>>
>>>>> http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/irq.c;h=5e0f463c381750090373dabd8967635bc297d457;hb=refs/heads/staging#l668
>>>>>
>>>>> Which should call dump_irqs() in before dying because of the ASSERT. 
>>>>> You might need to also take the latest version of dump_irqs() from
>>>>> unstable, as I seem to remember there was another assertion failure due
>>>>> to xfree()'ing in IRQ context.
>>>> Full log here:
>>>> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-failed-resume-dump-irqs.log
>>>> Interesting part:
>> (...)
>>> Even more curious.  vector e9 does not appear to be programmed in.  Can
>>> you extend the debugging to also call __print_IO_APIC().
>>>
>>> The i debug key and z debug key list IO-APIC entries from different
>>> sources of information.
>> As you wish, full log:
>> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-failed-resume-dump-irqs2.log
>>
>> Final part:
>> (XEN) *** IRQ BUG found ***
>> (XEN) CPU0 -Testing vector 233 from bitmap
>> 43,49,64,72,80,87-88,95-96,103,112,119-121,127,135,143-144,151-152,159-160,168,192,197,200,211,216,218
>> (XEN) Guest interrupt information:
>> (XEN)    IRQ:   0 affinity:00000000,00000000,00000000,00000001 vec:f0
>> type=IO-APIC-edge    status=00000000 mapped, unbound
>> (XEN)    IRQ:   1 affinity:00000000,00000000,00000000,00000001 vec:7f
>> type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0:  1(-S--),
>> (XEN)    IRQ:   2 affinity:ffffffff,ffffffff,ffffffff,ffffffff vec:e2
>> type=XT-PIC          status=00000000 mapped, unbound
>> (XEN)    IRQ:   3 affinity:00000000,00000000,00000000,00000001 vec:40
>> type=IO-APIC-edge    status=00000002 mapped, unbound
>> (XEN)    IRQ:   4 affinity:00000000,00000000,00000000,00000001 vec:f1
>> type=IO-APIC-edge    status=00000000 mapped, unbound
>> (XEN)    IRQ:   5 affinity:00000000,00000000,00000000,00000001 vec:48
>> type=IO-APIC-edge    status=00000002 mapped, unbound
>> (XEN)    IRQ:   6 affinity:00000000,00000000,00000000,00000001 vec:50
>> type=IO-APIC-edge    status=00000002 mapped, unbound
>> (XEN)    IRQ:   7 affinity:00000000,00000000,00000000,00000008 vec:da
>> type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0:  7(-S--),
>> (XEN)    IRQ:   8 affinity:00000000,00000000,00000000,00000004 vec:d8
>> type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0:  8(-S--),
>> (XEN)    IRQ:   9 affinity:00000000,00000000,00000000,00000001 vec:87
>> type=IO-APIC-level   status=00000010 in-flight=0 domain-list=0:  9(-S--),
>> (XEN)    IRQ:  10 affinity:00000000,00000000,00000000,00000001 vec:70
>> type=IO-APIC-edge    status=00000002 mapped, unbound
>> (XEN)    IRQ:  11 affinity:00000000,00000000,00000000,00000001 vec:78
>> type=IO-APIC-edge    status=00000002 mapped, unbound
>> (XEN)    IRQ:  12 affinity:00000000,00000000,00000000,00000001 vec:8f
>> type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0: 12(-S--),
>> (XEN)    IRQ:  13 affinity:00000000,00000000,00000000,0000000f vec:90
>> type=IO-APIC-edge    status=00000002 mapped, unbound
>> (XEN)    IRQ:  14 affinity:00000000,00000000,00000000,00000001 vec:98
>> type=IO-APIC-edge    status=00000002 mapped, unbound
>> (XEN)    IRQ:  15 affinity:00000000,00000000,00000000,00000001 vec:a0
>> type=IO-APIC-edge    status=00000002 mapped, unbound
>> (XEN)    IRQ:  16 affinity:00000000,00000000,00000000,00000001 vec:97
>> type=IO-APIC-level   status=00000050 in-flight=0 domain-list=0: 16(-S--),
>> (XEN)    IRQ:  17 affinity:00000000,00000000,00000000,00000001 vec:9f
>> type=IO-APIC-level   status=00000050 in-flight=0 domain-list=0: 17(-S--),
>> (XEN)    IRQ:  18 affinity:00000000,00000000,00000000,00000004 vec:79
>> type=IO-APIC-level   status=00000002 mapped, unbound
>> (XEN)    IRQ:  19 affinity:00000000,00000000,00000000,0000000f vec:c8
>> type=IO-APIC-level   status=00000002 mapped, unbound
>> (XEN)    IRQ:  20 affinity:00000000,00000000,00000000,00000002 vec:d3
>> type=IO-APIC-level   status=00000050 in-flight=0 domain-list=0: 20(-S--),
>> (XEN)    IRQ:  22 affinity:00000000,00000000,00000000,0000000f vec:2b
>> type=IO-APIC-level   status=00000002 mapped, unbound
>> (XEN)    IRQ:  23 affinity:00000000,00000000,00000000,0000000f vec:a8
>> type=IO-APIC-edge    status=00000002 mapped, unbound
>> (XEN)    IRQ:  24 affinity:00000000,00000000,00000000,00000001 vec:28
>> type=DMA_MSI         status=00000000 mapped, unbound
>> (XEN)    IRQ:  25 affinity:00000000,00000000,00000000,00000001 vec:30
>> type=DMA_MSI         status=00000000 mapped, unbound
>> (XEN)    IRQ:  26 affinity:00000000,00000000,00000000,00000001 vec:c7
>> type=PCI-MSI         status=00000010 in-flight=0 domain-list=0:279(-S--),
>> (XEN)    IRQ:  27 affinity:00000000,00000000,00000000,00000001 vec:cf
>> type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:278(-S--),
>> (XEN)    IRQ:  28 affinity:00000000,00000000,00000000,00000001 vec:d7
>> type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:277(-S--),
>> (XEN)    IRQ:  29 affinity:00000000,00000000,00000000,00000001 vec:df
>> type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:276(-S--),
>> (XEN)    IRQ:  30 affinity:00000000,00000000,00000000,00000001 vec:38
>> type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:275(-S--),
>> (XEN)    IRQ:  31 affinity:00000000,00000000,00000000,00000004 vec:47
>> type=PCI-MSI         status=00000002 mapped, unbound
>> (XEN)    IRQ:  32 affinity:00000000,00000000,00000000,00000001 vec:a7
>> type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:273(-S--),
>> (XEN)    IRQ:  33 affinity:00000000,00000000,00000000,00000001 vec:b7
>> type=PCI-MSI         status=00000010 in-flight=0 domain-list=0:272(-S--),
>> (XEN)    IRQ:  34 affinity:00000000,00000000,00000000,00000004 vec:40
>> type=PCI-MSI         status=00000010 in-flight=0 domain-list=0:271(PS--),
>> (XEN)    IRQ:  35 affinity:00000000,00000000,00000000,00000001 vec:af
>> type=PCI-MSI         status=00000050 in-flight=0 domain-list=1: 55(-S--),
>> (XEN) IO-APIC interrupt information:
>> (XEN)     IRQ  0 Vec240:
>> (XEN)       Apic 0x00, Pin  2: vec=f0 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ  1 Vec127:
>> (XEN)       Apic 0x00, Pin  1: vec=7f delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ  3 Vec 64:
>> (XEN)       Apic 0x00, Pin  3: vec=40 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ  4 Vec241:
>> (XEN)       Apic 0x00, Pin  4: vec=f1 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ  5 Vec 72:
>> (XEN)       Apic 0x00, Pin  5: vec=48 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ  6 Vec 80:
>> (XEN)       Apic 0x00, Pin  6: vec=50 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ  7 Vec218:
>> (XEN)       Apic 0x00, Pin  7: vec=da delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ  8 Vec216:
>> (XEN)       Apic 0x00, Pin  8: vec=d8 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ  9 Vec135:
>> (XEN)       Apic 0x00, Pin  9: vec=87 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=L mask=0 dest_id:0
>> (XEN)     IRQ 10 Vec112:
>> (XEN)       Apic 0x00, Pin 10: vec=70 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ 11 Vec120:
>> (XEN)       Apic 0x00, Pin 11: vec=78 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ 12 Vec143:
>> (XEN)       Apic 0x00, Pin 12: vec=8f delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ 13 Vec144:
>> (XEN)       Apic 0x00, Pin 13: vec=90 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=1 dest_id:0
>> (XEN)     IRQ 14 Vec152:
>> (XEN)       Apic 0x00, Pin 14: vec=98 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ 15 Vec160:
>> (XEN)       Apic 0x00, Pin 15: vec=a0 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN)     IRQ 16 Vec151:
>> (XEN)       Apic 0x00, Pin 16: vec=97 delivery=LoPri dest=L status=0
>> polarity=1 irr=0 trig=L mask=0 dest_id:0
>> (XEN)     IRQ 17 Vec159:
>> (XEN)       Apic 0x00, Pin 17: vec=9f delivery=LoPri dest=L status=0
>> polarity=1 irr=0 trig=L mask=0 dest_id:0
>> (XEN)     IRQ 18 Vec121:
>> (XEN)       Apic 0x00, Pin 18: vec=79 delivery=LoPri dest=L status=0
>> polarity=1 irr=0 trig=L mask=1 dest_id:0
>> (XEN)     IRQ 19 Vec200:
>> (XEN)       Apic 0x00, Pin 19: vec=c8 delivery=LoPri dest=L status=0
>> polarity=1 irr=0 trig=L mask=1 dest_id:0
>> (XEN)     IRQ 20 Vec211:
>> (XEN)       Apic 0x00, Pin 20: vec=d3 delivery=LoPri dest=L status=0
>> polarity=1 irr=0 trig=L mask=0 dest_id:0
>> (XEN)     IRQ 22 Vec 43:
>> (XEN)       Apic 0x00, Pin 22: vec=2b delivery=LoPri dest=L status=0
>> polarity=1 irr=0 trig=L mask=1 dest_id:0
>> (XEN)     IRQ 23 Vec168:
>> (XEN)       Apic 0x00, Pin 23: vec=a8 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=1 dest_id:0
>> (XEN) number of MP IRQ sources: 15.
>> (XEN) number of IO-APIC #2 registers: 24.
>> (XEN) testing the IO APIC.......................
>> (XEN) IO APIC #2......
>> (XEN) .... register #00: 02000000
>> (XEN) .......    : physical APIC id: 02
>> (XEN) .......    : Delivery Type: 0
>> (XEN) .......    : LTS          : 0
>> (XEN) .... register #01: 00170020
>> (XEN) .......     : max redirection entries: 0017
>> (XEN) .......     : PRQ implemented: 0
>> (XEN) .......     : IO APIC version: 0020
>> (XEN) .... IRQ redirection table:
>> (XEN)  NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
>> (XEN)  00 0DC 0C  1    0    0   0   0    1    2    87
>> (XEN)  01 000 00  0    0    0   0   0    1    1    7F
>> (XEN)  02 000 00  0    0    0   0   0    1    1    F0
>> (XEN)  03 000 00  0    0    0   0   0    1    1    40
>> (XEN)  04 000 00  0    0    0   0   0    1    1    F1
>> (XEN)  05 000 00  0    0    0   0   0    1    1    48
>> (XEN)  06 000 00  0    0    0   0   0    1    1    50
>> (XEN)  07 000 00  0    0    0   0   0    1    1    DA
>> (XEN)  08 000 00  0    0    0   0   0    1    1    D8
>> (XEN)  09 000 00  0    1    0   0   0    1    1    87
>> (XEN)  0a 000 00  0    0    0   0   0    1    1    70
>> (XEN)  0b 000 00  0    0    0   0   0    1    1    78
>> (XEN)  0c 000 00  0    0    0   0   0    1    1    8F
>> (XEN)  0d 000 00  1    0    0   0   0    1    1    90
>> (XEN)  0e 000 00  0    0    0   0   0    1    1    98
>> (XEN)  0f 000 00  0    0    0   0   0    1    1    A0
>> (XEN)  10 000 00  0    1    0   1   0    1    1    97
>> (XEN)  11 000 00  0    1    0   1   0    1    1    9F
>> (XEN)  12 000 00  1    1    0   1   0    1    1    79
>> (XEN)  13 000 00  1    1    0   1   0    1    1    C8
>> (XEN)  14 000 00  0    1    0   1   0    1    1    D3
>> (XEN)  15 000 00  1    0    0   0   0    0    0    00
>> (XEN)  16 000 00  1    1    0   1   0    1    1    2B
>> (XEN)  17 000 00  1    0    0   0   0    1    1    A8
>> (XEN) Using vector-based indexing
>> (XEN) IRQ to pin mappings:
>> (XEN) IRQ240 -> 0:2
>> (XEN) IRQ127 -> 0:1
>> (XEN) IRQ64 -> 0:3
>> (XEN) IRQ241 -> 0:4
>> (XEN) IRQ72 -> 0:5
>> (XEN) IRQ80 -> 0:6
>> (XEN) IRQ218 -> 0:7
>> (XEN) IRQ216 -> 0:8
>> (XEN) IRQ135 -> 0:9
>> (XEN) IRQ112 -> 0:10
>> (XEN) IRQ120 -> 0:11
>> (XEN) IRQ143 -> 0:12
>> (XEN) IRQ144 -> 0:13
>> (XEN) IRQ152 -> 0:14
>> (XEN) IRQ160 -> 0:15
>> (XEN) IRQ151 -> 0:16
>> (XEN) IRQ159 -> 0:17
>> (XEN) IRQ121 -> 0:18
>> (XEN) IRQ200 -> 0:19
>> (XEN) IRQ211 -> 0:20
>> (XEN) IRQ43 -> 0:22
>> (XEN) IRQ168 -> 0:23
>> (XEN) .................................... done.
>> (XEN) Xen BUG at io_apic.c:556
>> (XEN) ----[ Xen-4.1.5-rc1  x86_64  debug=y  Tainted:    C ]----
>> (XEN) CPU:    0
>> (XEN) RIP:    e008:[<ffff82c48015e2db>] smp_irq_move_cleanup_interrupt+0x216/0x28e
>> (XEN) RFLAGS: 0000000000010092   CONTEXT: hypervisor
>> (XEN) rax: 0000000000000000   rbx: 00000000000000e9   rcx: 0000000000000000
>> (XEN) rdx: 0000000000000000   rsi: 000000000000000a   rdi: ffff82c4802592e0
>> (XEN) rbp: ffff82c48029ff08   rsp: ffff82c48029feb8   r8:  0000000000000004
>> (XEN) r9:  0000000000000004   r10: 0000000000000004   r11: 0000000000000002
>> (XEN) r12: ffff830421080250   r13: ffff830421060534   r14: ffff82c48029ff18
>> (XEN) r15: ffff82c4802dd9e0   cr0: 000000008005003b   cr4: 00000000000026f0
>> (XEN) cr3: 000000026582c000   cr2: ffff8804020701d8
>> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
>> (XEN) Xen stack trace from rsp=ffff82c48029feb8:
>> (XEN)    0000000000000000 ffff82c48029ff18 ffff82c4802dd9e0 000000e900000000
>> (XEN)    000000000000e02b 0000000000000000 000000004bf51982 00000000000060a9
>> (XEN)    0000000000000000 0000000000000000 00007d3b7fd600c7 ffff82c48014de60
>> (XEN)    0000000000000000 0000000000000000 00000000000060a9 000000004bf51982
>> (XEN)    ffff8802d2665b28 0000000000000000 0000000000000000 0000000000007ff0
>> (XEN)    0000000000000022 0000000000000000 000000024bf57322 0000000001307da0
>> (XEN)    00000000000059a0 0000000000000000 00000000000060a9 0000002000000000
>> (XEN)    ffffffff8123c51a 000000000000e033 0000000000000293 ffff8802d2665b08
>> (XEN)    000000000000e02b 0000000000000000 0000000000000000 0000000000000000
>> (XEN)    0000000000000000 0000000000000000 ffff8300ca9a0000 0000000000000000
>> (XEN)    0000000000000000
>> (XEN) Xen call trace:
>> (XEN)    [<ffff82c48015e2db>] smp_irq_move_cleanup_interrupt+0x216/0x28e
>>
>>
>>
> 
> So vector e9 doesn't appear to be programmed in anywhere.
> 
> I am starting to get more into the realm of guessing here but, can you
> use apic_verbosity=debug on the command line and copy this extra
> debugging logic to send_cleanup_vector()
> 
> You should be able to conditionally trigger it on "desc->arch.vector ==
> 0xe9".  You will probably also want to change the BUG() to a WARN(), so
> we get the interrupt and ioapic information on both sides of the cleanup
> vector, as well as getting the stack trace of the codepath through Xen
> as a result of vector 0xe9.

send_cleanup_vector() doesn't seem to be called with cfg->vector == 0xe9...
Can dom0 mess something here around?

-- 
Best Regards / Pozdrawiam,
Marek Marczykowski
Invisible Things Lab


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 553 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-26 18:50                             ` Andrew Cooper
  2013-03-27  8:50                               ` Marek Marczykowski
@ 2013-03-27  8:52                               ` Jan Beulich
  2013-03-27  9:03                                 ` Jan Beulich
  2013-03-27 14:31                                 ` Marek Marczykowski
  1 sibling, 2 replies; 68+ messages in thread
From: Jan Beulich @ 2013-03-27  8:52 UTC (permalink / raw)
  To: Andrew Cooper, Marek Marczykowski; +Cc: Konrad Rzeszutek Wilk, xen-devel

>>> On 26.03.13 at 19:50, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> So vector e9 doesn't appear to be programmed in anywhere.

Quite obviously, as it's the 8259A vector for IRQ 9. The question
really is why an IRQ appears on that vector in the first place. The
8259A resume code _should_ leave all IRQs masked on a fully
IO-APIC system (see my question raised yesterday).

And that's also why I suggested, for an experiment, to fiddle with
the loop exit condition to exclude legacy vectors (which wouldn't
be a final solution, but would at least tell us whether the direction
is the right one). In the end, besides understanding why an
interrupt on vector E9 gets raised at all, we may also need to
tweak the IRQ migration logic to not do anything on legacy IRQs,
but that would need to happen earlier than in
smp_irq_move_cleanup_interrupt(). Considering that 4.3
apparently doesn't have this problem, we may need to go hunt for
a change that isn't directly connected to this, yet deals with the
problem as a side effect (at least I don't recall any particular fix
since 4.2). One aspect here is the double mapping of legacy IRQs
(once to their IO-APIC vector, and once to their legacy vector,
i.e. vector_irq[] having two entries pointing to the same IRQ).

Jan

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-27  8:50                               ` Marek Marczykowski
@ 2013-03-27  8:58                                 ` Jan Beulich
  0 siblings, 0 replies; 68+ messages in thread
From: Jan Beulich @ 2013-03-27  8:58 UTC (permalink / raw)
  To: Marek Marczykowski; +Cc: Andrew Cooper, Konrad Rzeszutek Wilk, xen-devel

>>> On 27.03.13 at 09:50, Marek Marczykowski <marmarek@invisiblethingslab.com> wrote:
> send_cleanup_vector() doesn't seem to be called with cfg->vector == 0xe9...
> Can dom0 mess something here around?

Of course not - I suppose it is being called for IRQ9 (with whatever
vector the IO-APIC has set for that IRQ at that point in time).

Jan

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-27  8:52                               ` Jan Beulich
@ 2013-03-27  9:03                                 ` Jan Beulich
  2013-03-27 14:01                                   ` Marek Marczykowski
  2013-03-27 14:31                                 ` Marek Marczykowski
  1 sibling, 1 reply; 68+ messages in thread
From: Jan Beulich @ 2013-03-27  9:03 UTC (permalink / raw)
  To: Marek Marczykowski; +Cc: Andrew Cooper, Konrad Rzeszutek Wilk, xen-devel

>>> On 27.03.13 at 09:52, "Jan Beulich" <JBeulich@suse.com> wrote:
>>>> On 26.03.13 at 19:50, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> So vector e9 doesn't appear to be programmed in anywhere.
> 
> Quite obviously, as it's the 8259A vector for IRQ 9. The question
> really is why an IRQ appears on that vector in the first place. The
> 8259A resume code _should_ leave all IRQs masked on a fully
> IO-APIC system (see my question raised yesterday).

So to put this in consumable form: Please log what i8259A_resume()
writes to ports 21 and A1 (i.e. cached_21 and cached_A1), and also
dump those ports' contents at the crash point (i.e. alongside the
dump_irqs()).

Jan

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-27  9:03                                 ` Jan Beulich
@ 2013-03-27 14:01                                   ` Marek Marczykowski
  0 siblings, 0 replies; 68+ messages in thread
From: Marek Marczykowski @ 2013-03-27 14:01 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Konrad Rzeszutek Wilk, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 16174 bytes --]

On 27.03.2013 10:03, Jan Beulich wrote:
>>>> On 27.03.13 at 09:52, "Jan Beulich" <JBeulich@suse.com> wrote:
>>>>> On 26.03.13 at 19:50, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>> So vector e9 doesn't appear to be programmed in anywhere.
>>
>> Quite obviously, as it's the 8259A vector for IRQ 9. The question
>> really is why an IRQ appears on that vector in the first place. The
>> 8259A resume code _should_ leave all IRQs masked on a fully
>> IO-APIC system (see my question raised yesterday).
> 
> So to put this in consumable form: Please log what i8259A_resume()
> writes to ports 21 and A1 (i.e. cached_21 and cached_A1), and also
> dump those ports' contents at the crash point (i.e. alongside the
> dump_irqs()).

I've noticed that not all messages are available on serial console, especially
nothing from inside of i8259A_resume(). So changed BUG to WARN and got some
additional lines.

Ports: 21:0xfb, A1:0xff (the same in i8259A_resume() as at crash point).

Part of
http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-failed-resume-dump-irqs3.log:
(XEN) Preparing system for ACPI S3 state.
(XEN) Disabling non-boot CPUs ...
(XEN) Broke affinity for irq 1
(XEN) Broke affinity for irq 12
(XEN) Broke affinity for irq 17
(XEN) [VT-D]intremap.c:552: remap_entry_to_msi_msg: index (65535) get an empty
entry!
(XEN) Broke affinity for irq 27
(XEN) Broke affinity for irq 1
(XEN) Broke affinity for irq 7
(XEN) Broke affinity for irq 9
(XEN) Broke affinity for irq 16
(XEN) Broke affinity for irq 20
(XEN) [VT-D]intremap.c:552: remap_entry_to_msi_msg: index (65535) get an empty
entry!
(XEN) Broke affinity for irq 32
(XEN) Broke affinity for irq 36
(XEN) Broke affinity for irq 1
(XEN) Broke affinity for irq 7
(XEN) Broke affinity for irq 20
(XEN) [VT-D]intremap.c:552: remap_entry_to_msi_msg: index (65535) get an empty
entry!
(XEN) Broke affinity for irq 28
(XEN) Broke affinity for irq 29
(XEN) Broke affinity for irq 30
(XEN) Broke affinity for irq 31
(XEN) Entering ACPI S3 state.
(XEN) i8259A_suspend: cached_21: 0xfb, cached_A1: 0xff
(XEN) i8259A_resume: cached_21: 0xfb, cached_A1: 0xff
(XEN) mce_intel.c:1162: MCA Capability: BCAST 1 SER 0 CMCI 1 firstbank 0
extended MCE MSR 0
(XEN) CPU0 CMCI LVT vector (0xf7) already installed
(XEN) CPU0: Thermal LVT vector (0xfa) already installed
(XEN) Finishing wakeup from ACPI S3 state.
(XEN) Enabling non-boot CPUs  ...
(XEN) Suppress EOI broadcast on CPU#1
(XEN) masked ExtINT on CPU#1
(XEN) Suppress EOI broadcast on CPU#2
(XEN) masked ExtINT on CPU#2
(XEN) Suppress EOI broadcast on CPU#3
(XEN) masked ExtINT on CPU#3
(XEN) *** IRQ BUG found ***
(XEN) CPU0 -Testing vector 233 from bitmap
44,49,57,64,68,72,76,80,84,88,96,100,108,112,120,122,144,152,154,160,168,192,194,200,208,211,218-219
(XEN) Guest interrupt information:
(XEN)    IRQ:   0 affinity:00000000,00000000,00000000,00000001 vec:f0
type=IO-APIC-edge    status=00000000 mapped, unbound
(XEN)    IRQ:   1 affinity:00000000,00000000,00000000,00000002 vec:db
type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0:  1(-S--),
(XEN)    IRQ:   2 affinity:ffffffff,ffffffff,ffffffff,ffffffff vec:e2
type=XT-PIC          status=00000000 mapped, unbound
(XEN)    IRQ:   3 affinity:00000000,00000000,00000000,00000001 vec:40
type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:   4 affinity:00000000,00000000,00000000,00000001 vec:f1
type=IO-APIC-edge    status=00000000 mapped, unbound
(XEN)    IRQ:   5 affinity:00000000,00000000,00000000,00000001 vec:48
type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:   6 affinity:00000000,00000000,00000000,00000001 vec:50
type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:   7 affinity:00000000,00000000,00000000,00000004 vec:7a
type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0:  7(-S--),
(XEN)    IRQ:   8 affinity:00000000,00000000,00000000,00000001 vec:60
type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0:  8(-S--),
(XEN)    IRQ:   9 affinity:00000000,00000000,00000000,00000001 vec:64
type=IO-APIC-level   status=00000010 in-flight=0 domain-list=0:  9(-S--),
(XEN)    IRQ:  10 affinity:00000000,00000000,00000000,00000001 vec:70
type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  11 affinity:00000000,00000000,00000000,00000001 vec:78
type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  12 affinity:00000000,00000000,00000000,00000001 vec:4c
type=IO-APIC-edge    status=00000050 in-flight=0 domain-list=0: 12(-S--),
(XEN)    IRQ:  13 affinity:00000000,00000000,00000000,0000000f vec:90
type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  14 affinity:00000000,00000000,00000000,00000001 vec:98
type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  15 affinity:00000000,00000000,00000000,00000001 vec:a0
type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  16 affinity:00000000,00000000,00000000,00000001 vec:6c
type=IO-APIC-level   status=00000050 in-flight=0 domain-list=0: 16(-S--),
(XEN)    IRQ:  17 affinity:00000000,00000000,00000000,00000001 vec:54
type=IO-APIC-level   status=00000050 in-flight=0 domain-list=0: 17(-S--),
(XEN)    IRQ:  18 affinity:00000000,00000000,00000000,00000008 vec:39
type=IO-APIC-level   status=00000002 mapped, unbound
(XEN)    IRQ:  19 affinity:00000000,00000000,00000000,0000000f vec:c8
type=IO-APIC-level   status=00000002 mapped, unbound
(XEN)    IRQ:  20 affinity:00000000,00000000,00000000,00000004 vec:da
type=IO-APIC-level   status=00000050 in-flight=0 domain-list=0: 20(-S--),
(XEN)    IRQ:  22 affinity:00000000,00000000,00000000,0000000f vec:9a
type=IO-APIC-level   status=00000002 mapped, unbound
(XEN)    IRQ:  23 affinity:00000000,00000000,00000000,0000000f vec:a8
type=IO-APIC-edge    status=00000002 mapped, unbound
(XEN)    IRQ:  24 affinity:00000000,00000000,00000000,00000001 vec:28
type=DMA_MSI         status=00000000 mapped, unbound
(XEN)    IRQ:  25 affinity:00000000,00000000,00000000,00000001 vec:30
type=DMA_MSI         status=00000000 mapped, unbound
(XEN)    IRQ:  26 affinity:00000000,00000000,00000000,00000004 vec:3c
type=PCI-MSI         status=00000002 mapped, unbound
(XEN)    IRQ:  27 affinity:ffffffff,ffffffff,ffffffff,ffffffff vec:9c
type=PCI-MSI         status=00000042 mapped, unbound
(XEN)    IRQ:  28 affinity:ffffffff,ffffffff,ffffffff,ffffffff vec:a4
type=PCI-MSI         status=00000042 mapped, unbound
(XEN)    IRQ:  29 affinity:ffffffff,ffffffff,ffffffff,ffffffff vec:ac
type=PCI-MSI         status=00000042 mapped, unbound
(XEN)    IRQ:  32 affinity:00000000,00000000,00000000,00000001 vec:74
type=PCI-MSI         status=00000050 in-flight=0 domain-list=0:273(-S--),
(XEN)    IRQ:  33 affinity:00000000,00000000,00000000,00000004 vec:8c
type=PCI-MSI         status=00000010 in-flight=0 domain-list=0:272(PS--),
(XEN)    IRQ:  34 affinity:00000000,00000000,00000000,00000001 vec:94
type=PCI-MSI         status=00000010 in-flight=0 domain-list=0:271(-S--),
(XEN)    IRQ:  35 affinity:00000000,00000000,00000000,00000004 vec:d9
type=PCI-MSI         status=00000042 mapped, unbound
(XEN)    IRQ:  36 affinity:00000000,00000000,00000000,00000001 vec:7c
type=PCI-MSI         status=00000050 in-flight=0 domain-list=1: 54(-S--),
(XEN) IO-APIC interrupt information:
(XEN)     IRQ  0 Vec240:
(XEN)       Apic 0x00, Pin  2: vec=f0 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  1 Vec219:
(XEN)       Apic 0x00, Pin  1: vec=db delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  3 Vec 64:
(XEN)       Apic 0x00, Pin  3: vec=40 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  4 Vec241:
(XEN)       Apic 0x00, Pin  4: vec=f1 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  5 Vec 72:
(XEN)       Apic 0x00, Pin  5: vec=48 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  6 Vec 80:
(XEN)       Apic 0x00, Pin  6: vec=50 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  7 Vec122:
(XEN)       Apic 0x00, Pin  7: vec=7a delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  8 Vec 96:
(XEN)       Apic 0x00, Pin  8: vec=60 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ  9 Vec100:
(XEN)       Apic 0x00, Pin  9: vec=64 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=L mask=0 dest_id:0
(XEN)     IRQ 10 Vec112:
(XEN)       Apic 0x00, Pin 10: vec=70 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 11 Vec120:
(XEN)       Apic 0x00, Pin 11: vec=78 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 12 Vec 76:
(XEN)       Apic 0x00, Pin 12: vec=4c delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 13 Vec144:
(XEN)       Apic 0x00, Pin 13: vec=90 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=1 dest_id:0
(XEN)     IRQ 14 Vec152:
(XEN)       Apic 0x00, Pin 14: vec=98 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 15 Vec160:
(XEN)       Apic 0x00, Pin 15: vec=a0 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN)     IRQ 16 Vec108:
(XEN)       Apic 0x00, Pin 16: vec=6c delivery=LoPri dest=L status=0
polarity=1 irr=0 trig=L mask=0 dest_id:0
(XEN)     IRQ 17 Vec 84:
(XEN)       Apic 0x00, Pin 17: vec=54 delivery=LoPri dest=L status=0
polarity=1 irr=0 trig=L mask=0 dest_id:0
(XEN)     IRQ 18 Vec 57:
(XEN)       Apic 0x00, Pin 18: vec=39 delivery=LoPri dest=L status=0
polarity=1 irr=0 trig=L mask=1 dest_id:0
(XEN)     IRQ 19 Vec200:
(XEN)       Apic 0x00, Pin 19: vec=c8 delivery=LoPri dest=L status=0
polarity=1 irr=0 trig=L mask=1 dest_id:0
(XEN)     IRQ 20 Vec218:
(XEN)       Apic 0x00, Pin 20: vec=da delivery=LoPri dest=L status=0
polarity=1 irr=0 trig=L mask=0 dest_id:0
(XEN)     IRQ 22 Vec154:
(XEN)       Apic 0x00, Pin 22: vec=9a delivery=LoPri dest=L status=0
polarity=1 irr=0 trig=L mask=1 dest_id:0
(XEN)     IRQ 23 Vec168:
(XEN)       Apic 0x00, Pin 23: vec=a8 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=1 dest_id:0
(XEN) number of MP IRQ sources: 15.
(XEN) number of IO-APIC #2 registers: 24.
(XEN) testing the IO APIC.......................
(XEN) IO APIC #2......
(XEN) .... register #00: 02000000
(XEN) .......    : physical APIC id: 02
(XEN) .......    : Delivery Type: 0
(XEN) .......    : LTS          : 0
(XEN) .... register #01: 00170020
(XEN) .......     : max redirection entries: 0017
(XEN) .......     : PRQ implemented: 0
(XEN) .......     : IO APIC version: 0020
(XEN) .... IRQ redirection table:
(XEN)  NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
(XEN)  00 000 00  1    0    0   0   0    0    0    00
(XEN)  01 000 00  0    0    0   0   0    1    1    DB
(XEN)  02 000 00  0    0    0   0   0    1    1    F0
(XEN)  03 000 00  0    0    0   0   0    1    1    40
(XEN)  04 000 00  0    0    0   0   0    1    1    F1
(XEN)  05 000 00  0    0    0   0   0    1    1    48
(XEN)  06 000 00  0    0    0   0   0    1    1    50
(XEN)  07 000 00  0    0    0   0   0    1    1    7A
(XEN)  08 000 00  0    0    0   0   0    1    1    60
(XEN)  09 000 00  0    1    0   0   0    1    1    64
(XEN)  0a 000 00  0    0    0   0   0    1    1    70
(XEN)  0b 000 00  0    0    0   0   0    1    1    78
(XEN)  0c 000 00  0    0    0   0   0    1    1    4C
(XEN)  0d 000 00  1    0    0   0   0    1    1    90
(XEN)  0e 000 00  0    0    0   0   0    1    1    98
(XEN)  0f 000 00  0    0    0   0   0    1    1    A0
(XEN)  10 000 00  0    1    0   1   0    1    1    6C
(XEN)  11 000 00  0    1    0   1   0    1    1    54
(XEN)  12 000 00  1    1    0   1   0    1    1    39
(XEN)  13 000 00  1    1    0   1   0    1    1    C8
(XEN)  14 000 00  0    1    0   1   0    1    1    DA
(XEN)  15 000 00  1    0    0   0   0    0    0    00
(XEN)  16 000 00  1    1    0   1   0    1    1    9A
(XEN)  17 000 00  1    0    0   0   0    1    1    A8
(XEN) Using vector-based indexing
(XEN) IRQ to pin mappings:
(XEN) IRQ240 -> 0:2
(XEN) IRQ219 -> 0:1
(XEN) IRQ64 -> 0:3
(XEN) IRQ241 -> 0:4
(XEN) IRQ72 -> 0:5
(XEN) IRQ80 -> 0:6
(XEN) IRQ122 -> 0:7
(XEN) IRQ96 -> 0:8
(XEN) IRQ100 -> 0:9
(XEN) IRQ112 -> 0:10
(XEN) IRQ120 -> 0:11
(XEN) IRQ76 -> 0:12
(XEN) IRQ144 -> 0:13
(XEN) IRQ152 -> 0:14
(XEN) IRQ160 -> 0:15
(XEN) IRQ108 -> 0:16
(XEN) IRQ84 -> 0:17
(XEN) IRQ57 -> 0:18
(XEN) IRQ200 -> 0:19
(XEN) IRQ218 -> 0:20
(XEN) IRQ154 -> 0:22
(XEN) IRQ168 -> 0:23
(XEN) .................................... done.
(XEN) i8259: 21: 0xfb, A1: 0xff
(XEN) Xen WARN at io_apic.c:558
(XEN) ----[ Xen-4.1.5-rc1  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82c48015e341>] smp_irq_move_cleanup_interrupt+0x23c/0x2bc
(XEN) RFLAGS: 0000000000010086   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: 00000000000000e9   rcx: 0000000000000000
(XEN) rdx: 0000000000000000   rsi: 000000000000000a   rdi: ffff82c4802592e0
(XEN) rbp: ffff82c48029fb58   rsp: ffff82c48029fb08   r8:  0000000000000004
(XEN) r9:  0000000000000001   r10: 00000000000000ff   r11: 0000000000000002
(XEN) r12: ffff830421080250   r13: ffff830421060534   r14: ffff82c48029ff18
(XEN) r15: ffff82c4802dd9e0   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 000000037e7a8000   cr2: ffff880402070318
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff82c48029fb08:
(XEN)    0000000000000000 0000000000000008 ffff82c48029ff18 ffff82c4802dd9e0
(XEN)    ffff82c48029fb58 0000000000000004 0000000000000000 0000000080030014
(XEN)    0000000000000000 0000000000000000 00007d3b7fd60477 ffff82c48014de60
(XEN)    0000000000000000 0000000000000000 0000000080030014 0000000000000000
(XEN)    ffff82c48029fc18 0000000000000004 0000000000000246 0000000000000000
(XEN)    00000000ffffffff 00000000ffffffff 0000000000000000 0000000000000001
(XEN)    0000000000000cfc 0000000000000282 ffff82c48025a9c0 0000002000000000
(XEN)    ffff82c4801226c0 000000000000e008 0000000000000282 ffff82c48029fc18
(XEN)    000000000000e010 0000000000000282 ffff82c48029fc48 ffff82c480175950
(XEN)    0000000000000202 0000000000000006 0000000000000010 00000000e2200004
(XEN)    ffff82c48029fc68 ffff82c4802105dc ffff82c48029fc78 ffff82c480122614
(XEN)    ffff82c48029fcc8 ffff82c480160183 ffff82c48029fca8 ffff82c480175950
(XEN)    000082c4ffffffff 0000000000000003 ffff8301108fd1c0 ffff830421050ac0
(XEN)    ffff8301108fd1c0 0000000000000000 0000000000000000 0000000000000003
(XEN)    ffff82c48029fd58 ffff82c48016033a 000000000000002f 0000000000000082
(XEN)    000782c48029fd08 ffff82c48029fe10 0000006a00000008 ffff82c48029fe78
(XEN)    0000000300000068 0000000000000000 0000000000002000 ffff82c4ffffffff
(XEN)    ffff82c48029fe10 ffff82c48029fe78 ffff82c48029fe10 ffff830421050ac0
(XEN)    0000000000000000 000000000000001e ffff82c48029fdc8 ffff82c4801610ef
(XEN)    ffff82c48029fdb8 ffff82c480115ec5 0000000000000293 ffff83042100a1f8
(XEN) Xen call trace:
(XEN)    [<ffff82c48015e341>] smp_irq_move_cleanup_interrupt+0x23c/0x2bc
(XEN)    [<ffff82c48014de60>] irq_move_cleanup_interrupt+0x30/0x40
(XEN)    [<ffff82c4801226c0>] _spin_unlock_irqrestore+0x22/0x24
(XEN)    [<ffff82c480175950>] pci_conf_read+0xb0/0xc1
(XEN)    [<ffff82c4802105dc>] pci_conf_read32+0x7c/0x7e
(XEN)    [<ffff82c480160183>] read_pci_mem_bar+0x2b0/0x303
(XEN)    [<ffff82c48016033a>] msix_capability_init+0x164/0x5fa
(XEN)    [<ffff82c4801610ef>] pci_enable_msi+0x19b/0x49b
(XEN)    [<ffff82c4801643bd>] map_domain_pirq+0x281/0x3df
(XEN)    [<ffff82c4801765cb>] do_physdev_op+0xa2b/0x1508
(XEN)    [<ffff82c480209fa8>] syscall_enter+0xc8/0x122
(XEN)


-- 
Best Regards / Pozdrawiam,
Marek Marczykowski
Invisible Things Lab


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 553 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-27  8:52                               ` Jan Beulich
  2013-03-27  9:03                                 ` Jan Beulich
@ 2013-03-27 14:31                                 ` Marek Marczykowski
  2013-03-27 14:46                                   ` Andrew Cooper
                                                     ` (2 more replies)
  1 sibling, 3 replies; 68+ messages in thread
From: Marek Marczykowski @ 2013-03-27 14:31 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Konrad Rzeszutek Wilk, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2044 bytes --]

On 27.03.2013 09:52, Jan Beulich wrote:
>>>> On 26.03.13 at 19:50, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> So vector e9 doesn't appear to be programmed in anywhere.
> 
> Quite obviously, as it's the 8259A vector for IRQ 9. The question
> really is why an IRQ appears on that vector in the first place. The
> 8259A resume code _should_ leave all IRQs masked on a fully
> IO-APIC system (see my question raised yesterday).
> 
> And that's also why I suggested, for an experiment, to fiddle with
> the loop exit condition to exclude legacy vectors (which wouldn't
> be a final solution, but would at least tell us whether the direction
> is the right one). In the end, besides understanding why an
> interrupt on vector E9 gets raised at all, we may also need to
> tweak the IRQ migration logic to not do anything on legacy IRQs,
> but that would need to happen earlier than in
> smp_irq_move_cleanup_interrupt(). Considering that 4.3
> apparently doesn't have this problem, we may need to go hunt for
> a change that isn't directly connected to this, yet deals with the
> problem as a side effect (at least I don't recall any particular fix
> since 4.2). One aspect here is the double mapping of legacy IRQs
> (once to their IO-APIC vector, and once to their legacy vector,
> i.e. vector_irq[] having two entries pointing to the same IRQ).

So tried change loop condition to LAST_DYNAMIC_VECTOR and it doesn't hit that
BUG/ASSERT. But still it doesn't work - only CPU0 used by scheduler, also some
errors from dom0 kernel, and errors about PCI devices used by domU(1).

Messages from resume (different tries):
http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-last-dynamic-vector.log
http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-last-dynamic-vector2.log

Also one time I've got fatal page fault error, earlier in resume (it isn't
deterministic):
http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-resume-page-fault.log

-- 
Best Regards / Pozdrawiam,
Marek Marczykowski
Invisible Things Lab


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 553 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-27 14:31                                 ` Marek Marczykowski
@ 2013-03-27 14:46                                   ` Andrew Cooper
  2013-03-27 14:49                                     ` Marek Marczykowski
  2013-03-27 14:52                                     ` Andrew Cooper
  2013-03-28 16:13                                   ` Jan Beulich
  2013-03-28 16:25                                   ` Jan Beulich
  2 siblings, 2 replies; 68+ messages in thread
From: Andrew Cooper @ 2013-03-27 14:46 UTC (permalink / raw)
  To: Marek Marczykowski; +Cc: Konrad Rzeszutek Wilk, Jan Beulich, xen-devel

On 27/03/2013 14:31, Marek Marczykowski wrote:
> On 27.03.2013 09:52, Jan Beulich wrote:
>>>>> On 26.03.13 at 19:50, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>> So vector e9 doesn't appear to be programmed in anywhere.
>> Quite obviously, as it's the 8259A vector for IRQ 9. The question
>> really is why an IRQ appears on that vector in the first place. The
>> 8259A resume code _should_ leave all IRQs masked on a fully
>> IO-APIC system (see my question raised yesterday).
>>
>> And that's also why I suggested, for an experiment, to fiddle with
>> the loop exit condition to exclude legacy vectors (which wouldn't
>> be a final solution, but would at least tell us whether the direction
>> is the right one). In the end, besides understanding why an
>> interrupt on vector E9 gets raised at all, we may also need to
>> tweak the IRQ migration logic to not do anything on legacy IRQs,
>> but that would need to happen earlier than in
>> smp_irq_move_cleanup_interrupt(). Considering that 4.3
>> apparently doesn't have this problem, we may need to go hunt for
>> a change that isn't directly connected to this, yet deals with the
>> problem as a side effect (at least I don't recall any particular fix
>> since 4.2). One aspect here is the double mapping of legacy IRQs
>> (once to their IO-APIC vector, and once to their legacy vector,
>> i.e. vector_irq[] having two entries pointing to the same IRQ).
> So tried change loop condition to LAST_DYNAMIC_VECTOR and it doesn't hit that
> BUG/ASSERT. But still it doesn't work - only CPU0 used by scheduler, also some
> errors from dom0 kernel, and errors about PCI devices used by domU(1).
>
> Messages from resume (different tries):
> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-last-dynamic-vector.log
> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-last-dynamic-vector2.log
>
> Also one time I've got fatal page fault error, earlier in resume (it isn't
> deterministic):
> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-resume-page-fault.log
>

This pagefault is a Null structure pointer dereference, likely the
scheduling data.  At a first glance, it looks related to the assertion
failures I have been seeing sporadically in testing, but unable to
reproduce reliably.  There seems to be something quite dodgy with
interaction of vcpu_wake and scheduling loops.

The other logs indicate that dom0 appears to have a domain id of 1,
which is sure to cause problems.

As for locating the cause of the legacy vectors, it might be a good idea
to stick a printk at the top of do_IRQ() which indicates an interrupt
with vector between 0xe0 and 0xef.  This might at least indicate whether
legacy vectors are genuinely being delivered, or whether we have some
memory corruption causing these effects.

~Andrew

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-27 14:46                                   ` Andrew Cooper
@ 2013-03-27 14:49                                     ` Marek Marczykowski
  2013-03-27 15:51                                       ` Marek Marczykowski
  2013-03-27 14:52                                     ` Andrew Cooper
  1 sibling, 1 reply; 68+ messages in thread
From: Marek Marczykowski @ 2013-03-27 14:49 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Konrad Rzeszutek Wilk, Jan Beulich, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3139 bytes --]

On 27.03.2013 15:46, Andrew Cooper wrote:
> On 27/03/2013 14:31, Marek Marczykowski wrote:
>> On 27.03.2013 09:52, Jan Beulich wrote:
>>>>>> On 26.03.13 at 19:50, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>>> So vector e9 doesn't appear to be programmed in anywhere.
>>> Quite obviously, as it's the 8259A vector for IRQ 9. The question
>>> really is why an IRQ appears on that vector in the first place. The
>>> 8259A resume code _should_ leave all IRQs masked on a fully
>>> IO-APIC system (see my question raised yesterday).
>>>
>>> And that's also why I suggested, for an experiment, to fiddle with
>>> the loop exit condition to exclude legacy vectors (which wouldn't
>>> be a final solution, but would at least tell us whether the direction
>>> is the right one). In the end, besides understanding why an
>>> interrupt on vector E9 gets raised at all, we may also need to
>>> tweak the IRQ migration logic to not do anything on legacy IRQs,
>>> but that would need to happen earlier than in
>>> smp_irq_move_cleanup_interrupt(). Considering that 4.3
>>> apparently doesn't have this problem, we may need to go hunt for
>>> a change that isn't directly connected to this, yet deals with the
>>> problem as a side effect (at least I don't recall any particular fix
>>> since 4.2). One aspect here is the double mapping of legacy IRQs
>>> (once to their IO-APIC vector, and once to their legacy vector,
>>> i.e. vector_irq[] having two entries pointing to the same IRQ).
>> So tried change loop condition to LAST_DYNAMIC_VECTOR and it doesn't hit that
>> BUG/ASSERT. But still it doesn't work - only CPU0 used by scheduler, also some
>> errors from dom0 kernel, and errors about PCI devices used by domU(1).
>>
>> Messages from resume (different tries):
>> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-last-dynamic-vector.log
>> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-last-dynamic-vector2.log
>>
>> Also one time I've got fatal page fault error, earlier in resume (it isn't
>> deterministic):
>> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-resume-page-fault.log
>>
> 
> This pagefault is a Null structure pointer dereference, likely the
> scheduling data.  At a first glance, it looks related to the assertion
> failures I have been seeing sporadically in testing, but unable to
> reproduce reliably.  There seems to be something quite dodgy with
> interaction of vcpu_wake and scheduling loops.
> 
> The other logs indicate that dom0 appears to have a domain id of 1,
> which is sure to cause problems.

Perhaps not - domain 1 exists and have some PCI devices assigned (namely two
network adapters).

> As for locating the cause of the legacy vectors, it might be a good idea
> to stick a printk at the top of do_IRQ() which indicates an interrupt
> with vector between 0xe0 and 0xef.  This might at least indicate whether
> legacy vectors are genuinely being delivered, or whether we have some
> memory corruption causing these effects.

Ok, will try something like this.

-- 
Best Regards / Pozdrawiam,
Marek Marczykowski
Invisible Things Lab


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 553 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-27 14:46                                   ` Andrew Cooper
  2013-03-27 14:49                                     ` Marek Marczykowski
@ 2013-03-27 14:52                                     ` Andrew Cooper
  2013-03-27 15:47                                       ` Konrad Rzeszutek Wilk
  1 sibling, 1 reply; 68+ messages in thread
From: Andrew Cooper @ 2013-03-27 14:52 UTC (permalink / raw)
  To: Marek Marczykowski; +Cc: xen-devel, Jan Beulich, Konrad Rzeszutek Wilk

On 27/03/2013 14:46, Andrew Cooper wrote:
> On 27/03/2013 14:31, Marek Marczykowski wrote:
>> On 27.03.2013 09:52, Jan Beulich wrote:
>>>>>> On 26.03.13 at 19:50, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>>> So vector e9 doesn't appear to be programmed in anywhere.
>>> Quite obviously, as it's the 8259A vector for IRQ 9. The question
>>> really is why an IRQ appears on that vector in the first place. The
>>> 8259A resume code _should_ leave all IRQs masked on a fully
>>> IO-APIC system (see my question raised yesterday).
>>>
>>> And that's also why I suggested, for an experiment, to fiddle with
>>> the loop exit condition to exclude legacy vectors (which wouldn't
>>> be a final solution, but would at least tell us whether the direction
>>> is the right one). In the end, besides understanding why an
>>> interrupt on vector E9 gets raised at all, we may also need to
>>> tweak the IRQ migration logic to not do anything on legacy IRQs,
>>> but that would need to happen earlier than in
>>> smp_irq_move_cleanup_interrupt(). Considering that 4.3
>>> apparently doesn't have this problem, we may need to go hunt for
>>> a change that isn't directly connected to this, yet deals with the
>>> problem as a side effect (at least I don't recall any particular fix
>>> since 4.2). One aspect here is the double mapping of legacy IRQs
>>> (once to their IO-APIC vector, and once to their legacy vector,
>>> i.e. vector_irq[] having two entries pointing to the same IRQ).
>> So tried change loop condition to LAST_DYNAMIC_VECTOR and it doesn't hit that
>> BUG/ASSERT. But still it doesn't work - only CPU0 used by scheduler, also some
>> errors from dom0 kernel, and errors about PCI devices used by domU(1).
>>
>> Messages from resume (different tries):
>> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-last-dynamic-vector.log
>> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-last-dynamic-vector2.log
>>
>> Also one time I've got fatal page fault error, earlier in resume (it isn't
>> deterministic):
>> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-resume-page-fault.log
>>
> This pagefault is a Null structure pointer dereference, likely the
> scheduling data.  At a first glance, it looks related to the assertion
> failures I have been seeing sporadically in testing, but unable to
> reproduce reliably.  There seems to be something quite dodgy with
> interaction of vcpu_wake and scheduling loops.
>
> The other logs indicate that dom0 appears to have a domain id of 1,
> which is sure to cause problems.

Actually - ignore this

>From the log,

(XEN) physdev.c:153: dom0: can't create irq for msi!
[  113.637037] xhci_hcd 0000:03:00.0: xen map irq failed -22 for 32752
domain
(XEN) physdev.c:153: dom0: can't create irq for msi!
[  113.657911] xhci_hcd 0000:03:00.0: xen map irq failed -22 for 32752
domain

and later

(XEN) physdev.c:153: dom1: can't create irq for msi!
[  121.909814] pciback 0000:00:19.0: xen map irq failed -22 for 1 domain
[  121.954080] error enable msi for guest 1 status ffffffea
(XEN) physdev.c:153: dom1: can't create irq for msi!
[  122.035355] pciback 0000:00:19.0: xen map irq failed -22 for 1 domain
[  122.044421] error enable msi for guest 1 status ffffffea

I think that there is a separate bug where mapped irqs are not unmapped
on the suspend path.

>
> As for locating the cause of the legacy vectors, it might be a good idea
> to stick a printk at the top of do_IRQ() which indicates an interrupt
> with vector between 0xe0 and 0xef.  This might at least indicate whether
> legacy vectors are genuinely being delivered, or whether we have some
> memory corruption causing these effects.
>
> ~Andrew
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-27 14:52                                     ` Andrew Cooper
@ 2013-03-27 15:47                                       ` Konrad Rzeszutek Wilk
  2013-03-27 16:56                                         ` Andrew Cooper
  0 siblings, 1 reply; 68+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-27 15:47 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Marek Marczykowski, Jan Beulich, xen-devel

On Wed, Mar 27, 2013 at 02:52:14PM +0000, Andrew Cooper wrote:
> On 27/03/2013 14:46, Andrew Cooper wrote:
> > On 27/03/2013 14:31, Marek Marczykowski wrote:
> >> On 27.03.2013 09:52, Jan Beulich wrote:
> >>>>>> On 26.03.13 at 19:50, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> >>>> So vector e9 doesn't appear to be programmed in anywhere.
> >>> Quite obviously, as it's the 8259A vector for IRQ 9. The question
> >>> really is why an IRQ appears on that vector in the first place. The
> >>> 8259A resume code _should_ leave all IRQs masked on a fully
> >>> IO-APIC system (see my question raised yesterday).
> >>>
> >>> And that's also why I suggested, for an experiment, to fiddle with
> >>> the loop exit condition to exclude legacy vectors (which wouldn't
> >>> be a final solution, but would at least tell us whether the direction
> >>> is the right one). In the end, besides understanding why an
> >>> interrupt on vector E9 gets raised at all, we may also need to
> >>> tweak the IRQ migration logic to not do anything on legacy IRQs,
> >>> but that would need to happen earlier than in
> >>> smp_irq_move_cleanup_interrupt(). Considering that 4.3
> >>> apparently doesn't have this problem, we may need to go hunt for
> >>> a change that isn't directly connected to this, yet deals with the
> >>> problem as a side effect (at least I don't recall any particular fix
> >>> since 4.2). One aspect here is the double mapping of legacy IRQs
> >>> (once to their IO-APIC vector, and once to their legacy vector,
> >>> i.e. vector_irq[] having two entries pointing to the same IRQ).
> >> So tried change loop condition to LAST_DYNAMIC_VECTOR and it doesn't hit that
> >> BUG/ASSERT. But still it doesn't work - only CPU0 used by scheduler, also some
> >> errors from dom0 kernel, and errors about PCI devices used by domU(1).
> >>
> >> Messages from resume (different tries):
> >> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-last-dynamic-vector.log
> >> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-last-dynamic-vector2.log
> >>
> >> Also one time I've got fatal page fault error, earlier in resume (it isn't
> >> deterministic):
> >> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-resume-page-fault.log
> >>
> > This pagefault is a Null structure pointer dereference, likely the
> > scheduling data.  At a first glance, it looks related to the assertion
> > failures I have been seeing sporadically in testing, but unable to
> > reproduce reliably.  There seems to be something quite dodgy with
> > interaction of vcpu_wake and scheduling loops.
> >
> > The other logs indicate that dom0 appears to have a domain id of 1,
> > which is sure to cause problems.
> 
> Actually - ignore this
> 
> >From the log,
> 
> (XEN) physdev.c:153: dom0: can't create irq for msi!
> [  113.637037] xhci_hcd 0000:03:00.0: xen map irq failed -22 for 32752
> domain
> (XEN) physdev.c:153: dom0: can't create irq for msi!
> [  113.657911] xhci_hcd 0000:03:00.0: xen map irq failed -22 for 32752
> domain
> 
> and later
> 
> (XEN) physdev.c:153: dom1: can't create irq for msi!
> [  121.909814] pciback 0000:00:19.0: xen map irq failed -22 for 1 domain
> [  121.954080] error enable msi for guest 1 status ffffffea
> (XEN) physdev.c:153: dom1: can't create irq for msi!
> [  122.035355] pciback 0000:00:19.0: xen map irq failed -22 for 1 domain
> [  122.044421] error enable msi for guest 1 status ffffffea
> 
> I think that there is a separate bug where mapped irqs are not unmapped
> on the suspend path.

You thinking this is a Linux (xen irq machinery) issue? Meaning it should
end up calling PHYSDEV_unmap_pirq as part of the suspend process?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-27 14:49                                     ` Marek Marczykowski
@ 2013-03-27 15:51                                       ` Marek Marczykowski
  2013-03-27 16:27                                         ` Andrew Cooper
  0 siblings, 1 reply; 68+ messages in thread
From: Marek Marczykowski @ 2013-03-27 15:51 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Konrad Rzeszutek Wilk, Jan Beulich, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 674 bytes --]

On 27.03.2013 15:49, Marek Marczykowski wrote:
> On 27.03.2013 15:46, Andrew Cooper wrote:
>> As for locating the cause of the legacy vectors, it might be a good idea
>> to stick a printk at the top of do_IRQ() which indicates an interrupt
>> with vector between 0xe0 and 0xef.  This might at least indicate whether
>> legacy vectors are genuinely being delivered, or whether we have some
>> memory corruption causing these effects.
> 
> Ok, will try something like this.

Nothing interesting here...
Only vector 0xf1 for irq 4 and 0xf0 for irq 0 (which match irq dump information).

-- 
Best Regards / Pozdrawiam,
Marek Marczykowski
Invisible Things Lab


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 553 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-27 15:51                                       ` Marek Marczykowski
@ 2013-03-27 16:27                                         ` Andrew Cooper
  2013-03-27 18:16                                           ` Marek Marczykowski
  2013-03-28 10:50                                           ` Jan Beulich
  0 siblings, 2 replies; 68+ messages in thread
From: Andrew Cooper @ 2013-03-27 16:27 UTC (permalink / raw)
  To: Marek Marczykowski; +Cc: Konrad Rzeszutek Wilk, Jan Beulich, xen-devel

On 27/03/2013 15:51, Marek Marczykowski wrote:
> On 27.03.2013 15:49, Marek Marczykowski wrote:
>> On 27.03.2013 15:46, Andrew Cooper wrote:
>>> As for locating the cause of the legacy vectors, it might be a good idea
>>> to stick a printk at the top of do_IRQ() which indicates an interrupt
>>> with vector between 0xe0 and 0xef.  This might at least indicate whether
>>> legacy vectors are genuinely being delivered, or whether we have some
>>> memory corruption causing these effects.
>> Ok, will try something like this.
> Nothing interesting here...
> Only vector 0xf1 for irq 4 and 0xf0 for irq 0 (which match irq dump information).
>

Even in the case where we hit the original assertion?

If so, then all I can thing is that the move_pending flag for that
specific GSI has been corrupted in memory somehow.

I wonder if hexdumping irq_desc[9] after setup, before sleep, on resume
and in the case of the assertion failure might give some hints.

~Andrew

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-27 15:47                                       ` Konrad Rzeszutek Wilk
@ 2013-03-27 16:56                                         ` Andrew Cooper
  2013-03-27 17:15                                           ` Marek Marczykowski
  0 siblings, 1 reply; 68+ messages in thread
From: Andrew Cooper @ 2013-03-27 16:56 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Marek Marczykowski, Jan Beulich, xen-devel

On 27/03/2013 15:47, Konrad Rzeszutek Wilk wrote:
> On Wed, Mar 27, 2013 at 02:52:14PM +0000, Andrew Cooper wrote:
>> On 27/03/2013 14:46, Andrew Cooper wrote:
>>> On 27/03/2013 14:31, Marek Marczykowski wrote:
>>>> On 27.03.2013 09:52, Jan Beulich wrote:
>>>>>>>> On 26.03.13 at 19:50, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>>>>> So vector e9 doesn't appear to be programmed in anywhere.
>>>>> Quite obviously, as it's the 8259A vector for IRQ 9. The question
>>>>> really is why an IRQ appears on that vector in the first place. The
>>>>> 8259A resume code _should_ leave all IRQs masked on a fully
>>>>> IO-APIC system (see my question raised yesterday).
>>>>>
>>>>> And that's also why I suggested, for an experiment, to fiddle with
>>>>> the loop exit condition to exclude legacy vectors (which wouldn't
>>>>> be a final solution, but would at least tell us whether the direction
>>>>> is the right one). In the end, besides understanding why an
>>>>> interrupt on vector E9 gets raised at all, we may also need to
>>>>> tweak the IRQ migration logic to not do anything on legacy IRQs,
>>>>> but that would need to happen earlier than in
>>>>> smp_irq_move_cleanup_interrupt(). Considering that 4.3
>>>>> apparently doesn't have this problem, we may need to go hunt for
>>>>> a change that isn't directly connected to this, yet deals with the
>>>>> problem as a side effect (at least I don't recall any particular fix
>>>>> since 4.2). One aspect here is the double mapping of legacy IRQs
>>>>> (once to their IO-APIC vector, and once to their legacy vector,
>>>>> i.e. vector_irq[] having two entries pointing to the same IRQ).
>>>> So tried change loop condition to LAST_DYNAMIC_VECTOR and it doesn't hit that
>>>> BUG/ASSERT. But still it doesn't work - only CPU0 used by scheduler, also some
>>>> errors from dom0 kernel, and errors about PCI devices used by domU(1).
>>>>
>>>> Messages from resume (different tries):
>>>> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-last-dynamic-vector.log
>>>> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-last-dynamic-vector2.log
>>>>
>>>> Also one time I've got fatal page fault error, earlier in resume (it isn't
>>>> deterministic):
>>>> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-resume-page-fault.log
>>>>
>>> This pagefault is a Null structure pointer dereference, likely the
>>> scheduling data.  At a first glance, it looks related to the assertion
>>> failures I have been seeing sporadically in testing, but unable to
>>> reproduce reliably.  There seems to be something quite dodgy with
>>> interaction of vcpu_wake and scheduling loops.
>>>
>>> The other logs indicate that dom0 appears to have a domain id of 1,
>>> which is sure to cause problems.
>> Actually - ignore this
>>
>> >From the log,
>>
>> (XEN) physdev.c:153: dom0: can't create irq for msi!
>> [  113.637037] xhci_hcd 0000:03:00.0: xen map irq failed -22 for 32752
>> domain
>> (XEN) physdev.c:153: dom0: can't create irq for msi!
>> [  113.657911] xhci_hcd 0000:03:00.0: xen map irq failed -22 for 32752
>> domain
>>
>> and later
>>
>> (XEN) physdev.c:153: dom1: can't create irq for msi!
>> [  121.909814] pciback 0000:00:19.0: xen map irq failed -22 for 1 domain
>> [  121.954080] error enable msi for guest 1 status ffffffea
>> (XEN) physdev.c:153: dom1: can't create irq for msi!
>> [  122.035355] pciback 0000:00:19.0: xen map irq failed -22 for 1 domain
>> [  122.044421] error enable msi for guest 1 status ffffffea
>>
>> I think that there is a separate bug where mapped irqs are not unmapped
>> on the suspend path.
> You thinking this is a Linux (xen irq machinery) issue? Meaning it should
> end up calling PHYSDEV_unmap_pirq as part of the suspend process?

I am not sure.  Without looking at the code, I am only speculating.

Beyond that, the main question is about the expected behaviour.  Do we
expect dom0/U to unmap its irqs and remap them after resume?  What do we
expect from domains which are unaware of the host sleep action?

~Andrew

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-27 16:56                                         ` Andrew Cooper
@ 2013-03-27 17:15                                           ` Marek Marczykowski
  2013-03-28 17:41                                             ` Andrew Cooper
  0 siblings, 1 reply; 68+ messages in thread
From: Marek Marczykowski @ 2013-03-27 17:15 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Jan Beulich, Konrad Rzeszutek Wilk


[-- Attachment #1.1: Type: text/plain, Size: 4496 bytes --]

On 27.03.2013 17:56, Andrew Cooper wrote:
> On 27/03/2013 15:47, Konrad Rzeszutek Wilk wrote:
>> On Wed, Mar 27, 2013 at 02:52:14PM +0000, Andrew Cooper wrote:
>>> On 27/03/2013 14:46, Andrew Cooper wrote:
>>>> On 27/03/2013 14:31, Marek Marczykowski wrote:
>>>>> On 27.03.2013 09:52, Jan Beulich wrote:
>>>>>>>>> On 26.03.13 at 19:50, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>>>>>> So vector e9 doesn't appear to be programmed in anywhere.
>>>>>> Quite obviously, as it's the 8259A vector for IRQ 9. The question
>>>>>> really is why an IRQ appears on that vector in the first place. The
>>>>>> 8259A resume code _should_ leave all IRQs masked on a fully
>>>>>> IO-APIC system (see my question raised yesterday).
>>>>>>
>>>>>> And that's also why I suggested, for an experiment, to fiddle with
>>>>>> the loop exit condition to exclude legacy vectors (which wouldn't
>>>>>> be a final solution, but would at least tell us whether the direction
>>>>>> is the right one). In the end, besides understanding why an
>>>>>> interrupt on vector E9 gets raised at all, we may also need to
>>>>>> tweak the IRQ migration logic to not do anything on legacy IRQs,
>>>>>> but that would need to happen earlier than in
>>>>>> smp_irq_move_cleanup_interrupt(). Considering that 4.3
>>>>>> apparently doesn't have this problem, we may need to go hunt for
>>>>>> a change that isn't directly connected to this, yet deals with the
>>>>>> problem as a side effect (at least I don't recall any particular fix
>>>>>> since 4.2). One aspect here is the double mapping of legacy IRQs
>>>>>> (once to their IO-APIC vector, and once to their legacy vector,
>>>>>> i.e. vector_irq[] having two entries pointing to the same IRQ).
>>>>> So tried change loop condition to LAST_DYNAMIC_VECTOR and it doesn't hit that
>>>>> BUG/ASSERT. But still it doesn't work - only CPU0 used by scheduler, also some
>>>>> errors from dom0 kernel, and errors about PCI devices used by domU(1).
>>>>>
>>>>> Messages from resume (different tries):
>>>>> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-last-dynamic-vector.log
>>>>> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-last-dynamic-vector2.log
>>>>>
>>>>> Also one time I've got fatal page fault error, earlier in resume (it isn't
>>>>> deterministic):
>>>>> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-resume-page-fault.log
>>>>>
>>>> This pagefault is a Null structure pointer dereference, likely the
>>>> scheduling data.  At a first glance, it looks related to the assertion
>>>> failures I have been seeing sporadically in testing, but unable to
>>>> reproduce reliably.  There seems to be something quite dodgy with
>>>> interaction of vcpu_wake and scheduling loops.
>>>>
>>>> The other logs indicate that dom0 appears to have a domain id of 1,
>>>> which is sure to cause problems.
>>> Actually - ignore this
>>>
>>> >From the log,
>>>
>>> (XEN) physdev.c:153: dom0: can't create irq for msi!
>>> [  113.637037] xhci_hcd 0000:03:00.0: xen map irq failed -22 for 32752
>>> domain
>>> (XEN) physdev.c:153: dom0: can't create irq for msi!
>>> [  113.657911] xhci_hcd 0000:03:00.0: xen map irq failed -22 for 32752
>>> domain
>>>
>>> and later
>>>
>>> (XEN) physdev.c:153: dom1: can't create irq for msi!
>>> [  121.909814] pciback 0000:00:19.0: xen map irq failed -22 for 1 domain
>>> [  121.954080] error enable msi for guest 1 status ffffffea
>>> (XEN) physdev.c:153: dom1: can't create irq for msi!
>>> [  122.035355] pciback 0000:00:19.0: xen map irq failed -22 for 1 domain
>>> [  122.044421] error enable msi for guest 1 status ffffffea
>>>
>>> I think that there is a separate bug where mapped irqs are not unmapped
>>> on the suspend path.
>> You thinking this is a Linux (xen irq machinery) issue? Meaning it should
>> end up calling PHYSDEV_unmap_pirq as part of the suspend process?
> 
> I am not sure.  Without looking at the code, I am only speculating.
> 
> Beyond that, the main question is about the expected behaviour.  Do we
> expect dom0/U to unmap its irqs and remap them after resume?  What do we
> expect from domains which are unaware of the host sleep action?

BTW this is the case: domain 1 isn't fully aware of sleep. It have some PCI
devices assigned. The only action taken there before suspend is shutdown
network interfaces (without this system hanged during suspend).

-- 
Best Regards / Pozdrawiam,
Marek Marczykowski
Invisible Things Lab


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 553 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-27 16:27                                         ` Andrew Cooper
@ 2013-03-27 18:16                                           ` Marek Marczykowski
  2013-03-27 18:56                                             ` Andrew Cooper
  2013-03-28 10:50                                           ` Jan Beulich
  1 sibling, 1 reply; 68+ messages in thread
From: Marek Marczykowski @ 2013-03-27 18:16 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Konrad Rzeszutek Wilk, Jan Beulich, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2348 bytes --]

On 27.03.2013 17:27, Andrew Cooper wrote:
> On 27/03/2013 15:51, Marek Marczykowski wrote:
>> On 27.03.2013 15:49, Marek Marczykowski wrote:
>>> On 27.03.2013 15:46, Andrew Cooper wrote:
>>>> As for locating the cause of the legacy vectors, it might be a good idea
>>>> to stick a printk at the top of do_IRQ() which indicates an interrupt
>>>> with vector between 0xe0 and 0xef.  This might at least indicate whether
>>>> legacy vectors are genuinely being delivered, or whether we have some
>>>> memory corruption causing these effects.
>>> Ok, will try something like this.
>> Nothing interesting here...
>> Only vector 0xf1 for irq 4 and 0xf0 for irq 0 (which match irq dump information).
>>
> 
> Even in the case where we hit the original assertion?

Yes, even then.

> If so, then all I can thing is that the move_pending flag for that
> specific GSI has been corrupted in memory somehow.

I guest this isn't the case, see below.

> I wonder if hexdumping irq_desc[9] after setup, before sleep, on resume
> and in the case of the assertion failure might give some hints.

I've tried something like this. Detailed log here:
http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-suspend-irq9-dump.log

Some interesing parts:
after system startup:
(XEN) irq_cfg of IRQ 9:
(XEN)   vector: 138
(XEN)   move_cleanup_count: 0x0
(XEN)   move_in_progress: 0x0
(XEN) irq_desc of IRQ 9:
(XEN)   status: 80 (IRQ_GUEST | IRQ_PENDING)

Isn't this wrong (status vs move_in_progress)?

Then I've run pm-suspend, intentionally failed at the end to prevent actual
suspend, but run all its hooks. After that:
(XEN) irq_cfg of IRQ 9:
(XEN)   vector: 181
(XEN)   move_cleanup_count: 0x0
(XEN)   move_in_progress: 0x1
(XEN) irq_desc of IRQ 9:
(XEN)   status: 80

So now move_in_progress consistent with status.
Wait few second, and still move_in_progress was 0x1. Isn't it supposed to be
only temporary state?

Then suspended, at resume hit that bug. There was:
(XEN) irq_cfg of IRQ 9:
(XEN)   vector: 60
(XEN)   move_cleanup_count: 0x0
(XEN)   move_in_progress: 0x0
(XEN) irq_desc of IRQ 9:
(XEN)   status: 16

move_in_progress==0, ok. But move_cleanup_count==0, while at least once was
move_in_progress==1. Isn't that wrong?

-- 
Best Regards / Pozdrawiam,
Marek Marczykowski
Invisible Things Lab


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 553 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-27 18:16                                           ` Marek Marczykowski
@ 2013-03-27 18:56                                             ` Andrew Cooper
  2013-03-28 14:43                                               ` Marek Marczykowski
  0 siblings, 1 reply; 68+ messages in thread
From: Andrew Cooper @ 2013-03-27 18:56 UTC (permalink / raw)
  To: Marek Marczykowski; +Cc: Konrad Rzeszutek Wilk, Jan Beulich, xen-devel

On 27/03/2013 18:16, Marek Marczykowski wrote:
> On 27.03.2013 17:27, Andrew Cooper wrote:
>> On 27/03/2013 15:51, Marek Marczykowski wrote:
>>> On 27.03.2013 15:49, Marek Marczykowski wrote:
>>>> On 27.03.2013 15:46, Andrew Cooper wrote:
>>>>> As for locating the cause of the legacy vectors, it might be a good idea
>>>>> to stick a printk at the top of do_IRQ() which indicates an interrupt
>>>>> with vector between 0xe0 and 0xef.  This might at least indicate whether
>>>>> legacy vectors are genuinely being delivered, or whether we have some
>>>>> memory corruption causing these effects.
>>>> Ok, will try something like this.
>>> Nothing interesting here...
>>> Only vector 0xf1 for irq 4 and 0xf0 for irq 0 (which match irq dump information).
>>>
>> Even in the case where we hit the original assertion?
> Yes, even then.
>
>> If so, then all I can thing is that the move_pending flag for that
>> specific GSI has been corrupted in memory somehow.
> I guest this isn't the case, see below.
>
>> I wonder if hexdumping irq_desc[9] after setup, before sleep, on resume
>> and in the case of the assertion failure might give some hints.
> I've tried something like this. Detailed log here:
> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-suspend-irq9-dump.log

This is concerning, unless I am getting utterly confused.  Jan: Do you
mind double checking my reasoning?

irq 0 through 15 should be the PIC irqs, set up in init_IRQ() in
arch/x86/i8259.c

irq9 should be the irq for the PIC vector which is set up as 0xe9, and
its vector should never change.

Could you put in extra checks for the sanity of per_cpu(vector_irq,
cpu)[0xe0 thru 0xef] ?

>
> Some interesing parts:
> after system startup:
> (XEN) irq_cfg of IRQ 9:
> (XEN)   vector: 138
> (XEN)   move_cleanup_count: 0x0
> (XEN)   move_in_progress: 0x0
> (XEN) irq_desc of IRQ 9:
> (XEN)   status: 80 (IRQ_GUEST | IRQ_PENDING)
>
> Isn't this wrong (status vs move_in_progress)?

This here looks fine.  What do you think is wrong about it?

>
> Then I've run pm-suspend, intentionally failed at the end to prevent actual
> suspend, but run all its hooks. After that:
> (XEN) irq_cfg of IRQ 9:
> (XEN)   vector: 181
> (XEN)   move_cleanup_count: 0x0
> (XEN)   move_in_progress: 0x1
> (XEN) irq_desc of IRQ 9:
> (XEN)   status: 80
>
> So now move_in_progress consistent with status.
> Wait few second, and still move_in_progress was 0x1. Isn't it supposed to be
> only temporary state?

move_in_progress gets set by __assign_irq_vector() when the scheduler
decides to move the IRQ.  It can stay set for a long time.

On the next interrupt from this source, the move_in_progress bit being
set causes the IRQ source to be reprogrammed to the new destination.

>
> Then suspended, at resume hit that bug. There was:
> (XEN) irq_cfg of IRQ 9:
> (XEN)   vector: 60
> (XEN)   move_cleanup_count: 0x0
> (XEN)   move_in_progress: 0x0
> (XEN) irq_desc of IRQ 9:
> (XEN)   status: 16
>
> move_in_progress==0, ok. But move_cleanup_count==0, while at least once was
> move_in_progress==1. Isn't that wrong?
>

move_cleanup_count is only set in send_cleanup_vector, for the specific
vector which is being cleaned up.

However, as the IPI handler cleans up all vectors which are outstanding,
the move_cleanup_count can be 0 for most vectors which are actually
cleaned up.

This is in an attempt to reduce the number of IPIs required to clean up
all moving irqs.  As the scheduler currently has a habit of moving vcpus
at every scheduling opportunity, this means that irqs are constantly moving.

~Andrew

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-27 16:27                                         ` Andrew Cooper
  2013-03-27 18:16                                           ` Marek Marczykowski
@ 2013-03-28 10:50                                           ` Jan Beulich
  2013-03-28 11:53                                             ` Andrew Cooper
  1 sibling, 1 reply; 68+ messages in thread
From: Jan Beulich @ 2013-03-28 10:50 UTC (permalink / raw)
  To: Andrew Cooper, Marek Marczykowski; +Cc: Konrad Rzeszutek Wilk, xen-devel

[-- Attachment #1: Type: text/plain, Size: 2771 bytes --]

>>> On 27.03.13 at 17:27, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> On 27/03/2013 15:51, Marek Marczykowski wrote:
>> On 27.03.2013 15:49, Marek Marczykowski wrote:
>>> On 27.03.2013 15:46, Andrew Cooper wrote:
>>>> As for locating the cause of the legacy vectors, it might be a good idea
>>>> to stick a printk at the top of do_IRQ() which indicates an interrupt
>>>> with vector between 0xe0 and 0xef.  This might at least indicate whether
>>>> legacy vectors are genuinely being delivered, or whether we have some
>>>> memory corruption causing these effects.
>>> Ok, will try something like this.
>> Nothing interesting here...
>> Only vector 0xf1 for irq 4 and 0xf0 for irq 0 (which match irq dump 
> information).
>>
> 
> Even in the case where we hit the original assertion?
> 
> If so, then all I can thing is that the move_pending flag for that
> specific GSI has been corrupted in memory somehow.

No, I think the flag is legitimately set after resume, and gets
looked at the after the first SCI got signaled (which would
trigger the pending affinity change to be carried out that was
initiated in the suspend path). The problem is a more
fundamental one: irq_move_cleanup_interrupt() (in unstable
terms) includes the legacy vectors, so if, upon encountering the
move_cleanup_count for IRQ 9 (or any legacy IRQ) execution
doesn't make it all the way through to carrying out the cleanup,
the loop, once in the legacy vector range, will re-encounter the
same IRQ, find move_cleanup_count non-zero again, and thus
tries to do something here.

Hence I think skipping the legacy vector range here is indeed
necessary, even outside the suspend/resume scenario (see
below). Another alternative would be to invalidate the
vector_irq[] entries for legacy vectors handled through the
IO-APIC.

Jan

x86: irq_move_cleanup_interrupt() must ignore legacy vectors

Since the main loop in the function includes legacy vectors, and since
vector_irq[] gets set up for legacy vectors regardless of whether those
get handled through the IO-APIC, it must not do anything on this vector
range. In fact, we should never get here for IRQs not handled through
the IO-APIC, so add a respective warning at once (could probably as
well be an ASSERT()).

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -625,6 +625,12 @@ void irq_move_cleanup_interrupt(struct c
         if ((int)irq < 0)
             continue;
 
+        if ( vector >= FIRST_LEGACY_VECTOR && vector <= LAST_LEGACY_VECTOR )
+        {
+            WARN_ON(!IO_APIC_IRQ(irq));
+            continue;
+        }
+
         desc = irq_to_desc(irq);
         if (!desc)
             continue;



[-- Attachment #2: x86-IRQ-move-cleanup-skip-legacy.patch --]
[-- Type: text/plain, Size: 919 bytes --]

x86: irq_move_cleanup_interrupt() must ignore legacy vectors

Since the main loop in the function includes legacy vectors, and since
vector_irq[] gets set up for legacy vectors regardless of whether those
get handled through the IO-APIC, it must not do anything on this vector
range. In fact, we should never get here for IRQs not handled through
the IO-APIC, so add a respective warning at once (could probably as
well be an ASSERT()).

Signed-off-by: Jan Beulich <jbeulich@suse.com>

--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -625,6 +625,12 @@ void irq_move_cleanup_interrupt(struct c
         if ((int)irq < 0)
             continue;
 
+        if ( vector >= FIRST_LEGACY_VECTOR && vector <= LAST_LEGACY_VECTOR )
+        {
+            WARN_ON(!IO_APIC_IRQ(irq));
+            continue;
+        }
+
         desc = irq_to_desc(irq);
         if (!desc)
             continue;

[-- Attachment #3: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-28 10:50                                           ` Jan Beulich
@ 2013-03-28 11:53                                             ` Andrew Cooper
  2013-03-28 12:54                                               ` Jan Beulich
  0 siblings, 1 reply; 68+ messages in thread
From: Andrew Cooper @ 2013-03-28 11:53 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Konrad Rzeszutek Wilk, Marek Marczykowski, xen-devel

On 28/03/2013 10:50, Jan Beulich wrote:
>>>> On 27.03.13 at 17:27, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> On 27/03/2013 15:51, Marek Marczykowski wrote:
>>> On 27.03.2013 15:49, Marek Marczykowski wrote:
>>>> On 27.03.2013 15:46, Andrew Cooper wrote:
>>>>> As for locating the cause of the legacy vectors, it might be a good idea
>>>>> to stick a printk at the top of do_IRQ() which indicates an interrupt
>>>>> with vector between 0xe0 and 0xef.  This might at least indicate whether
>>>>> legacy vectors are genuinely being delivered, or whether we have some
>>>>> memory corruption causing these effects.
>>>> Ok, will try something like this.
>>> Nothing interesting here...
>>> Only vector 0xf1 for irq 4 and 0xf0 for irq 0 (which match irq dump 
>> information).
>> Even in the case where we hit the original assertion?
>>
>> If so, then all I can thing is that the move_pending flag for that
>> specific GSI has been corrupted in memory somehow.
> No, I think the flag is legitimately set after resume, and gets
> looked at the after the first SCI got signaled (which would
> trigger the pending affinity change to be carried out that was
> initiated in the suspend path). The problem is a more
> fundamental one: irq_move_cleanup_interrupt() (in unstable
> terms) includes the legacy vectors, so if, upon encountering the
> move_cleanup_count for IRQ 9 (or any legacy IRQ) execution
> doesn't make it all the way through to carrying out the cleanup,
> the loop, once in the legacy vector range, will re-encounter the
> same IRQ, find move_cleanup_count non-zero again, and thus
> tries to do something here.
>
> Hence I think skipping the legacy vector range here is indeed
> necessary, even outside the suspend/resume scenario (see
> below). Another alternative would be to invalidate the
> vector_irq[] entries for legacy vectors handled through the
> IO-APIC.
>
> Jan
>
> x86: irq_move_cleanup_interrupt() must ignore legacy vectors
>
> Since the main loop in the function includes legacy vectors, and since
> vector_irq[] gets set up for legacy vectors regardless of whether those
> get handled through the IO-APIC, it must not do anything on this vector
> range. In fact, we should never get here for IRQs not handled through
> the IO-APIC, so add a respective warning at once (could probably as
> well be an ASSERT()).
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Under what circumstances would we have any vectors 0xe0-0xef programmed
into the IOAPIC?  I cant think of any offhand.

As far as I am aware, it is not valid for any PIC interrupts to ever be
up for moving, as they should only be delivered to the BSP.

In addition to the check you have, the scope of the loop should probably
be reduced.  We should never be considering to move any vector larger
than LAST_HIPRIORITY_VECTOR, which I believe are all LAPIC interrupts,
making 8 useless iterations of the loop.  I would also suggest that it
is an ASSERT rather than a WARN, but that leaves us not fixing the bug
at hand, as we have already verified that vector 0xe9 is not programmed
into the IOAPIC.

~Andrew

>
> --- a/xen/arch/x86/irq.c
> +++ b/xen/arch/x86/irq.c
> @@ -625,6 +625,12 @@ void irq_move_cleanup_interrupt(struct c
>          if ((int)irq < 0)
>              continue;
>  
> +        if ( vector >= FIRST_LEGACY_VECTOR && vector <= LAST_LEGACY_VECTOR )
> +        {
> +            WARN_ON(!IO_APIC_IRQ(irq));
> +            continue;
> +        }
> +
>          desc = irq_to_desc(irq);
>          if (!desc)
>              continue;
>
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-28 11:53                                             ` Andrew Cooper
@ 2013-03-28 12:54                                               ` Jan Beulich
  2013-03-28 13:19                                                 ` Jan Beulich
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Beulich @ 2013-03-28 12:54 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Konrad Rzeszutek Wilk, Marek Marczykowski, xen-devel

>>> On 28.03.13 at 12:53, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> On 28/03/2013 10:50, Jan Beulich wrote:
>> x86: irq_move_cleanup_interrupt() must ignore legacy vectors
>>
>> Since the main loop in the function includes legacy vectors, and since
>> vector_irq[] gets set up for legacy vectors regardless of whether those
>> get handled through the IO-APIC, it must not do anything on this vector
>> range. In fact, we should never get here for IRQs not handled through
>> the IO-APIC, so add a respective warning at once (could probably as
>> well be an ASSERT()).
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Under what circumstances would we have any vectors 0xe0-0xef programmed
> into the IOAPIC?  I cant think of any offhand.

Never. And I didn't say it would.

> As far as I am aware, it is not valid for any PIC interrupts to ever be
> up for moving, as they should only be delivered to the BSP.

Hence the WARN_ON() (or ASSERT()).

> In addition to the check you have, the scope of the loop should probably
> be reduced.  We should never be considering to move any vector larger
> than LAST_HIPRIORITY_VECTOR, which I believe are all LAPIC interrupts,
> making 8 useless iterations of the loop.

Agreed. Will update the patch to also do that.

>  I would also suggest that it
> is an ASSERT rather than a WARN, but that leaves us not fixing the bug
> at hand, as we have already verified that vector 0xe9 is not programmed
> into the IOAPIC.

So with you repeating this I think I didn't explain well enough
what I think is happening. Hence I'll try again: We possibly (on at
least one CPU for sure) have two vector_irq[] entries referring to
any particular legacy IRQ - one for the vector that the IO-APIC is
using, and one for the corresponding legacy vector. Hence there'll
be two iterations of the loop here looking at the _same_ IRQ, the
second of which (wrongly) being the one pointed to by the entry in
the legacy vector range. It is this second instance that the change
is suppressing, with the WARN_ON() being there to ascertain that
we indeed never get here for an IRQ handled through the 8259A.

Jan

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-28 12:54                                               ` Jan Beulich
@ 2013-03-28 13:19                                                 ` Jan Beulich
  0 siblings, 0 replies; 68+ messages in thread
From: Jan Beulich @ 2013-03-28 13:19 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Konrad Rzeszutek Wilk, Marek Marczykowski, xen-devel

>>> On 28.03.13 at 13:54, "Jan Beulich" <JBeulich@suse.com> wrote:
>>>> On 28.03.13 at 12:53, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>> On 28/03/2013 10:50, Jan Beulich wrote:
>>> x86: irq_move_cleanup_interrupt() must ignore legacy vectors
>>>
>>> Since the main loop in the function includes legacy vectors, and since
>>> vector_irq[] gets set up for legacy vectors regardless of whether those
>>> get handled through the IO-APIC, it must not do anything on this vector
>>> range. In fact, we should never get here for IRQs not handled through
>>> the IO-APIC, so add a respective warning at once (could probably as
>>> well be an ASSERT()).
>>>
>>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> 
>> Under what circumstances would we have any vectors 0xe0-0xef programmed
>> into the IOAPIC?  I cant think of any offhand.
> 
> Never. And I didn't say it would.
> 
>> As far as I am aware, it is not valid for any PIC interrupts to ever be
>> up for moving, as they should only be delivered to the BSP.
> 
> Hence the WARN_ON() (or ASSERT()).

You know what - now that I actually tried this out, I see that this
triggers. For the moment I'm puzzled, will need to look into this in
more detail.

Jan

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-27 18:56                                             ` Andrew Cooper
@ 2013-03-28 14:43                                               ` Marek Marczykowski
  0 siblings, 0 replies; 68+ messages in thread
From: Marek Marczykowski @ 2013-03-28 14:43 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Konrad Rzeszutek Wilk, Jan Beulich, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 6040 bytes --]

On 27.03.2013 19:56, Andrew Cooper wrote:
> On 27/03/2013 18:16, Marek Marczykowski wrote:
>> On 27.03.2013 17:27, Andrew Cooper wrote:
>>> On 27/03/2013 15:51, Marek Marczykowski wrote:
>>>> On 27.03.2013 15:49, Marek Marczykowski wrote:
>>>>> On 27.03.2013 15:46, Andrew Cooper wrote:
>>>>>> As for locating the cause of the legacy vectors, it might be a good idea
>>>>>> to stick a printk at the top of do_IRQ() which indicates an interrupt
>>>>>> with vector between 0xe0 and 0xef.  This might at least indicate whether
>>>>>> legacy vectors are genuinely being delivered, or whether we have some
>>>>>> memory corruption causing these effects.
>>>>> Ok, will try something like this.
>>>> Nothing interesting here...
>>>> Only vector 0xf1 for irq 4 and 0xf0 for irq 0 (which match irq dump information).
>>>>
>>> Even in the case where we hit the original assertion?
>> Yes, even then.
>>
>>> If so, then all I can thing is that the move_pending flag for that
>>> specific GSI has been corrupted in memory somehow.
>> I guest this isn't the case, see below.
>>
>>> I wonder if hexdumping irq_desc[9] after setup, before sleep, on resume
>>> and in the case of the assertion failure might give some hints.
>> I've tried something like this. Detailed log here:
>> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-suspend-irq9-dump.log
> 
> This is concerning, unless I am getting utterly confused.  Jan: Do you
> mind double checking my reasoning?
> 
> irq 0 through 15 should be the PIC irqs, set up in init_IRQ() in
> arch/x86/i8259.c
> 
> irq9 should be the irq for the PIC vector which is set up as 0xe9, and
> its vector should never change.
> 
> Could you put in extra checks for the sanity of per_cpu(vector_irq,
> cpu)[0xe0 thru 0xef] ?

Ok, got something here:
http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-suspend-irq9-dump2.log

Now bug triggered after some time after resume (about 15s). But only CPU0 by
scheduler immediately after resume. Interesting part - note vector_irq(e1):
(XEN) irq_cfg of IRQ 9:
(XEN)   vector: 188
(XEN)   cpu_mask: 00000000,00000000,00000000,00000001
(XEN)   old_cpu_mask: 00000000,00000000,00000000,00000002
(XEN)   move_cleanup_count: 0x0
(XEN)   used_vectors:
49,64,72,74,80-81,88,98,112,120,144,148,152,156,160,164,168,172,178,188,192,196,200,207-208
(XEN)   move_in_progress: 0x0
(XEN) irq_desc of IRQ 9:
(XEN)   status: 16
(XEN)   handler: ffff82c480252660
(XEN)   msi_desc: 0000000000000000
(XEN)   action: ffff83041d9f1ed0
(XEN)   depth: 0
(XEN)   chip_data: ffff830421080250
(XEN)   irq: 9
(XEN)   affinity: 00000000,00000000,00000000,00000001
(XEN)   pending_mask: 00000000,00000000,00000000,00000000
(XEN)   (...)
(XEN) vector_irq(e0): 0
(XEN) vector_irq(e1): -1
(XEN) vector_irq(e2): 2
(XEN) vector_irq(e3): 3
(XEN) vector_irq(e4): 4
(XEN) vector_irq(e5): 5
(XEN) vector_irq(e6): 6
(XEN) vector_irq(e7): 7
(XEN) vector_irq(e8): 8
(XEN) vector_irq(e9): 9
(XEN) vector_irq(ea): 10
(XEN) vector_irq(eb): 11
(XEN) vector_irq(ec): 12
(XEN) vector_irq(ed): 13
(XEN) vector_irq(ee): 14
(XEN) vector_irq(ef): 15
(XEN) Xen WARN at io_apic.c:639
(XEN) ----[ Xen-4.1.5-rc1  x86_64  debug=y  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82c48015e5fb>] smp_irq_move_cleanup_interrupt+0x246/0x2c6
(XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
(XEN) rax: 0000000000000000   rbx: 00000000000000e1   rcx: 0000000000000000
(XEN) rdx: 0000000000000000   rsi: 000000000000000a   rdi: ffff82c4802592e0
(XEN) rbp: ffff82c48029fda8   rsp: ffff82c48029fd58   r8:  0000000000000004
(XEN) r9:  0000000000000001   r10: 000000000000000f   r11: 0000000000000002
(XEN) r12: ffff830421080050   r13: ffff830421060134   r14: ffff82c48029ff18
(XEN) r15: ffff82c4802dd9e0   cr0: 000000008005003b   cr4: 00000000000026f0
(XEN) cr3: 0000000273d3c000   cr2: ffff88000c360318
(XEN) ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen stack trace from rsp=ffff82c48029fd58:
(XEN)    0000000000000000 000000008029fd70 ffff82c48029ff18 ffff82c4802dd9e0
(XEN)    ffff82c480153f55 ffff830421043260 ffff830421043320 0000006f207ab134
(XEN)    0000006f207c3b14 ffff82c4802dd600 00007d3b7fd60227 ffff82c48014de60
(XEN)    ffff82c4802dd600 0000006f207c3b14 0000006f207ab134 ffff830421043320
(XEN)    ffff82c48029fef0 ffff830421043260 0000ffff0000ffff 0000006f416dab2e
(XEN)    ffff830007ef4060 0000006f1fad2570 0000000000003f40 0000000000000001
(XEN)    0000000000000000 ffff82c4802de200 0000000002048cac 0000002000000000
(XEN)    ffff82c480197940 000000000000e008 0000000000000246 ffff82c48029fe68
(XEN)    000000000000e010 ffff82c48029fef0 ffff82c4801987b7 ffff880402105d30
(XEN)    00000000ca9a4000 ffffffffffffffff aaaaaaaaaaaaaa00 aaaaaaaaaaaaaaaa
(XEN)    0000006f21136437 0000000000000000 0000000000000000 ffffffffffffffff
(XEN)    000004c200000542 0000000000000000 ffff82c48029ff18 ffff82c48029ff18
(XEN)    00000000ffffffff 0000000000000002 ffff82c4802dd600 ffff82c48029ff10
(XEN)    ffff82c4801549ce ffff8300ca9a4000 ffff8300ca666000 ffff82c48029fdc8
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000001
(XEN)    ffff880402105f00 ffff880402105fd8 0000000000000246 0000000000000001
(XEN)    0000000000000000 0000000000000000 0000000000000000 ffffffff810013aa
(XEN)    ffffffff81a2a858 00000000deadbeef 00000000deadbeef 0000010000000000
(XEN)    ffffffff810013aa 000000000000e033 0000000000000246 ffff880402105ee8
(XEN)    000000000000e02b 0000000000000000 0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN)    [<ffff82c48015e5fb>] smp_irq_move_cleanup_interrupt+0x246/0x2c6
(XEN)    [<ffff82c48014de60>] irq_move_cleanup_interrupt+0x30/0x40
(XEN)    [<ffff82c480197940>] lapic_timer_nop+0x0/0x6
(XEN)    [<ffff82c4801549ce>] idle_loop+0x4b/0x59



Ignore rest of comments from my previous mail - I clearly don't understand IRQ
handling code.

-- 
Best Regards / Pozdrawiam,
Marek Marczykowski
Invisible Things Lab


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 553 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-27 14:31                                 ` Marek Marczykowski
  2013-03-27 14:46                                   ` Andrew Cooper
@ 2013-03-28 16:13                                   ` Jan Beulich
  2013-03-28 19:03                                     ` Marek Marczykowski
  2013-03-28 16:25                                   ` Jan Beulich
  2 siblings, 1 reply; 68+ messages in thread
From: Jan Beulich @ 2013-03-28 16:13 UTC (permalink / raw)
  To: Marek Marczykowski; +Cc: Andrew Cooper, Konrad Rzeszutek Wilk, xen-devel

>>> On 27.03.13 at 15:31, Marek Marczykowski <marmarek@invisiblethingslab.com> wrote:
> Also one time I've got fatal page fault error, earlier in resume (it isn't
> deterministic):
> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-resume-page-fault.log 

This is mostly identical to
http://lists.xen.org/archives/html/xen-devel/2013-01/msg02175.html,
and hence I would assume that the patch Ben posted (v4 came
through yesterday) would be fixing this. Care to give this a try?

Jan

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-27 14:31                                 ` Marek Marczykowski
  2013-03-27 14:46                                   ` Andrew Cooper
  2013-03-28 16:13                                   ` Jan Beulich
@ 2013-03-28 16:25                                   ` Jan Beulich
  2013-03-28 16:31                                     ` Marek Marczykowski
  2 siblings, 1 reply; 68+ messages in thread
From: Jan Beulich @ 2013-03-28 16:25 UTC (permalink / raw)
  To: Marek Marczykowski; +Cc: Andrew Cooper, Konrad Rzeszutek Wilk, xen-devel

>>> On 27.03.13 at 15:31, Marek Marczykowski <marmarek@invisiblethingslab.com>
wrote:
> On 27.03.2013 09:52, Jan Beulich wrote:
>>>>> On 26.03.13 at 19:50, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>> So vector e9 doesn't appear to be programmed in anywhere.
>> 
>> Quite obviously, as it's the 8259A vector for IRQ 9. The question
>> really is why an IRQ appears on that vector in the first place. The
>> 8259A resume code _should_ leave all IRQs masked on a fully
>> IO-APIC system (see my question raised yesterday).
>> 
>> And that's also why I suggested, for an experiment, to fiddle with
>> the loop exit condition to exclude legacy vectors (which wouldn't
>> be a final solution, but would at least tell us whether the direction
>> is the right one). In the end, besides understanding why an
>> interrupt on vector E9 gets raised at all, we may also need to
>> tweak the IRQ migration logic to not do anything on legacy IRQs,
>> but that would need to happen earlier than in
>> smp_irq_move_cleanup_interrupt(). Considering that 4.3
>> apparently doesn't have this problem, we may need to go hunt for
>> a change that isn't directly connected to this, yet deals with the
>> problem as a side effect (at least I don't recall any particular fix
>> since 4.2). One aspect here is the double mapping of legacy IRQs
>> (once to their IO-APIC vector, and once to their legacy vector,
>> i.e. vector_irq[] having two entries pointing to the same IRQ).
> 
> So tried change loop condition to LAST_DYNAMIC_VECTOR and it doesn't hit 
> that
> BUG/ASSERT. But still it doesn't work - only CPU0 used by scheduler, also 
> some
> errors from dom0 kernel, and errors about PCI devices used by domU(1).
> 
> Messages from resume (different tries):
> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-last-dynamic-vector.log 
> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-last-dynamic-vector2.log 

Is that a sensible usage scenario at all? I would think that a
prerequisite to host S3 is that all guests get suspended. If you
do that, do you still have these interrupt re-setup problems?

Jan

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-28 16:25                                   ` Jan Beulich
@ 2013-03-28 16:31                                     ` Marek Marczykowski
  2013-03-28 16:52                                       ` Jan Beulich
  0 siblings, 1 reply; 68+ messages in thread
From: Marek Marczykowski @ 2013-03-28 16:31 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Konrad Rzeszutek Wilk, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 2582 bytes --]

On 28.03.2013 17:25, Jan Beulich wrote:
>>>> On 27.03.13 at 15:31, Marek Marczykowski <marmarek@invisiblethingslab.com>
> wrote:
>> On 27.03.2013 09:52, Jan Beulich wrote:
>>>>>> On 26.03.13 at 19:50, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>>> So vector e9 doesn't appear to be programmed in anywhere.
>>>
>>> Quite obviously, as it's the 8259A vector for IRQ 9. The question
>>> really is why an IRQ appears on that vector in the first place. The
>>> 8259A resume code _should_ leave all IRQs masked on a fully
>>> IO-APIC system (see my question raised yesterday).
>>>
>>> And that's also why I suggested, for an experiment, to fiddle with
>>> the loop exit condition to exclude legacy vectors (which wouldn't
>>> be a final solution, but would at least tell us whether the direction
>>> is the right one). In the end, besides understanding why an
>>> interrupt on vector E9 gets raised at all, we may also need to
>>> tweak the IRQ migration logic to not do anything on legacy IRQs,
>>> but that would need to happen earlier than in
>>> smp_irq_move_cleanup_interrupt(). Considering that 4.3
>>> apparently doesn't have this problem, we may need to go hunt for
>>> a change that isn't directly connected to this, yet deals with the
>>> problem as a side effect (at least I don't recall any particular fix
>>> since 4.2). One aspect here is the double mapping of legacy IRQs
>>> (once to their IO-APIC vector, and once to their legacy vector,
>>> i.e. vector_irq[] having two entries pointing to the same IRQ).
>>
>> So tried change loop condition to LAST_DYNAMIC_VECTOR and it doesn't hit 
>> that
>> BUG/ASSERT. But still it doesn't work - only CPU0 used by scheduler, also 
>> some
>> errors from dom0 kernel, and errors about PCI devices used by domU(1).
>>
>> Messages from resume (different tries):
>> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-last-dynamic-vector.log 
>> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-last-dynamic-vector2.log 
> 
> Is that a sensible usage scenario at all? I would think that a
> prerequisite to host S3 is that all guests get suspended. 

What do you mean by "suspended"? I haven't found any sane method to do that
with xl (only some manual xenstore write to control/shutdown). For now I do:
 - shutdown all network adapters in VMs
 - pause all VMs

> If you
> do that, do you still have these interrupt re-setup problems?

Yes, even when no guest is running (which was the case on 4.2)...

-- 
Best Regards / Pozdrawiam,
Marek Marczykowski
Invisible Things Lab


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 553 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-28 16:31                                     ` Marek Marczykowski
@ 2013-03-28 16:52                                       ` Jan Beulich
  2013-03-28 17:09                                         ` Marek Marczykowski
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Beulich @ 2013-03-28 16:52 UTC (permalink / raw)
  To: Marek Marczykowski; +Cc: Andrew Cooper, Konrad Rzeszutek Wilk, xen-devel

>>> On 28.03.13 at 17:31, Marek Marczykowski <marmarek@invisiblethingslab.com>
wrote:
> On 28.03.2013 17:25, Jan Beulich wrote:
>>>>> On 27.03.13 at 15:31, Marek Marczykowski <marmarek@invisiblethingslab.com>
>> wrote:
>>> On 27.03.2013 09:52, Jan Beulich wrote:
>>>>>>> On 26.03.13 at 19:50, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>>>> So vector e9 doesn't appear to be programmed in anywhere.
>>>>
>>>> Quite obviously, as it's the 8259A vector for IRQ 9. The question
>>>> really is why an IRQ appears on that vector in the first place. The
>>>> 8259A resume code _should_ leave all IRQs masked on a fully
>>>> IO-APIC system (see my question raised yesterday).
>>>>
>>>> And that's also why I suggested, for an experiment, to fiddle with
>>>> the loop exit condition to exclude legacy vectors (which wouldn't
>>>> be a final solution, but would at least tell us whether the direction
>>>> is the right one). In the end, besides understanding why an
>>>> interrupt on vector E9 gets raised at all, we may also need to
>>>> tweak the IRQ migration logic to not do anything on legacy IRQs,
>>>> but that would need to happen earlier than in
>>>> smp_irq_move_cleanup_interrupt(). Considering that 4.3
>>>> apparently doesn't have this problem, we may need to go hunt for
>>>> a change that isn't directly connected to this, yet deals with the
>>>> problem as a side effect (at least I don't recall any particular fix
>>>> since 4.2). One aspect here is the double mapping of legacy IRQs
>>>> (once to their IO-APIC vector, and once to their legacy vector,
>>>> i.e. vector_irq[] having two entries pointing to the same IRQ).
>>>
>>> So tried change loop condition to LAST_DYNAMIC_VECTOR and it doesn't hit 
>>> that
>>> BUG/ASSERT. But still it doesn't work - only CPU0 used by scheduler, also 
>>> some
>>> errors from dom0 kernel, and errors about PCI devices used by domU(1).
>>>
>>> Messages from resume (different tries):
>>> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-last-dynamic-vector.log 
>>> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-last-dynamic-vector2.log 
>> 
>> Is that a sensible usage scenario at all? I would think that a
>> prerequisite to host S3 is that all guests get suspended. 
> 
> What do you mean by "suspended"? I haven't found any sane method to do that
> with xl (only some manual xenstore write to control/shutdown). For now I do:
>  - shutdown all network adapters in VMs
>  - pause all VMs

Aren't there "xl save" and "xl restore"? And for HVM guests, I think
there's also a way to do virtual S3.

Jan

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-28 16:52                                       ` Jan Beulich
@ 2013-03-28 17:09                                         ` Marek Marczykowski
  0 siblings, 0 replies; 68+ messages in thread
From: Marek Marczykowski @ 2013-03-28 17:09 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Konrad Rzeszutek Wilk, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 3080 bytes --]

On 28.03.2013 17:52, Jan Beulich wrote:
>>>> On 28.03.13 at 17:31, Marek Marczykowski <marmarek@invisiblethingslab.com>
> wrote:
>> On 28.03.2013 17:25, Jan Beulich wrote:
>>>>>> On 27.03.13 at 15:31, Marek Marczykowski <marmarek@invisiblethingslab.com>
>>> wrote:
>>>> On 27.03.2013 09:52, Jan Beulich wrote:
>>>>>>>> On 26.03.13 at 19:50, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>>>>> So vector e9 doesn't appear to be programmed in anywhere.
>>>>>
>>>>> Quite obviously, as it's the 8259A vector for IRQ 9. The question
>>>>> really is why an IRQ appears on that vector in the first place. The
>>>>> 8259A resume code _should_ leave all IRQs masked on a fully
>>>>> IO-APIC system (see my question raised yesterday).
>>>>>
>>>>> And that's also why I suggested, for an experiment, to fiddle with
>>>>> the loop exit condition to exclude legacy vectors (which wouldn't
>>>>> be a final solution, but would at least tell us whether the direction
>>>>> is the right one). In the end, besides understanding why an
>>>>> interrupt on vector E9 gets raised at all, we may also need to
>>>>> tweak the IRQ migration logic to not do anything on legacy IRQs,
>>>>> but that would need to happen earlier than in
>>>>> smp_irq_move_cleanup_interrupt(). Considering that 4.3
>>>>> apparently doesn't have this problem, we may need to go hunt for
>>>>> a change that isn't directly connected to this, yet deals with the
>>>>> problem as a side effect (at least I don't recall any particular fix
>>>>> since 4.2). One aspect here is the double mapping of legacy IRQs
>>>>> (once to their IO-APIC vector, and once to their legacy vector,
>>>>> i.e. vector_irq[] having two entries pointing to the same IRQ).
>>>>
>>>> So tried change loop condition to LAST_DYNAMIC_VECTOR and it doesn't hit 
>>>> that
>>>> BUG/ASSERT. But still it doesn't work - only CPU0 used by scheduler, also 
>>>> some
>>>> errors from dom0 kernel, and errors about PCI devices used by domU(1).
>>>>
>>>> Messages from resume (different tries):
>>>> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-last-dynamic-vector.log 
>>>> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-last-dynamic-vector2.log 
>>>
>>> Is that a sensible usage scenario at all? I would think that a
>>> prerequisite to host S3 is that all guests get suspended. 
>>
>> What do you mean by "suspended"? I haven't found any sane method to do that
>> with xl (only some manual xenstore write to control/shutdown). For now I do:
>>  - shutdown all network adapters in VMs
>>  - pause all VMs
> 
> Aren't there "xl save" and "xl restore"? And for HVM guests, I think
> there's also a way to do virtual S3.

xl save/restore takes far to much time.

I've tried xenstore-write "suspend" to control/shutdown, then xc_domain_resume
call some time ago, but I had some problems with that (unfortunately don't
remember details...).
This is basically what xl save and restore does, but without actual data dump.

-- 
Best Regards / Pozdrawiam,
Marek Marczykowski
Invisible Things Lab


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 553 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-27 17:15                                           ` Marek Marczykowski
@ 2013-03-28 17:41                                             ` Andrew Cooper
  2013-03-28 17:44                                               ` Marek Marczykowski
  0 siblings, 1 reply; 68+ messages in thread
From: Andrew Cooper @ 2013-03-28 17:41 UTC (permalink / raw)
  To: Marek Marczykowski; +Cc: xen-devel, Jan Beulich, Konrad Rzeszutek Wilk

On 27/03/2013 17:15, Marek Marczykowski wrote:
> On 27.03.2013 17:56, Andrew Cooper wrote:
>> On 27/03/2013 15:47, Konrad Rzeszutek Wilk wrote:
>>> On Wed, Mar 27, 2013 at 02:52:14PM +0000, Andrew Cooper wrote:
>>>> On 27/03/2013 14:46, Andrew Cooper wrote:
>>>>> On 27/03/2013 14:31, Marek Marczykowski wrote:
>>>>>> On 27.03.2013 09:52, Jan Beulich wrote:
>>>>>>>>>> On 26.03.13 at 19:50, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>>>>>>> So vector e9 doesn't appear to be programmed in anywhere.
>>>>>>> Quite obviously, as it's the 8259A vector for IRQ 9. The question
>>>>>>> really is why an IRQ appears on that vector in the first place. The
>>>>>>> 8259A resume code _should_ leave all IRQs masked on a fully
>>>>>>> IO-APIC system (see my question raised yesterday).
>>>>>>>
>>>>>>> And that's also why I suggested, for an experiment, to fiddle with
>>>>>>> the loop exit condition to exclude legacy vectors (which wouldn't
>>>>>>> be a final solution, but would at least tell us whether the direction
>>>>>>> is the right one). In the end, besides understanding why an
>>>>>>> interrupt on vector E9 gets raised at all, we may also need to
>>>>>>> tweak the IRQ migration logic to not do anything on legacy IRQs,
>>>>>>> but that would need to happen earlier than in
>>>>>>> smp_irq_move_cleanup_interrupt(). Considering that 4.3
>>>>>>> apparently doesn't have this problem, we may need to go hunt for
>>>>>>> a change that isn't directly connected to this, yet deals with the
>>>>>>> problem as a side effect (at least I don't recall any particular fix
>>>>>>> since 4.2). One aspect here is the double mapping of legacy IRQs
>>>>>>> (once to their IO-APIC vector, and once to their legacy vector,
>>>>>>> i.e. vector_irq[] having two entries pointing to the same IRQ).
>>>>>> So tried change loop condition to LAST_DYNAMIC_VECTOR and it doesn't hit that
>>>>>> BUG/ASSERT. But still it doesn't work - only CPU0 used by scheduler, also some
>>>>>> errors from dom0 kernel, and errors about PCI devices used by domU(1).
>>>>>>
>>>>>> Messages from resume (different tries):
>>>>>> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-last-dynamic-vector.log
>>>>>> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-last-dynamic-vector2.log
>>>>>>
>>>>>> Also one time I've got fatal page fault error, earlier in resume (it isn't
>>>>>> deterministic):
>>>>>> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-resume-page-fault.log
>>>>>>
>>>>> This pagefault is a Null structure pointer dereference, likely the
>>>>> scheduling data.  At a first glance, it looks related to the assertion
>>>>> failures I have been seeing sporadically in testing, but unable to
>>>>> reproduce reliably.  There seems to be something quite dodgy with
>>>>> interaction of vcpu_wake and scheduling loops.
>>>>>
>>>>> The other logs indicate that dom0 appears to have a domain id of 1,
>>>>> which is sure to cause problems.
>>>> Actually - ignore this
>>>>
>>>> >From the log,
>>>>
>>>> (XEN) physdev.c:153: dom0: can't create irq for msi!
>>>> [  113.637037] xhci_hcd 0000:03:00.0: xen map irq failed -22 for 32752
>>>> domain
>>>> (XEN) physdev.c:153: dom0: can't create irq for msi!
>>>> [  113.657911] xhci_hcd 0000:03:00.0: xen map irq failed -22 for 32752
>>>> domain
>>>>
>>>> and later
>>>>
>>>> (XEN) physdev.c:153: dom1: can't create irq for msi!
>>>> [  121.909814] pciback 0000:00:19.0: xen map irq failed -22 for 1 domain
>>>> [  121.954080] error enable msi for guest 1 status ffffffea
>>>> (XEN) physdev.c:153: dom1: can't create irq for msi!
>>>> [  122.035355] pciback 0000:00:19.0: xen map irq failed -22 for 1 domain
>>>> [  122.044421] error enable msi for guest 1 status ffffffea
>>>>
>>>> I think that there is a separate bug where mapped irqs are not unmapped
>>>> on the suspend path.
>>> You thinking this is a Linux (xen irq machinery) issue? Meaning it should
>>> end up calling PHYSDEV_unmap_pirq as part of the suspend process?
>> I am not sure.  Without looking at the code, I am only speculating.
>>
>> Beyond that, the main question is about the expected behaviour.  Do we
>> expect dom0/U to unmap its irqs and remap them after resume?  What do we
>> expect from domains which are unaware of the host sleep action?
> BTW this is the case: domain 1 isn't fully aware of sleep. It have some PCI
> devices assigned. The only action taken there before suspend is shutdown
> network interfaces (without this system hanged during suspend).
>

What do you mean here by shutting down the network interfaces? Are the
devices being assigned back to dom0?  Ifso, is dom0 assigning them back
to domU before the domU driver tries to set itself up?

~Andrew

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-28 17:41                                             ` Andrew Cooper
@ 2013-03-28 17:44                                               ` Marek Marczykowski
  2013-03-28 17:50                                                 ` Andrew Cooper
  0 siblings, 1 reply; 68+ messages in thread
From: Marek Marczykowski @ 2013-03-28 17:44 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Jan Beulich, Konrad Rzeszutek Wilk


[-- Attachment #1.1: Type: text/plain, Size: 5076 bytes --]

On 28.03.2013 18:41, Andrew Cooper wrote:
> On 27/03/2013 17:15, Marek Marczykowski wrote:
>> On 27.03.2013 17:56, Andrew Cooper wrote:
>>> On 27/03/2013 15:47, Konrad Rzeszutek Wilk wrote:
>>>> On Wed, Mar 27, 2013 at 02:52:14PM +0000, Andrew Cooper wrote:
>>>>> On 27/03/2013 14:46, Andrew Cooper wrote:
>>>>>> On 27/03/2013 14:31, Marek Marczykowski wrote:
>>>>>>> On 27.03.2013 09:52, Jan Beulich wrote:
>>>>>>>>>>> On 26.03.13 at 19:50, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>>>>>>>> So vector e9 doesn't appear to be programmed in anywhere.
>>>>>>>> Quite obviously, as it's the 8259A vector for IRQ 9. The question
>>>>>>>> really is why an IRQ appears on that vector in the first place. The
>>>>>>>> 8259A resume code _should_ leave all IRQs masked on a fully
>>>>>>>> IO-APIC system (see my question raised yesterday).
>>>>>>>>
>>>>>>>> And that's also why I suggested, for an experiment, to fiddle with
>>>>>>>> the loop exit condition to exclude legacy vectors (which wouldn't
>>>>>>>> be a final solution, but would at least tell us whether the direction
>>>>>>>> is the right one). In the end, besides understanding why an
>>>>>>>> interrupt on vector E9 gets raised at all, we may also need to
>>>>>>>> tweak the IRQ migration logic to not do anything on legacy IRQs,
>>>>>>>> but that would need to happen earlier than in
>>>>>>>> smp_irq_move_cleanup_interrupt(). Considering that 4.3
>>>>>>>> apparently doesn't have this problem, we may need to go hunt for
>>>>>>>> a change that isn't directly connected to this, yet deals with the
>>>>>>>> problem as a side effect (at least I don't recall any particular fix
>>>>>>>> since 4.2). One aspect here is the double mapping of legacy IRQs
>>>>>>>> (once to their IO-APIC vector, and once to their legacy vector,
>>>>>>>> i.e. vector_irq[] having two entries pointing to the same IRQ).
>>>>>>> So tried change loop condition to LAST_DYNAMIC_VECTOR and it doesn't hit that
>>>>>>> BUG/ASSERT. But still it doesn't work - only CPU0 used by scheduler, also some
>>>>>>> errors from dom0 kernel, and errors about PCI devices used by domU(1).
>>>>>>>
>>>>>>> Messages from resume (different tries):
>>>>>>> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-last-dynamic-vector.log
>>>>>>> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-last-dynamic-vector2.log
>>>>>>>
>>>>>>> Also one time I've got fatal page fault error, earlier in resume (it isn't
>>>>>>> deterministic):
>>>>>>> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-resume-page-fault.log
>>>>>>>
>>>>>> This pagefault is a Null structure pointer dereference, likely the
>>>>>> scheduling data.  At a first glance, it looks related to the assertion
>>>>>> failures I have been seeing sporadically in testing, but unable to
>>>>>> reproduce reliably.  There seems to be something quite dodgy with
>>>>>> interaction of vcpu_wake and scheduling loops.
>>>>>>
>>>>>> The other logs indicate that dom0 appears to have a domain id of 1,
>>>>>> which is sure to cause problems.
>>>>> Actually - ignore this
>>>>>
>>>>> >From the log,
>>>>>
>>>>> (XEN) physdev.c:153: dom0: can't create irq for msi!
>>>>> [  113.637037] xhci_hcd 0000:03:00.0: xen map irq failed -22 for 32752
>>>>> domain
>>>>> (XEN) physdev.c:153: dom0: can't create irq for msi!
>>>>> [  113.657911] xhci_hcd 0000:03:00.0: xen map irq failed -22 for 32752
>>>>> domain
>>>>>
>>>>> and later
>>>>>
>>>>> (XEN) physdev.c:153: dom1: can't create irq for msi!
>>>>> [  121.909814] pciback 0000:00:19.0: xen map irq failed -22 for 1 domain
>>>>> [  121.954080] error enable msi for guest 1 status ffffffea
>>>>> (XEN) physdev.c:153: dom1: can't create irq for msi!
>>>>> [  122.035355] pciback 0000:00:19.0: xen map irq failed -22 for 1 domain
>>>>> [  122.044421] error enable msi for guest 1 status ffffffea
>>>>>
>>>>> I think that there is a separate bug where mapped irqs are not unmapped
>>>>> on the suspend path.
>>>> You thinking this is a Linux (xen irq machinery) issue? Meaning it should
>>>> end up calling PHYSDEV_unmap_pirq as part of the suspend process?
>>> I am not sure.  Without looking at the code, I am only speculating.
>>>
>>> Beyond that, the main question is about the expected behaviour.  Do we
>>> expect dom0/U to unmap its irqs and remap them after resume?  What do we
>>> expect from domains which are unaware of the host sleep action?
>> BTW this is the case: domain 1 isn't fully aware of sleep. It have some PCI
>> devices assigned. The only action taken there before suspend is shutdown
>> network interfaces (without this system hanged during suspend).
>>
> 
> What do you mean here by shutting down the network interfaces? Are the
> devices being assigned back to dom0?  

No, just simple ip link set eth0 down. Seems to be enough to suspend succeed,
at least on most hardware...

> Ifso, is dom0 assigning them back
> to domU before the domU driver tries to set itself up?

-- 
Best Regards / Pozdrawiam,
Marek Marczykowski
Invisible Things Lab


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 553 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-28 17:44                                               ` Marek Marczykowski
@ 2013-03-28 17:50                                                 ` Andrew Cooper
  2013-03-29  0:26                                                   ` Marek Marczykowski
  0 siblings, 1 reply; 68+ messages in thread
From: Andrew Cooper @ 2013-03-28 17:50 UTC (permalink / raw)
  To: Marek Marczykowski; +Cc: xen-devel, Jan Beulich, Konrad Rzeszutek Wilk

On 28/03/2013 17:44, Marek Marczykowski wrote:
> On 28.03.2013 18:41, Andrew Cooper wrote:
>> On 27/03/2013 17:15, Marek Marczykowski wrote:
>>> On 27.03.2013 17:56, Andrew Cooper wrote:
>>>> On 27/03/2013 15:47, Konrad Rzeszutek Wilk wrote:
>>>>> On Wed, Mar 27, 2013 at 02:52:14PM +0000, Andrew Cooper wrote:
>>>>>> On 27/03/2013 14:46, Andrew Cooper wrote:
>>>>>>> On 27/03/2013 14:31, Marek Marczykowski wrote:
>>>>>>>> On 27.03.2013 09:52, Jan Beulich wrote:
>>>>>>>>>>>> On 26.03.13 at 19:50, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>>>>>>>>> So vector e9 doesn't appear to be programmed in anywhere.
>>>>>>>>> Quite obviously, as it's the 8259A vector for IRQ 9. The question
>>>>>>>>> really is why an IRQ appears on that vector in the first place. The
>>>>>>>>> 8259A resume code _should_ leave all IRQs masked on a fully
>>>>>>>>> IO-APIC system (see my question raised yesterday).
>>>>>>>>>
>>>>>>>>> And that's also why I suggested, for an experiment, to fiddle with
>>>>>>>>> the loop exit condition to exclude legacy vectors (which wouldn't
>>>>>>>>> be a final solution, but would at least tell us whether the direction
>>>>>>>>> is the right one). In the end, besides understanding why an
>>>>>>>>> interrupt on vector E9 gets raised at all, we may also need to
>>>>>>>>> tweak the IRQ migration logic to not do anything on legacy IRQs,
>>>>>>>>> but that would need to happen earlier than in
>>>>>>>>> smp_irq_move_cleanup_interrupt(). Considering that 4.3
>>>>>>>>> apparently doesn't have this problem, we may need to go hunt for
>>>>>>>>> a change that isn't directly connected to this, yet deals with the
>>>>>>>>> problem as a side effect (at least I don't recall any particular fix
>>>>>>>>> since 4.2). One aspect here is the double mapping of legacy IRQs
>>>>>>>>> (once to their IO-APIC vector, and once to their legacy vector,
>>>>>>>>> i.e. vector_irq[] having two entries pointing to the same IRQ).
>>>>>>>> So tried change loop condition to LAST_DYNAMIC_VECTOR and it doesn't hit that
>>>>>>>> BUG/ASSERT. But still it doesn't work - only CPU0 used by scheduler, also some
>>>>>>>> errors from dom0 kernel, and errors about PCI devices used by domU(1).
>>>>>>>>
>>>>>>>> Messages from resume (different tries):
>>>>>>>> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-last-dynamic-vector.log
>>>>>>>> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-last-dynamic-vector2.log
>>>>>>>>
>>>>>>>> Also one time I've got fatal page fault error, earlier in resume (it isn't
>>>>>>>> deterministic):
>>>>>>>> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-resume-page-fault.log
>>>>>>>>
>>>>>>> This pagefault is a Null structure pointer dereference, likely the
>>>>>>> scheduling data.  At a first glance, it looks related to the assertion
>>>>>>> failures I have been seeing sporadically in testing, but unable to
>>>>>>> reproduce reliably.  There seems to be something quite dodgy with
>>>>>>> interaction of vcpu_wake and scheduling loops.
>>>>>>>
>>>>>>> The other logs indicate that dom0 appears to have a domain id of 1,
>>>>>>> which is sure to cause problems.
>>>>>> Actually - ignore this
>>>>>>
>>>>>> >From the log,
>>>>>>
>>>>>> (XEN) physdev.c:153: dom0: can't create irq for msi!
>>>>>> [  113.637037] xhci_hcd 0000:03:00.0: xen map irq failed -22 for 32752
>>>>>> domain
>>>>>> (XEN) physdev.c:153: dom0: can't create irq for msi!
>>>>>> [  113.657911] xhci_hcd 0000:03:00.0: xen map irq failed -22 for 32752
>>>>>> domain
>>>>>>
>>>>>> and later
>>>>>>
>>>>>> (XEN) physdev.c:153: dom1: can't create irq for msi!
>>>>>> [  121.909814] pciback 0000:00:19.0: xen map irq failed -22 for 1 domain
>>>>>> [  121.954080] error enable msi for guest 1 status ffffffea
>>>>>> (XEN) physdev.c:153: dom1: can't create irq for msi!
>>>>>> [  122.035355] pciback 0000:00:19.0: xen map irq failed -22 for 1 domain
>>>>>> [  122.044421] error enable msi for guest 1 status ffffffea
>>>>>>
>>>>>> I think that there is a separate bug where mapped irqs are not unmapped
>>>>>> on the suspend path.
>>>>> You thinking this is a Linux (xen irq machinery) issue? Meaning it should
>>>>> end up calling PHYSDEV_unmap_pirq as part of the suspend process?
>>>> I am not sure.  Without looking at the code, I am only speculating.
>>>>
>>>> Beyond that, the main question is about the expected behaviour.  Do we
>>>> expect dom0/U to unmap its irqs and remap them after resume?  What do we
>>>> expect from domains which are unaware of the host sleep action?
>>> BTW this is the case: domain 1 isn't fully aware of sleep. It have some PCI
>>> devices assigned. The only action taken there before suspend is shutdown
>>> network interfaces (without this system hanged during suspend).
>>>
>> What do you mean here by shutting down the network interfaces? Are the
>> devices being assigned back to dom0?  
> No, just simple ip link set eth0 down. Seems to be enough to suspend succeed,
> at least on most hardware...

In which case repeat map_pirq hypercalls will fail with -EINVAL because
the pirq is already set up.  It is probably worth putting a printk in
map_pirq and unmap_pirq to see exactly what is happening across the
sleep/resume cycle.

~Andrew

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-28 16:13                                   ` Jan Beulich
@ 2013-03-28 19:03                                     ` Marek Marczykowski
  2013-04-01 13:53                                       ` Ben Guthro
  0 siblings, 1 reply; 68+ messages in thread
From: Marek Marczykowski @ 2013-03-28 19:03 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Konrad Rzeszutek Wilk, xen-devel


[-- Attachment #1.1: Type: text/plain, Size: 1537 bytes --]

On 28.03.2013 17:13, Jan Beulich wrote:
>>>> On 27.03.13 at 15:31, Marek Marczykowski <marmarek@invisiblethingslab.com> wrote:
>> Also one time I've got fatal page fault error, earlier in resume (it isn't
>> deterministic):
>> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-resume-page-fault.log 
> 
> This is mostly identical to
> http://lists.xen.org/archives/html/xen-devel/2013-01/msg02175.html,
> and hence I would assume that the patch Ben posted (v4 came
> through yesterday) would be fixing this. Care to give this a try?

With this, together with your previous patch ("x86:
irq_move_cleanup_interrupt() must ignore legacy vectors") I can't hit previous
IRQ setup problem (at least for few tries).

But it still doesn't solve original problem - after suspend system temperature
goes high, apparently only CPU0 is online.
If I pin some domain vCPU to non-0 CPU before suspend, I hit ASSERT() on resume:
(XEN) Finishing wakeup from ACPI S3 state.
(XEN) Enabling non-boot CPUs  ...
(XEN) Suppress EOI broadcast on CPU#1
(XEN) masked ExtINT on CPU#1
(XEN) Suppress EOI broadcast on CPU#2
(XEN) masked ExtINT on CPU#2
(XEN) Suppress EOI broadcast on CPU#3
(XEN) masked ExtINT on CPU#3
(XEN) Restoring affinity for d2v3
(XEN) Assertion '!cpus_empty(cpus) && cpu_isset(cpu, cpus)' failed at
sched_credit.c:481

xl cpupool-list -c:
Name               CPU list
Pool-0             0
xl cpupool-cpu-add Pool-0 1
-> -EBUSY


-- 
Best Regards / Pozdrawiam,
Marek Marczykowski
Invisible Things Lab


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 553 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-28 17:50                                                 ` Andrew Cooper
@ 2013-03-29  0:26                                                   ` Marek Marczykowski
  0 siblings, 0 replies; 68+ messages in thread
From: Marek Marczykowski @ 2013-03-29  0:26 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen-devel, Jan Beulich, Konrad Rzeszutek Wilk


[-- Attachment #1.1: Type: text/plain, Size: 5834 bytes --]

On 28.03.2013 18:50, Andrew Cooper wrote:
> On 28/03/2013 17:44, Marek Marczykowski wrote:
>> On 28.03.2013 18:41, Andrew Cooper wrote:
>>> On 27/03/2013 17:15, Marek Marczykowski wrote:
>>>> On 27.03.2013 17:56, Andrew Cooper wrote:
>>>>> On 27/03/2013 15:47, Konrad Rzeszutek Wilk wrote:
>>>>>> On Wed, Mar 27, 2013 at 02:52:14PM +0000, Andrew Cooper wrote:
>>>>>>> On 27/03/2013 14:46, Andrew Cooper wrote:
>>>>>>>> On 27/03/2013 14:31, Marek Marczykowski wrote:
>>>>>>>>> On 27.03.2013 09:52, Jan Beulich wrote:
>>>>>>>>>>>>> On 26.03.13 at 19:50, Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>>>>>>>>>>> So vector e9 doesn't appear to be programmed in anywhere.
>>>>>>>>>> Quite obviously, as it's the 8259A vector for IRQ 9. The question
>>>>>>>>>> really is why an IRQ appears on that vector in the first place. The
>>>>>>>>>> 8259A resume code _should_ leave all IRQs masked on a fully
>>>>>>>>>> IO-APIC system (see my question raised yesterday).
>>>>>>>>>>
>>>>>>>>>> And that's also why I suggested, for an experiment, to fiddle with
>>>>>>>>>> the loop exit condition to exclude legacy vectors (which wouldn't
>>>>>>>>>> be a final solution, but would at least tell us whether the direction
>>>>>>>>>> is the right one). In the end, besides understanding why an
>>>>>>>>>> interrupt on vector E9 gets raised at all, we may also need to
>>>>>>>>>> tweak the IRQ migration logic to not do anything on legacy IRQs,
>>>>>>>>>> but that would need to happen earlier than in
>>>>>>>>>> smp_irq_move_cleanup_interrupt(). Considering that 4.3
>>>>>>>>>> apparently doesn't have this problem, we may need to go hunt for
>>>>>>>>>> a change that isn't directly connected to this, yet deals with the
>>>>>>>>>> problem as a side effect (at least I don't recall any particular fix
>>>>>>>>>> since 4.2). One aspect here is the double mapping of legacy IRQs
>>>>>>>>>> (once to their IO-APIC vector, and once to their legacy vector,
>>>>>>>>>> i.e. vector_irq[] having two entries pointing to the same IRQ).
>>>>>>>>> So tried change loop condition to LAST_DYNAMIC_VECTOR and it doesn't hit that
>>>>>>>>> BUG/ASSERT. But still it doesn't work - only CPU0 used by scheduler, also some
>>>>>>>>> errors from dom0 kernel, and errors about PCI devices used by domU(1).
>>>>>>>>>
>>>>>>>>> Messages from resume (different tries):
>>>>>>>>> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-last-dynamic-vector.log
>>>>>>>>> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-last-dynamic-vector2.log
>>>>>>>>>
>>>>>>>>> Also one time I've got fatal page fault error, earlier in resume (it isn't
>>>>>>>>> deterministic):
>>>>>>>>> http://duch.mimuw.edu.pl/~marmarek/qubes/xen-4.1-resume-page-fault.log
>>>>>>>>>
>>>>>>>> This pagefault is a Null structure pointer dereference, likely the
>>>>>>>> scheduling data.  At a first glance, it looks related to the assertion
>>>>>>>> failures I have been seeing sporadically in testing, but unable to
>>>>>>>> reproduce reliably.  There seems to be something quite dodgy with
>>>>>>>> interaction of vcpu_wake and scheduling loops.
>>>>>>>>
>>>>>>>> The other logs indicate that dom0 appears to have a domain id of 1,
>>>>>>>> which is sure to cause problems.
>>>>>>> Actually - ignore this
>>>>>>>
>>>>>>> >From the log,
>>>>>>>
>>>>>>> (XEN) physdev.c:153: dom0: can't create irq for msi!
>>>>>>> [  113.637037] xhci_hcd 0000:03:00.0: xen map irq failed -22 for 32752
>>>>>>> domain
>>>>>>> (XEN) physdev.c:153: dom0: can't create irq for msi!
>>>>>>> [  113.657911] xhci_hcd 0000:03:00.0: xen map irq failed -22 for 32752
>>>>>>> domain
>>>>>>>
>>>>>>> and later
>>>>>>>
>>>>>>> (XEN) physdev.c:153: dom1: can't create irq for msi!
>>>>>>> [  121.909814] pciback 0000:00:19.0: xen map irq failed -22 for 1 domain
>>>>>>> [  121.954080] error enable msi for guest 1 status ffffffea
>>>>>>> (XEN) physdev.c:153: dom1: can't create irq for msi!
>>>>>>> [  122.035355] pciback 0000:00:19.0: xen map irq failed -22 for 1 domain
>>>>>>> [  122.044421] error enable msi for guest 1 status ffffffea
>>>>>>>
>>>>>>> I think that there is a separate bug where mapped irqs are not unmapped
>>>>>>> on the suspend path.
>>>>>> You thinking this is a Linux (xen irq machinery) issue? Meaning it should
>>>>>> end up calling PHYSDEV_unmap_pirq as part of the suspend process?
>>>>> I am not sure.  Without looking at the code, I am only speculating.
>>>>>
>>>>> Beyond that, the main question is about the expected behaviour.  Do we
>>>>> expect dom0/U to unmap its irqs and remap them after resume?  What do we
>>>>> expect from domains which are unaware of the host sleep action?
>>>> BTW this is the case: domain 1 isn't fully aware of sleep. It have some PCI
>>>> devices assigned. The only action taken there before suspend is shutdown
>>>> network interfaces (without this system hanged during suspend).
>>>>
>>> What do you mean here by shutting down the network interfaces? Are the
>>> devices being assigned back to dom0?  
>> No, just simple ip link set eth0 down. Seems to be enough to suspend succeed,
>> at least on most hardware...
> 
> In which case repeat map_pirq hypercalls will fail with -EINVAL because
> the pirq is already set up.  It is probably worth putting a printk in
> map_pirq and unmap_pirq to see exactly what is happening across the
> sleep/resume cycle.

No unmap/map is done during sleep/resume cycle regarding that domain (have two
mapped pirqs). Even for dom0 I see only one unmap/map during suspend/resume.
For most devices this doesn't break anything. Few exceptions needs module
reload after resume (e.g. sky2), but not sure about the reason (no additional
logs, simply no link detected).

-- 
Best Regards / Pozdrawiam,
Marek Marczykowski
Invisible Things Lab


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 553 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-03-28 19:03                                     ` Marek Marczykowski
@ 2013-04-01 13:53                                       ` Ben Guthro
  2013-04-02  1:13                                         ` Marek Marczykowski
  0 siblings, 1 reply; 68+ messages in thread
From: Ben Guthro @ 2013-04-01 13:53 UTC (permalink / raw)
  To: Marek Marczykowski
  Cc: Andrew Cooper, xen-devel, Jan Beulich, Konrad Rzeszutek Wilk

On Thu, Mar 28, 2013 at 3:03 PM, Marek Marczykowski
<marmarek@invisiblethingslab.com> wrote:
> (XEN) Restoring affinity for d2v3
> (XEN) Assertion '!cpus_empty(cpus) && cpu_isset(cpu, cpus)' failed at
> sched_credit.c:481


I think the "fix-suspend-scheduler-*" patches posted here are applicable here:
http://markmail.org/message/llj3oyhgjzvw3t23


Specifically, I think you need this bit:

diff --git a/xen/common/cpu.c b/xen/common/cpu.c
index 630881e..e20868c 100644
--- a/xen/common/cpu.c
+++ b/xen/common/cpu.c
@@ -5,6 +5,7 @@
 #include <xen/init.h>
 #include <xen/sched.h>
 #include <xen/stop_machine.h>
+#include <xen/sched-if.h>

 unsigned int __read_mostly nr_cpu_ids = NR_CPUS;
 #ifndef nr_cpumask_bits
@@ -212,6 +213,8 @@ void enable_nonboot_cpus(void)
             BUG_ON(error == -EBUSY);
             printk("Error taking CPU%d up: %d\n", cpu, error);
         }
+        if (system_state == SYS_STATE_resume)
+            cpumask_set_cpu(cpu, cpupool0->cpu_valid);
     }

     cpumask_clear(&frozen_cpus);

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-04-01 13:53                                       ` Ben Guthro
@ 2013-04-02  1:13                                         ` Marek Marczykowski
  2013-04-02 14:05                                           ` Konrad Rzeszutek Wilk
  2013-04-15 22:09                                           ` Marek Marczykowski
  0 siblings, 2 replies; 68+ messages in thread
From: Marek Marczykowski @ 2013-04-02  1:13 UTC (permalink / raw)
  To: Ben Guthro; +Cc: Andrew Cooper, xen-devel, Jan Beulich, Konrad Rzeszutek Wilk


[-- Attachment #1.1: Type: text/plain, Size: 3381 bytes --]

On 01.04.2013 15:53, Ben Guthro wrote:
> On Thu, Mar 28, 2013 at 3:03 PM, Marek Marczykowski
> <marmarek@invisiblethingslab.com> wrote:
>> (XEN) Restoring affinity for d2v3
>> (XEN) Assertion '!cpus_empty(cpus) && cpu_isset(cpu, cpus)' failed at
>> sched_credit.c:481
> 
> 
> I think the "fix-suspend-scheduler-*" patches posted here are applicable here:
> http://markmail.org/message/llj3oyhgjzvw3t23
> 
> 
> Specifically, I think you need this bit:
> 
> diff --git a/xen/common/cpu.c b/xen/common/cpu.c
> index 630881e..e20868c 100644
> --- a/xen/common/cpu.c
> +++ b/xen/common/cpu.c
> @@ -5,6 +5,7 @@
>  #include <xen/init.h>
>  #include <xen/sched.h>
>  #include <xen/stop_machine.h>
> +#include <xen/sched-if.h>
> 
>  unsigned int __read_mostly nr_cpu_ids = NR_CPUS;
>  #ifndef nr_cpumask_bits
> @@ -212,6 +213,8 @@ void enable_nonboot_cpus(void)
>              BUG_ON(error == -EBUSY);
>              printk("Error taking CPU%d up: %d\n", cpu, error);
>          }
> +        if (system_state == SYS_STATE_resume)
> +            cpumask_set_cpu(cpu, cpupool0->cpu_valid);
>      }
> 
>      cpumask_clear(&frozen_cpus);
> 

Indeed, this makes things better, but still not ideal.
Now after resume all CPUs are in Pool-0, which is good. But CPU0 is much more
preferred than others (xl vcpu-list). For example if I start 4 busy loops in
dom0, I got (even after some time):
[user@dom0 ~]$ xl vcpu-list
Name                                ID  VCPU   CPU State   Time(s) CPU Affinity
dom0                                 0     0    0   r--      98.5  any cpu
dom0                                 0     1    0   ---     181.3  any cpu
dom0                                 0     2    2   r--     262.4  any cpu
dom0                                 0     3    3   r--     230.8  any cpu
netvm                                1     0    0   -b-      18.4  any cpu
netvm                                1     1    0   -b-       9.1  any cpu
netvm                                1     2    0   -b-       7.1  any cpu
netvm                                1     3    0   -b-       5.4  any cpu
firewallvm                           2     0    0   -b-      10.7  any cpu
firewallvm                           2     1    0   -b-       3.0  any cpu
firewallvm                           2     2    0   -b-       2.5  any cpu
firewallvm                           2     3    3   -b-       3.6  any cpu

If I remove some CPU from Pool-0 and re-add it, things back to normal for this
particular CPU (so I got two equally used CPUs) - to fully restore system I
must remove all but CPU0 from Pool-0 and add it again.

Also still only CPU0 have all C-states (C0-C3), all others have only C0-C1.
This probably could be fixed by your "xen: Re-upload processor PM data to
hypervisor after S3 resume" patch (reload of xen-acpi-processor module helps
here). But I don't think it is a right way. It isn't necessary on other
systems (with somehow older hardware). It must be something missing on resume
path. The question is what...

Perhaps someone need to go through enable_nonboot_cpus() (__cpu_up?) and check
if it restore all things disabled in disable_nonboot_cpus() (__cpu_disable?).
Unfortunately I don't know x86 details so good to follow that code...

-- 
Best Regards / Pozdrawiam,
Marek Marczykowski
Invisible Things Lab


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 553 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-04-02  1:13                                         ` Marek Marczykowski
@ 2013-04-02 14:05                                           ` Konrad Rzeszutek Wilk
  2013-04-15 22:09                                           ` Marek Marczykowski
  1 sibling, 0 replies; 68+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-04-02 14:05 UTC (permalink / raw)
  To: Marek Marczykowski; +Cc: Andrew Cooper, Ben Guthro, Jan Beulich, xen-devel

On Tue, Apr 02, 2013 at 03:13:56AM +0200, Marek Marczykowski wrote:
> On 01.04.2013 15:53, Ben Guthro wrote:
> > On Thu, Mar 28, 2013 at 3:03 PM, Marek Marczykowski
> > <marmarek@invisiblethingslab.com> wrote:
> >> (XEN) Restoring affinity for d2v3
> >> (XEN) Assertion '!cpus_empty(cpus) && cpu_isset(cpu, cpus)' failed at
> >> sched_credit.c:481
> > 
> > 
> > I think the "fix-suspend-scheduler-*" patches posted here are applicable here:
> > http://markmail.org/message/llj3oyhgjzvw3t23
> > 
> > 
> > Specifically, I think you need this bit:
> > 
> > diff --git a/xen/common/cpu.c b/xen/common/cpu.c
> > index 630881e..e20868c 100644
> > --- a/xen/common/cpu.c
> > +++ b/xen/common/cpu.c
> > @@ -5,6 +5,7 @@
> >  #include <xen/init.h>
> >  #include <xen/sched.h>
> >  #include <xen/stop_machine.h>
> > +#include <xen/sched-if.h>
> > 
> >  unsigned int __read_mostly nr_cpu_ids = NR_CPUS;
> >  #ifndef nr_cpumask_bits
> > @@ -212,6 +213,8 @@ void enable_nonboot_cpus(void)
> >              BUG_ON(error == -EBUSY);
> >              printk("Error taking CPU%d up: %d\n", cpu, error);
> >          }
> > +        if (system_state == SYS_STATE_resume)
> > +            cpumask_set_cpu(cpu, cpupool0->cpu_valid);
> >      }
> > 
> >      cpumask_clear(&frozen_cpus);
> > 
> 
> Indeed, this makes things better, but still not ideal.
> Now after resume all CPUs are in Pool-0, which is good. But CPU0 is much more
> preferred than others (xl vcpu-list). For example if I start 4 busy loops in
> dom0, I got (even after some time):
> [user@dom0 ~]$ xl vcpu-list
> Name                                ID  VCPU   CPU State   Time(s) CPU Affinity
> dom0                                 0     0    0   r--      98.5  any cpu
> dom0                                 0     1    0   ---     181.3  any cpu
> dom0                                 0     2    2   r--     262.4  any cpu
> dom0                                 0     3    3   r--     230.8  any cpu
> netvm                                1     0    0   -b-      18.4  any cpu
> netvm                                1     1    0   -b-       9.1  any cpu
> netvm                                1     2    0   -b-       7.1  any cpu
> netvm                                1     3    0   -b-       5.4  any cpu
> firewallvm                           2     0    0   -b-      10.7  any cpu
> firewallvm                           2     1    0   -b-       3.0  any cpu
> firewallvm                           2     2    0   -b-       2.5  any cpu
> firewallvm                           2     3    3   -b-       3.6  any cpu
> 
> If I remove some CPU from Pool-0 and re-add it, things back to normal for this
> particular CPU (so I got two equally used CPUs) - to fully restore system I
> must remove all but CPU0 from Pool-0 and add it again.
> 
> Also still only CPU0 have all C-states (C0-C3), all others have only C0-C1.
> This probably could be fixed by your "xen: Re-upload processor PM data to
> hypervisor after S3 resume" patch (reload of xen-acpi-processor module helps
> here). But I don't think it is a right way. It isn't necessary on other
> systems (with somehow older hardware). It must be something missing on resume
> path. The question is what...

The xen-acpi-processor should probably also have the cpu hotplug notification
in it to deal with this - so that you don't need to do the reload.

> 
> Perhaps someone need to go through enable_nonboot_cpus() (__cpu_up?) and check
> if it restore all things disabled in disable_nonboot_cpus() (__cpu_disable?).
> Unfortunately I don't know x86 details so good to follow that code...
> 
> -- 
> Best Regards / Pozdrawiam,
> Marek Marczykowski
> Invisible Things Lab
> 



> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-04-02  1:13                                         ` Marek Marczykowski
  2013-04-02 14:05                                           ` Konrad Rzeszutek Wilk
@ 2013-04-15 22:09                                           ` Marek Marczykowski
  2013-04-15 23:36                                             ` Ben Guthro
  2013-04-16  8:47                                             ` Jan Beulich
  1 sibling, 2 replies; 68+ messages in thread
From: Marek Marczykowski @ 2013-04-15 22:09 UTC (permalink / raw)
  To: Ben Guthro; +Cc: Andrew Cooper, xen-devel, Jan Beulich, Konrad Rzeszutek Wilk


[-- Attachment #1.1: Type: text/plain, Size: 4367 bytes --]

On 02.04.2013 03:13, Marek Marczykowski wrote:
> On 01.04.2013 15:53, Ben Guthro wrote:
>> On Thu, Mar 28, 2013 at 3:03 PM, Marek Marczykowski
>> <marmarek@invisiblethingslab.com> wrote:
>>> (XEN) Restoring affinity for d2v3
>>> (XEN) Assertion '!cpus_empty(cpus) && cpu_isset(cpu, cpus)' failed at
>>> sched_credit.c:481
>>
>>
>> I think the "fix-suspend-scheduler-*" patches posted here are applicable here:
>> http://markmail.org/message/llj3oyhgjzvw3t23
>>
>>
>> Specifically, I think you need this bit:
>>
>> diff --git a/xen/common/cpu.c b/xen/common/cpu.c
>> index 630881e..e20868c 100644
>> --- a/xen/common/cpu.c
>> +++ b/xen/common/cpu.c
>> @@ -5,6 +5,7 @@
>>  #include <xen/init.h>
>>  #include <xen/sched.h>
>>  #include <xen/stop_machine.h>
>> +#include <xen/sched-if.h>
>>
>>  unsigned int __read_mostly nr_cpu_ids = NR_CPUS;
>>  #ifndef nr_cpumask_bits
>> @@ -212,6 +213,8 @@ void enable_nonboot_cpus(void)
>>              BUG_ON(error == -EBUSY);
>>              printk("Error taking CPU%d up: %d\n", cpu, error);
>>          }
>> +        if (system_state == SYS_STATE_resume)
>> +            cpumask_set_cpu(cpu, cpupool0->cpu_valid);
>>      }
>>
>>      cpumask_clear(&frozen_cpus);
>>
> 
> Indeed, this makes things better, but still not ideal.
> Now after resume all CPUs are in Pool-0, which is good. But CPU0 is much more
> preferred than others (xl vcpu-list). For example if I start 4 busy loops in
> dom0, I got (even after some time):
> [user@dom0 ~]$ xl vcpu-list
> Name                                ID  VCPU   CPU State   Time(s) CPU Affinity
> dom0                                 0     0    0   r--      98.5  any cpu
> dom0                                 0     1    0   ---     181.3  any cpu
> dom0                                 0     2    2   r--     262.4  any cpu
> dom0                                 0     3    3   r--     230.8  any cpu
> netvm                                1     0    0   -b-      18.4  any cpu
> netvm                                1     1    0   -b-       9.1  any cpu
> netvm                                1     2    0   -b-       7.1  any cpu
> netvm                                1     3    0   -b-       5.4  any cpu
> firewallvm                           2     0    0   -b-      10.7  any cpu
> firewallvm                           2     1    0   -b-       3.0  any cpu
> firewallvm                           2     2    0   -b-       2.5  any cpu
> firewallvm                           2     3    3   -b-       3.6  any cpu
> 
> If I remove some CPU from Pool-0 and re-add it, things back to normal for this
> particular CPU (so I got two equally used CPUs) - to fully restore system I
> must remove all but CPU0 from Pool-0 and add it again.
> 
> Also still only CPU0 have all C-states (C0-C3), all others have only C0-C1.
> This probably could be fixed by your "xen: Re-upload processor PM data to
> hypervisor after S3 resume" patch (reload of xen-acpi-processor module helps
> here). But I don't think it is a right way. It isn't necessary on other
> systems (with somehow older hardware). It must be something missing on resume
> path. The question is what...
> 
> Perhaps someone need to go through enable_nonboot_cpus() (__cpu_up?) and check
> if it restore all things disabled in disable_nonboot_cpus() (__cpu_disable?).
> Unfortunately I don't know x86 details so good to follow that code...

Summarize ACPI S3 issues:

I. Fixed issues:

1. IRQ problem fixed by "x86: irq_move_cleanup_interrupt() must ignore legacy
vectors" commit
2. Assertion failure on resume with vcpu affinity used, fixes by "x86/S3:
Restore broken vcpu affinity on resume" commit


II. Not (fully) fixed issues:

1. CPU Pool-0 contains only CPU0 after resume - patch quoted above fixes the
issue, but it isn't applied to xen-unstable
2. After resume scheduler chooses (almost) only CPU0 (above quoted listing).
Removing and re-adding all CPUs to Pool-0 solves the problem. Perhaps some
timers are not restarted after resume?
3. ACPI C-states are only present for CPU0 (after resume of course), fixed by
"xen: Re-upload processor PM data to hypervisor after S3" patch by Ben, but it
isn't in upstream linux (nor Konrad's acpi-s3 branches).

-- 
Best Regards / Pozdrawiam,
Marek Marczykowski
Invisible Things Lab


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 553 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-04-15 22:09                                           ` Marek Marczykowski
@ 2013-04-15 23:36                                             ` Ben Guthro
  2013-04-15 23:51                                               ` konrad wilk
  2013-04-16  1:02                                               ` Marek Marczykowski
  2013-04-16  8:47                                             ` Jan Beulich
  1 sibling, 2 replies; 68+ messages in thread
From: Ben Guthro @ 2013-04-15 23:36 UTC (permalink / raw)
  To: Marek Marczykowski
  Cc: Andrew Cooper, xen-devel, Jan Beulich, Konrad Rzeszutek Wilk

On Mon, Apr 15, 2013 at 11:09 PM, Marek Marczykowski
<marmarek@invisiblethingslab.com> wrote:
> On 02.04.2013 03:13, Marek Marczykowski wrote:
>> On 01.04.2013 15:53, Ben Guthro wrote:
>>> On Thu, Mar 28, 2013 at 3:03 PM, Marek Marczykowski
>>> <marmarek@invisiblethingslab.com> wrote:
>>>> (XEN) Restoring affinity for d2v3
>>>> (XEN) Assertion '!cpus_empty(cpus) && cpu_isset(cpu, cpus)' failed at
>>>> sched_credit.c:481
>>>
>>>
>>> I think the "fix-suspend-scheduler-*" patches posted here are applicable here:
>>> http://markmail.org/message/llj3oyhgjzvw3t23
>>>
>>>
>>> Specifically, I think you need this bit:
>>>
>>> diff --git a/xen/common/cpu.c b/xen/common/cpu.c
>>> index 630881e..e20868c 100644
>>> --- a/xen/common/cpu.c
>>> +++ b/xen/common/cpu.c
>>> @@ -5,6 +5,7 @@
>>>  #include <xen/init.h>
>>>  #include <xen/sched.h>
>>>  #include <xen/stop_machine.h>
>>> +#include <xen/sched-if.h>
>>>
>>>  unsigned int __read_mostly nr_cpu_ids = NR_CPUS;
>>>  #ifndef nr_cpumask_bits
>>> @@ -212,6 +213,8 @@ void enable_nonboot_cpus(void)
>>>              BUG_ON(error == -EBUSY);
>>>              printk("Error taking CPU%d up: %d\n", cpu, error);
>>>          }
>>> +        if (system_state == SYS_STATE_resume)
>>> +            cpumask_set_cpu(cpu, cpupool0->cpu_valid);
>>>      }
>>>
>>>      cpumask_clear(&frozen_cpus);
>>>
>>
>> Indeed, this makes things better, but still not ideal.
>> Now after resume all CPUs are in Pool-0, which is good. But CPU0 is much more
>> preferred than others (xl vcpu-list). For example if I start 4 busy loops in
>> dom0, I got (even after some time):
>> [user@dom0 ~]$ xl vcpu-list
>> Name                                ID  VCPU   CPU State   Time(s) CPU Affinity
>> dom0                                 0     0    0   r--      98.5  any cpu
>> dom0                                 0     1    0   ---     181.3  any cpu
>> dom0                                 0     2    2   r--     262.4  any cpu
>> dom0                                 0     3    3   r--     230.8  any cpu
>> netvm                                1     0    0   -b-      18.4  any cpu
>> netvm                                1     1    0   -b-       9.1  any cpu
>> netvm                                1     2    0   -b-       7.1  any cpu
>> netvm                                1     3    0   -b-       5.4  any cpu
>> firewallvm                           2     0    0   -b-      10.7  any cpu
>> firewallvm                           2     1    0   -b-       3.0  any cpu
>> firewallvm                           2     2    0   -b-       2.5  any cpu
>> firewallvm                           2     3    3   -b-       3.6  any cpu
>>
>> If I remove some CPU from Pool-0 and re-add it, things back to normal for this
>> particular CPU (so I got two equally used CPUs) - to fully restore system I
>> must remove all but CPU0 from Pool-0 and add it again.
>>
>> Also still only CPU0 have all C-states (C0-C3), all others have only C0-C1.
>> This probably could be fixed by your "xen: Re-upload processor PM data to
>> hypervisor after S3 resume" patch (reload of xen-acpi-processor module helps
>> here). But I don't think it is a right way. It isn't necessary on other
>> systems (with somehow older hardware). It must be something missing on resume
>> path. The question is what...
>>
>> Perhaps someone need to go through enable_nonboot_cpus() (__cpu_up?) and check
>> if it restore all things disabled in disable_nonboot_cpus() (__cpu_disable?).
>> Unfortunately I don't know x86 details so good to follow that code...
>
> Summarize ACPI S3 issues:
>
> I. Fixed issues:
>
> 1. IRQ problem fixed by "x86: irq_move_cleanup_interrupt() must ignore legacy
> vectors" commit
> 2. Assertion failure on resume with vcpu affinity used, fixes by "x86/S3:
> Restore broken vcpu affinity on resume" commit
>
>
> II. Not (fully) fixed issues:
>
> 1. CPU Pool-0 contains only CPU0 after resume - patch quoted above fixes the
> issue, but it isn't applied to xen-unstable
> 2. After resume scheduler chooses (almost) only CPU0 (above quoted listing).
> Removing and re-adding all CPUs to Pool-0 solves the problem. Perhaps some
> timers are not restarted after resume?

Marek,
Please try the patch from this thread to see if it solves your 2 issues above:
http://markmail.org/thread/35ecqimv7bwq3k6d

This patch was NAK'ed due to cpupool breakage...but in my testing, it
solved both of these problems.

I don't know how to properly solve it in a cpupool compatible way...
but I also haven't put much additional effort into doing so.


> 3. ACPI C-states are only present for CPU0 (after resume of course), fixed by
> "xen: Re-upload processor PM data to hypervisor after S3" patch by Ben, but it
> isn't in upstream linux (nor Konrad's acpi-s3 branches).

I don't recall seeing any ACK / NAK from Konrad on this.

Original post:
https://patchwork.kernel.org/patch/2033981/

Konrad - do you have any thoughts about incorporating this into a
future merge window?

Ben

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-04-15 23:36                                             ` Ben Guthro
@ 2013-04-15 23:51                                               ` konrad wilk
  2013-04-16  0:19                                                 ` Ben Guthro
  2013-04-16  1:02                                               ` Marek Marczykowski
  1 sibling, 1 reply; 68+ messages in thread
From: konrad wilk @ 2013-04-15 23:51 UTC (permalink / raw)
  To: Ben Guthro; +Cc: Andrew Cooper, Marek Marczykowski, Jan Beulich, xen-devel


>> 3. ACPI C-states are only present for CPU0 (after resume of course), fixed by
>> "xen: Re-upload processor PM data to hypervisor after S3" patch by Ben, but it
>> isn't in upstream linux (nor Konrad's acpi-s3 branches).
> I don't recall seeing any ACK / NAK from Konrad on this.
>
> Original post:
> https://patchwork.kernel.org/patch/2033981/
>
> Konrad - do you have any thoughts about incorporating this into a
> future merge window?

Hey Ben,
I seem to have missed it.
I think the patch is missing a change to pr_backup->acpi_id = i, 
otherwise it would resend
the C-states with the same APIC ID. Also the upstream version does 
kfree(pr_backup) at some point.

But more importantly, do you know why it is needed? Is Xen hypervisor 
"loosing" this information because they go offline and then they are 
onlined again?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-04-15 23:51                                               ` konrad wilk
@ 2013-04-16  0:19                                                 ` Ben Guthro
  2013-04-16  0:46                                                   ` Ben Guthro
  0 siblings, 1 reply; 68+ messages in thread
From: Ben Guthro @ 2013-04-16  0:19 UTC (permalink / raw)
  To: konrad wilk; +Cc: Andrew Cooper, Marek Marczykowski, Jan Beulich, xen-devel

On Tue, Apr 16, 2013 at 12:51 AM, konrad wilk <konrad.wilk@oracle.com> wrote:
>
>>> 3. ACPI C-states are only present for CPU0 (after resume of course),
>>> fixed by
>>> "xen: Re-upload processor PM data to hypervisor after S3" patch by Ben,
>>> but it
>>> isn't in upstream linux (nor Konrad's acpi-s3 branches).
>>
>> I don't recall seeing any ACK / NAK from Konrad on this.
>>
>> Original post:
>> https://patchwork.kernel.org/patch/2033981/
>>
>> Konrad - do you have any thoughts about incorporating this into a
>> future merge window?
>
>
> Hey Ben,
> I seem to have missed it.
> I think the patch is missing a change to pr_backup->acpi_id = i, otherwise
> it would resend
> the C-states with the same APIC ID. Also the upstream version does
> kfree(pr_backup) at some point.

Hmm. I'll look into this, and re-submit.

>
> But more importantly, do you know why it is needed? Is Xen hypervisor
> "loosing" this information because they go offline and then they are onlined
> again?

It was a while ago...the first of a number of 4.2 S3 related
performance issues that we chasing reports from users / automated QA
that the end result was "slow performance on S3 in XP"

As it turns out - this didn't fix the performance problem...but it
also didn't seem right.

I'm not sure if it is because the non-boot cpus are offlined...but it
would seem to make logical sense.

Ben

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-04-16  0:19                                                 ` Ben Guthro
@ 2013-04-16  0:46                                                   ` Ben Guthro
  2013-04-16  3:20                                                     ` konrad wilk
  0 siblings, 1 reply; 68+ messages in thread
From: Ben Guthro @ 2013-04-16  0:46 UTC (permalink / raw)
  To: konrad wilk; +Cc: Andrew Cooper, Marek Marczykowski, Jan Beulich, xen-devel

On Tue, Apr 16, 2013 at 1:19 AM, Ben Guthro <ben@guthro.net> wrote:
> On Tue, Apr 16, 2013 at 12:51 AM, konrad wilk <konrad.wilk@oracle.com> wrote:
>>
>>>> 3. ACPI C-states are only present for CPU0 (after resume of course),
>>>> fixed by
>>>> "xen: Re-upload processor PM data to hypervisor after S3" patch by Ben,
>>>> but it
>>>> isn't in upstream linux (nor Konrad's acpi-s3 branches).
>>>
>>> I don't recall seeing any ACK / NAK from Konrad on this.
>>>
>>> Original post:
>>> https://patchwork.kernel.org/patch/2033981/
>>>
>>> Konrad - do you have any thoughts about incorporating this into a
>>> future merge window?
>>
>>
>> Hey Ben,
>> I seem to have missed it.
>> I think the patch is missing a change to pr_backup->acpi_id = i, otherwise
>> it would resend
>> the C-states with the same APIC ID. Also the upstream version does
>> kfree(pr_backup) at some point.
>
> Hmm. I'll look into this, and re-submit.

At the risk of seeming a bit dim, could you elaborate a bit here?
I'm looking at the function again, and perhaps I'm missing something.

Since xen_acpi_processor_resume() was a subset of what was done in
xen_acpi_processor_init() - I trimmed a number of things unused in the
functionality I was using. This included the pr_backup related things
(both alloc & free)

I'm not seeing exactly what you are suggesting I am missing, if I
don't even have a pr_backup. This usually means I overlooked something
embarrassingly obvious. If you would be so kind as to point this out
so I can slap my forehead, I'd appreciate it.

Thanks
Ben



>
>>
>> But more importantly, do you know why it is needed? Is Xen hypervisor
>> "loosing" this information because they go offline and then they are onlined
>> again?
>
> It was a while ago...the first of a number of 4.2 S3 related
> performance issues that we chasing reports from users / automated QA
> that the end result was "slow performance on S3 in XP"
>
> As it turns out - this didn't fix the performance problem...but it
> also didn't seem right.
>
> I'm not sure if it is because the non-boot cpus are offlined...but it
> would seem to make logical sense.
>
> Ben

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-04-15 23:36                                             ` Ben Guthro
  2013-04-15 23:51                                               ` konrad wilk
@ 2013-04-16  1:02                                               ` Marek Marczykowski
  1 sibling, 0 replies; 68+ messages in thread
From: Marek Marczykowski @ 2013-04-16  1:02 UTC (permalink / raw)
  To: Ben Guthro; +Cc: Andrew Cooper, xen-devel, Jan Beulich, Konrad Rzeszutek Wilk


[-- Attachment #1.1: Type: text/plain, Size: 4911 bytes --]

On 16.04.2013 01:36, Ben Guthro wrote:
> On Mon, Apr 15, 2013 at 11:09 PM, Marek Marczykowski
> <marmarek@invisiblethingslab.com> wrote:
>> On 02.04.2013 03:13, Marek Marczykowski wrote:
>>> On 01.04.2013 15:53, Ben Guthro wrote:
>>>> On Thu, Mar 28, 2013 at 3:03 PM, Marek Marczykowski
>>>> <marmarek@invisiblethingslab.com> wrote:
>>>>> (XEN) Restoring affinity for d2v3
>>>>> (XEN) Assertion '!cpus_empty(cpus) && cpu_isset(cpu, cpus)' failed at
>>>>> sched_credit.c:481
>>>>
>>>>
>>>> I think the "fix-suspend-scheduler-*" patches posted here are applicable here:
>>>> http://markmail.org/message/llj3oyhgjzvw3t23
>>>>
>>>>
>>>> Specifically, I think you need this bit:
>>>>
>>>> diff --git a/xen/common/cpu.c b/xen/common/cpu.c
>>>> index 630881e..e20868c 100644
>>>> --- a/xen/common/cpu.c
>>>> +++ b/xen/common/cpu.c
>>>> @@ -5,6 +5,7 @@
>>>>  #include <xen/init.h>
>>>>  #include <xen/sched.h>
>>>>  #include <xen/stop_machine.h>
>>>> +#include <xen/sched-if.h>
>>>>
>>>>  unsigned int __read_mostly nr_cpu_ids = NR_CPUS;
>>>>  #ifndef nr_cpumask_bits
>>>> @@ -212,6 +213,8 @@ void enable_nonboot_cpus(void)
>>>>              BUG_ON(error == -EBUSY);
>>>>              printk("Error taking CPU%d up: %d\n", cpu, error);
>>>>          }
>>>> +        if (system_state == SYS_STATE_resume)
>>>> +            cpumask_set_cpu(cpu, cpupool0->cpu_valid);
>>>>      }
>>>>
>>>>      cpumask_clear(&frozen_cpus);
>>>>
>>>
>>> Indeed, this makes things better, but still not ideal.
>>> Now after resume all CPUs are in Pool-0, which is good. But CPU0 is much more
>>> preferred than others (xl vcpu-list). For example if I start 4 busy loops in
>>> dom0, I got (even after some time):
>>> [user@dom0 ~]$ xl vcpu-list
>>> Name                                ID  VCPU   CPU State   Time(s) CPU Affinity
>>> dom0                                 0     0    0   r--      98.5  any cpu
>>> dom0                                 0     1    0   ---     181.3  any cpu
>>> dom0                                 0     2    2   r--     262.4  any cpu
>>> dom0                                 0     3    3   r--     230.8  any cpu
>>> netvm                                1     0    0   -b-      18.4  any cpu
>>> netvm                                1     1    0   -b-       9.1  any cpu
>>> netvm                                1     2    0   -b-       7.1  any cpu
>>> netvm                                1     3    0   -b-       5.4  any cpu
>>> firewallvm                           2     0    0   -b-      10.7  any cpu
>>> firewallvm                           2     1    0   -b-       3.0  any cpu
>>> firewallvm                           2     2    0   -b-       2.5  any cpu
>>> firewallvm                           2     3    3   -b-       3.6  any cpu
>>>
>>> If I remove some CPU from Pool-0 and re-add it, things back to normal for this
>>> particular CPU (so I got two equally used CPUs) - to fully restore system I
>>> must remove all but CPU0 from Pool-0 and add it again.
>>>
>>> Also still only CPU0 have all C-states (C0-C3), all others have only C0-C1.
>>> This probably could be fixed by your "xen: Re-upload processor PM data to
>>> hypervisor after S3 resume" patch (reload of xen-acpi-processor module helps
>>> here). But I don't think it is a right way. It isn't necessary on other
>>> systems (with somehow older hardware). It must be something missing on resume
>>> path. The question is what...
>>>
>>> Perhaps someone need to go through enable_nonboot_cpus() (__cpu_up?) and check
>>> if it restore all things disabled in disable_nonboot_cpus() (__cpu_disable?).
>>> Unfortunately I don't know x86 details so good to follow that code...
>>
>> Summarize ACPI S3 issues:
>>
>> I. Fixed issues:
>>
>> 1. IRQ problem fixed by "x86: irq_move_cleanup_interrupt() must ignore legacy
>> vectors" commit
>> 2. Assertion failure on resume with vcpu affinity used, fixes by "x86/S3:
>> Restore broken vcpu affinity on resume" commit
>>
>>
>> II. Not (fully) fixed issues:
>>
>> 1. CPU Pool-0 contains only CPU0 after resume - patch quoted above fixes the
>> issue, but it isn't applied to xen-unstable
>> 2. After resume scheduler chooses (almost) only CPU0 (above quoted listing).
>> Removing and re-adding all CPUs to Pool-0 solves the problem. Perhaps some
>> timers are not restarted after resume?
> 
> Marek,
> Please try the patch from this thread to see if it solves your 2 issues above:
> http://markmail.org/thread/35ecqimv7bwq3k6d
> 
> This patch was NAK'ed due to cpupool breakage...but in my testing, it
> solved both of these problems.
> 
> I don't know how to properly solve it in a cpupool compatible way...
> but I also haven't put much additional effort into doing so.

Indeed this makes problem disappear.

-- 
Best Regards / Pozdrawiam,
Marek Marczykowski
Invisible Things Lab


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 553 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-04-16  0:46                                                   ` Ben Guthro
@ 2013-04-16  3:20                                                     ` konrad wilk
  0 siblings, 0 replies; 68+ messages in thread
From: konrad wilk @ 2013-04-16  3:20 UTC (permalink / raw)
  To: Ben Guthro; +Cc: Andrew Cooper, Marek Marczykowski, Jan Beulich, xen-devel


On 4/15/2013 8:46 PM, Ben Guthro wrote:
> On Tue, Apr 16, 2013 at 1:19 AM, Ben Guthro <ben@guthro.net> wrote:
>> On Tue, Apr 16, 2013 at 12:51 AM, konrad wilk <konrad.wilk@oracle.com> wrote:
>>>>> 3. ACPI C-states are only present for CPU0 (after resume of course),
>>>>> fixed by
>>>>> "xen: Re-upload processor PM data to hypervisor after S3" patch by Ben,
>>>>> but it
>>>>> isn't in upstream linux (nor Konrad's acpi-s3 branches).
>>>> I don't recall seeing any ACK / NAK from Konrad on this.
>>>>
>>>> Original post:
>>>> https://patchwork.kernel.org/patch/2033981/
>>>>
>>>> Konrad - do you have any thoughts about incorporating this into a
>>>> future merge window?
>>>
>>> Hey Ben,
>>> I seem to have missed it.
>>> I think the patch is missing a change to pr_backup->acpi_id = i, otherwise
>>> it would resend
>>> the C-states with the same APIC ID. Also the upstream version does
>>> kfree(pr_backup) at some point.
>> Hmm. I'll look into this, and re-submit.
> At the risk of seeming a bit dim, could you elaborate a bit here?
Part of what xen-acpi-processor has to deal with is the 
'dom0_max_vcpus=' case. Which means that when
'acpi_processor_get_performance_info' is called to parse ACPI C-states 
it will limit itself to only the 'online'
CPUs it sees. Meaning that all the other ones (which might be physically 
present) which Linux does not see are skipped.

As such there is this:

545                 if (!pr_backup) {
546                         pr_backup = kzalloc(sizeof(struct 
acpi_processor), GFP_KERNEL);
547                         if (pr_backup)
548                                 memcpy(pr_backup, _pr, sizeof(struct 
acpi_processor));
549                 }

And then later

552         rc = check_acpi_ids(pr_backup);

which walks the ACPI namespace checking whether it has uploaded the 
ACPI-IDs for all the CPUs. If there
are some that are missing (b/c dom0_max_vcpus=X) was used, then it 
uploads the pr_backup with the ACPI ID altered.

What I think you ought to try is just to call check_acpi_ids after the 
for_cpu_online() loop with the pr_backup.

Hm, you could actually make this even easier. Just move this code:

539         for_each_possible_cpu(i) {
540                 struct acpi_processor *_pr;
541                 _pr = per_cpu(processors, i /* APIC ID */);
542                 if (!_pr)
543                         continue;
544
545                 if (!pr_backup) {
546                         pr_backup = kzalloc(sizeof(struct 
acpi_processor), GFP_KERNEL);
547                         if (pr_backup)
548                                 memcpy(pr_backup, _pr, sizeof(struct 
acpi_processor));
549                 }
550                 (void)upload_pm_data(_pr);
551         }
552         rc = check_acpi_ids(pr_backup);

in its own function. Then make both the module loading _and_ the syscore 
resume call said function.
Viola!

Naturally the kfree(pr_backup) and pr_backup = NULL have to be 
eliminated from the module_init function.. and the module_exit needs the 
pr_backup moved past the syscore_unregister.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-04-15 22:09                                           ` Marek Marczykowski
  2013-04-15 23:36                                             ` Ben Guthro
@ 2013-04-16  8:47                                             ` Jan Beulich
  2013-04-16 11:49                                               ` Ben Guthro
  1 sibling, 1 reply; 68+ messages in thread
From: Jan Beulich @ 2013-04-16  8:47 UTC (permalink / raw)
  To: Ben Guthro, Marek Marczykowski
  Cc: Andrew Cooper, Konrad Rzeszutek Wilk, xen-devel

>>> On 16.04.13 at 00:09, Marek Marczykowski <marmarek@invisiblethingslab.com> wrote:
> II. Not (fully) fixed issues:
> 
> 1. CPU Pool-0 contains only CPU0 after resume - patch quoted above fixes the
> issue, but it isn't applied to xen-unstable
> 2. After resume scheduler chooses (almost) only CPU0 (above quoted listing).
> Removing and re-adding all CPUs to Pool-0 solves the problem. Perhaps some
> timers are not restarted after resume?

So I understand there is a patch dealing with this, but I'm not clear
whether that's known to break CPU pools?

> 3. ACPI C-states are only present for CPU0 (after resume of course), fixed by
> "xen: Re-upload processor PM data to hypervisor after S3" patch by Ben, but 
> it isn't in upstream linux (nor Konrad's acpi-s3 branches).

Perhaps this rather ought to be fixed in the hypervisor (to not
forget the respective information; perhaps also for P-states)?
After all that's another case where S3 is different from soft or hard
offlining an individual CPU (in particular we can expect the same
CPU to come back up during resume, whereas namely a hot-
unplugged one could get replaced by a [slightly] different one).

Jan

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-04-16  8:47                                             ` Jan Beulich
@ 2013-04-16 11:49                                               ` Ben Guthro
  2013-04-16 11:57                                                 ` Jan Beulich
  0 siblings, 1 reply; 68+ messages in thread
From: Ben Guthro @ 2013-04-16 11:49 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Konrad Rzeszutek Wilk, Marek Marczykowski, xen-devel

On Tue, Apr 16, 2013 at 4:47 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 16.04.13 at 00:09, Marek Marczykowski <marmarek@invisiblethingslab.com> wrote:
>> II. Not (fully) fixed issues:
>>
>> 1. CPU Pool-0 contains only CPU0 after resume - patch quoted above fixes the
>> issue, but it isn't applied to xen-unstable
>> 2. After resume scheduler chooses (almost) only CPU0 (above quoted listing).
>> Removing and re-adding all CPUs to Pool-0 solves the problem. Perhaps some
>> timers are not restarted after resume?
>
> So I understand there is a patch dealing with this, but I'm not clear
> whether that's known to break CPU pools?

All cpus will end up in cpu pool 0 after S3.
I'm not sure that is "broken" - but it probably isn't ideal either.

IMO - it is better than the alternative state...but Juergen seems to disagree.



>
>> 3. ACPI C-states are only present for CPU0 (after resume of course), fixed by
>> "xen: Re-upload processor PM data to hypervisor after S3" patch by Ben, but
>> it isn't in upstream linux (nor Konrad's acpi-s3 branches).
>
> Perhaps this rather ought to be fixed in the hypervisor (to not
> forget the respective information; perhaps also for P-states)?
> After all that's another case where S3 is different from soft or hard
> offlining an individual CPU (in particular we can expect the same
> CPU to come back up during resume, whereas namely a hot-
> unplugged one could get replaced by a [slightly] different one).
>
> Jan
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-04-16 11:49                                               ` Ben Guthro
@ 2013-04-16 11:57                                                 ` Jan Beulich
  2013-04-16 12:09                                                   ` Ben Guthro
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Beulich @ 2013-04-16 11:57 UTC (permalink / raw)
  To: Ben Guthro
  Cc: Andrew Cooper, Konrad Rzeszutek Wilk, Marek Marczykowski, xen-devel

>>> On 16.04.13 at 13:49, Ben Guthro <ben@guthro.net> wrote:
> On Tue, Apr 16, 2013 at 4:47 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 16.04.13 at 00:09, Marek Marczykowski <marmarek@invisiblethingslab.com> 
> wrote:
>>> II. Not (fully) fixed issues:
>>>
>>> 1. CPU Pool-0 contains only CPU0 after resume - patch quoted above fixes the
>>> issue, but it isn't applied to xen-unstable
>>> 2. After resume scheduler chooses (almost) only CPU0 (above quoted listing).
>>> Removing and re-adding all CPUs to Pool-0 solves the problem. Perhaps some
>>> timers are not restarted after resume?
>>
>> So I understand there is a patch dealing with this, but I'm not clear
>> whether that's known to break CPU pools?
> 
> All cpus will end up in cpu pool 0 after S3.
> I'm not sure that is "broken" - but it probably isn't ideal either.
> 
> IMO - it is better than the alternative state...but Juergen seems to 
> disagree.

But it can't be that difficult to save/restore pool association on top
of said patch?

Jan

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-04-16 11:57                                                 ` Jan Beulich
@ 2013-04-16 12:09                                                   ` Ben Guthro
  2013-04-16 12:51                                                     ` Jan Beulich
  0 siblings, 1 reply; 68+ messages in thread
From: Ben Guthro @ 2013-04-16 12:09 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Andrew Cooper, Konrad Rzeszutek Wilk, Marek Marczykowski, xen-devel

On Tue, Apr 16, 2013 at 7:57 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>> On 16.04.13 at 13:49, Ben Guthro <ben@guthro.net> wrote:
>> On Tue, Apr 16, 2013 at 4:47 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>> On 16.04.13 at 00:09, Marek Marczykowski <marmarek@invisiblethingslab.com>
>> wrote:
>>>> II. Not (fully) fixed issues:
>>>>
>>>> 1. CPU Pool-0 contains only CPU0 after resume - patch quoted above fixes the
>>>> issue, but it isn't applied to xen-unstable
>>>> 2. After resume scheduler chooses (almost) only CPU0 (above quoted listing).
>>>> Removing and re-adding all CPUs to Pool-0 solves the problem. Perhaps some
>>>> timers are not restarted after resume?
>>>
>>> So I understand there is a patch dealing with this, but I'm not clear
>>> whether that's known to break CPU pools?
>>
>> All cpus will end up in cpu pool 0 after S3.
>> I'm not sure that is "broken" - but it probably isn't ideal either.
>>
>> IMO - it is better than the alternative state...but Juergen seems to
>> disagree.
>
> But it can't be that difficult to save/restore pool association on top
> of said patch?

I took a brief look, in the hopes of taking a similar tack as with the
vcpu affinity restoration.
However, it seems to be a slightly more difficult problem.
In the vcpu affinity, there was an existing structure to stash away
the information we needed after resume.

In a pcpu, there is no such associated metadata...the SMP processor id
is just an integer.
So - where would we store the pool information temporarily across the
S3 process?

Ben

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x
  2013-04-16 12:09                                                   ` Ben Guthro
@ 2013-04-16 12:51                                                     ` Jan Beulich
  0 siblings, 0 replies; 68+ messages in thread
From: Jan Beulich @ 2013-04-16 12:51 UTC (permalink / raw)
  To: Ben Guthro
  Cc: Andrew Cooper, Juergen Gross, Konrad Rzeszutek Wilk,
	Marek Marczykowski, xen-devel

>>> On 16.04.13 at 14:09, Ben Guthro <ben@guthro.net> wrote:
> On Tue, Apr 16, 2013 at 7:57 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>> On 16.04.13 at 13:49, Ben Guthro <ben@guthro.net> wrote:
>>> On Tue, Apr 16, 2013 at 4:47 AM, Jan Beulich <JBeulich@suse.com> wrote:
>>>>>>> On 16.04.13 at 00:09, Marek Marczykowski <marmarek@invisiblethingslab.com>
>>> wrote:
>>>>> II. Not (fully) fixed issues:
>>>>>
>>>>> 1. CPU Pool-0 contains only CPU0 after resume - patch quoted above fixes the
>>>>> issue, but it isn't applied to xen-unstable
>>>>> 2. After resume scheduler chooses (almost) only CPU0 (above quoted listing).
>>>>> Removing and re-adding all CPUs to Pool-0 solves the problem. Perhaps some
>>>>> timers are not restarted after resume?
>>>>
>>>> So I understand there is a patch dealing with this, but I'm not clear
>>>> whether that's known to break CPU pools?
>>>
>>> All cpus will end up in cpu pool 0 after S3.
>>> I'm not sure that is "broken" - but it probably isn't ideal either.
>>>
>>> IMO - it is better than the alternative state...but Juergen seems to
>>> disagree.
>>
>> But it can't be that difficult to save/restore pool association on top
>> of said patch?
> 
> I took a brief look, in the hopes of taking a similar tack as with the
> vcpu affinity restoration.
> However, it seems to be a slightly more difficult problem.
> In the vcpu affinity, there was an existing structure to stash away
> the information we needed after resume.
> 
> In a pcpu, there is no such associated metadata...the SMP processor id
> is just an integer.
> So - where would we store the pool information temporarily across the
> S3 process?

Do it the other way around - the CPU pools have a mask of valid
CPUs. You could latch those pre-suspend for each of the pools (e.g.
by again introducing a second mask hanging off the same structure).

(Also adding Juergen to Cc in case he has other thoughts.)

Jan

^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2013-04-16 12:51 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-13 20:50 High CPU temp, suspend problem - xen 4.1.5-pre, linux 3.7.x Marek Marczykowski
2013-03-15  3:00 ` Dario Faggioli
2013-03-15  3:22   ` Marek Marczykowski
2013-03-15 13:02 ` Konrad Rzeszutek Wilk
2013-03-22 15:34   ` Marek Marczykowski
2013-03-22 16:56     ` Konrad Rzeszutek Wilk
2013-03-25 11:36       ` Marek Marczykowski
2013-03-25 14:17         ` Konrad Rzeszutek Wilk
2013-03-25 14:56           ` Marek Marczykowski
2013-03-26 12:17           ` Marek Marczykowski
2013-03-26 13:11             ` Jan Beulich
2013-03-26 13:50               ` Marek Marczykowski
2013-03-26 15:47                 ` Andrew Cooper
2013-03-26 16:12                   ` Andrew Cooper
2013-03-26 16:47                     ` Marek Marczykowski
2013-03-26 16:03                 ` Jan Beulich
2013-03-26 16:45                   ` Marek Marczykowski
2013-03-26 17:02                     ` Andrew Cooper
2013-03-26 17:42                       ` Marek Marczykowski
2013-03-26 17:54                         ` Andrew Cooper
2013-03-26 18:21                           ` Marek Marczykowski
2013-03-26 18:50                             ` Andrew Cooper
2013-03-27  8:50                               ` Marek Marczykowski
2013-03-27  8:58                                 ` Jan Beulich
2013-03-27  8:52                               ` Jan Beulich
2013-03-27  9:03                                 ` Jan Beulich
2013-03-27 14:01                                   ` Marek Marczykowski
2013-03-27 14:31                                 ` Marek Marczykowski
2013-03-27 14:46                                   ` Andrew Cooper
2013-03-27 14:49                                     ` Marek Marczykowski
2013-03-27 15:51                                       ` Marek Marczykowski
2013-03-27 16:27                                         ` Andrew Cooper
2013-03-27 18:16                                           ` Marek Marczykowski
2013-03-27 18:56                                             ` Andrew Cooper
2013-03-28 14:43                                               ` Marek Marczykowski
2013-03-28 10:50                                           ` Jan Beulich
2013-03-28 11:53                                             ` Andrew Cooper
2013-03-28 12:54                                               ` Jan Beulich
2013-03-28 13:19                                                 ` Jan Beulich
2013-03-27 14:52                                     ` Andrew Cooper
2013-03-27 15:47                                       ` Konrad Rzeszutek Wilk
2013-03-27 16:56                                         ` Andrew Cooper
2013-03-27 17:15                                           ` Marek Marczykowski
2013-03-28 17:41                                             ` Andrew Cooper
2013-03-28 17:44                                               ` Marek Marczykowski
2013-03-28 17:50                                                 ` Andrew Cooper
2013-03-29  0:26                                                   ` Marek Marczykowski
2013-03-28 16:13                                   ` Jan Beulich
2013-03-28 19:03                                     ` Marek Marczykowski
2013-04-01 13:53                                       ` Ben Guthro
2013-04-02  1:13                                         ` Marek Marczykowski
2013-04-02 14:05                                           ` Konrad Rzeszutek Wilk
2013-04-15 22:09                                           ` Marek Marczykowski
2013-04-15 23:36                                             ` Ben Guthro
2013-04-15 23:51                                               ` konrad wilk
2013-04-16  0:19                                                 ` Ben Guthro
2013-04-16  0:46                                                   ` Ben Guthro
2013-04-16  3:20                                                     ` konrad wilk
2013-04-16  1:02                                               ` Marek Marczykowski
2013-04-16  8:47                                             ` Jan Beulich
2013-04-16 11:49                                               ` Ben Guthro
2013-04-16 11:57                                                 ` Jan Beulich
2013-04-16 12:09                                                   ` Ben Guthro
2013-04-16 12:51                                                     ` Jan Beulich
2013-03-28 16:25                                   ` Jan Beulich
2013-03-28 16:31                                     ` Marek Marczykowski
2013-03-28 16:52                                       ` Jan Beulich
2013-03-28 17:09                                         ` Marek Marczykowski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.