Thanks for all of this! On Thu, Dec 28, 2017 at 12:00:47PM +0100, Thomas Gleixner wrote: > On Wed, 20 Dec 2017, Alexandru Chirvasitu wrote: > > On Wed, Dec 20, 2017 at 11:58:57AM +0800, Dou Liyang wrote: > > > At 12/20/2017 08:31 AM, Thomas Gleixner wrote: > > > > > I had never heard of 'bisect' before this casual mention (you might tell > > > > > I am a bit out of my depth). I've since applied it to Linus' tree between > > > > > > > > > bebc608 Linux 4.14 (good) > > > > > > > > > > and > > > > > > > > > > 4fbd8d1 Linux 4.15-rc1 (bad) > > > > > > > > Is Linus current head 4.15-rc4 bad as well? > > > > > > > [...] > > > > Yes. Exactly the same symptoms on > > > > 1291a0d5 Linux 4.15-rc4 > > > > compiled just now from Linus' tree. > > Ok, lets take a step back. The bisect/kexec attempts led us away from the > initial problem which is the machine locking up after login, right? > Yes; sorry about that.. > Could you try the patch below on top of Linus tree (rc5+)? > > Thanks, > > tglx > > 8<--------------- > --- a/arch/x86/kernel/apic/apic_flat_64.c > +++ b/arch/x86/kernel/apic/apic_flat_64.c > @@ -151,7 +151,7 @@ static struct apic apic_flat __ro_after_ > .apic_id_valid = default_apic_id_valid, > .apic_id_registered = flat_apic_id_registered, > > - .irq_delivery_mode = dest_LowestPrio, > + .irq_delivery_mode = dest_Fixed, > .irq_dest_mode = 1, /* logical */ > > .disable_esr = 0, > --- a/arch/x86/kernel/apic/probe_32.c > +++ b/arch/x86/kernel/apic/probe_32.c > @@ -105,7 +105,7 @@ static struct apic apic_default __ro_aft > .apic_id_valid = default_apic_id_valid, > .apic_id_registered = default_apic_id_registered, > > - .irq_delivery_mode = dest_LowestPrio, > + .irq_delivery_mode = dest_Fixed, > /* logical delivery broadcast to all CPUs: */ > .irq_dest_mode = 1, > > --- a/arch/x86/kernel/apic/x2apic_cluster.c > +++ b/arch/x86/kernel/apic/x2apic_cluster.c > @@ -184,7 +184,7 @@ static struct apic apic_x2apic_cluster _ > .apic_id_valid = x2apic_apic_id_valid, > .apic_id_registered = x2apic_apic_id_registered, > > - .irq_delivery_mode = dest_LowestPrio, > + .irq_delivery_mode = dest_Fixed, > .irq_dest_mode = 1, /* logical */ > > .disable_esr = 0, > I tried both patches that you guys sent in the last couple of messages. I applied them separately to the last 4.15-rc5 kernel I had (the one for which I sent Dou the journalctl output). The diffs are both to that version. Results follow. (1) Dou's patch: ------------------------------------------------------------ x86/vector: Replace the raw_spin_lock() with diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c index 7504491..e5bab02 100644 --- a/arch/x86/kernel/apic/vector.c +++ b/arch/x86/kernel/apic/vector.c @@ -726,6 +726,7 @@ static int apic_set_affinity(struct irq_data *irqd, const struct cpumask *dest, bool force) { struct apic_chip_data *apicd = apic_chip_data(irqd); + unsigned long flags; int err; /* @@ -740,13 +741,13 @@ static int apic_set_affinity(struct irq_data *irqd, (apicd->is_managed || apicd->can_reserve)) return IRQ_SET_MASK_OK; - raw_spin_lock(&vector_lock); + raw_spin_lock_irqsave(&vector_lock, flags); cpumask_and(vector_searchmask, dest, cpu_online_mask); if (irqd_affinity_is_managed(irqd)) err = assign_managed_vector(irqd, vector_searchmask); else err = assign_vector_locked(irqd, vector_searchmask); - raw_spin_unlock(&vector_lock); + raw_spin_unlock_irqrestore(&vector_lock, flags); return err ? err : IRQ_SET_MASK_OK; } ------------------------------------------------------------ With this, I still get the lockup messages after login, but not the freezes! The lockups register in the log, which I am attaching (see below for attachment naming conventions). The computer's still clearly impaired (ethernet won't connect again for instance, and the CPU distress messages happen periodically throughout the tty session), but at least it's logged now. --- (2) Thomas' patch: ------------------------------------------------------------ apic patch from tglx diff --git a/arch/x86/kernel/apic/apic_flat_64.c b/arch/x86/kernel/apic/apic_flat_64.c index aa85690..1f734d4 100644 --- a/arch/x86/kernel/apic/apic_flat_64.c +++ b/arch/x86/kernel/apic/apic_flat_64.c @@ -151,7 +151,7 @@ static struct apic apic_flat __ro_after_init = { .apic_id_valid = default_apic_id_valid, .apic_id_registered = flat_apic_id_registered, - .irq_delivery_mode = dest_LowestPrio, + .irq_delivery_mode = dest_Fixed, .irq_dest_mode = 1, /* logical */ .disable_esr = 0, diff --git a/arch/x86/kernel/apic/probe_32.c b/arch/x86/kernel/apic/probe_32.c index fa22017..765cded 100644 --- a/arch/x86/kernel/apic/probe_32.c +++ b/arch/x86/kernel/apic/probe_32.c @@ -105,7 +105,7 @@ static struct apic apic_default __ro_after_init = { .apic_id_valid = default_apic_id_valid, .apic_id_registered = default_apic_id_registered, - .irq_delivery_mode = dest_LowestPrio, + .irq_delivery_mode = dest_Fixed, /* logical delivery broadcast to all CPUs: */ .irq_dest_mode = 1, diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c index 622f13c..39568bd 100644 --- a/arch/x86/kernel/apic/x2apic_cluster.c +++ b/arch/x86/kernel/apic/x2apic_cluster.c @@ -184,7 +184,7 @@ static struct apic apic_x2apic_cluster __ro_after_init = { .apic_id_valid = x2apic_apic_id_valid, .apic_id_registered = x2apic_apic_id_registered, - .irq_delivery_mode = dest_LowestPrio, + .irq_delivery_mode = dest_Fixed, .irq_dest_mode = 1, /* logical */ .disable_esr = 0, ------------------------------------------------------------ This gives me the same disabling lockups as before, i.e. I have to reboot. Correspondingly, the log I'm attaching for this kernel won't be of much use, because it's called with `journalctl --boot=-1` after the fact. Might still be of some use.. --- The log files I'm attaching comply with the following naming pattern: 'dou' means the log comes from the kernel patched with Dou's patch; 'thms' refers to Thomas' patch; 'jrnl' means it came from journalctl with various boot=? options; 'dmesg' means it came from calling dmesg; 'noparams' means the kernel was called with no additional parameters; 'debug' means it was called with 'apic=debug'. --- P.S. It was very considerate to send the attachment Dou, but that shouldn't be necessary anymore; the issues I had with 'git apply' were blank-space-related, and I've managed to resolve them. So anything copy-pastable directly in the message body should do if I try more patches. Thank you!