Kernel 5.3.x, 5.2.2+: VMware player suspend on 64/32 bit guests

* Kernel 5.3.x, 5.2.2+: VMware player suspend on 64/32 bit guests
       [not found]           ` <alpine.DEB.2.21.1908100811160.7324@nanos.tec.linutronix.de>
@ 2019-08-10 11:24             ` Woody Suwalski
  2019-08-12 14:42               ` Woody Suwalski
  0 siblings, 1 reply; 10+ messages in thread
From: Woody Suwalski @ 2019-08-10 11:24 UTC (permalink / raw)
  To: LKML; +Cc: Thomas Gleixner, Rafael J. Wysocki

Moving the thread to LKML, as suggested by Thomas...
>
>> ---------- Forwarded message ---------
>> From: Woody Suwalski <terraluna977@gmail.com>
>> Date: Thu, Aug 1, 2019 at 3:45 PM
>> Subject: Intermittent suspend on 5.3 / 5.2
>> To: Rafael J. Wysocki <rjw@rjwysocki.net>
>>
>>
>> Hi Rafał,
>> I know that you are investigating some issues between these 2 kernels,
>> however I see probably an unrelated problem with suspend on 5.3 and
>> 5.2.4. I think it has creeped in to 5.1.21 as well, but not sure (it is
>> intermittent). So far 4.20.17 works OK, and  I think 5.2.0 works OK.
>> The problem I see is on both 32 and 64 bit VMs, in VMware workstation
>> 15. The VM is trying to suspend when no activity. It leaves out a black
>> box with cursor in top-left position. Upon wakeup from VMware it goes to
>> vmware pre-bios screen, and then expands the black box to the run-size
>> and switches to X.
>> The problem with new kernels is that (I think) the suspend fails - the
>> black box with cursor is there, but seems bigger, and of course is not
>> wake'able (have to reset). In kern.log suspend seems be running OK, and
>> then new dmesg lines kick in, and no obvious culprit.
>> So looking for a free advice .
>> a. You already know what it is
>> b. You may have suggestions as to which upstream patch could be to blame
>> c. I should boot with some debug params (console_off=0, or some other?)
>> and get some real info?
>>
>> BTW. For suspend to work I had to override mem_sleep to [shallow], or
>> maybe later to [s2idle] (the actual VMs are at work, referring from
>> memory...)
>>
>> If you have any ideas, all are welcomed
>> Thanks, Woody

On 8/6/2019 3:18 PM, Woody Suwalski wrote:
> Rafal, the patch (in 5.3-rc3)
>
> Fixes: f850a48a0799 ("ACPI: PM: Allow transitions to D0 to occur in
> special cases")
>
> does not fix the issue - it must be something else...

Sorry for the late response.

There are known issues in 5.3-rc related to power management which 
should be fixed in -rc4.  Please try that one when it is out.

Cheers!

Thomas Gleixner wrote:
> Woody,
>
> On Fri, 9 Aug 2019, Woody Suwalski wrote:
>
> For future things like this, please CC LKML. There is nothing secrit here
> and CC'ing the mailing list allows other people to find this and spare
> themself the whole bisection pain. Asided of that private mail does not
> scale. On the list other people can look at it and give input eventually.
>
>> After bisecting I have found the potential culprit:
>> dfe0cf8b  x86/ioapic: Implement irq_get irqchip_state() callback
>>
>> I am repeating the bisection from start to re-confirm.
>>
>> Reverse-patch on 5.3-rc3 (64bit) is fixing the problem for me.
>> What is unclear - just adding the patch to 5.2.1 does not seem to
>> break it. So there is some more magic involved.
> Of course it does not do anything because 5.2.1 is not having
>
> f4999a2a3a48 ("genirq: Add optional hardware synchronization for shutdown")
>   
>> Thomas, any suggestions?
> What that means is that there is an interrupt shutdown which hits the
> condition where an interrupt _IS_ marked in the IOAPIC as delivered to a
> CPU, but not serviced yet.
>
> Now the question is why it is not serviced. suspend_device_irqs() is
> calling into synchronize_irq(), which is probably the place where that
> it hangs. But that's called with CPUs online and interrupts enabled.
>
>> The reproduce methodology: use VMware player 15, either 32 or 64 bit build.
>> reboot and run "systemctl suspend". The first suspend works OK. The
>> second usually locks on kernels 5.2.2 and up. Maybe try 4 times to
>> confirm good (it is intermittent).
> -ENOVMWAREPLAYER and I'm traveling so I don't have a machine handy to
> install it. So if you can't debug it deeper down, I'm not going to have a
> chance to look at it before the end of next week.
>
> That said, can we please move this to LKML?
>
> Thanks,
>
> 	tglx
>
>
I can add some printk's into synchronize_irq(), however no idea if they 
will be survive in the kmsg log after a next power-reset. I can wait for 
a week :-)

Thanks, Woody

^ permalink raw reply	[flat|nested] 10+ messages in thread