All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Rafael J. Wysocki" <rjw@rjwysocki.net>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Takashi Iwai <tiwai@suse.de>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	James Wang <jnwang@suse.com>, Borislav Petkov <bpetkov@suse.de>,
	linux-kernel@vger.kernel.org, Pingfan Liu <kernelfans@gmail.com>
Subject: Re: [REGRESSION] Errors at reboot after 722e5f2b1eec
Date: Tue, 11 Sep 2018 12:51:32 +0200	[thread overview]
Message-ID: <2580518.z31CQjiopR@aspire.rjw.lan> (raw)
In-Reply-To: <20180911093324.GB10436@kroah.com>

On Tuesday, September 11, 2018 11:33:24 AM CEST Greg Kroah-Hartman wrote:
> On Tue, Sep 11, 2018 at 10:17:44AM +0200, Takashi Iwai wrote:
> > [ seems like my previous post didn't go out properly; if you have
> >   already received it, please discard this one ]
> 
> Sorry, I got it, it's just in my large queue :(
> 
> > Hi Rafael, Greg,
> > 
> > James Wang reported on SUSE bugzilla that his machine spews many
> > AMD-Vi errors at reboot like:
> > 
> > [  154.907879] systemd-shutdown[1]: Detaching loop devices.
> > [  154.954583] kvm: exiting hardware virtualization
> > [  154.999953] usb 5-2: USB disconnect, device number 2
> > [  155.025278] ohci-pci 0000:00:12.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  155.081360] ohci-pci 0000:00:12.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  155.136778] ohci-pci 0000:00:12.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  155.191772] ohci-pci 0000:00:12.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  155.247055] ohci-pci 0000:00:12.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  155.302614] ohci-pci 0000:00:12.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  155.358996] ohci-pci 0000:00:12.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  155.392155] usb 4-2: new full-speed USB device number 2 using ohci-pci
> > [  155.413752] ohci-pci 0000:00:12.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  155.413762] ohci-pci 0000:00:12.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  155.560307] ohci-pci 0000:00:12.1: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  155.616039] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:12.1 domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  155.667843] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:12.1 domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  155.719497] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:12.1 domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  155.772697] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:12.1 domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  155.823919] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:12.1 domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  155.875490] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:12.1 domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  155.927258] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:12.1 domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  155.979318] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:12.1 domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  156.031813] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:12.1 domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  156.084293] AMD-Vi: Event logged [IO_PAGE_FAULT device=00:12.1 domain=0x0006 address=0x0000000000000080 flags=0x0020]
> > [  156.272157] reboot: Restarting system
> > [  156.290316] reboot: machine restart
> > 
> > And, James bisected and spotted that it's introduced by the commit
> > 722e5f2b1eec ("driver core: Partially revert "driver core: correct
> > device's shutdown order"").  Reverting the commit fixes the problem.

Well, has anyone tried to understand why this is so?

It looks like the probe-time reordering of the devices_kset list worked around
some init-time dependency issue, but we can't reorder devices_kset then as it
breaks parent-child ordering in general.

> > He mentioned about Uncorrectable Machine Check Exception seen at
> > shutdown, too, where it doesn't appear after the revert.  (Though,
> > it's not sure whether it's really relevant.)
> > 
> > The errors are clearly related with the USB device (a KVM device,
> > IIRC), and the errors are not seen if the USB device is disconnected.
> > 
> > We experienced this at first with SLE15 kernel (4.12 with backports),
> > but later the same issue was confirmed on 4.18.y and 4.19-rc2.  Also,
> > it's confirmed that revert works on the upstream kernels, too.
> > 
> > Does this hit your radar?
> 
> Ugh, no, I haven't heard of this before, Rafael?
> 
> So the need for the revert fixes some machines, but others need the
> patch, this isn't going to be fun :(

We need to understand what's going on on the machines that stopped working
and fix them.

Calling devices_kset_move_last() from really_probe() is clearly incorrect
and restoring it would be a mistake IMO.

BTW, there is a series of patches from Pingfan Liu:

https://patchwork.kernel.org/project/linux-pm/list/?series=9535

that may help in principle, so any chance to try them on the affected
systems?

Thanks,
Rafael


  reply	other threads:[~2018-09-11 10:54 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <s5hk1ns1m8n.wl-tiwai@suse.de>
2018-09-11  9:33 ` [REGRESSION] Errors at reboot after 722e5f2b1eec Greg Kroah-Hartman
2018-09-11 10:51   ` Rafael J. Wysocki [this message]
2018-09-11 11:01     ` Borislav Petkov
2018-09-11 11:55       ` Takashi Iwai
2018-09-11 12:26         ` James Wang
     [not found]       ` <441b2b19-1a73-4cde-913d-61a9f8072f08@suse.com>
2018-09-11 12:15         ` Takashi Iwai
2018-09-13 10:03           ` James Wang
2018-09-13 14:13             ` Rafael J. Wysocki
2018-09-14  2:29               ` Pingfan Liu
2018-09-14  6:26                 ` Rafael J. Wysocki
2018-09-14  7:14                   ` Borislav Petkov
2018-09-12  6:41   ` Pingfan Liu
2018-09-12  9:10     ` James Wang
2018-09-12  9:56       ` Rafael J. Wysocki
2018-09-13 15:21         ` James Wang
2018-09-13 20:50           ` Rafael J. Wysocki
2018-09-14  6:23             ` James Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2580518.z31CQjiopR@aspire.rjw.lan \
    --to=rjw@rjwysocki.net \
    --cc=bpetkov@suse.de \
    --cc=gregkh@linuxfoundation.org \
    --cc=jnwang@suse.com \
    --cc=kernelfans@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=tiwai@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.