Linux-PCI Archive on lore.kernel.org
 help / color / Atom feed
From: Andreas Hartmann <andihartmann@freenet.de>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
	linux-pci <linux-pci@vger.kernel.org>
Subject: Re: Hard and silent lock up since linux 3.14 with PCIe pass through (vfio)
Date: Sat, 25 Oct 2014 08:03:00 +0200
Message-ID: <544B3D14.70907@maya.org> (raw)
In-Reply-To: <1414093023.27420.40.camel@ul30vt.home>

Alex Williamson wrote:
> On Thu, 2014-10-23 at 19:33 +0200, Andreas Hartmann wrote:
>> Alex Williamson wrote:
>> [...]
>>> If you use Bjorn's previous patch to disable VC save/restore and my
>>> patch to reorder the reset mechanisms, does echo 1 > reset for the sysfs
>>> entry for the device also still cause a hang?
>>
>> Yes - it's hanging too (w/ vfio bound to the device - didn't test other
>> possibilities).
> 
> Does it happen regardless of the slot the card is plugged into?  Thanks,

As I already wrote, it's not possible to plug the device to another
port. But besides that, let me stress some "findings" I made over the
past view weeks I'm now knowing about this problem. Maybe it gives you
an idea about what's going on:


- I did all of the tests in text mode on the console. Normally, there is
a blinking cursor. When doing the echo 1 > reset, the shell doesn't come
back again and the blinking of the cursor gets immediately slower.
Getting slower means: it takes some more time until it is on / off again
again. This way, it "blinks" another not exceeding 2 times until it's
finally dead.
It looks like the machine would have suddenly extremely high load (there
are 8 cores!) - but this seems to be not true, because the cpu fan stays
silent - the rpm isn't changed at all.


- Most of the time, I'm doing tests which fail, I'm having problems
after the hang with USB (it's the Etron device). Problem means: initrd
isn't able to communicate with the device (but bios and grub2 didn't had
any problem, because keyboard worked fine, which is connected via USB
3). At this point, it is necessary to disconnect the mains completely
and wait half a minute until the problem disappears.

Seldom, I too had this problem even on bios stage: the keyboard couldn't
be seen even by the bios any more.


- Sometimes (really seldom - now happened about 3 times), it gets
extremely hard to return to normal operation after that hang. This
means: Since a few weeks, I'm running kernel 3.12.28-3-desktop out of
the box (= as provided by openSUSE). Sometimes now, I got (apparently)
the same problems (= PCIe passthrough hangs the complete machine) w/
3.12.28 as I'm having with stock >= 3.14 after testing. It's even
useless then to reconnect the mains (I experienced this 2 times in
series after one hang yesterday). At this point, I have to run kernel
3.10.x (which runs pretty fine as usual) and only after that, 3.12 works
again as expected (as appeared once yesterday while tests w/ disabled
USB 3 devices via bios).


- I think there is a relationship between how long the hang is active
and the consecutive problems coming up. If the hang is immediately (max
about 1s) reset w/ the reset knob, it is possible, that there is no USB
problem after reboot and the machine works completely fine with 3.12.x
again.


Conclusion (from my point of view):
The broken reset seems to do something really _extreme ugly_ w/ the
hardware, which has the potential to break the hardware "lasting" or the
consecutive software isn't able at all to correctly reconfigure the
system again - even after reconnecting the mains.
Fortunately I'm having an old kernel version (3.10.x), which seems to be
able to "repair" the hardware again. But I have to emphasis that the
situation is really highly questionable and I'm meanwhile fearing to
break my board finally, which is working really _extremely_ stable
besides that.



Out of interest:
Bjorn's patch disables vc save/restore support - and the machine works
fine again. Why is it needed at all if it seems to work perfectly w/o
it? What's the additional benefit? Or in other words: What am I missing
until today :-) ? What would be better? What could I do more?



Thanks,
kind regards,
Andreas

  parent reply index

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-23 19:03 Andreas Hartmann
2014-09-23 20:07 ` Alex Williamson
2014-09-24 14:54   ` Andreas Hartmann
2014-09-24 17:16     ` Andreas Hartmann
2014-10-10  9:39   ` Andreas Hartmann
2014-10-10 14:37     ` Bjorn Helgaas
2014-10-10 14:49       ` Andreas Hartmann
2014-10-10 15:55         ` Bjorn Helgaas
2014-10-10 16:09           ` Andreas Hartmann
2014-10-10 16:41             ` Bjorn Helgaas
2014-10-10 22:32               ` Andreas Hartmann
2014-10-10 22:54                 ` Bjorn Helgaas
2014-10-11  6:20                   ` Andreas Hartmann
2014-10-15  8:04                     ` Alex Williamson
2014-10-17  1:04                       ` Andreas Hartmann
2014-10-21 21:06                         ` Alex Williamson
2014-10-21 21:32                           ` Alex Williamson
2014-10-22 16:22                             ` Andreas Hartmann
2014-10-22 20:36                               ` Alex Williamson
2014-10-23 16:00                                 ` Andreas Hartmann
2014-10-23 16:33                                   ` Alex Williamson
2014-10-23 17:12                                     ` Andreas Hartmann
2014-10-23 17:33                                     ` Andreas Hartmann
2014-10-23 19:37                                       ` Alex Williamson
2014-10-24 14:21                                         ` Andreas Hartmann
2014-10-25  6:03                                         ` Andreas Hartmann [this message]
2014-10-28 21:51                                           ` Alex Williamson
2014-10-29 16:47                                             ` Andreas Hartmann
2014-10-29 17:44                                               ` Alex Williamson
2014-10-29 17:57                                                 ` Andreas Hartmann
2014-10-29 18:16                                                   ` Alex Williamson
2014-10-29 19:43                                                     ` Andreas Hartmann
2014-10-29 20:50                                                       ` Alex Williamson
2014-10-29 21:35                                                         ` Andreas Hartmann
2014-10-30 16:35                                                         ` Andreas Hartmann
2014-10-30 16:58                                                           ` Alex Williamson
2014-10-30 19:09                                                             ` Andreas Hartmann
2014-10-30 19:45                                                               ` Alex Williamson
2014-10-30 20:21                                                                 ` Andreas Hartmann
2014-10-22 15:34                           ` Andreas Hartmann
2014-10-22 16:02                             ` Alex Williamson
2014-10-22 16:20                               ` Andreas Hartmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=544B3D14.70907@maya.org \
    --to=andihartmann@freenet.de \
    --cc=alex.williamson@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=linux-pci@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-PCI Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-pci/0 linux-pci/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-pci linux-pci/ https://lore.kernel.org/linux-pci \
		linux-pci@vger.kernel.org
	public-inbox-index linux-pci

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-pci


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git