From mboxrd@z Thu Jan 1 00:00:00 1970
From: bugzilla-daemon@freedesktop.org
Subject: [Bug 107784] [AMD tahiti XT] displayport broken
Date: Sun, 02 Sep 2018 17:22:15 +0000
Message-ID:
Bug ID
107784
Summary
[AMD tahiti XT] displayport broken
Product
DRI
Version
DRI git
Hardware
x86-64 (AMD64)
OS
Linux (All)
Status
NEW
Severity
blocker
Priority
highest
Component
DRM/AMDgpu
Assignee
dri-devel@lists.freedesktop.org
Reporter
sylvain.bertrand@gmail.com
linux 4.19-rc1+ branch: amd-staging-drm-next
commit:d0a96214993c5dad9c2a54b888209f0f5cafd060
amdgpu is unable to program my displayport monitor anymore (was working in
4.18-rc1+)
see the error messages in the kernel log
Created attachment 141416 [det=
ails]
kernel log
Can you bisect? P.S. Please enable CONFIG_UNWINDER_ORC in your kernel build, to make the backtraces in dmesg more useful.
On Mon, Sep 03, 2018 at 09:17:30AM +0000, bugzilla-daemon@freedesktop.org wrote: > https://bugs.freedesktop.org/show_bug.= cgi?id=3D107784 >=20 > --- Comment #2 from Michel= D=C3=A4nzer <michel@daenz= er.net> --- > Can you bisect? I don't think it is reasonable because something bad happens between the 4.= 18 to 4.19 jump, then I would have to bisect on thousands of commits. Additionnaly, I had to remove my git repo of 4.18 and clone again 4.19 beca= use git was unable to perform the pull, even with reset and force (I probably d= on't know some git magic here): lost the last working 4.18 commit. I would need to know what drm/amdgpu related commits where added from 4.18->4.19, the person who did this has this knowledge. I'll try to fool around with git to see if I can manage something in the me= an time (with dates and targetting drm/amdgpu directories). > P.S. Please enable CONFIG_UNWINDER_ORC in your k= ernel build, to make the > backtraces in dmesg more useful. ORC breaks linux build. Only guess compile (actually kbuild and kconfig are quite broken in this 4.19), but it's an optimized build without symbols.
I did a manual bisection and got lucky: faulty commit: 019cddc88f9e4ae0de2c76802f7137210c2101aa on amd-staging-drm-= next this commit is i2c related, right before the commit 5e8704ac1cfa9fceef94fcc457e18613b1589b34 which is the drm-next commit.
(In reply to Sylvain BERTRAND from comment #4) > faulty commit: 019cddc88f9e4ae0de2c76802f7137210= c2101aa on > amd-staging-drm-next That's a pure merge commit, so it's unlikely to be the one that actually ca= used the problem. Without a plausible bisection result or at least usable backtraces in dmesg= , no progress can be made here.
I tested several times that: - this commit is failing. - the commit right before this one was working. Then, I can reasonably say, this commit, whatever its type, is the one brea= king displayport programming. Then, the code breaking displayport would be here: how hard this is to figu= re out what code related to displayport was modified? I can test again several times that this is the faulty commit.
In reply to Sylvain BERTRAND from comment #6) > I tested several times that: > - this commit is failing. > - the commit right before this one was working. >=20 > Then, I can reasonably say, this commit, whatever its type, is the one > breaking displayport programming. A merge commit has two parents so you need to test both. It is possible tha= t a merge commit did indeed introduce the bug but *highly* unlikely. (apologies= if you knew all of this) Sylvain: Are you familiar with "git bisect"? "git bisect&quo= t; automatically take care of merge commits. Also it minimizes the number of tests you need to do= .
Bugzilla: The tool which preserves all my stupid errors for et= ernity. :-/ Sorry, I just read the previous comments and of course you used git bisect already... Still you should test both parents of a merge commit to be sure. When my git bisect ends at a merge commit, usually I messed up the build du= ring bisection (e.g. running a previous build by mistake).
Ok, I got it. Since my git knowledge is quite limited, this &q= uot;merge" commit is opening a vast sea of new commits to test.=20 I'll dive into bisection using bisect (which actually deals with those merge commits). I am a bit scared of the amount of commits, may take hours/days. Please, in the foreseable futur, do not make amd-staging-drm-next lag that much. Coming back once I get the faulty _simple_ commit, wish me luck.
(In reply to Sylvain BERTRAND from comment #9) > Ok, I got it. Since my git knowledge is quite li= mited, this "merge" commit is > opening a vast sea of new commits to test.=20 >=20 > I'll dive into bisection using bisect (which actually deals with those= merge > commits). I am a bit scared of the amount of commits, may take hours/d= ays. Don't worry: If your problem is immediately obvious after booting this shou= ld be not too bad. You can also use the versions you build manually to "s= eed" the git bisection process. That way you reduce the number of bisection steps a = bit. If you encounter non-bootable kernels please use "git bisect skip"= ; - this is important! DO NOT mark as "good"/"bad" in these cases. > Please, in the foreseable futur, do not make amd= -staging-drm-next lag that > much. I think the "drm-next" thing was mandated by Linus so there is no= t much which can be done at this point. But anyway if you have an easily reproducible problem bisection is not too bad.
bisected: e2a9ca29b5edc89da2fddeae30e1070b272395c5 This commit is one in a series related to new TSC code. I tried to switch the clocksource to hpet early in the boot process, did not change anything. Any ideas before I post an issue on kernel bugzilla?
What | Removed | Added |
---|---|---|
Attachment #141416 is obsolete= td> | 1 |
Created attachmen=
t 141464 [details]
kernel log from the bad commit
Created attachment=
141465 [details]
kernel log from a good commit
(In reply to Sylvain BERTRAND from comment #11) > bisected: e2a9ca29b5edc89da2fddeae30e1070b272395= c5 >=20 > This commit is one in a series related to new TSC code That's not consistent with the merge commit you identified earlier. So I'm afraid it's likely that you incorrectly classified some commits as good or = bad. Maybe the problem doesn't occur 100% consistently even with bad commits, so= try testing longer / more times before declaring a commit as good.
On Thu, Sep 06, 2018 at 10:04:53AM +0000, bugzilla-daemon@freedesktop.org wrote: > https://bugs.freedesktop.org/show_bug.= cgi?id=3D107784 >=20 > --- Comment #14 from Mich= el D=C3=A4nzer <michel@dae= nzer.net> --- > That's not consistent with the merge commit you identified earlier. So= I'm > afraid it's likely that you incorrectly classified some commits as goo= d or bad. > Maybe the problem doesn't occur 100% consistently even with bad commit= s, so try > testing longer / more times before declaring a commit as good. Not consistent? Could you be more specific? Some git magic I forgot again? This time I used git bisect to go through "merge" commits properl= y. I did test countless times those commits: that would mean this TSC code wou= ld "side-effect" an ultra-rare bad condition into an nearly-all-the-= time bad condition. That amount of bad luck? Whatever, I'll update amd-staging-drm-next, and go through a full bisection again. I'll need probably days (lucky: I don't work).
(In reply to Sylvain BERTRAND from comment #15) > Not consistent? Could you be more specific? You wrote: (In reply to Sylvain BERTRAND from = comment #6) > - this commit is failing. > - the commit right before this one was working. "this commit" being 019cddc88f9e4ae0de2c76802f7137210c2101aa (the= I2C merge), which has two parents. Both of those parent commits contain commit e2a9ca29b5edc89da2fddeae30e1070b272395c5 (a TSC commit) as part of their history. So you previously considered commit e2a9ca29b5edc89da2fddeae30e1070b272395c5 as both bad and good. That's the inconsistency. This most likely means that you're not yet able to reliably determine that a given commit is bad, e.g. due to not testing (long) enough.
On Thu, Sep 06, 2018 at 02:22:18PM +0000, bugzilla-daemon@freedesktop.org wrote: > https://bugs.freedesktop.org/show_bug.= cgi?id=3D107784 >=20 > --- Comment #16 from Mich= el D=C3=A4nzer <michel@dae= nzer.net> --- > "this commit" being 019cddc88f9e4ae0de2c76802f7137210c2101aa= (the I2C merge), > which has two parents. Both of those parent commits contain commit > e2a9ca29b5edc89da2fddeae30e1070b272395c5 (a TSC commit) as part of the= ir > history. So you previously considered commit > e2a9ca29b5edc89da2fddeae30e1070b272395c5 as both bad and good. That's = the > inconsistency. >=20 > This most likely means that you're not yet able to reliably determine = that a > given commit is bad, e.g. due to not testing (long) enough. Wow! Then it is even worse of what I thought. Due to the violent leap from = 4.18 to 4.19, there are zillions of commits, and even nlog(n) bisect will give me ten_s_ of commits to test. Please, could you refine your "long enough" for a blocker pb whic= h happens at boot, once xorg tries to program my displayport screen. That would be based on yo= ur experience, something to give me the order of the "long enough". That said, I have a hinch. I am going to setup a much cleaner test env: it'= s a custom distro which boots in _really_ a few seconds (not in the range of mo= st mainstream distros boot time)-->I am going to slow it down, on purpose (certainly in more than 1 spot). Then, I have an efi framebuffer and I saw = some issues about this->I am going to get rid of it. Then, I am not confident= in my monitor (see my other bugs), I may use the previous artificial slow down, to power cycle the monitor, before xorg tries to detect and program it. Well, = I'll try to figure a way to put my monitor in a "probably" cleaner sta= te (in respect of displayport hotplug support). Oh, and just in case of, I'll stick to the performance cpu governor. If you have any advice about this based on your experience at knowledge , w= hich I cannot match, I'm all eyes and hears.