linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: 4.1 EOL
       [not found]   ` <CAOvwQ4hsy1CSKGCwm_Y+j-VGpmXMAF_vhg+WnifphhxVd3ur8A@mail.gmail.com>
@ 2017-11-13 21:00     ` alexander.levin
  2017-11-14  8:46       ` Jani Nikula
  0 siblings, 1 reply; 7+ messages in thread
From: alexander.levin @ 2017-11-13 21:00 UTC (permalink / raw)
  To: Tuncer Ayaz
  Cc: daniel.vetter, jani.nikula, seanpaul, airlied, dri-devel,
	linux-kernel, stable

I've cc'ed some folks in hopes to get this resolved upstream.

Either way, 4.1's EoL was previously moved to about 6 months from now,
so hopefully we'll have more than enough time to get this resolved.

On Sat, Nov 11, 2017 at 10:13:55PM +0000, Tuncer Ayaz wrote:
>The predicament I'm in on my machines is that ever since drm-intel has
>implemented atomic modesetting, there's a list regressions caused by
>those fundamental architecture changes and the code churn it implied.
>This means 4.1 is (from what I can tell) the last kernel before atomic
>modesetting was added and the only kernel free of all those issues
>which necessitate trying out various combinations of flags on the
>kernel cmdline.
>
>For instance, right now I'm trying 4.13.12 with these flags:
>video=SVIDEO-1:d
>i915.semaphores=1
>i915.enable_rc6=0
>i915.enable_psr=0
>intel_iommu=igfx_off
>
>PS: I'm kinda confused how anyone uses DMAR with VT-d when it's known
>to be buggy.
>
>The flags seem to decrease the chances of provoking the bugs, but after a
>day of running Xorg, it's possible to still hit the RCS0 GPU hangs.
>
>If you don't pass video=SVIDEO-1:d, then atomic's flip_done times out
>on boot or exit to VT console. It's good that other people have the same
>issues and have been following the bugzilla tickets, and con confirm
>the results.
>
>I'm kinda glad I don't have a machine that's newer than Sandybridge
>since that means I can use 4.1, though it's not a long-term solution,
>and the plan is for the reported bugzilla tickets to be resolved at
>some point, or me switching away from Intel GPUs, which might be
>doable if I save money and get an AMD APU laptop next summer and
>switch my desktop to a discrete GPU.
>
>For example:
>https://bugs.freedesktop.org/show_bug.cgi?id=101237
>https://bugs.freedesktop.org/show_bug.cgi?id=103076
>https://bbs.archlinux.org/viewtopic.php?id=218581&p=3
>https://bugs.archlinux.org/task/51703
>
>So, since 4.4, 4.9 and 4.12, drm-tip are still regressive,
>I wanted to ask if you considered pushing back 4.1's EOL.
>
>Given a look at bugzilla, I have the impression that those issues will
>need at least another year before they're fixed, since most of them
>have been sitting there for many, many months. I suspect the Intel DRM
>team doesn't have the bandwidth to address the issues in a timely
>fashion while still adding upbringing for new GPUs and features
>(fences, etc.).
>
>The generic modesetting DDX and Wayland are less susceptible to the
>GPU hangs, but can be made to provoke it if tried long enough.
>However, the modesetting DDX tears heavily and is about to gain atomic
>modesetting in the next Xorg release, so will suffer from the same
>easy GPU hang likelihood.
>
>Prior to SandyBridge there was zero tearing but beginning with
>SandyBridge xf86-video-intel's TearFree=TRUE is the only reliable way
>to fix Xorg tearing.
>
>I do appreciate you maintaining 4.1 so far and hate to admit that I'm
>reliant on it on more than two machines, before and after Sandybridge,
>exluding those machines which need a newer kernel. I also understand
>how much work this is and since I'm not using Linux professionally for
>a product, I can't offer compensation for your time. I can only offer
>to collect and point you at a list of DRM bugs for validation of my
>claims.

-- 

Thanks,
Sasha

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 4.1 EOL
  2017-11-13 21:00     ` 4.1 EOL alexander.levin
@ 2017-11-14  8:46       ` Jani Nikula
  2017-11-14 21:41         ` Tuncer Ayaz
  0 siblings, 1 reply; 7+ messages in thread
From: Jani Nikula @ 2017-11-14  8:46 UTC (permalink / raw)
  To: alexander.levin, Tuncer Ayaz
  Cc: daniel.vetter, seanpaul, airlied, dri-devel, linux-kernel, stable


Tuncer, where's your bug report? Can't find one. Please file your bug at
the fdo bugzilla.

Thanks,
Jani.

On Mon, 13 Nov 2017, alexander.levin@verizon.com wrote:
> I've cc'ed some folks in hopes to get this resolved upstream.
>
> Either way, 4.1's EoL was previously moved to about 6 months from now,
> so hopefully we'll have more than enough time to get this resolved.
>
> On Sat, Nov 11, 2017 at 10:13:55PM +0000, Tuncer Ayaz wrote:
>>The predicament I'm in on my machines is that ever since drm-intel has
>>implemented atomic modesetting, there's a list regressions caused by
>>those fundamental architecture changes and the code churn it implied.
>>This means 4.1 is (from what I can tell) the last kernel before atomic
>>modesetting was added and the only kernel free of all those issues
>>which necessitate trying out various combinations of flags on the
>>kernel cmdline.
>>
>>For instance, right now I'm trying 4.13.12 with these flags:
>>video=SVIDEO-1:d
>>i915.semaphores=1
>>i915.enable_rc6=0
>>i915.enable_psr=0
>>intel_iommu=igfx_off
>>
>>PS: I'm kinda confused how anyone uses DMAR with VT-d when it's known
>>to be buggy.
>>
>>The flags seem to decrease the chances of provoking the bugs, but after a
>>day of running Xorg, it's possible to still hit the RCS0 GPU hangs.
>>
>>If you don't pass video=SVIDEO-1:d, then atomic's flip_done times out
>>on boot or exit to VT console. It's good that other people have the same
>>issues and have been following the bugzilla tickets, and con confirm
>>the results.
>>
>>I'm kinda glad I don't have a machine that's newer than Sandybridge
>>since that means I can use 4.1, though it's not a long-term solution,
>>and the plan is for the reported bugzilla tickets to be resolved at
>>some point, or me switching away from Intel GPUs, which might be
>>doable if I save money and get an AMD APU laptop next summer and
>>switch my desktop to a discrete GPU.
>>
>>For example:
>>https://bugs.freedesktop.org/show_bug.cgi?id=101237
>>https://bugs.freedesktop.org/show_bug.cgi?id=103076
>>https://bbs.archlinux.org/viewtopic.php?id=218581&p=3
>>https://bugs.archlinux.org/task/51703
>>
>>So, since 4.4, 4.9 and 4.12, drm-tip are still regressive,
>>I wanted to ask if you considered pushing back 4.1's EOL.
>>
>>Given a look at bugzilla, I have the impression that those issues will
>>need at least another year before they're fixed, since most of them
>>have been sitting there for many, many months. I suspect the Intel DRM
>>team doesn't have the bandwidth to address the issues in a timely
>>fashion while still adding upbringing for new GPUs and features
>>(fences, etc.).
>>
>>The generic modesetting DDX and Wayland are less susceptible to the
>>GPU hangs, but can be made to provoke it if tried long enough.
>>However, the modesetting DDX tears heavily and is about to gain atomic
>>modesetting in the next Xorg release, so will suffer from the same
>>easy GPU hang likelihood.
>>
>>Prior to SandyBridge there was zero tearing but beginning with
>>SandyBridge xf86-video-intel's TearFree=TRUE is the only reliable way
>>to fix Xorg tearing.
>>
>>I do appreciate you maintaining 4.1 so far and hate to admit that I'm
>>reliant on it on more than two machines, before and after Sandybridge,
>>exluding those machines which need a newer kernel. I also understand
>>how much work this is and since I'm not using Linux professionally for
>>a product, I can't offer compensation for your time. I can only offer
>>to collect and point you at a list of DRM bugs for validation of my
>>claims.

-- 
Jani Nikula, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 4.1 EOL
  2017-11-14  8:46       ` Jani Nikula
@ 2017-11-14 21:41         ` Tuncer Ayaz
  2017-11-15 10:04           ` Jani Nikula
  0 siblings, 1 reply; 7+ messages in thread
From: Tuncer Ayaz @ 2017-11-14 21:41 UTC (permalink / raw)
  To: Jani Nikula
  Cc: alexander.levin, daniel.vetter, seanpaul, airlied, dri-devel,
	linux-kernel, stable

On 11/14/17, Jani Nikula <jani.nikula@linux.intel.com> wrote:
>
> Tuncer, where's your bug report? Can't find one. Please file your
> bug at the fdo bugzilla.

I'm sorry if this wasn't clear.

I didn't file a bug report since others have already done so,
reporting the same symptoms. I did sign up yesterday to confirm this
in the most recent bug report. And I don't think it makes sense to
re-file the exact same report.

The way I arrived there is via another post in a forum post related to
x220 regressions, but it doesn't look exclusive to Sandybridge GPUs.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 4.1 EOL
  2017-11-14 21:41         ` Tuncer Ayaz
@ 2017-11-15 10:04           ` Jani Nikula
  2017-11-15 17:00             ` Tuncer Ayaz
  0 siblings, 1 reply; 7+ messages in thread
From: Jani Nikula @ 2017-11-15 10:04 UTC (permalink / raw)
  To: Tuncer Ayaz
  Cc: alexander.levin, daniel.vetter, seanpaul, airlied, dri-devel,
	linux-kernel, stable

On Tue, 14 Nov 2017, Tuncer Ayaz <tuncer.ayaz@gmail.com> wrote:
> On 11/14/17, Jani Nikula <jani.nikula@linux.intel.com> wrote:
>>
>> Tuncer, where's your bug report? Can't find one. Please file your
>> bug at the fdo bugzilla.
>
> I'm sorry if this wasn't clear.
>
> I didn't file a bug report since others have already done so,
> reporting the same symptoms. I did sign up yesterday to confirm this
> in the most recent bug report. And I don't think it makes sense to
> re-file the exact same report.
>
> The way I arrived there is via another post in a forum post related to
> x220 regressions, but it doesn't look exclusive to Sandybridge GPUs.

The freedesktop.org bugs you reference are for rather different
platforms than yours. There's nothing there to indicate v4.1 being the
last known good kernel like for you. There is no exact same report.

Please file the bug. Please run v4.14 or drm-tip branch from [1]. Please
remove all other module parameters, but add drm.debug=14, and attach the
dmesg from boot to the problem. Please attach the GPU error state if you
get a GPU hang. Please let us decide if we've seen the bug before or
not.

We've been continuously improving our CI and test assets and expanding
the hardware pool we run the tests on for years now. Even so, bugs
obviously slip through. And it's really *really* hard to revert anything
or fix regressions when we get the reports about two years or a dozen
kernel releases after we've broken stuff. :(


BR,
Jani.

[1] https://cgit.freedesktop.org/drm/drm-tip

-- 
Jani Nikula, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 4.1 EOL
  2017-11-15 10:04           ` Jani Nikula
@ 2017-11-15 17:00             ` Tuncer Ayaz
  2017-11-16  9:13               ` Jani Nikula
  0 siblings, 1 reply; 7+ messages in thread
From: Tuncer Ayaz @ 2017-11-15 17:00 UTC (permalink / raw)
  To: Jani Nikula
  Cc: alexander.levin, daniel.vetter, seanpaul, airlied, dri-devel,
	linux-kernel, stable

On 11/15/17, Jani Nikula <jani.nikula@linux.intel.com> wrote:

> The freedesktop.org bugs you reference are for rather different
> platforms than yours. There's nothing there to indicate v4.1 being
> the last known good kernel like for you. There is no exact same
> report.

I don't follow why you think it's a different platform and how I might
have "more" definitely shown v4.1 to be good, but I'll trust your
judgement as a drm dev and not argue :).

> Please file the bug. Please run v4.14 or drm-tip branch from [1].
> Please remove all other module parameters, but add drm.debug=14, and
> attach the dmesg from boot to the problem. Please attach the GPU
> error state if you get a GPU hang. Please let us decide if we've
> seen the bug before or not.

Is the flip_done timeout on exit from Xorg a separate bug? That's one
of the symptoms.

The other symptom is GEM errors in dmesg followed by rcs0 gpu hangs
some time later.

In both cases the machine will be temporarily unresponsive or even
hang indefinitely.

I can't say when the bugs will be filed. Hopefully soon.

> We've been continuously improving our CI and test assets and
> expanding the hardware pool we run the tests on for years now. Even
> so, bugs obviously slip through. And it's really *really* hard to
> revert anything or fix regressions when we get the reports about two
> years or a dozen kernel releases after we've broken stuff. :(

Sure, but it's important to note that the rcs0 hangs have been very
visible in 4.13 and, if included, better hidden in older kernels.
Meaning, it didn't appear as easily in older kernels for me to take
notice and report.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 4.1 EOL
  2017-11-15 17:00             ` Tuncer Ayaz
@ 2017-11-16  9:13               ` Jani Nikula
  2017-11-17  7:42                 ` Tuncer Ayaz
  0 siblings, 1 reply; 7+ messages in thread
From: Jani Nikula @ 2017-11-16  9:13 UTC (permalink / raw)
  To: Tuncer Ayaz
  Cc: alexander.levin, daniel.vetter, seanpaul, airlied, dri-devel,
	linux-kernel, stable

On Wed, 15 Nov 2017, Tuncer Ayaz <tuncer.ayaz@gmail.com> wrote:
> I don't follow why you think it's a different platform and how I might
> have "more" definitely shown v4.1 to be good, but I'll trust your
> judgement as a drm dev and not argue :).

You apparently have Sandy Bridge, the referenced reports are about
Broadwell and Skylake. Even if the symptoms you see are the same, the
root causes might be wildly different, needing a different fix.

I've learned the hard way not to make assumptions without detailed
information, which in this case I don't have. As in, I don't even know
for sure if you have Sandy Bridge or not, although it's alluded to in
your message.

>From my point of view, you're shouting regression while giving us
nothing to work with. You need to help us to help you.

BR,
Jani.


-- 
Jani Nikula, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 4.1 EOL
  2017-11-16  9:13               ` Jani Nikula
@ 2017-11-17  7:42                 ` Tuncer Ayaz
  0 siblings, 0 replies; 7+ messages in thread
From: Tuncer Ayaz @ 2017-11-17  7:42 UTC (permalink / raw)
  To: Jani Nikula
  Cc: alexander.levin, daniel.vetter, seanpaul, airlied, dri-devel,
	linux-kernel, stable

On 11/16/17, Jani Nikula <jani.nikula@linux.intel.com> wrote:
> On Wed, 15 Nov 2017, Tuncer Ayaz <tuncer.ayaz@gmail.com> wrote:
> > I don't follow why you think it's a different platform and how I
> > might have "more" definitely shown v4.1 to be good, but I'll trust
> > your judgement as a drm dev and not argue :).
>
> You apparently have Sandy Bridge, the referenced reports are about
> Broadwell and Skylake. Even if the symptoms you see are the same,
> the root causes might be wildly different, needing a different fix.

Thanks for taking time to explain and clear my confusion :).

I checked the comments of the other reporter with a Sandy Bridge
system, and they haven't provided a proper trace. Hence, you're
absolutely right.

> I've learned the hard way not to make assumptions without detailed
> information, which in this case I don't have. As in, I don't even
> know for sure if you have Sandy Bridge or not, although it's alluded
> to in your message.

I do (Sandy Bridge), sorry for not being clearer about that.

> From my point of view, you're shouting regression while giving us
> nothing to work with. You need to help us to help you.

Like I said, will do.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-11-17  7:42 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAOvwQ4i_VTLg_P-mCTesTvUENYmtix1MfbmFQ=WTC6UUT-DroA@mail.gmail.com>
     [not found] ` <20171111200025.hz6zbcqy7uacqd2s@sasha-lappy>
     [not found]   ` <CAOvwQ4hsy1CSKGCwm_Y+j-VGpmXMAF_vhg+WnifphhxVd3ur8A@mail.gmail.com>
2017-11-13 21:00     ` 4.1 EOL alexander.levin
2017-11-14  8:46       ` Jani Nikula
2017-11-14 21:41         ` Tuncer Ayaz
2017-11-15 10:04           ` Jani Nikula
2017-11-15 17:00             ` Tuncer Ayaz
2017-11-16  9:13               ` Jani Nikula
2017-11-17  7:42                 ` Tuncer Ayaz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).