Comment # 29 on bug 100567 from
(In reply to Roy from comment #28)
> Somewhat surprised that this particular report hasn't received any attention
> from a core dev. Sadly, I'm afraid my response will not be hugely satisfying
> eiter.
> 
> The message "fifo: SCHED_ERROR 0a [CTXSW_TIMEOUT]" is not the result of *a
> bug* and does not point in the direction of one. Rather it is a symptom of a
> wide variety of potential problems with nouveau, (all of) which result in a
> hang of the rendering part of the GPU. On itself it doesn't give meaningful
> information to any developer as to what may be the culprit. It's like
> measuring a fever, it could be the result of many conditions.
> 
> Developers would be helped if there was a reliably reproducible situation in
> which this event can be triggered. If there is a list of steps that can be
> followed that would result in this message and a hang, that somehow doesn't
> involve words like "random" or "wait for possibly a few hours", and that can
> be traced with tools such as APITrace, we could get a step further in
> analysing what goes wrong. However, it seems unlikely this is the case,
> especially since we are also aware of multithreading-related issues that
> make isolating such problems extremely difficult. Unfortunately, post-mortem
> syslogs and dmesgs are unlikely to add any useful information to this or
> similar bug reports.

Hi Roy,

first of all thank you very much for your time taken to answer this thread.
Also thank you for the insights you gave.

As there are "tons" of threads out there, from Ubuntu to Fedora bug trackers,
as well as several threads here on freedesktop and the Linux bug tracker, that
are open created even 2 years ago, I guess it's time to get hands on to get rid
of this bug. You cannot imagine how annoying it is if you work on a machine
that could easily just completely "stall" when you open the wrong menu item or
scroll a Twitter page just at the wrong time.

Anyways, I'm still offering $100 for fixing this, so me and a good bunch of
people around get this fixed upstream, so that it won't show up regardless of
distro or whatever kernel they run. This definitely needs a good amount of
backporting though, as it bugs us for a long time. Maybe the one or other is
willing two throw another $5 at it, so this would actually not benefit the user
only, but also the developer. If someone is familiar with a crowd-funding or
something -> I am not.

I'm willing to participate in testing patches or whatever I can do as a
non-graphics-driver developer. I'm surely familiar with the processes testing
patches, however.

I don't know what to suggest as a start to get this thing done. I have nouveau
"debug" logs on my machine for a long time, but actually never saw anything
relevant in it besides the "message of death". Just a blink of an eye after
this happens the machine is completely unusable - no chance to interact anyhow
besides SysRq+REISSSUB.

I would very much like to make a progress here.

Maybe we could start and gather the already opened bugs and the people who
participated in these. Just to have a couple of people really "hit" by the
problem, that are willing to help and, moreover, are able to gather info on
their machines.

I'm pretty happy to run my machines without the "BLOB" and would happily keep
it this way. But this is a showstopper. A showstopper that has not gotten the
right attention for a long time now.

Thanks in advance - no matter what we will achieve.

Marc


You are receiving this mail because: