All of lore.kernel.org
 help / color / mirror / Atom feed
* Help needed for bug 58556
       [not found] ` <954328835.395384526.1391172165854.JavaMail.root-x5ewXQG5twBsFmKuirFwRhh1pbbyJDp15NbjCUgZEJk@public.gmane.org>
@ 2014-01-31 13:09   ` pierre.morrow-GANU6spQydw
       [not found]     ` <1930859494.395453948.1391173785548.JavaMail.root-x5ewXQG5twBsFmKuirFwRhh1pbbyJDp15NbjCUgZEJk@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: pierre.morrow-GANU6spQydw @ 2014-01-31 13:09 UTC (permalink / raw)
  To: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW


[-- Attachment #1.1: Type: text/plain, Size: 1182 bytes --]

Hello List, 

I am trying to solve bug 58556 [1], but I will need some help as I don't understand all that is going on. 

The system is composed of an NV96 (9600 GT) and an NVAC (9400 M) card; acceleration is disabled, otherwise the system hangs at boot after initialising the NV96 card. 
The problem consist of a complete garbage screen in console mode, and a one-third gabage screen in GUI. 

It appeared in commit 20abd1634a6e2eedb84ca977adea56b8aa06cc3e, as it was initialising some structures only when acceleration was on, even if it was later used in both cases. These structures changed in commit ebb945a94bba2ce8dff7b0942ff2b3f2a52a0a69, solving the initial issue but the problem still remains. 

After a few tests, it seems nouveau_channel_new is key to get a correct screen, though it is only called when acceleration is on; I didn't find which structures initialised by nouveau_channel_new are needed to get a clean screen, nor did I find any clues in debug messages. 

Does anyone have some clues about how it (should) works, or could give me some pointers? 

Thanks in advance for your help, 

Pierre Moreau 


[1]: https://bugs.freedesktop.org/show_bug.cgi?id=58556 

[-- Attachment #1.2: Type: text/html, Size: 1407 bytes --]

[-- Attachment #2: Type: text/plain, Size: 181 bytes --]

_______________________________________________
Nouveau mailing list
Nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Help needed for bug 58556
       [not found]     ` <1930859494.395453948.1391173785548.JavaMail.root-x5ewXQG5twBsFmKuirFwRhh1pbbyJDp15NbjCUgZEJk@public.gmane.org>
@ 2014-01-31 20:16       ` Ilia Mirkin
       [not found]         ` <CAKb7Uvijy5X8=fpt1qy4w_-sVUT9NCc_ft9yOvPPdkCOd6oXSQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Ilia Mirkin @ 2014-01-31 20:16 UTC (permalink / raw)
  To: pierre.morrow-GANU6spQydw; +Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On Fri, Jan 31, 2014 at 8:09 AM,  <pierre.morrow-GANU6spQydw@public.gmane.org> wrote:
> Hello List,
>
> I am trying to solve bug 58556 [1], but I will need some help as I don't
> understand all that is going on.

Unfortunately this is a *massive* bug... and confused by the "other"
very similar but apparently not identical bug in the system.

>
> The system is composed of an NV96 (9600 GT) and an NVAC (9400 M) card;
> acceleration is disabled, otherwise the system hangs at boot after
> initialising the NV96 card.
> The problem consist of a complete garbage screen in console mode, and a
> one-third gabage screen in GUI.

What happens if you only enable acceleration on the NVAC card? (e.g.
by hacking up nouveau to ignore the other one entirely). Wasn't there
some thing where the NV96 card was effectively disabled but still
appearing in PCI space? Or I might be thinking of a different mac
situation...

>
> It appeared in commit 20abd1634a6e2eedb84ca977adea56b8aa06cc3e, as it was
> initialising some structures only when acceleration was on, even if it was
> later used in both cases. These structures changed in commit
> ebb945a94bba2ce8dff7b0942ff2b3f2a52a0a69, solving the initial issue but the
> problem still remains.

As you probably saw, this is a MASSIVE commit. What exactly was the
problem with 20abd1634a?

>
> After a few tests, it seems nouveau_channel_new is key to get a correct
> screen, though it is only called when acceleration is on; I didn't find
> which structures initialised by nouveau_channel_new are needed to get a
> clean screen, nor did I find any clues in debug messages.

Can you go into some detail on what these tests were that yielded a
successful outcome? IIRC nouveau_channel_new is called to create a
new... channel, which is used by drm clients. If you don't have
acceleration, that whole api is disabled, so it shouldn't come up. I
guess accel_init also initializes drm->channel which is the kernel
channel for doing stuff. [Although TBH I'm not entirely sure how
things work without acceleration enabled...  but I think there's a
non-fifo way to show images on the screen.]

  -ilia

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Help needed for bug 58556
       [not found]         ` <CAKb7Uvijy5X8=fpt1qy4w_-sVUT9NCc_ft9yOvPPdkCOd6oXSQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-01-31 22:39           ` pierre.morrow-GANU6spQydw
       [not found]             ` <964876991.396653347.1391207965303.JavaMail.root-x5ewXQG5twBsFmKuirFwRhh1pbbyJDp15NbjCUgZEJk@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: pierre.morrow-GANU6spQydw @ 2014-01-31 22:39 UTC (permalink / raw)
  To: Ilia Mirkin; +Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW


[-- Attachment #1.1: Type: text/plain, Size: 3094 bytes --]

----- Mail original -----

> De: "Ilia Mirkin" <imirkin-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org>
> À: "pierre morrow" <pierre.morrow-GANU6spQydw@public.gmane.org>
> Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
> Envoyé: Vendredi 31 Janvier 2014 21:16:40
> Objet: Re: [Nouveau] Help needed for bug 58556

> Unfortunately this is a *massive* bug... and confused by the "other"
> very similar but apparently not identical bug in the system.

> What happens if you only enable acceleration on the NVAC card? (e.g.
> by hacking up nouveau to ignore the other one entirely). Wasn't there
> some thing where the NV96 card was effectively disabled but still
> appearing in PCI space? Or I might be thinking of a different mac
> situation...

Well, if I disable acceleration for the NV96 card, it doesn't hang after initialising it, but I get spammed (I think it's PAGE_NOT_PRESENT errors, like [1], but my screen goes garbage at that point, so I can't read anything) later on, and I don't get to login. 
BTW, what could I do to get boot logs even if the system did not make it trough (apart from recording with my phone...)? 

> As you probably saw, this is a MASSIVE commit. What exactly was the
> problem with 20abd1634a?

The vblank structure was a little bit modified, and psw->vblank would be initialised only when acceleration is on (it was always initialised before), though it would be used inside functions called even when acceleration is off. You can see it in comments 18 [2] and 20 [3]. 

> Can you go into some detail on what these tests were that yielded a
> successful outcome? IIRC nouveau_channel_new is called to create a
> new... channel, which is used by drm clients. If you don't have
> acceleration, that whole api is disabled, so it shouldn't come up. I
> guess accel_init also initializes drm->channel which is the kernel
> channel for doing stuff. [Although TBH I'm not entirely sure how
> things work without acceleration enabled... but I think there's a
> non-fifo way to show images on the screen.]

My tests were pretty bruteforcing ones: 
* comment all nouveau_accel_init content, and uncomment block by block until it works; 
* then comment all nouveau_channel_new content, and uncomment function by function until it works; 
* and finally, I did the same inside nouveau_channel_init (for this function, only the vram creation, gart creation and dma variables initialisation were enough to get a clean screen) . 

To sum up what pieces of nouveau_accel_init were needed to get a clean screen: 
* return if card is an NV96 one; 
* init fence; 
* run nouveau_channel_new: 
* nouveau_channel_ind 
* nouveau_channel_init, precisely these parts: 
* vram creation; 
* gart creation; 
* dma variables initialisation. 
> -ilia

Pierre Moreau 

[1]: nouveau E[PFB][0000:03:00:0] trapped write at 0x0000546000 on channel 0x0000fee0 [unknown] BAR/PFIFO_WRITE/FB reason: PAGE_NOT_PRESENT 
[2]: https://bugs.freedesktop.org/show_bug.cgi?id=58556#c18 
[3]: https://bugs.freedesktop.org/show_bug.cgi?id=58556#c20

[-- Attachment #1.2: Type: text/html, Size: 4615 bytes --]

[-- Attachment #2: Type: text/plain, Size: 181 bytes --]

_______________________________________________
Nouveau mailing list
Nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Help needed for bug 58556
       [not found]             ` <964876991.396653347.1391207965303.JavaMail.root-x5ewXQG5twBsFmKuirFwRhh1pbbyJDp15NbjCUgZEJk@public.gmane.org>
@ 2014-01-31 22:58               ` Ilia Mirkin
       [not found]                 ` <CAKb7UviKuN35KDW73kqOGWzd9GYTzrfodA+TPHa460vUy0iANA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Ilia Mirkin @ 2014-01-31 22:58 UTC (permalink / raw)
  To: pierre.morrow-GANU6spQydw; +Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On Fri, Jan 31, 2014 at 5:39 PM,  <pierre.morrow-GANU6spQydw@public.gmane.org> wrote:
> De: "Ilia Mirkin" <imirkin-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org>
> > Unfortunately this is a *massive* bug... and confused by the "other"
> > very similar but apparently not identical bug in the system.
> >
> > What happens if you only enable acceleration on the NVAC card? (e.g.
> > by hacking up nouveau to ignore the other one entirely). Wasn't there
> > some thing where the NV96 card was effectively disabled but still
> > appearing in PCI space? Or I might be thinking of a different mac
> > situation...
>
> Well, if I disable acceleration for the NV96 card, it doesn't hang after
> initialising it, but I get spammed (I think it's PAGE_NOT_PRESENT errors,
> like [1], but my screen goes garbage at that point, so I can't read
> anything) later on, and I don't get to login.

I meant disable it much harder -- like tell nouveau to just ignore it
as though modeset=0 was passed in for it. Also I seem to recall you
can do an outb (even from grub) that will just turn off the nv96 card
entirely.

> BTW, what could I do to get boot logs even if the system did not make it
> trough (apart from recording with my phone...)?

pstore if you have efi, netconsole, blockconsole. And phone isn't so
bad either :)

>
>
>
> > As you probably saw, this is a MASSIVE commit. What exactly was the
> > problem with 20abd1634a?
>
> The vblank structure was a little bit modified, and psw->vblank would be
> initialised only when acceleration is on (it was always initialised before),
> though it would be used inside functions called even when acceleration is
> off. You can see it in comments 18 [2] and 20 [3].
>
>
> > Can you go into some detail on what these tests were that yielded a
> > successful outcome? IIRC nouveau_channel_new is called to create a
> > new... channel, which is used by drm clients. If you don't have
> > acceleration, that whole api is disabled, so it shouldn't come up. I
> > guess accel_init also initializes drm->channel which is the kernel
> > channel for doing stuff. [Although TBH I'm not entirely sure how
> > things work without acceleration enabled...  but I think there's a
> > non-fifo way to show images on the screen.]
>
> My tests were pretty bruteforcing ones:
> *   comment all nouveau_accel_init content, and uncomment block by block
> until it works;
> *   then comment all nouveau_channel_new content, and uncomment function by
> function until it works;
> *   and finally, I did the same inside nouveau_channel_init (for this
> function, only the vram creation, gart creation and dma variables
> initialisation were enough to get a clean screen).
>
> To sum up what pieces of nouveau_accel_init were needed to get a clean
> screen:
> *   return if card is an NV96 one;
> *   init fence;
> *   run nouveau_channel_new:
>     *   nouveau_channel_ind
>     *   nouveau_channel_init, precisely these parts:
>         *   vram creation;
>         *   gart creation;
>         *   dma variables initialisation.

Yeah, so all these things should only be necessary if you have
acceleration enabled. I wonder if the card comes up in a funny "I'm
still executing stuff" state and nouveau fails to "shut it down" when
noaccel is passed in.

  -ilia

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Help needed for bug 58556
       [not found]                 ` <CAKb7UviKuN35KDW73kqOGWzd9GYTzrfodA+TPHa460vUy0iANA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2014-02-01 15:25                   ` pierre.morrow-GANU6spQydw
  2014-02-04 15:12                   ` Pierre Moreau
  1 sibling, 0 replies; 7+ messages in thread
From: pierre.morrow-GANU6spQydw @ 2014-02-01 15:25 UTC (permalink / raw)
  To: Ilia Mirkin; +Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW


[-- Attachment #1.1: Type: text/plain, Size: 1330 bytes --]

----- Mail original -----

> De: "Ilia Mirkin" <imirkin-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org>
> À: "pierre morrow" <pierre.morrow-GANU6spQydw@public.gmane.org>
> Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
> Envoyé: Vendredi 31 Janvier 2014 23:58:58
> Objet: Re: [Nouveau] Help needed for bug 58556

> I meant disable it much harder -- like tell nouveau to just ignore it
> as though modeset=0 was passed in for it. Also I seem to recall you
> can do an outb (even from grub) that will just turn off the nv96 card
> entirely.

So I denied pci to enable the card, but nothing changed: with noaccel, screen is still corrupted and without it, I still don't boot, getting lots of errors. 
Oh, I forgot to precise: the screen gets scrambled just after the handover from efifb to nouveaufb. 

> pstore if you have efi, netconsole, blockconsole. And phone isn't so
> bad either :)

> Yeah, so all these things should only be necessary if you have
> acceleration enabled. I wonder if the card comes up in a funny "I'm
> still executing stuff" state and nouveau fails to "shut it down" when
> noaccel is passed in.

Well, from what I saw in nouveau_accel_init, it just enables things if noaccel=0, and does nothing otherwise; noaccel isn't used apart from accel_init. 

> -ilia

Pierre 

[-- Attachment #1.2: Type: text/html, Size: 2265 bytes --]

[-- Attachment #2: Type: text/plain, Size: 181 bytes --]

_______________________________________________
Nouveau mailing list
Nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Help needed for bug 58556
       [not found]                 ` <CAKb7UviKuN35KDW73kqOGWzd9GYTzrfodA+TPHa460vUy0iANA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2014-02-01 15:25                   ` pierre.morrow-GANU6spQydw
@ 2014-02-04 15:12                   ` Pierre Moreau
       [not found]                     ` <3D3EC88C-612F-485B-A8F0-3C197D9FD2A4-GANU6spQydw@public.gmane.org>
  1 sibling, 1 reply; 7+ messages in thread
From: Pierre Moreau @ 2014-02-04 15:12 UTC (permalink / raw)
  To: Ilia Mirkin; +Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

I think we had the wrong culprit: I just tried PCI-disabling the NVAC card (and keeping the NV96 one), and it just works: no garbage screen, and moreover, I get no hangs nor errors when enabling acceleration!

I'll spend some time comparing both outputs (without NVAC, and without NV96) to find out what the NVAC is doing wrong. 

Pierre

> On 31 Jan 2014, at 23:58, Ilia Mirkin <imirkin-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org> wrote:
> 
>> On Fri, Jan 31, 2014 at 5:39 PM,  <pierre.morrow-GANU6spQydw@public.gmane.org> wrote:
>> De: "Ilia Mirkin" <imirkin-FrUbXkNCsVf2fBVCVOL8/A@public.gmane.org>
>>> Unfortunately this is a *massive* bug... and confused by the "other"
>>> very similar but apparently not identical bug in the system.
>>> 
>>> What happens if you only enable acceleration on the NVAC card? (e.g.
>>> by hacking up nouveau to ignore the other one entirely). Wasn't there
>>> some thing where the NV96 card was effectively disabled but still
>>> appearing in PCI space? Or I might be thinking of a different mac
>>> situation...
>> 
>> Well, if I disable acceleration for the NV96 card, it doesn't hang after
>> initialising it, but I get spammed (I think it's PAGE_NOT_PRESENT errors,
>> like [1], but my screen goes garbage at that point, so I can't read
>> anything) later on, and I don't get to login.
> 
> I meant disable it much harder -- like tell nouveau to just ignore it
> as though modeset=0 was passed in for it. Also I seem to recall you
> can do an outb (even from grub) that will just turn off the nv96 card
> entirely.
> 
>> BTW, what could I do to get boot logs even if the system did not make it
>> trough (apart from recording with my phone...)?
> 
> pstore if you have efi, netconsole, blockconsole. And phone isn't so
> bad either :)
> 
>> 
>> 
>> 
>>> As you probably saw, this is a MASSIVE commit. What exactly was the
>>> problem with 20abd1634a?
>> 
>> The vblank structure was a little bit modified, and psw->vblank would be
>> initialised only when acceleration is on (it was always initialised before),
>> though it would be used inside functions called even when acceleration is
>> off. You can see it in comments 18 [2] and 20 [3].
>> 
>> 
>>> Can you go into some detail on what these tests were that yielded a
>>> successful outcome? IIRC nouveau_channel_new is called to create a
>>> new... channel, which is used by drm clients. If you don't have
>>> acceleration, that whole api is disabled, so it shouldn't come up. I
>>> guess accel_init also initializes drm->channel which is the kernel
>>> channel for doing stuff. [Although TBH I'm not entirely sure how
>>> things work without acceleration enabled...  but I think there's a
>>> non-fifo way to show images on the screen.]
>> 
>> My tests were pretty bruteforcing ones:
>> *   comment all nouveau_accel_init content, and uncomment block by block
>> until it works;
>> *   then comment all nouveau_channel_new content, and uncomment function by
>> function until it works;
>> *   and finally, I did the same inside nouveau_channel_init (for this
>> function, only the vram creation, gart creation and dma variables
>> initialisation were enough to get a clean screen).
>> 
>> To sum up what pieces of nouveau_accel_init were needed to get a clean
>> screen:
>> *   return if card is an NV96 one;
>> *   init fence;
>> *   run nouveau_channel_new:
>>    *   nouveau_channel_ind
>>    *   nouveau_channel_init, precisely these parts:
>>        *   vram creation;
>>        *   gart creation;
>>        *   dma variables initialisation.
> 
> Yeah, so all these things should only be necessary if you have
> acceleration enabled. I wonder if the card comes up in a funny "I'm
> still executing stuff" state and nouveau fails to "shut it down" when
> noaccel is passed in.
> 
>  -ilia

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Help needed for bug 58556
       [not found]                     ` <3D3EC88C-612F-485B-A8F0-3C197D9FD2A4-GANU6spQydw@public.gmane.org>
@ 2014-02-13 13:00                       ` pierre.morrow-GANU6spQydw
  0 siblings, 0 replies; 7+ messages in thread
From: pierre.morrow-GANU6spQydw @ 2014-02-13 13:00 UTC (permalink / raw)
  To: Ilia Mirkin; +Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW


[-- Attachment #1.1: Type: text/plain, Size: 979 bytes --]

Hello, 

One thing that differs between the NV96 and the NVAC, is the efifb handling: 
for the NVAC, efifb is conflicting with nouveaufb, therefore it is removed, 
whereas the NV96 has no problem and keeps efifb along. 

To sum up what happens with the different framebuffers on the two cards: 

NVAC: with efifb+nouveaufb: corruption, with nouveaufb: still corrupted 
NV96: with efifb+nouveaufb: ok, with nouveaufb: screen freezes 

So NV96 does not seem to use nouveaufb for outputing, as removing efifb 
freezes the screen (ie. the machine continues to run, just the screen isn’t 
updated anymore) and removing nouveau_fbcon_init has no effect. Not sure why 
NV96 does not use nouveaufb... 
And NVAC uses nouveaufb (removing efifb has no effect and removing 
nouveau_fbcon_init causes a screen freeze). I need to do some further tests 
as removing nouveau_fbcon_init and nouveau_accel_init does not seem to bring 
garbage screen before freezing. 

Pierre

[-- Attachment #1.2: Type: text/html, Size: 1230 bytes --]

[-- Attachment #2: Type: text/plain, Size: 181 bytes --]

_______________________________________________
Nouveau mailing list
Nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2014-02-13 13:00 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <954328835.395384526.1391172165854.JavaMail.root@spooler8-g27.priv.proxad.net>
     [not found] ` <954328835.395384526.1391172165854.JavaMail.root-x5ewXQG5twBsFmKuirFwRhh1pbbyJDp15NbjCUgZEJk@public.gmane.org>
2014-01-31 13:09   ` Help needed for bug 58556 pierre.morrow-GANU6spQydw
     [not found]     ` <1930859494.395453948.1391173785548.JavaMail.root-x5ewXQG5twBsFmKuirFwRhh1pbbyJDp15NbjCUgZEJk@public.gmane.org>
2014-01-31 20:16       ` Ilia Mirkin
     [not found]         ` <CAKb7Uvijy5X8=fpt1qy4w_-sVUT9NCc_ft9yOvPPdkCOd6oXSQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-01-31 22:39           ` pierre.morrow-GANU6spQydw
     [not found]             ` <964876991.396653347.1391207965303.JavaMail.root-x5ewXQG5twBsFmKuirFwRhh1pbbyJDp15NbjCUgZEJk@public.gmane.org>
2014-01-31 22:58               ` Ilia Mirkin
     [not found]                 ` <CAKb7UviKuN35KDW73kqOGWzd9GYTzrfodA+TPHa460vUy0iANA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-02-01 15:25                   ` pierre.morrow-GANU6spQydw
2014-02-04 15:12                   ` Pierre Moreau
     [not found]                     ` <3D3EC88C-612F-485B-A8F0-3C197D9FD2A4-GANU6spQydw@public.gmane.org>
2014-02-13 13:00                       ` pierre.morrow-GANU6spQydw

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.