All of lore.kernel.org
 help / color / mirror / Atom feed
* Ingenic JZ4730 - illegal instruction
@ 2009-03-06 16:36 Nils Faerber
  2009-03-08 14:53 ` Markus Gothe
  2009-03-09  8:39 ` Kevin D. Kissell
  0 siblings, 2 replies; 10+ messages in thread
From: Nils Faerber @ 2009-03-06 16:36 UTC (permalink / raw)
  To: linux-mips

Hello!
I am rather playing than really working on a Ingenic JZ4730 based
device. The JZ4730 is a MIPS32 SOC included in many types of devices,
like media players and thelike but also in small power efficient
subnotebooks (this is the device I am trying to support based on the
Ingebic Linux kernel patch).

The current kernel patch from Ingenic

http://www.ingenic.cn/eng/productServ/App/JZ4730/pfCustomPage.aspx
or
ftp://ftp.ingenic.cn/3sw/01linux/02kernel/linux-2.6.24/linux-2.6.24.3-jz-20090218.patch.gz

for the patch (I used an even older patch to start my board support but
they basically only added newer CPU types in later patches).

The support for my board is almost in place but I see from time to time
failing applications with "illegal instruction" faults. Most shell
applications work pretty fine, especially more complex GUI applications
seem to fail, like a webbrowser or such.
I also tested this with different GCC and glibc version which makes me
pretty sure that I am seeing a kernel problem here rather than a
userspace problem.

I am pretty clueless how to debug this. Apropos debig as another hint:
Some application work if I start them in GDB but fail outside.

Any hint how to start debugging this would be greatly appreciated! And a
fix would be like a dream ;)

Many thanks!

Cheers
  nils faerber

-- 
kernel concepts GbR        Tel: +49-271-771091-12
Sieghuetter Hauptweg 48    Fax: +49-271-771091-19
D-57072 Siegen             Mob: +49-176-21024535
http://www.kernelconcepts.de

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Ingenic JZ4730 - illegal instruction
  2009-03-06 16:36 Ingenic JZ4730 - illegal instruction Nils Faerber
@ 2009-03-08 14:53 ` Markus Gothe
  2009-03-08 16:03   ` ard
  2009-03-09  8:39 ` Kevin D. Kissell
  1 sibling, 1 reply; 10+ messages in thread
From: Markus Gothe @ 2009-03-08 14:53 UTC (permalink / raw)
  To: Nils Faerber; +Cc: linux-mips

[-- Attachment #1: Type: text/plain, Size: 1723 bytes --]

Well, the Xburst-arch seems to be pretty fucked up beyond all repair.

//Markus

On 6 Mar 2009, at 17:36, Nils Faerber wrote:

> Hello!
> I am rather playing than really working on a Ingenic JZ4730 based
> device. The JZ4730 is a MIPS32 SOC included in many types of devices,
> like media players and thelike but also in small power efficient
> subnotebooks (this is the device I am trying to support based on the
> Ingebic Linux kernel patch).
>
> The current kernel patch from Ingenic
>
> http://www.ingenic.cn/eng/productServ/App/JZ4730/pfCustomPage.aspx
> or
> ftp://ftp.ingenic.cn/3sw/01linux/02kernel/linux-2.6.24/linux-2.6.24.3-jz-20090218.patch.gz
>
> for the patch (I used an even older patch to start my board support  
> but
> they basically only added newer CPU types in later patches).
>
> The support for my board is almost in place but I see from time to  
> time
> failing applications with "illegal instruction" faults. Most shell
> applications work pretty fine, especially more complex GUI  
> applications
> seem to fail, like a webbrowser or such.
> I also tested this with different GCC and glibc version which makes me
> pretty sure that I am seeing a kernel problem here rather than a
> userspace problem.
>
> I am pretty clueless how to debug this. Apropos debig as another hint:
> Some application work if I start them in GDB but fail outside.
>
> Any hint how to start debugging this would be greatly appreciated!  
> And a
> fix would be like a dream ;)
>
> Many thanks!
>
> Cheers
>  nils faerber
>
> -- 
> kernel concepts GbR        Tel: +49-271-771091-12
> Sieghuetter Hauptweg 48    Fax: +49-271-771091-19
> D-57072 Siegen             Mob: +49-176-21024535
> http://www.kernelconcepts.de
>


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 194 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Ingenic JZ4730 - illegal instruction
  2009-03-08 14:53 ` Markus Gothe
@ 2009-03-08 16:03   ` ard
  2009-03-10 17:12     ` Markus Gothe
  0 siblings, 1 reply; 10+ messages in thread
From: ard @ 2009-03-08 16:03 UTC (permalink / raw)
  To: linux-mips

Hello,

On Sun, Mar 08, 2009 at 03:53:49PM +0100, Markus Gothe wrote:
> Well, the Xburst-arch seems to be pretty fucked up beyond all repair.

Hmmm, that doesn't sound promising. But do you have references
for that?
Maybe google has some more info than since I first started
searching.
Anyway: for now it happily runs debian, and for what I can see,
the kernel patches have no real changes except for extra drivers
and extra board and powermanagement drivers.
So if you have hints in which way the xburst deviates from
"standard" mips, it could help us a lot.
In the mean time I am going to subscribe to the ingenic forum
( http://www.ingenic.cn/eng/forum/vvFrmDefault.aspx )
that contains more characters that I can't read than characters
that I can read :-(.

Regards,
Ard


-- 
.signature not found

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Ingenic JZ4730 - illegal instruction
  2009-03-06 16:36 Ingenic JZ4730 - illegal instruction Nils Faerber
  2009-03-08 14:53 ` Markus Gothe
@ 2009-03-09  8:39 ` Kevin D. Kissell
  2009-03-09 10:00   ` Nils Faerber
  1 sibling, 1 reply; 10+ messages in thread
From: Kevin D. Kissell @ 2009-03-09  8:39 UTC (permalink / raw)
  To: Nils Faerber; +Cc: linux-mips

The only thing that you've mentioned below that really makes me think 
that you're looking at a kernel bug is the comment about things not 
failing under GDB.  But if *any* of the programs that are failing fail 
under gdb, I'd want to know just what instruction is at the place where 
they're taking a SIGILL. If gdb heisenbergs things too much, then the 
basic brute force thing to do would be to instrument the kernel itself 
to report on what happened, and what it sees at the "bad instruction" 
address, using printk.  If the memory value actually looks like a legit 
instruction, it would confirm the hypothesis that you've got an icache 
maintenance problem.  I note that the Ingenic patch has a "flushcaches" 
routine that has hardwired assumptions about the cache organization.  
Could those be incorrect on the chip you're using?

          Regards, and happy hunting,

          Kevin K.

Nils Faerber wrote:
> Hello!
> I am rather playing than really working on a Ingenic JZ4730 based
> device. The JZ4730 is a MIPS32 SOC included in many types of devices,
> like media players and thelike but also in small power efficient
> subnotebooks (this is the device I am trying to support based on the
> Ingebic Linux kernel patch).
>
> The current kernel patch from Ingenic
>
> http://www.ingenic.cn/eng/productServ/App/JZ4730/pfCustomPage.aspx
> or
> ftp://ftp.ingenic.cn/3sw/01linux/02kernel/linux-2.6.24/linux-2.6.24.3-jz-20090218.patch.gz
>
> for the patch (I used an even older patch to start my board support but
> they basically only added newer CPU types in later patches).
>
> The support for my board is almost in place but I see from time to time
> failing applications with "illegal instruction" faults. Most shell
> applications work pretty fine, especially more complex GUI applications
> seem to fail, like a webbrowser or such.
> I also tested this with different GCC and glibc version which makes me
> pretty sure that I am seeing a kernel problem here rather than a
> userspace problem.
>
> I am pretty clueless how to debug this. Apropos debig as another hint:
> Some application work if I start them in GDB but fail outside.
>
> Any hint how to start debugging this would be greatly appreciated! And a
> fix would be like a dream ;)
>
> Many thanks!
>
> Cheers
>   nils faerber
>
>   

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Ingenic JZ4730 - illegal instruction
  2009-03-09  8:39 ` Kevin D. Kissell
@ 2009-03-09 10:00   ` Nils Faerber
  2009-03-09 14:12     ` Kevin D. Kissell
  0 siblings, 1 reply; 10+ messages in thread
From: Nils Faerber @ 2009-03-09 10:00 UTC (permalink / raw)
  To: Kevin D. Kissell; +Cc: linux-mips

Hi Kevin!

Kevin D. Kissell schrieb:
> The only thing that you've mentioned below that really makes me think
> that you're looking at a kernel bug is the comment about things not
> failing under GDB.  But if *any* of the programs that are failing fail
> under gdb, I'd want to know just what instruction is at the place where
> they're taking a SIGILL. If gdb heisenbergs things too much, then the
> basic brute force thing to do would be to instrument the kernel itself
> to report on what happened, and what it sees at the "bad instruction"
> address, using printk.  If the memory value actually looks like a legit
> instruction, it would confirm the hypothesis that you've got an icache
> maintenance problem.  I note that the Ingenic patch has a "flushcaches"
> routine that has hardwired assumptions about the cache organization. 
> Could those be incorrect on the chip you're using?

Thanks for having a thought about the issue!

By now I pitily have to admit that my GDB assumption was not all that
correct :( After *a*lot* more tries I found an application that actually
also fails inside GDB. But with some more tries I can now confirm that
applications fail at random points - it is not a single instruction that
causes the fault but rather random points.
So I think your memory/cache issue theory sounds pretty interesting...
I just had a look at the JZ4730 code (in arch/mips/jz4730/) and the only
 mention of a cache flush is in pm.c which will only be executed in case
of going to sleep (i.e. CPU deep sleep aka s2ram).
arch/mips/mm/c-r4k.c also contains a JZ_RISC section for setting up
cache options and arch/mips/mm/tlbex.c a TLB case special for the JZ.

Those look promising!
I could very well think of cases where a wrong cache flush could cause
such or similar problems.

>          Regards, and happy hunting,

Happy? When I found it maybe. The annoying thing about this is that
Ingenic is not very helpful. I emailed them several times already asking
for the full datasheet of the CPU with no replay at all yet. The
datasheet they hae on their webpage is just the brief with about 60
pages and not very helpful when you ar elooking for details like cache
handling etc.

So I will have to resort to experiments - trial an error.

Thank you very much for your thoughts and idea!

>          Kevin K.
Cheers
  nils faerber

-- 
kernel concepts GbR        Tel: +49-271-771091-12
Sieghuetter Hauptweg 48    Fax: +49-271-771091-19
D-57072 Siegen             Mob: +49-176-21024535
http://www.kernelconcepts.de

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Ingenic JZ4730 - illegal instruction
  2009-03-09 10:00   ` Nils Faerber
@ 2009-03-09 14:12     ` Kevin D. Kissell
  2009-03-09 15:05       ` Nils Faerber
  0 siblings, 1 reply; 10+ messages in thread
From: Kevin D. Kissell @ 2009-03-09 14:12 UTC (permalink / raw)
  To: Nils Faerber; +Cc: linux-mips

[-- Attachment #1: Type: text/plain, Size: 3832 bytes --]

I don't have time to go chasing this stuff any further on your behalf,
but it *does* smell to me like an icache management problem.  Remember,
MIPS processors almost universally have split I/D caches and no
coherence support between them, so if you either (a) forget to do an
explicit D-cache write-back operation after copying to a page mapped
write-back that's going to be used as instructions/text, or (b) forget
to do an explicit I-cache invalidate when you re-use a page for
instructions that has been previously used for a different instruction
page, you will have problems, even without going into DMA I/O coherence
issues.  If your problem were (b), though, you'd be seeing bad answers,
segmentation violations, bus errors, etc., at least as often as you'd be
seeing illegal instruction exceptions.  So my money would be on (a).

The need for cache management is so fundamental to Linux for MIPS that
all the necessary general hooks have been there for years.  If I were
you, I'd focus on the definitions of the primitives that you spotted in
c-r4k.c.  Does the stuff in the JZ_RISC section correspond to the
assembly language flush sequence done in the Ingenic patch to head.S? 
Are you sure that the JZ_RISC section is in fact the version of those
functions that's being built into your kernel?

          Regards,

          Kevin K.

Nils Faerber wrote:
> Hi Kevin!
>
> Kevin D. Kissell schrieb:
>   
>> The only thing that you've mentioned below that really makes me think
>> that you're looking at a kernel bug is the comment about things not
>> failing under GDB.  But if *any* of the programs that are failing fail
>> under gdb, I'd want to know just what instruction is at the place where
>> they're taking a SIGILL. If gdb heisenbergs things too much, then the
>> basic brute force thing to do would be to instrument the kernel itself
>> to report on what happened, and what it sees at the "bad instruction"
>> address, using printk.  If the memory value actually looks like a legit
>> instruction, it would confirm the hypothesis that you've got an icache
>> maintenance problem.  I note that the Ingenic patch has a "flushcaches"
>> routine that has hardwired assumptions about the cache organization. 
>> Could those be incorrect on the chip you're using?
>>     
>
> Thanks for having a thought about the issue!
>
> By now I pitily have to admit that my GDB assumption was not all that
> correct :( After *a*lot* more tries I found an application that actually
> also fails inside GDB. But with some more tries I can now confirm that
> applications fail at random points - it is not a single instruction that
> causes the fault but rather random points.
> So I think your memory/cache issue theory sounds pretty interesting...
> I just had a look at the JZ4730 code (in arch/mips/jz4730/) and the only
>  mention of a cache flush is in pm.c which will only be executed in case
> of going to sleep (i.e. CPU deep sleep aka s2ram).
> arch/mips/mm/c-r4k.c also contains a JZ_RISC section for setting up
> cache options and arch/mips/mm/tlbex.c a TLB case special for the JZ.
>
> Those look promising!
> I could very well think of cases where a wrong cache flush could cause
> such or similar problems.
>
>   
>>          Regards, and happy hunting,
>>     
>
> Happy? When I found it maybe. The annoying thing about this is that
> Ingenic is not very helpful. I emailed them several times already asking
> for the full datasheet of the CPU with no replay at all yet. The
> datasheet they hae on their webpage is just the brief with about 60
> pages and not very helpful when you ar elooking for details like cache
> handling etc.
>
> So I will have to resort to experiments - trial an error.
>
> Thank you very much for your thoughts and idea!
>
>   
>>          Kevin K.
>>     
> Cheers
>   nils faerber
>
>   

[-- Attachment #2: Type: text/html, Size: 4360 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Ingenic JZ4730 - illegal instruction
  2009-03-09 14:12     ` Kevin D. Kissell
@ 2009-03-09 15:05       ` Nils Faerber
  2009-03-09 15:45         ` Kevin D. Kissell
  0 siblings, 1 reply; 10+ messages in thread
From: Nils Faerber @ 2009-03-09 15:05 UTC (permalink / raw)
  To: Kevin D. Kissell; +Cc: linux-mips

Kevin D. Kissell schrieb:
> I don't have time to go chasing this stuff any further on your behalf,

I do not expect that either ;)
I am already quite happy that you shared your experience with me - it
already helped me a lot to fid some points in the code that could be the
culprit and I can dig further from here.

> but it *does* smell to me like an icache management problem.  Remember,
> MIPS processors almost universally have split I/D caches and no
> coherence support between them, so if you either (a) forget to do an
> explicit D-cache write-back operation after copying to a page mapped
> write-back that's going to be used as instructions/text, or (b) forget
> to do an explicit I-cache invalidate when you re-use a page for
> instructions that has been previously used for a different instruction
> page, you will have problems, even without going into DMA I/O coherence
> issues.  If your problem were (b), though, you'd be seeing bad answers,
> segmentation violations, bus errors, etc., at least as often as you'd be
> seeing illegal instruction exceptions.  So my money would be on (a).

Yes, it is only illegal instructions, no other faults.

> The need for cache management is so fundamental to Linux for MIPS that
> all the necessary general hooks have been there for years.  If I were
> you, I'd focus on the definitions of the primitives that you spotted in
> c-r4k.c.  Does the stuff in the JZ_RISC section correspond to the

OK.

> assembly language flush sequence done in the Ingenic patch to head.S? 
> Are you sure that the JZ_RISC section is in fact the version of those
> functions that's being built into your kernel?

Well, there is CONFIG_JZRISC=y in the kernel .config and a
switch(current_cpu_type) { case CPU_JZRISC: ...} so I would assume it is
being used. But I will verify that the CONFIG_JZRISC=y is correctly
translated into a current_cpu_type.

Oh, one last question, in order to rule out the cache as bug-spot would
the kernel option "run uncached" "solve" the issue (and be darn slow)?

>           Regards,
>           Kevin K.
Thanks a lot so far!
It helped me a great deal to start to understand what is going on here...

Cheers
  nils faerber

-- 
kernel concepts GbR        Tel: +49-271-771091-12
Sieghuetter Hauptweg 48    Fax: +49-271-771091-19
D-57072 Siegen             Mob: +49-176-21024535
http://www.kernelconcepts.de

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Ingenic JZ4730 - illegal instruction
  2009-03-09 15:05       ` Nils Faerber
@ 2009-03-09 15:45         ` Kevin D. Kissell
  2009-03-09 16:26           ` Nils Faerber
  0 siblings, 1 reply; 10+ messages in thread
From: Kevin D. Kissell @ 2009-03-09 15:45 UTC (permalink / raw)
  To: Nils Faerber; +Cc: linux-mips

[-- Attachment #1: Type: text/plain, Size: 1448 bytes --]

Nils Faerber wrote:
> Kevin D. Kissell schrieb:
>   
>> Are you sure that the JZ_RISC section is in fact the version of those
>> functions that's being built into your kernel?
>>     
>
> Well, there is CONFIG_JZRISC=y in the kernel .config and a
> switch(current_cpu_type) { case CPU_JZRISC: ...} so I would assume it is
> being used. But I will verify that the CONFIG_JZRISC=y is correctly
> translated into a current_cpu_type.
>   
Your assumption is reasonable.  But given that things aren't working, 
yes, it's good to verify.
> Oh, one last question, in order to rule out the cache as bug-spot would
> the kernel option "run uncached" "solve" the issue (and be darn slow)?
>   
It would certainly solve the issue, and would *probably* result in a 
system that would be fully functional but slow.  Very high end and very 
low end systems can be rendered unusable by forcing uncached operation, 
but it's certainly worth a try.  Also, if your cache control logic 
supports both write-back and write-through operation, if you set the 
default cache "attribute" for kernel and page tables (which is 
essentially what you're doing under-the-hood when you configure for 
uncached operation) to write-through, that should cure the problems with 
copying text pages, but *not* those with re-using them, with less 
performance impact.  I'd be a little surprised if the Ingenic part 
offered both modes, though.

          Regards,

          Kevin K.

[-- Attachment #2: Type: text/html, Size: 1986 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Ingenic JZ4730 - illegal instruction
  2009-03-09 15:45         ` Kevin D. Kissell
@ 2009-03-09 16:26           ` Nils Faerber
  0 siblings, 0 replies; 10+ messages in thread
From: Nils Faerber @ 2009-03-09 16:26 UTC (permalink / raw)
  To: linux-mips

Kevin D. Kissell schrieb:
> Nils Faerber wrote:
>> Kevin D. Kissell schrieb:
>>> Are you sure that the JZ_RISC section is in fact the version of those
>>> functions that's being built into your kernel?
>> Well, there is CONFIG_JZRISC=y in the kernel .config and a
>> switch(current_cpu_type) { case CPU_JZRISC: ...} so I would assume it is
>> being used. But I will verify that the CONFIG_JZRISC=y is correctly
>> translated into a current_cpu_type.
> Your assumption is reasonable.  But given that things aren't working,
> yes, it's good to verify.

It should be proper - it is as I can see set by cpu_probe.

>> Oh, one last question, in order to rule out the cache as bug-spot would
>> the kernel option "run uncached" "solve" the issue (and be darn slow)?
> It would certainly solve the issue, and would *probably* result in a
> system that would be fully functional but slow.  Very high end and very

It is *very* slow - you can almost watch every single instruction ;)

> low end systems can be rendered unusable by forcing uncached operation,
> but it's certainly worth a try.  Also, if your cache control logic

It seems to run - I am still stuck at the GUI login screen, everything
is so darn slow now. Testing for he fault could take ages now, a game of
patience it seems ;)

> supports both write-back and write-through operation, if you set the
> default cache "attribute" for kernel and page tables (which is
> essentially what you're doing under-the-hood when you configure for
> uncached operation) to write-through, that should cure the problems with
> copying text pages, but *not* those with re-using them, with less
> performance impact.  I'd be a little surprised if the Ingenic part
> offered both modes, though.

The really bad thing is that I do not have the full datasheet to the CPU
so I basically have no idea what this thing really supports or not. So I
can only try and test. Luckily this is just a toy project and not a
commecial contract work (which I would not have accepted without proper
documentation).

PS: Login is done now and I suddenly see apps initialisiing that
obviously silently failed before - so I am pretty sure now that uncached
does work, which means that the cache handling has the bug I am looking for.

>           Regards,
>           Kevin K.
Cheers
  nils faerber

-- 
kernel concepts GbR        Tel: +49-271-771091-12
Sieghuetter Hauptweg 48    Fax: +49-271-771091-19
D-57072 Siegen             Mob: +49-176-21024535
http://www.kernelconcepts.de

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Ingenic JZ4730 - illegal instruction
  2009-03-08 16:03   ` ard
@ 2009-03-10 17:12     ` Markus Gothe
  0 siblings, 0 replies; 10+ messages in thread
From: Markus Gothe @ 2009-03-10 17:12 UTC (permalink / raw)
  To: ard; +Cc: linux-mips

[-- Attachment #1: Type: text/plain, Size: 1052 bytes --]

Afaik it's running Xiptech uCLinux/Stuff.
When I dug into the Xburst-subarch it seemed to miss lots of stuff  
that you'd expect in a MIPS Corp. CPU core.

//Markus

On 8 Mar 2009, at 17:03, ard wrote:

> Hello,
>
> On Sun, Mar 08, 2009 at 03:53:49PM +0100, Markus Gothe wrote:
>> Well, the Xburst-arch seems to be pretty fucked up beyond all repair.
>
> Hmmm, that doesn't sound promising. But do you have references
> for that?
> Maybe google has some more info than since I first started
> searching.
> Anyway: for now it happily runs debian, and for what I can see,
> the kernel patches have no real changes except for extra drivers
> and extra board and powermanagement drivers.
> So if you have hints in which way the xburst deviates from
> "standard" mips, it could help us a lot.
> In the mean time I am going to subscribe to the ingenic forum
> ( http://www.ingenic.cn/eng/forum/vvFrmDefault.aspx )
> that contains more characters that I can't read than characters
> that I can read :-(.
>
> Regards,
> Ard
>
>
> -- 
> .signature not found
>


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 194 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-03-10 17:12 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-06 16:36 Ingenic JZ4730 - illegal instruction Nils Faerber
2009-03-08 14:53 ` Markus Gothe
2009-03-08 16:03   ` ard
2009-03-10 17:12     ` Markus Gothe
2009-03-09  8:39 ` Kevin D. Kissell
2009-03-09 10:00   ` Nils Faerber
2009-03-09 14:12     ` Kevin D. Kissell
2009-03-09 15:05       ` Nils Faerber
2009-03-09 15:45         ` Kevin D. Kissell
2009-03-09 16:26           ` Nils Faerber

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.