All of lore.kernel.org
 help / color / mirror / Atom feed
* Bug in v7_coherent_kern_range() ?
@ 2012-04-01  3:21 Huang Shijie
  2012-04-01  6:10 ` Dirk Behme
  2012-04-02 11:12 ` Will Deacon
  0 siblings, 2 replies; 20+ messages in thread
From: Huang Shijie @ 2012-04-01  3:21 UTC (permalink / raw)
  To: linux-arm-kernel

[1] Platform:
freescale's IMX6Q(4 cores) , ARM CORTEX-A9

[2] kernel:
3.0.15(I have cherry-picked many patches, and the arch/arm/mm/cache-v7.S
is same code with the latest kernel v3.4-rc1)
enable SMP, VIPT,

[3] application:
I use our our application which will clone many threads,
two threads (assume as A and B) may do the same thing at the same time
as the following code:

In most of the time, it's ok.
But in some unknown situation, cacheflush() failed and one threads
(assume A) may hung up in the following code:
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

open("/usr/lib/lib_mp3_dec_arm12_elinux.so.2.10.0", O_RDONLY) = 8
read(8,
"\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\20\35\0\0004\0\0\0"...,
512) = 512
fstat64(8, {st_mode=S_IFREG|0644, st_size=56232, ...}) = 0
mmap2(NULL, 88032, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 8, 0)
= 0x2ff0a000
mprotect(0x2ff18000, 28672, PROT_NONE) = 0
mmap2(0x2ff1f000, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 8, 0xd) = 0x2ff1f000
close(8) = 0
mprotect(0x2ff0a000, 57344, PROT_READ|PROT_WRITE) = 0
mprotect(0x2ff0a000, 57344, PROT_READ|PROT_EXEC) = 0
cacheflush(0x2ff0a000, 0x2ff18000, 0, 0x6, 0x2cd03420) = 0 // System
hung up here!!!
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


[4] kernel log
I use "echo t > /proc/sysrq-trigger" to show the tasks's information:
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

multiqueue0:src D 804cd678 0 7328 5963 0x00000001
[<804cd678>] (__schedule+0x228/0x760) from [<804d0564>]
(__down_read+0xa8/0xe0)
[<804d0564>] (__down_read+0xa8/0xe0) from [<800478c4>]
(do_page_fault+0xbc/0x480)
[<800478c4>] (do_page_fault+0xbc/0x480) from [<8003841c>]
(do_DataAbort+0x34/0x98)
[<8003841c>] (do_DataAbort+0x34/0x98) from [<8003df10>]
(__dabt_svc+0x70/0xa0)
Exception stack(0xbae37ea8 to 0xbae37ef0)
7ea0: 31e05000 31e1d000 00000020 0000001f 31e05000 31e1d000
7ec0: bfac86b8 31e05000 31e1d000 bae36000 08100075 31e056fc 31e08000
bae37ef0
7ee0: 800424a8 8004a1fc 800f0013 ffffffff
[<8003df10>] (__dabt_svc+0x70/0xa0) from [<8004a1fc>]
(v7_coherent_kern_range+0x20/0x80)
[<8004a1fc>] (v7_coherent_kern_range+0x20/0x80) from [<800424a8>]
(arm_syscall+0x2a0/0x2c4)
[<800424a8>] (arm_syscall+0x2a0/0x2c4) from [<8003e500>]
(ret_fast_syscall+0x0/0x3c)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


The do_cache_op() has already held the mm->mmap_sem, but
v7_coherent_kern_range()
cause one page fault during it flush the cache. deadlock! So it hung up
in the do_page_fault().

[5] questions:
Why the v7_coherent_kern_range() can caused the data abort?
Is there something wrong about the v7_coherent_kern_range()?


thanks
Huang Shijie

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Bug in v7_coherent_kern_range() ?
  2012-04-01  3:21 Bug in v7_coherent_kern_range() ? Huang Shijie
@ 2012-04-01  6:10 ` Dirk Behme
  2012-04-01  7:09   ` Huang Shijie
  2012-04-02 11:12 ` Will Deacon
  1 sibling, 1 reply; 20+ messages in thread
From: Dirk Behme @ 2012-04-01  6:10 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Huang Shijie,

On 01.04.2012 05:21, Huang Shijie wrote:
> [1] Platform:
> freescale's IMX6Q(4 cores) , ARM CORTEX-A9
>
> [2] kernel:
> 3.0.15(I have cherry-picked many patches, and the arch/arm/mm/cache-v7.S
> is same code with the latest kernel v3.4-rc1)
> enable SMP, VIPT,

Could you try an unpatched, clean v3.4-rc1 instead?

What's about your 2.6.38?

What's about 3.0.26? 3.0.15 seems to miss some maybe relevant patches.

> [3] application:

Could you share a (simple) test case?

Best regards

Dirk

> I use our our application which will clone many threads,
> two threads (assume as A and B) may do the same thing at the same time
> as the following code:
>
> In most of the time, it's ok.
> But in some unknown situation, cacheflush() failed and one threads
> (assume A) may hung up in the following code:
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
> open("/usr/lib/lib_mp3_dec_arm12_elinux.so.2.10.0", O_RDONLY) = 8
> read(8,
> "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\20\35\0\0004\0\0\0"...,
> 512) = 512
> fstat64(8, {st_mode=S_IFREG|0644, st_size=56232, ...}) = 0
> mmap2(NULL, 88032, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 8, 0)
> = 0x2ff0a000
> mprotect(0x2ff18000, 28672, PROT_NONE) = 0
> mmap2(0x2ff1f000, 4096, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 8, 0xd) = 0x2ff1f000
> close(8) = 0
> mprotect(0x2ff0a000, 57344, PROT_READ|PROT_WRITE) = 0
> mprotect(0x2ff0a000, 57344, PROT_READ|PROT_EXEC) = 0
> cacheflush(0x2ff0a000, 0x2ff18000, 0, 0x6, 0x2cd03420) = 0 // System
> hung up here!!!
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
> [4] kernel log
> I use "echo t>  /proc/sysrq-trigger" to show the tasks's information:
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
> multiqueue0:src D 804cd678 0 7328 5963 0x00000001
> [<804cd678>] (__schedule+0x228/0x760) from [<804d0564>]
> (__down_read+0xa8/0xe0)
> [<804d0564>] (__down_read+0xa8/0xe0) from [<800478c4>]
> (do_page_fault+0xbc/0x480)
> [<800478c4>] (do_page_fault+0xbc/0x480) from [<8003841c>]
> (do_DataAbort+0x34/0x98)
> [<8003841c>] (do_DataAbort+0x34/0x98) from [<8003df10>]
> (__dabt_svc+0x70/0xa0)
> Exception stack(0xbae37ea8 to 0xbae37ef0)
> 7ea0: 31e05000 31e1d000 00000020 0000001f 31e05000 31e1d000
> 7ec0: bfac86b8 31e05000 31e1d000 bae36000 08100075 31e056fc 31e08000
> bae37ef0
> 7ee0: 800424a8 8004a1fc 800f0013 ffffffff
> [<8003df10>] (__dabt_svc+0x70/0xa0) from [<8004a1fc>]
> (v7_coherent_kern_range+0x20/0x80)
> [<8004a1fc>] (v7_coherent_kern_range+0x20/0x80) from [<800424a8>]
> (arm_syscall+0x2a0/0x2c4)
> [<800424a8>] (arm_syscall+0x2a0/0x2c4) from [<8003e500>]
> (ret_fast_syscall+0x0/0x3c)
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
> The do_cache_op() has already held the mm->mmap_sem, but
> v7_coherent_kern_range()
> cause one page fault during it flush the cache. deadlock! So it hung up
> in the do_page_fault().
>
> [5] questions:
> Why the v7_coherent_kern_range() can caused the data abort?
> Is there something wrong about the v7_coherent_kern_range()?
>
>
> thanks
> Huang Shijie
>
>
>
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Bug in v7_coherent_kern_range() ?
  2012-04-01  6:10 ` Dirk Behme
@ 2012-04-01  7:09   ` Huang Shijie
  2012-04-01  8:01     ` Dirk Behme
  2012-04-01  8:57     ` Dirk Behme
  0 siblings, 2 replies; 20+ messages in thread
From: Huang Shijie @ 2012-04-01  7:09 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Dirk:
> Hi Huang Shijie,
>
> On 01.04.2012 05:21, Huang Shijie wrote:
>> [1] Platform:
>> freescale's IMX6Q(4 cores) , ARM CORTEX-A9
>>
>> [2] kernel:
>> 3.0.15(I have cherry-picked many patches, and the arch/arm/mm/cache-v7.S
>> is same code with the latest kernel v3.4-rc1)
>> enable SMP, VIPT,
>
> Could you try an unpatched, clean v3.4-rc1 instead?
Sorry, I could not try the v3.4-rc1. Some our bsp drivers are not DT 
supported.


>
> What's about your 2.6.38?
2.6.38 is not a good version to run the imx6q. It losts many our 
drivers's patches.
>
> What's about 3.0.26? 3.0.15 seems to miss some maybe relevant patches.
>
Our bsp release are based on 3.0.15.  so we could not test it on 3.0.26 too.

>> [3] application:
>
> Could you share a (simple) test case?
The test case is like this:
   #gplay xx.avi

gplay is our own player, such as mplayer.
I just created a script which will play the video files one by one.

BR
Huang Shijie

>
> Best regards
>
> Dirk
>
>> I use our our application which will clone many threads,
>> two threads (assume as A and B) may do the same thing at the same time
>> as the following code:
>>
>> In most of the time, it's ok.
>> But in some unknown situation, cacheflush() failed and one threads
>> (assume A) may hung up in the following code:
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
>>
>>
>> open("/usr/lib/lib_mp3_dec_arm12_elinux.so.2.10.0", O_RDONLY) = 8
>> read(8,
>> "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\20\35\0\0004\0\0\0"...,
>> 512) = 512
>> fstat64(8, {st_mode=S_IFREG|0644, st_size=56232, ...}) = 0
>> mmap2(NULL, 88032, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 8, 0)
>> = 0x2ff0a000
>> mprotect(0x2ff18000, 28672, PROT_NONE) = 0
>> mmap2(0x2ff1f000, 4096, PROT_READ|PROT_WRITE,
>> MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 8, 0xd) = 0x2ff1f000
>> close(8) = 0
>> mprotect(0x2ff0a000, 57344, PROT_READ|PROT_WRITE) = 0
>> mprotect(0x2ff0a000, 57344, PROT_READ|PROT_EXEC) = 0
>> cacheflush(0x2ff0a000, 0x2ff18000, 0, 0x6, 0x2cd03420) = 0 // System
>> hung up here!!!
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
>>
>>
>>
>> [4] kernel log
>> I use "echo t>  /proc/sysrq-trigger" to show the tasks's information:
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
>>
>>
>> multiqueue0:src D 804cd678 0 7328 5963 0x00000001
>> [<804cd678>] (__schedule+0x228/0x760) from [<804d0564>]
>> (__down_read+0xa8/0xe0)
>> [<804d0564>] (__down_read+0xa8/0xe0) from [<800478c4>]
>> (do_page_fault+0xbc/0x480)
>> [<800478c4>] (do_page_fault+0xbc/0x480) from [<8003841c>]
>> (do_DataAbort+0x34/0x98)
>> [<8003841c>] (do_DataAbort+0x34/0x98) from [<8003df10>]
>> (__dabt_svc+0x70/0xa0)
>> Exception stack(0xbae37ea8 to 0xbae37ef0)
>> 7ea0: 31e05000 31e1d000 00000020 0000001f 31e05000 31e1d000
>> 7ec0: bfac86b8 31e05000 31e1d000 bae36000 08100075 31e056fc 31e08000
>> bae37ef0
>> 7ee0: 800424a8 8004a1fc 800f0013 ffffffff
>> [<8003df10>] (__dabt_svc+0x70/0xa0) from [<8004a1fc>]
>> (v7_coherent_kern_range+0x20/0x80)
>> [<8004a1fc>] (v7_coherent_kern_range+0x20/0x80) from [<800424a8>]
>> (arm_syscall+0x2a0/0x2c4)
>> [<800424a8>] (arm_syscall+0x2a0/0x2c4) from [<8003e500>]
>> (ret_fast_syscall+0x0/0x3c)
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
>>
>>
>>
>> The do_cache_op() has already held the mm->mmap_sem, but
>> v7_coherent_kern_range()
>> cause one page fault during it flush the cache. deadlock! So it hung up
>> in the do_page_fault().
>>
>> [5] questions:
>> Why the v7_coherent_kern_range() can caused the data abort?
>> Is there something wrong about the v7_coherent_kern_range()?
>>
>>
>> thanks
>> Huang Shijie
>>
>>
>>
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel at lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>
>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Bug in v7_coherent_kern_range() ?
  2012-04-01  7:09   ` Huang Shijie
@ 2012-04-01  8:01     ` Dirk Behme
  2012-04-01  8:16       ` Huang Shijie
  2012-04-01  8:57     ` Dirk Behme
  1 sibling, 1 reply; 20+ messages in thread
From: Dirk Behme @ 2012-04-01  8:01 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Huang Shijie,

On 01.04.2012 09:09, Huang Shijie wrote:
> Hi Dirk:
>> Hi Huang Shijie,
>>
>> On 01.04.2012 05:21, Huang Shijie wrote:
>>> [1] Platform:
>>> freescale's IMX6Q(4 cores) , ARM CORTEX-A9
>>>
>>> [2] kernel:
>>> 3.0.15(I have cherry-picked many patches, and the
>>> arch/arm/mm/cache-v7.S
>>> is same code with the latest kernel v3.4-rc1)
>>> enable SMP, VIPT,
>>
>> Could you try an unpatched, clean v3.4-rc1 instead?
> Sorry, I could not try the v3.4-rc1. Some our bsp drivers are not DT
> supported.

I think we are not talking about drivers, we are talking about some 
kernel core code, like cache handling? To test 
v7_coherent_kern_range() you might not need to many bsp drivers?

>> What's about your 2.6.38?
> 2.6.38 is not a good version to run the imx6q. It losts many our
> drivers's patches.
>>
>> What's about 3.0.26? 3.0.15 seems to miss some maybe relevant patches.
>>
> Our bsp release are based on 3.0.15. so we could not test it on 3.0.26
> too.

You can. Just give git rebase a try.

>>> [3] application:
>>
>> Could you share a (simple) test case?
> The test case is like this:
> #gplay xx.avi
>
> gplay is our own player, such as mplayer.

Could you share a (simple) test case? E.g. share 'gplay'? Or try to 
reproduce your issue with an other test case? E.g. mplayer? Or better 
anything simpler the community can use to try to reproduce your issue?

Best regards

Dirk

> I just created a script which will play the video files one by one.
>
> BR
> Huang Shijie
>
>>
>> Best regards
>>
>> Dirk
>>
>>> I use our our application which will clone many threads,
>>> two threads (assume as A and B) may do the same thing at the same time
>>> as the following code:
>>>
>>> In most of the time, it's ok.
>>> But in some unknown situation, cacheflush() failed and one threads
>>> (assume A) may hung up in the following code:
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>> open("/usr/lib/lib_mp3_dec_arm12_elinux.so.2.10.0", O_RDONLY) = 8
>>> read(8,
>>> "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\20\35\0\0004\0\0\0"...,
>>>
>>> 512) = 512
>>> fstat64(8, {st_mode=S_IFREG|0644, st_size=56232, ...}) = 0
>>> mmap2(NULL, 88032, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
>>> 8, 0)
>>> = 0x2ff0a000
>>> mprotect(0x2ff18000, 28672, PROT_NONE) = 0
>>> mmap2(0x2ff1f000, 4096, PROT_READ|PROT_WRITE,
>>> MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 8, 0xd) = 0x2ff1f000
>>> close(8) = 0
>>> mprotect(0x2ff0a000, 57344, PROT_READ|PROT_WRITE) = 0
>>> mprotect(0x2ff0a000, 57344, PROT_READ|PROT_EXEC) = 0
>>> cacheflush(0x2ff0a000, 0x2ff18000, 0, 0x6, 0x2cd03420) = 0 // System
>>> hung up here!!!
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>>
>>> [4] kernel log
>>> I use "echo t> /proc/sysrq-trigger" to show the tasks's information:
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>> multiqueue0:src D 804cd678 0 7328 5963 0x00000001
>>> [<804cd678>] (__schedule+0x228/0x760) from [<804d0564>]
>>> (__down_read+0xa8/0xe0)
>>> [<804d0564>] (__down_read+0xa8/0xe0) from [<800478c4>]
>>> (do_page_fault+0xbc/0x480)
>>> [<800478c4>] (do_page_fault+0xbc/0x480) from [<8003841c>]
>>> (do_DataAbort+0x34/0x98)
>>> [<8003841c>] (do_DataAbort+0x34/0x98) from [<8003df10>]
>>> (__dabt_svc+0x70/0xa0)
>>> Exception stack(0xbae37ea8 to 0xbae37ef0)
>>> 7ea0: 31e05000 31e1d000 00000020 0000001f 31e05000 31e1d000
>>> 7ec0: bfac86b8 31e05000 31e1d000 bae36000 08100075 31e056fc 31e08000
>>> bae37ef0
>>> 7ee0: 800424a8 8004a1fc 800f0013 ffffffff
>>> [<8003df10>] (__dabt_svc+0x70/0xa0) from [<8004a1fc>]
>>> (v7_coherent_kern_range+0x20/0x80)
>>> [<8004a1fc>] (v7_coherent_kern_range+0x20/0x80) from [<800424a8>]
>>> (arm_syscall+0x2a0/0x2c4)
>>> [<800424a8>] (arm_syscall+0x2a0/0x2c4) from [<8003e500>]
>>> (ret_fast_syscall+0x0/0x3c)
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>>
>>> The do_cache_op() has already held the mm->mmap_sem, but
>>> v7_coherent_kern_range()
>>> cause one page fault during it flush the cache. deadlock! So it
>>> hung up
>>> in the do_page_fault().
>>>
>>> [5] questions:
>>> Why the v7_coherent_kern_range() can caused the data abort?
>>> Is there something wrong about the v7_coherent_kern_range()?
>>>
>>>
>>> thanks
>>> Huang Shijie
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> linux-arm-kernel mailing list
>>> linux-arm-kernel at lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>>
>>
>>
>
>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Bug in v7_coherent_kern_range() ?
  2012-04-01  8:01     ` Dirk Behme
@ 2012-04-01  8:16       ` Huang Shijie
  2012-04-01  8:50         ` Dirk Behme
  0 siblings, 1 reply; 20+ messages in thread
From: Huang Shijie @ 2012-04-01  8:16 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Dirk:
> Hi Huang Shijie,
>
> On 01.04.2012 09:09, Huang Shijie wrote:
>> Hi Dirk:
>>> Hi Huang Shijie,
>>>
>>> On 01.04.2012 05:21, Huang Shijie wrote:
>>>> [1] Platform:
>>>> freescale's IMX6Q(4 cores) , ARM CORTEX-A9
>>>>
>>>> [2] kernel:
>>>> 3.0.15(I have cherry-picked many patches, and the
>>>> arch/arm/mm/cache-v7.S
>>>> is same code with the latest kernel v3.4-rc1)
>>>> enable SMP, VIPT,
>>>
>>> Could you try an unpatched, clean v3.4-rc1 instead?
>> Sorry, I could not try the v3.4-rc1. Some our bsp drivers are not DT
>> supported.
>
> I think we are not talking about drivers, we are talking about some 
> kernel core code, like cache handling? To test 
> v7_coherent_kern_range() you might not need to many bsp drivers?
Yes , the gplay will use the vpu driver. But the VPU driver is not in 
the kernel. Without the vpu driver, the gplay can not works.
>
>>> What's about your 2.6.38?
>> 2.6.38 is not a good version to run the imx6q. It losts many our
>> drivers's patches.
>>>
>>> What's about 3.0.26? 3.0.15 seems to miss some maybe relevant patches.
>>>
>> Our bsp release are based on 3.0.15. so we could not test it on 3.0.26
>> too.
>
> You can. Just give git rebase a try.
It will be a nightmare to me.  We have nearly 1000 patches. I will cost 
me much time to handle the conflicts.

>
>>>> [3] application:
>>>
>>> Could you share a (simple) test case?
>> The test case is like this:
>> #gplay xx.avi
>>
>> gplay is our own player, such as mplayer.
>
> Could you share a (simple) test case? E.g. share 'gplay'? Or try to 
> reproduce your issue with an other test case? E.g. mplayer? Or better 
> anything simpler the community can use to try to reproduce your issue?
I can email to you the gplay, if you have an imx6q board. you can test it.
I just wish someone give me some advice about this issue.

I find the arch/arm/include/asm/assembler.h is out of date. So I will 
update it and test it again.

thanks a lot , Dirk.

Huang Shijie
>
> Best regards
>
> Dirk
>
>> I just created a script which will play the video files one by one.
>>
>> BR
>> Huang Shijie
>>
>>>
>>> Best regards
>>>
>>> Dirk
>>>
>>>> I use our our application which will clone many threads,
>>>> two threads (assume as A and B) may do the same thing at the same time
>>>> as the following code:
>>>>
>>>> In most of the time, it's ok.
>>>> But in some unknown situation, cacheflush() failed and one threads
>>>> (assume A) may hung up in the following code:
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
>>>>
>>>>
>>>>
>>>> open("/usr/lib/lib_mp3_dec_arm12_elinux.so.2.10.0", O_RDONLY) = 8
>>>> read(8,
>>>> "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\20\35\0\0004\0\0\0"..., 
>>>>
>>>>
>>>> 512) = 512
>>>> fstat64(8, {st_mode=S_IFREG|0644, st_size=56232, ...}) = 0
>>>> mmap2(NULL, 88032, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
>>>> 8, 0)
>>>> = 0x2ff0a000
>>>> mprotect(0x2ff18000, 28672, PROT_NONE) = 0
>>>> mmap2(0x2ff1f000, 4096, PROT_READ|PROT_WRITE,
>>>> MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 8, 0xd) = 0x2ff1f000
>>>> close(8) = 0
>>>> mprotect(0x2ff0a000, 57344, PROT_READ|PROT_WRITE) = 0
>>>> mprotect(0x2ff0a000, 57344, PROT_READ|PROT_EXEC) = 0
>>>> cacheflush(0x2ff0a000, 0x2ff18000, 0, 0x6, 0x2cd03420) = 0 // System
>>>> hung up here!!!
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
>>>>
>>>>
>>>>
>>>>
>>>> [4] kernel log
>>>> I use "echo t> /proc/sysrq-trigger" to show the tasks's information:
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
>>>>
>>>>
>>>>
>>>> multiqueue0:src D 804cd678 0 7328 5963 0x00000001
>>>> [<804cd678>] (__schedule+0x228/0x760) from [<804d0564>]
>>>> (__down_read+0xa8/0xe0)
>>>> [<804d0564>] (__down_read+0xa8/0xe0) from [<800478c4>]
>>>> (do_page_fault+0xbc/0x480)
>>>> [<800478c4>] (do_page_fault+0xbc/0x480) from [<8003841c>]
>>>> (do_DataAbort+0x34/0x98)
>>>> [<8003841c>] (do_DataAbort+0x34/0x98) from [<8003df10>]
>>>> (__dabt_svc+0x70/0xa0)
>>>> Exception stack(0xbae37ea8 to 0xbae37ef0)
>>>> 7ea0: 31e05000 31e1d000 00000020 0000001f 31e05000 31e1d000
>>>> 7ec0: bfac86b8 31e05000 31e1d000 bae36000 08100075 31e056fc 31e08000
>>>> bae37ef0
>>>> 7ee0: 800424a8 8004a1fc 800f0013 ffffffff
>>>> [<8003df10>] (__dabt_svc+0x70/0xa0) from [<8004a1fc>]
>>>> (v7_coherent_kern_range+0x20/0x80)
>>>> [<8004a1fc>] (v7_coherent_kern_range+0x20/0x80) from [<800424a8>]
>>>> (arm_syscall+0x2a0/0x2c4)
>>>> [<800424a8>] (arm_syscall+0x2a0/0x2c4) from [<8003e500>]
>>>> (ret_fast_syscall+0x0/0x3c)
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
>>>>
>>>>
>>>>
>>>>
>>>> The do_cache_op() has already held the mm->mmap_sem, but
>>>> v7_coherent_kern_range()
>>>> cause one page fault during it flush the cache. deadlock! So it
>>>> hung up
>>>> in the do_page_fault().
>>>>
>>>> [5] questions:
>>>> Why the v7_coherent_kern_range() can caused the data abort?
>>>> Is there something wrong about the v7_coherent_kern_range()?
>>>>
>>>>
>>>> thanks
>>>> Huang Shijie
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> linux-arm-kernel mailing list
>>>> linux-arm-kernel at lists.infradead.org
>>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>>>
>>>
>>>
>>
>>
>>
>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Bug in v7_coherent_kern_range() ?
  2012-04-01  8:16       ` Huang Shijie
@ 2012-04-01  8:50         ` Dirk Behme
  2012-04-01  9:14           ` Huang Shijie
  0 siblings, 1 reply; 20+ messages in thread
From: Dirk Behme @ 2012-04-01  8:50 UTC (permalink / raw)
  To: linux-arm-kernel

On 01.04.2012 10:16, Huang Shijie wrote:
> Hi Dirk:
>> Hi Huang Shijie,
>>
>> On 01.04.2012 09:09, Huang Shijie wrote:
>>> Hi Dirk:
>>>> Hi Huang Shijie,
>>>>
>>>> On 01.04.2012 05:21, Huang Shijie wrote:
>>>>> [1] Platform:
>>>>> freescale's IMX6Q(4 cores) , ARM CORTEX-A9
>>>>>
>>>>> [2] kernel:
>>>>> 3.0.15(I have cherry-picked many patches, and the
>>>>> arch/arm/mm/cache-v7.S
>>>>> is same code with the latest kernel v3.4-rc1)
>>>>> enable SMP, VIPT,
>>>>
>>>> Could you try an unpatched, clean v3.4-rc1 instead?
>>> Sorry, I could not try the v3.4-rc1. Some our bsp drivers are not DT
>>> supported.
>>
>> I think we are not talking about drivers, we are talking about some
>> kernel core code, like cache handling? To test
>> v7_coherent_kern_range() you might not need to many bsp drivers?
> Yes , the gplay will use the vpu driver. But the VPU driver is not in
> the kernel. Without the vpu driver, the gplay can not works.

You could try to disable the vpu driver and check if the issue is 
still there, then.

>>>> What's about your 2.6.38?
>>> 2.6.38 is not a good version to run the imx6q. It losts many our
>>> drivers's patches.
>>>>
>>>> What's about 3.0.26? 3.0.15 seems to miss some maybe relevant
>>>> patches.
>>>>
>>> Our bsp release are based on 3.0.15. so we could not test it on 3.0.26
>>> too.
>>
>> You can. Just give git rebase a try.
> It will be a nightmare to me. We have nearly 1000 patches. I will cost
> me much time to handle the conflicts.

IMHO you will get one easy to solve merge conflict. So it should you 
take < 10min to rebase to 3.0.26. Just try it ;)

>>
>>>>> [3] application:
>>>>
>>>> Could you share a (simple) test case?
>>> The test case is like this:
>>> #gplay xx.avi
>>>
>>> gplay is our own player, such as mplayer.
>>
>> Could you share a (simple) test case? E.g. share 'gplay'? Or try to
>> reproduce your issue with an other test case? E.g. mplayer? Or
>> better anything simpler the community can use to try to reproduce
>> your issue?
> I can email to you the gplay, if you have an imx6q board. you can test
> it.
> I just wish someone give me some advice about this issue.

It would help to use a kernel version and a test case the community 
can use to reproduce.

Best regards

Dirk

> I find the arch/arm/include/asm/assembler.h is out of date. So I will
> update it and test it again.
>
> thanks a lot , Dirk.
>
> Huang Shijie
>>
>> Best regards
>>
>> Dirk
>>
>>> I just created a script which will play the video files one by one.
>>>
>>> BR
>>> Huang Shijie
>>>
>>>>
>>>> Best regards
>>>>
>>>> Dirk
>>>>
>>>>> I use our our application which will clone many threads,
>>>>> two threads (assume as A and B) may do the same thing at the same
>>>>> time
>>>>> as the following code:
>>>>>
>>>>> In most of the time, it's ok.
>>>>> But in some unknown situation, cacheflush() failed and one threads
>>>>> (assume A) may hung up in the following code:
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>
>>>>>
>>>>>
>>>>> open("/usr/lib/lib_mp3_dec_arm12_elinux.so.2.10.0", O_RDONLY) = 8
>>>>> read(8,
>>>>> "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\20\35\0\0004\0\0\0"...,
>>>>>
>>>>>
>>>>> 512) = 512
>>>>> fstat64(8, {st_mode=S_IFREG|0644, st_size=56232, ...}) = 0
>>>>> mmap2(NULL, 88032, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
>>>>> 8, 0)
>>>>> = 0x2ff0a000
>>>>> mprotect(0x2ff18000, 28672, PROT_NONE) = 0
>>>>> mmap2(0x2ff1f000, 4096, PROT_READ|PROT_WRITE,
>>>>> MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 8, 0xd) = 0x2ff1f000
>>>>> close(8) = 0
>>>>> mprotect(0x2ff0a000, 57344, PROT_READ|PROT_WRITE) = 0
>>>>> mprotect(0x2ff0a000, 57344, PROT_READ|PROT_EXEC) = 0
>>>>> cacheflush(0x2ff0a000, 0x2ff18000, 0, 0x6, 0x2cd03420) = 0 // System
>>>>> hung up here!!!
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> [4] kernel log
>>>>> I use "echo t> /proc/sysrq-trigger" to show the tasks's information:
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>
>>>>>
>>>>>
>>>>> multiqueue0:src D 804cd678 0 7328 5963 0x00000001
>>>>> [<804cd678>] (__schedule+0x228/0x760) from [<804d0564>]
>>>>> (__down_read+0xa8/0xe0)
>>>>> [<804d0564>] (__down_read+0xa8/0xe0) from [<800478c4>]
>>>>> (do_page_fault+0xbc/0x480)
>>>>> [<800478c4>] (do_page_fault+0xbc/0x480) from [<8003841c>]
>>>>> (do_DataAbort+0x34/0x98)
>>>>> [<8003841c>] (do_DataAbort+0x34/0x98) from [<8003df10>]
>>>>> (__dabt_svc+0x70/0xa0)
>>>>> Exception stack(0xbae37ea8 to 0xbae37ef0)
>>>>> 7ea0: 31e05000 31e1d000 00000020 0000001f 31e05000 31e1d000
>>>>> 7ec0: bfac86b8 31e05000 31e1d000 bae36000 08100075 31e056fc 31e08000
>>>>> bae37ef0
>>>>> 7ee0: 800424a8 8004a1fc 800f0013 ffffffff
>>>>> [<8003df10>] (__dabt_svc+0x70/0xa0) from [<8004a1fc>]
>>>>> (v7_coherent_kern_range+0x20/0x80)
>>>>> [<8004a1fc>] (v7_coherent_kern_range+0x20/0x80) from [<800424a8>]
>>>>> (arm_syscall+0x2a0/0x2c4)
>>>>> [<800424a8>] (arm_syscall+0x2a0/0x2c4) from [<8003e500>]
>>>>> (ret_fast_syscall+0x0/0x3c)
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> The do_cache_op() has already held the mm->mmap_sem, but
>>>>> v7_coherent_kern_range()
>>>>> cause one page fault during it flush the cache. deadlock! So it
>>>>> hung up
>>>>> in the do_page_fault().
>>>>>
>>>>> [5] questions:
>>>>> Why the v7_coherent_kern_range() can caused the data abort?
>>>>> Is there something wrong about the v7_coherent_kern_range()?
>>>>>
>>>>>
>>>>> thanks
>>>>> Huang Shijie
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> linux-arm-kernel mailing list
>>>>> linux-arm-kernel at lists.infradead.org
>>>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>
>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Bug in v7_coherent_kern_range() ?
  2012-04-01  7:09   ` Huang Shijie
  2012-04-01  8:01     ` Dirk Behme
@ 2012-04-01  8:57     ` Dirk Behme
  2012-04-01  9:19       ` Huang Shijie
  2012-04-01  9:19       ` Huang Shijie
  1 sibling, 2 replies; 20+ messages in thread
From: Dirk Behme @ 2012-04-01  8:57 UTC (permalink / raw)
  To: linux-arm-kernel

On 01.04.2012 09:09, Huang Shijie wrote:
> Hi Dirk:
>> Hi Huang Shijie,
>>
>> On 01.04.2012 05:21, Huang Shijie wrote:
>>> [1] Platform:
>>> freescale's IMX6Q(4 cores) , ARM CORTEX-A9
>>>
>>> [2] kernel:
>>> 3.0.15(I have cherry-picked many patches, and the
>>> arch/arm/mm/cache-v7.S
>>> is same code with the latest kernel v3.4-rc1)
>>> enable SMP, VIPT,
>>
>> Could you try an unpatched, clean v3.4-rc1 instead?
> Sorry, I could not try the v3.4-rc1. Some our bsp drivers are not DT
> supported.

Have you tried the 3.2 based Linaro kernel? It's DT based.

Best regards

Dirk

>> What's about your 2.6.38?
> 2.6.38 is not a good version to run the imx6q. It losts many our
> drivers's patches.
>>
>> What's about 3.0.26? 3.0.15 seems to miss some maybe relevant patches.
>>
> Our bsp release are based on 3.0.15. so we could not test it on 3.0.26
> too.
>
>>> [3] application:
>>
>> Could you share a (simple) test case?
> The test case is like this:
> #gplay xx.avi
>
> gplay is our own player, such as mplayer.
> I just created a script which will play the video files one by one.
>
> BR
> Huang Shijie
>
>>
>> Best regards
>>
>> Dirk
>>
>>> I use our our application which will clone many threads,
>>> two threads (assume as A and B) may do the same thing at the same time
>>> as the following code:
>>>
>>> In most of the time, it's ok.
>>> But in some unknown situation, cacheflush() failed and one threads
>>> (assume A) may hung up in the following code:
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>> open("/usr/lib/lib_mp3_dec_arm12_elinux.so.2.10.0", O_RDONLY) = 8
>>> read(8,
>>> "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\20\35\0\0004\0\0\0"...,
>>>
>>> 512) = 512
>>> fstat64(8, {st_mode=S_IFREG|0644, st_size=56232, ...}) = 0
>>> mmap2(NULL, 88032, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
>>> 8, 0)
>>> = 0x2ff0a000
>>> mprotect(0x2ff18000, 28672, PROT_NONE) = 0
>>> mmap2(0x2ff1f000, 4096, PROT_READ|PROT_WRITE,
>>> MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 8, 0xd) = 0x2ff1f000
>>> close(8) = 0
>>> mprotect(0x2ff0a000, 57344, PROT_READ|PROT_WRITE) = 0
>>> mprotect(0x2ff0a000, 57344, PROT_READ|PROT_EXEC) = 0
>>> cacheflush(0x2ff0a000, 0x2ff18000, 0, 0x6, 0x2cd03420) = 0 // System
>>> hung up here!!!
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>>
>>> [4] kernel log
>>> I use "echo t> /proc/sysrq-trigger" to show the tasks's information:
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>> multiqueue0:src D 804cd678 0 7328 5963 0x00000001
>>> [<804cd678>] (__schedule+0x228/0x760) from [<804d0564>]
>>> (__down_read+0xa8/0xe0)
>>> [<804d0564>] (__down_read+0xa8/0xe0) from [<800478c4>]
>>> (do_page_fault+0xbc/0x480)
>>> [<800478c4>] (do_page_fault+0xbc/0x480) from [<8003841c>]
>>> (do_DataAbort+0x34/0x98)
>>> [<8003841c>] (do_DataAbort+0x34/0x98) from [<8003df10>]
>>> (__dabt_svc+0x70/0xa0)
>>> Exception stack(0xbae37ea8 to 0xbae37ef0)
>>> 7ea0: 31e05000 31e1d000 00000020 0000001f 31e05000 31e1d000
>>> 7ec0: bfac86b8 31e05000 31e1d000 bae36000 08100075 31e056fc 31e08000
>>> bae37ef0
>>> 7ee0: 800424a8 8004a1fc 800f0013 ffffffff
>>> [<8003df10>] (__dabt_svc+0x70/0xa0) from [<8004a1fc>]
>>> (v7_coherent_kern_range+0x20/0x80)
>>> [<8004a1fc>] (v7_coherent_kern_range+0x20/0x80) from [<800424a8>]
>>> (arm_syscall+0x2a0/0x2c4)
>>> [<800424a8>] (arm_syscall+0x2a0/0x2c4) from [<8003e500>]
>>> (ret_fast_syscall+0x0/0x3c)
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>>
>>> The do_cache_op() has already held the mm->mmap_sem, but
>>> v7_coherent_kern_range()
>>> cause one page fault during it flush the cache. deadlock! So it
>>> hung up
>>> in the do_page_fault().
>>>
>>> [5] questions:
>>> Why the v7_coherent_kern_range() can caused the data abort?
>>> Is there something wrong about the v7_coherent_kern_range()?
>>>
>>>
>>> thanks
>>> Huang Shijie
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> linux-arm-kernel mailing list
>>> linux-arm-kernel at lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>>
>>
>>
>
>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Bug in v7_coherent_kern_range() ?
  2012-04-01  8:50         ` Dirk Behme
@ 2012-04-01  9:14           ` Huang Shijie
  0 siblings, 0 replies; 20+ messages in thread
From: Huang Shijie @ 2012-04-01  9:14 UTC (permalink / raw)
  To: linux-arm-kernel

? 2012?04?01? 16:50, Dirk Behme ??:
> On 01.04.2012 10:16, Huang Shijie wrote:
>> Hi Dirk:
>>> Hi Huang Shijie,
>>>
>>> On 01.04.2012 09:09, Huang Shijie wrote:
>>>> Hi Dirk:
>>>>> Hi Huang Shijie,
>>>>>
>>>>> On 01.04.2012 05:21, Huang Shijie wrote:
>>>>>> [1] Platform:
>>>>>> freescale's IMX6Q(4 cores) , ARM CORTEX-A9
>>>>>>
>>>>>> [2] kernel:
>>>>>> 3.0.15(I have cherry-picked many patches, and the
>>>>>> arch/arm/mm/cache-v7.S
>>>>>> is same code with the latest kernel v3.4-rc1)
>>>>>> enable SMP, VIPT,
>>>>>
>>>>> Could you try an unpatched, clean v3.4-rc1 instead?
>>>> Sorry, I could not try the v3.4-rc1. Some our bsp drivers are not DT
>>>> supported.
>>>
>>> I think we are not talking about drivers, we are talking about some
>>> kernel core code, like cache handling? To test
>>> v7_coherent_kern_range() you might not need to many bsp drivers?
>> Yes , the gplay will use the vpu driver. But the VPU driver is not in
>> the kernel. Without the vpu driver, the gplay can not works.
>
> You could try to disable the vpu driver and check if the issue is 
> still there, then.
>
:(
I have no idea how to reproduce this issue if i disable the vpu driver.
>>>>> What's about your 2.6.38?
>>>> 2.6.38 is not a good version to run the imx6q. It losts many our
>>>> drivers's patches.
>>>>>
>>>>> What's about 3.0.26? 3.0.15 seems to miss some maybe relevant
>>>>> patches.
>>>>>
>>>> Our bsp release are based on 3.0.15. so we could not test it on 3.0.26
>>>> too.
>>>
>>> You can. Just give git rebase a try.
>> It will be a nightmare to me. We have nearly 1000 patches. I will cost
>> me much time to handle the conflicts.
>
> IMHO you will get one easy to solve merge conflict. So it should you 
> take < 10min to rebase to 3.0.26. Just try it ;)
>
>>>
>>>>>> [3] application:
>>>>>
>>>>> Could you share a (simple) test case?
>>>> The test case is like this:
>>>> #gplay xx.avi
>>>>
>>>> gplay is our own player, such as mplayer.
>>>
>>> Could you share a (simple) test case? E.g. share 'gplay'? Or try to
>>> reproduce your issue with an other test case? E.g. mplayer? Or
>>> better anything simpler the community can use to try to reproduce
>>> your issue?
>> I can email to you the gplay, if you have an imx6q board. you can test
>> it.
>> I just wish someone give me some advice about this issue.
>
> It would help to use a kernel version and a test case the community 
> can use to reproduce.
>
I know.

thanks
Huang Shijie


> Best regards
>
> Dirk
>
>> I find the arch/arm/include/asm/assembler.h is out of date. So I will
>> update it and test it again.
>>
>> thanks a lot , Dirk.
>>
>> Huang Shijie
>>>
>>> Best regards
>>>
>>> Dirk
>>>
>>>> I just created a script which will play the video files one by one.
>>>>
>>>> BR
>>>> Huang Shijie
>>>>
>>>>>
>>>>> Best regards
>>>>>
>>>>> Dirk
>>>>>
>>>>>> I use our our application which will clone many threads,
>>>>>> two threads (assume as A and B) may do the same thing at the same
>>>>>> time
>>>>>> as the following code:
>>>>>>
>>>>>> In most of the time, it's ok.
>>>>>> But in some unknown situation, cacheflush() failed and one threads
>>>>>> (assume A) may hung up in the following code:
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> open("/usr/lib/lib_mp3_dec_arm12_elinux.so.2.10.0", O_RDONLY) = 8
>>>>>> read(8,
>>>>>> "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\20\35\0\0004\0\0\0"..., 
>>>>>>
>>>>>>
>>>>>>
>>>>>> 512) = 512
>>>>>> fstat64(8, {st_mode=S_IFREG|0644, st_size=56232, ...}) = 0
>>>>>> mmap2(NULL, 88032, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE,
>>>>>> 8, 0)
>>>>>> = 0x2ff0a000
>>>>>> mprotect(0x2ff18000, 28672, PROT_NONE) = 0
>>>>>> mmap2(0x2ff1f000, 4096, PROT_READ|PROT_WRITE,
>>>>>> MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 8, 0xd) = 0x2ff1f000
>>>>>> close(8) = 0
>>>>>> mprotect(0x2ff0a000, 57344, PROT_READ|PROT_WRITE) = 0
>>>>>> mprotect(0x2ff0a000, 57344, PROT_READ|PROT_EXEC) = 0
>>>>>> cacheflush(0x2ff0a000, 0x2ff18000, 0, 0x6, 0x2cd03420) = 0 // System
>>>>>> hung up here!!!
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> [4] kernel log
>>>>>> I use "echo t> /proc/sysrq-trigger" to show the tasks's information:
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> multiqueue0:src D 804cd678 0 7328 5963 0x00000001
>>>>>> [<804cd678>] (__schedule+0x228/0x760) from [<804d0564>]
>>>>>> (__down_read+0xa8/0xe0)
>>>>>> [<804d0564>] (__down_read+0xa8/0xe0) from [<800478c4>]
>>>>>> (do_page_fault+0xbc/0x480)
>>>>>> [<800478c4>] (do_page_fault+0xbc/0x480) from [<8003841c>]
>>>>>> (do_DataAbort+0x34/0x98)
>>>>>> [<8003841c>] (do_DataAbort+0x34/0x98) from [<8003df10>]
>>>>>> (__dabt_svc+0x70/0xa0)
>>>>>> Exception stack(0xbae37ea8 to 0xbae37ef0)
>>>>>> 7ea0: 31e05000 31e1d000 00000020 0000001f 31e05000 31e1d000
>>>>>> 7ec0: bfac86b8 31e05000 31e1d000 bae36000 08100075 31e056fc 31e08000
>>>>>> bae37ef0
>>>>>> 7ee0: 800424a8 8004a1fc 800f0013 ffffffff
>>>>>> [<8003df10>] (__dabt_svc+0x70/0xa0) from [<8004a1fc>]
>>>>>> (v7_coherent_kern_range+0x20/0x80)
>>>>>> [<8004a1fc>] (v7_coherent_kern_range+0x20/0x80) from [<800424a8>]
>>>>>> (arm_syscall+0x2a0/0x2c4)
>>>>>> [<800424a8>] (arm_syscall+0x2a0/0x2c4) from [<8003e500>]
>>>>>> (ret_fast_syscall+0x0/0x3c)
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> The do_cache_op() has already held the mm->mmap_sem, but
>>>>>> v7_coherent_kern_range()
>>>>>> cause one page fault during it flush the cache. deadlock! So it
>>>>>> hung up
>>>>>> in the do_page_fault().
>>>>>>
>>>>>> [5] questions:
>>>>>> Why the v7_coherent_kern_range() can caused the data abort?
>>>>>> Is there something wrong about the v7_coherent_kern_range()?
>>>>>>
>>>>>>
>>>>>> thanks
>>>>>> Huang Shijie
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> linux-arm-kernel mailing list
>>>>>> linux-arm-kernel at lists.infradead.org
>>>>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>>
>
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Bug in v7_coherent_kern_range() ?
  2012-04-01  8:57     ` Dirk Behme
@ 2012-04-01  9:19       ` Huang Shijie
  2012-04-01  9:19       ` Huang Shijie
  1 sibling, 0 replies; 20+ messages in thread
From: Huang Shijie @ 2012-04-01  9:19 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Dirk:
>
> Have you tried the 3.2 based Linaro kernel? It's DT based.
>
not yet.

I will test the it.

BR
Huang Shijie

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Bug in v7_coherent_kern_range() ?
  2012-04-01  8:57     ` Dirk Behme
  2012-04-01  9:19       ` Huang Shijie
@ 2012-04-01  9:19       ` Huang Shijie
  1 sibling, 0 replies; 20+ messages in thread
From: Huang Shijie @ 2012-04-01  9:19 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Dirk:
>
> Have you tried the 3.2 based Linaro kernel? It's DT based.
>
not yet.

I will test  it.

BR
Huang Shijie

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Bug in v7_coherent_kern_range() ?
  2012-04-01  3:21 Bug in v7_coherent_kern_range() ? Huang Shijie
  2012-04-01  6:10 ` Dirk Behme
@ 2012-04-02 11:12 ` Will Deacon
  2012-04-06  3:35   ` Huang Shijie
  1 sibling, 1 reply; 20+ messages in thread
From: Will Deacon @ 2012-04-02 11:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, Apr 01, 2012 at 04:21:10AM +0100, Huang Shijie wrote:
> But in some unknown situation, cacheflush() failed and one threads
> (assume A) may hung up in the following code:

[...]

> multiqueue0:src D 804cd678 0 7328 5963 0x00000001
> [<804cd678>] (__schedule+0x228/0x760) from [<804d0564>]
> (__down_read+0xa8/0xe0)
> [<804d0564>] (__down_read+0xa8/0xe0) from [<800478c4>]
> (do_page_fault+0xbc/0x480)
> [<800478c4>] (do_page_fault+0xbc/0x480) from [<8003841c>]
> (do_DataAbort+0x34/0x98)
> [<8003841c>] (do_DataAbort+0x34/0x98) from [<8003df10>]
> (__dabt_svc+0x70/0xa0)
> Exception stack(0xbae37ea8 to 0xbae37ef0)
> 7ea0: 31e05000 31e1d000 00000020 0000001f 31e05000 31e1d000
> 7ec0: bfac86b8 31e05000 31e1d000 bae36000 08100075 31e056fc 31e08000
> bae37ef0
> 7ee0: 800424a8 8004a1fc 800f0013 ffffffff
> [<8003df10>] (__dabt_svc+0x70/0xa0) from [<8004a1fc>]
> (v7_coherent_kern_range+0x20/0x80)
> [<8004a1fc>] (v7_coherent_kern_range+0x20/0x80) from [<800424a8>]
> (arm_syscall+0x2a0/0x2c4)
> [<800424a8>] (arm_syscall+0x2a0/0x2c4) from [<8003e500>]
> (ret_fast_syscall+0x0/0x3c)

Please can you try the patch posted here:?

http://lists.arm.linux.org.uk/lurker/message/20111107.173344.f738392e.en.html

If it fixes your problem, please consider giving a tested-by.

Will

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Bug in v7_coherent_kern_range() ?
  2012-04-02 11:12 ` Will Deacon
@ 2012-04-06  3:35   ` Huang Shijie
  2012-04-10  9:22     ` Will Deacon
  0 siblings, 1 reply; 20+ messages in thread
From: Huang Shijie @ 2012-04-06  3:35 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will:
> On Sun, Apr 01, 2012 at 04:21:10AM +0100, Huang Shijie wrote:
>> But in some unknown situation, cacheflush() failed and one threads
>> (assume A) may hung up in the following code:
> [...]
>
>> multiqueue0:src D 804cd678 0 7328 5963 0x00000001
>> [<804cd678>] (__schedule+0x228/0x760) from [<804d0564>]
>> (__down_read+0xa8/0xe0)
>> [<804d0564>] (__down_read+0xa8/0xe0) from [<800478c4>]
>> (do_page_fault+0xbc/0x480)
>> [<800478c4>] (do_page_fault+0xbc/0x480) from [<8003841c>]
>> (do_DataAbort+0x34/0x98)
>> [<8003841c>] (do_DataAbort+0x34/0x98) from [<8003df10>]
>> (__dabt_svc+0x70/0xa0)
>> Exception stack(0xbae37ea8 to 0xbae37ef0)
>> 7ea0: 31e05000 31e1d000 00000020 0000001f 31e05000 31e1d000
>> 7ec0: bfac86b8 31e05000 31e1d000 bae36000 08100075 31e056fc 31e08000
>> bae37ef0
>> 7ee0: 800424a8 8004a1fc 800f0013 ffffffff
>> [<8003df10>] (__dabt_svc+0x70/0xa0) from [<8004a1fc>]
>> (v7_coherent_kern_range+0x20/0x80)
>> [<8004a1fc>] (v7_coherent_kern_range+0x20/0x80) from [<800424a8>]
>> (arm_syscall+0x2a0/0x2c4)
>> [<800424a8>] (arm_syscall+0x2a0/0x2c4) from [<8003e500>]
>> (ret_fast_syscall+0x0/0x3c)
> Please can you try the patch posted here:?
>
> http://lists.arm.linux.org.uk/lurker/message/20111107.173344.f738392e.en.html
I tested this patch.  It fixed this bug. This bug did not occur any more.
But my system still hung at futex. I think the futex issue is another 
bug.(will this patch affect the futex?)

So :
Tested-by: Huang Shijie <b32955@freescale.com>

BR
Huang Shijie

> If it fixes your problem, please consider giving a tested-by.
>
> Will
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Bug in v7_coherent_kern_range() ?
  2012-04-06  3:35   ` Huang Shijie
@ 2012-04-10  9:22     ` Will Deacon
  2012-04-10 10:30       ` Huang Shijie
  0 siblings, 1 reply; 20+ messages in thread
From: Will Deacon @ 2012-04-10  9:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 06, 2012 at 04:35:09AM +0100, Huang Shijie wrote:
> >
> > http://lists.arm.linux.org.uk/lurker/message/20111107.173344.f738392e.en.html
> I tested this patch.  It fixed this bug. This bug did not occur any more.
> But my system still hung at futex. I think the futex issue is another 
> bug.(will this patch affect the futex?)

If you're on an SMP system, can you check that you have df77abca ("ARM:
7099/1: futex: preserve oldval in SMP __futex_atomic_op") applied?

> So :
> Tested-by: Huang Shijie <b32955@freescale.com>

Ok, thanks. It looks that has briefly revived the discussion over there at
least.

Will

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Bug in v7_coherent_kern_range() ?
  2012-04-10  9:22     ` Will Deacon
@ 2012-04-10 10:30       ` Huang Shijie
  2012-04-10 10:35         ` Will Deacon
       [not found]         ` <4F854992.9080601@freescale.com>
  0 siblings, 2 replies; 20+ messages in thread
From: Huang Shijie @ 2012-04-10 10:30 UTC (permalink / raw)
  To: linux-arm-kernel

Hi will:
> On Fri, Apr 06, 2012 at 04:35:09AM +0100, Huang Shijie wrote:
>>> http://lists.arm.linux.org.uk/lurker/message/20111107.173344.f738392e.en.html
>> I tested this patch.  It fixed this bug. This bug did not occur any more.
>> But my system still hung at futex. I think the futex issue is another
>> bug.(will this patch affect the futex?)
> If you're on an SMP system, can you check that you have df77abca ("ARM:
> 7099/1: futex: preserve oldval in SMP __futex_atomic_op") applied?
>

already applied.
thanks.

The futex codes (/kernel/futex.c and arch/arm/include/asm/futex.h) are 
the latest.
I guess there is a bug in the futex code in SMP system.

Best Regards
Huang Shijie


>> So :
>> Tested-by: Huang Shijie<b32955@freescale.com>
> Ok, thanks. It looks that has briefly revived the discussion over there at
> least.
>
> Will
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Bug in v7_coherent_kern_range() ?
  2012-04-10 10:30       ` Huang Shijie
@ 2012-04-10 10:35         ` Will Deacon
       [not found]         ` <4F854992.9080601@freescale.com>
  1 sibling, 0 replies; 20+ messages in thread
From: Will Deacon @ 2012-04-10 10:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Apr 10, 2012 at 11:30:52AM +0100, Huang Shijie wrote:
> > On Fri, Apr 06, 2012 at 04:35:09AM +0100, Huang Shijie wrote:
> >>> http://lists.arm.linux.org.uk/lurker/message/20111107.173344.f738392e.en.html
> >> I tested this patch.  It fixed this bug. This bug did not occur any more.
> >> But my system still hung at futex. I think the futex issue is another
> >> bug.(will this patch affect the futex?)
> > If you're on an SMP system, can you check that you have df77abca ("ARM:
> > 7099/1: futex: preserve oldval in SMP __futex_atomic_op") applied?
> >
> 
> already applied.
> thanks.
> 
> The futex codes (/kernel/futex.c and arch/arm/include/asm/futex.h) are 
> the latest.
> I guess there is a bug in the futex code in SMP system.

Ok. Can you please:

(a) Make your test case available somewhere?

(b) Try a more recent mainline kernel (3.3)?

Also - which libc are you using? Some older library implementations
incorrectly use swp for atomicity. If you have swp emulation enabled, this
could cause a lock-up. Do you have CONFIG_SWP_EMULATE=y?

Will

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Bug in v7_coherent_kern_range() ?
       [not found]         ` <4F854992.9080601@freescale.com>
@ 2012-04-11 10:24           ` Will Deacon
  2012-04-11 11:02             ` Fabio Estevam
  2012-05-10  2:51             ` Huang Shijie
  0 siblings, 2 replies; 20+ messages in thread
From: Will Deacon @ 2012-04-11 10:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 11, 2012 at 10:06:26AM +0100, Huang Shijie wrote:
> Ok. Can you please:
> 
> (a) Make your test case available somewhere?
> 
> 
> I wish i could find a more common test case to reproduce this bug. But, i can't.
> The only test case now is to run the gplay on our IMX6Q platform.

Ok, that makes it tricky since I don't have gplay or an IMX6Q platform.

> (b) Try a more recent mainline kernel (3.3)?
> 
> 
> yes, I will try to test the linaro kernel.

Can you not try vanilla mainline instead? Either way, let me know how you
get on.

> The info of the libc:
>      GNU libc version: 2.13
>      GNU libc release: stable

That looks new enough for swp not to be an issue.

If you still have this problem with a newer kernel, we can try using the
fallback SMP futex implementation and see if that works.

Will

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Bug in v7_coherent_kern_range() ?
  2012-04-11 10:24           ` Will Deacon
@ 2012-04-11 11:02             ` Fabio Estevam
  2012-04-16  5:48               ` Huang Shijie
  2012-05-10  2:51             ` Huang Shijie
  1 sibling, 1 reply; 20+ messages in thread
From: Fabio Estevam @ 2012-04-11 11:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 11, 2012 at 7:24 AM, Will Deacon <will.deacon@arm.com> wrote:

>> I wish i could find a more common test case to reproduce this bug. But, i can't.
>> The only test case now is to run the gplay on our IMX6Q platform.
>
> Ok, that makes it tricky since I don't have gplay or an IMX6Q platform.

gplay is a C application that does the same thing as launching a
simple Gstreamer pipeline like:

gst-launch playbin2 uri=file:///home/file.mp4

Huang,

Does the problem also occur if you don?t use the VPU driver? I mean,
does it also happen if you decode the file using software codecs. I
would like to know if the issue you see is related to the VPU driver
or not.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Bug in v7_coherent_kern_range() ?
  2012-04-11 11:02             ` Fabio Estevam
@ 2012-04-16  5:48               ` Huang Shijie
  0 siblings, 0 replies; 20+ messages in thread
From: Huang Shijie @ 2012-04-16  5:48 UTC (permalink / raw)
  To: linux-arm-kernel

? 2012?04?11? 19:02, Fabio Estevam ??:
> On Wed, Apr 11, 2012 at 7:24 AM, Will Deacon<will.deacon@arm.com>  wrote:
>
>>> I wish i could find a more common test case to reproduce this bug. But, i can't.
>>> The only test case now is to run the gplay on our IMX6Q platform.
>> Ok, that makes it tricky since I don't have gplay or an IMX6Q platform.
> gplay is a C application that does the same thing as launching a
> simple Gstreamer pipeline like:
>
> gst-launch playbin2 uri=file:///home/file.mp4
>
> Huang,
>
> Does the problem also occur if you don?t use the VPU driver? I mean,
I do not test the case with the VPU disabled.
> does it also happen if you decode the file using software codecs. I
> would like to know if the issue you see is related to the VPU driver
Can the vpu affects the futex?  I  am debugging an uart bug now.
I will continue to debug this bug when i finish the uart bug.

Best Regards
Huang Shijie
> or not.
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Bug in v7_coherent_kern_range() ?
  2012-04-11 10:24           ` Will Deacon
  2012-04-11 11:02             ` Fabio Estevam
@ 2012-05-10  2:51             ` Huang Shijie
  2012-05-10  8:38               ` Will Deacon
  1 sibling, 1 reply; 20+ messages in thread
From: Huang Shijie @ 2012-05-10  2:51 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will:
> If you still have this problem with a newer kernel, we can try using the
> fallback SMP futex implementation and see if that works.
>
After we update our application(gstreamer), the futex issue gone.
So this is not a kernel bug, but an application bug.

thanks for your help.

Huang Shijie

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Bug in v7_coherent_kern_range() ?
  2012-05-10  2:51             ` Huang Shijie
@ 2012-05-10  8:38               ` Will Deacon
  0 siblings, 0 replies; 20+ messages in thread
From: Will Deacon @ 2012-05-10  8:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, May 10, 2012 at 03:51:20AM +0100, Huang Shijie wrote:
> Hi Will:
> > If you still have this problem with a newer kernel, we can try using the
> > fallback SMP futex implementation and see if that works.
> >
> After we update our application(gstreamer), the futex issue gone.
> So this is not a kernel bug, but an application bug.

That's good to hear, thanks for reporting back.

Will

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2012-05-10  8:38 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-01  3:21 Bug in v7_coherent_kern_range() ? Huang Shijie
2012-04-01  6:10 ` Dirk Behme
2012-04-01  7:09   ` Huang Shijie
2012-04-01  8:01     ` Dirk Behme
2012-04-01  8:16       ` Huang Shijie
2012-04-01  8:50         ` Dirk Behme
2012-04-01  9:14           ` Huang Shijie
2012-04-01  8:57     ` Dirk Behme
2012-04-01  9:19       ` Huang Shijie
2012-04-01  9:19       ` Huang Shijie
2012-04-02 11:12 ` Will Deacon
2012-04-06  3:35   ` Huang Shijie
2012-04-10  9:22     ` Will Deacon
2012-04-10 10:30       ` Huang Shijie
2012-04-10 10:35         ` Will Deacon
     [not found]         ` <4F854992.9080601@freescale.com>
2012-04-11 10:24           ` Will Deacon
2012-04-11 11:02             ` Fabio Estevam
2012-04-16  5:48               ` Huang Shijie
2012-05-10  2:51             ` Huang Shijie
2012-05-10  8:38               ` Will Deacon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.