All of lore.kernel.org
 help / color / mirror / Atom feed
* Kernel crash while copying big files since kernel 3.18
@ 2015-01-05  7:12 François Valenduc
  2015-01-05 16:13 ` François Valenduc
  2015-01-05 17:25 ` Larry Finger
  0 siblings, 2 replies; 9+ messages in thread
From: François Valenduc @ 2015-01-05  7:12 UTC (permalink / raw)
  To: linux-wireless

Hello everybody,

Since kernel 3.18, I encounter a kernel crash each time when I copy a
big file (around 12 Gb) from an external USB drive to the harddrive of
my laptop.
I tried a bisection between kernels 3.17 and 3.18 and I was surprised to
find that this has to do with the driver of the wireless card
(rtl8188ee). However, I don't have problems if I copy the file while the
rtl8188 module is not loaded. Unfortunately, the results of git-bisect
are not totally conclusive because the kernel crash during boot when the
wireless connection is established. Here are the last steps of the
bisection:

# bad: [c151aed6aa146e9587590051aba9da68b9370f9b] rtlwifi: rtl8188ee:
Update driver to match Realtek release of 06282014
git bisect bad c151aed6aa146e9587590051aba9da68b9370f9b
# good: [fd09ff958777cf583d7541f180991c0fc50bd2f7] rtlwifi: Remove extra
workqueue for enter/leave power state
git bisect good fd09ff958777cf583d7541f180991c0fc50bd2f7
# skip: [9afa2e44f4d8f9d031f815c32bb8f225f0f6746b] rtlwifi: Modify
base.{c,h} for new drivers
git bisect skip 9afa2e44f4d8f9d031f815c32bb8f225f0f6746b
# skip: [3c67b8f9f3b5bb1207c9bb198e5ef04ff56921dd] rtlwifi: Modify
cam.{c,h} and efuse.{c,h} for new drivers
git bisect skip 3c67b8f9f3b5bb1207c9bb198e5ef04ff56921dd
# skip: [f7953b2ad66cc5fc66e13d5c0a40e61b45cdfca8] rtlwifi: Modify
core.c for new drivers
git bisect skip f7953b2ad66cc5fc66e13d5c0a40e61b45cdfca8
# skip: [d3feae41a3473a0f7b431d6af4e092865d586e52] rtlwifi: Update
power-save routines for 062814 driver
git bisect skip d3feae41a3473a0f7b431d6af4e092865d586e52
# skip: [38506ecefab911785d5e1aa5889f6eeb462e0954] rtlwifi: rtl_pci:
Start modification for new drivers
git bisect skip 38506ecefab911785d5e1aa5889f6eeb462e0954
# skip: [f3a97e93814aeac3f13e857a0071726acc9bd626] rtlwifi: Finish
modifying core routines for new drivers
git bisect skip f3a97e93814aeac3f13e857a0071726acc9bd626
# only skipped commits left to test
# possible first bad commit: [c151aed6aa146e9587590051aba9da68b9370f9b]
rtlwifi: rtl8188ee: Update driver to match Realtek release of 06282014
# possible first bad commit: [f3a97e93814aeac3f13e857a0071726acc9bd626]
rtlwifi: Finish modifying core routines for new drivers
# possible first bad commit: [d3feae41a3473a0f7b431d6af4e092865d586e52]
rtlwifi: Update power-save routines for 062814 driver
# possible first bad commit: [3c67b8f9f3b5bb1207c9bb198e5ef04ff56921dd]
rtlwifi: Modify cam.{c,h} and efuse.{c,h} for new drivers
# possible first bad commit: [9afa2e44f4d8f9d031f815c32bb8f225f0f6746b]
rtlwifi: Modify base.{c,h} for new drivers
# possible first bad commit: [f7953b2ad66cc5fc66e13d5c0a40e61b45cdfca8]
rtlwifi: Modify core.c for new drivers
# possible first bad commit: [38506ecefab911785d5e1aa5889f6eeb462e0954]
rtlwifi: rtl_pci: Start modification for new drivers

Can somebody explain what's happening ? I do the copy via Dolphin in KDE
and the screen becomes black and the computer becomes totally
unresponsive. So, I don't have access to the logs to see the trace of
the problem.

Thanks in advance for your help,

François Valenduc

PS: please use "cc" to answer because I don't have a subscription to the
mailing list.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Kernel crash while copying big files since kernel 3.18
  2015-01-05  7:12 Kernel crash while copying big files since kernel 3.18 François Valenduc
@ 2015-01-05 16:13 ` François Valenduc
  2015-01-05 17:25 ` Larry Finger
  1 sibling, 0 replies; 9+ messages in thread
From: François Valenduc @ 2015-01-05 16:13 UTC (permalink / raw)
  To: linux-wireless

Le 05/01/15 08:12, François Valenduc a écrit :
> Hello everybody,
>
> Since kernel 3.18, I encounter a kernel crash each time when I copy a
> big file (around 12 Gb) from an external USB drive to the harddrive of
> my laptop.
> I tried a bisection between kernels 3.17 and 3.18 and I was surprised to
> find that this has to do with the driver of the wireless card
> (rtl8188ee). However, I don't have problems if I copy the file while the
> rtl8188 module is not loaded. Unfortunately, the results of git-bisect
> are not totally conclusive because the kernel crash during boot when the
> wireless connection is established. Here are the last steps of the
> bisection:
>
> # bad: [c151aed6aa146e9587590051aba9da68b9370f9b] rtlwifi: rtl8188ee:
> Update driver to match Realtek release of 06282014
> git bisect bad c151aed6aa146e9587590051aba9da68b9370f9b
> # good: [fd09ff958777cf583d7541f180991c0fc50bd2f7] rtlwifi: Remove extra
> workqueue for enter/leave power state
> git bisect good fd09ff958777cf583d7541f180991c0fc50bd2f7
> # skip: [9afa2e44f4d8f9d031f815c32bb8f225f0f6746b] rtlwifi: Modify
> base.{c,h} for new drivers
> git bisect skip 9afa2e44f4d8f9d031f815c32bb8f225f0f6746b
> # skip: [3c67b8f9f3b5bb1207c9bb198e5ef04ff56921dd] rtlwifi: Modify
> cam.{c,h} and efuse.{c,h} for new drivers
> git bisect skip 3c67b8f9f3b5bb1207c9bb198e5ef04ff56921dd
> # skip: [f7953b2ad66cc5fc66e13d5c0a40e61b45cdfca8] rtlwifi: Modify
> core.c for new drivers
> git bisect skip f7953b2ad66cc5fc66e13d5c0a40e61b45cdfca8
> # skip: [d3feae41a3473a0f7b431d6af4e092865d586e52] rtlwifi: Update
> power-save routines for 062814 driver
> git bisect skip d3feae41a3473a0f7b431d6af4e092865d586e52
> # skip: [38506ecefab911785d5e1aa5889f6eeb462e0954] rtlwifi: rtl_pci:
> Start modification for new drivers
> git bisect skip 38506ecefab911785d5e1aa5889f6eeb462e0954
> # skip: [f3a97e93814aeac3f13e857a0071726acc9bd626] rtlwifi: Finish
> modifying core routines for new drivers
> git bisect skip f3a97e93814aeac3f13e857a0071726acc9bd626
> # only skipped commits left to test
> # possible first bad commit: [c151aed6aa146e9587590051aba9da68b9370f9b]
> rtlwifi: rtl8188ee: Update driver to match Realtek release of 06282014
> # possible first bad commit: [f3a97e93814aeac3f13e857a0071726acc9bd626]
> rtlwifi: Finish modifying core routines for new drivers
> # possible first bad commit: [d3feae41a3473a0f7b431d6af4e092865d586e52]
> rtlwifi: Update power-save routines for 062814 driver
> # possible first bad commit: [3c67b8f9f3b5bb1207c9bb198e5ef04ff56921dd]
> rtlwifi: Modify cam.{c,h} and efuse.{c,h} for new drivers
> # possible first bad commit: [9afa2e44f4d8f9d031f815c32bb8f225f0f6746b]
> rtlwifi: Modify base.{c,h} for new drivers
> # possible first bad commit: [f7953b2ad66cc5fc66e13d5c0a40e61b45cdfca8]
> rtlwifi: Modify core.c for new drivers
> # possible first bad commit: [38506ecefab911785d5e1aa5889f6eeb462e0954]
> rtlwifi: rtl_pci: Start modification for new drivers
>
> Can somebody explain what's happening ? I do the copy via Dolphin in KDE
> and the screen becomes black and the computer becomes totally
> unresponsive. So, I don't have access to the logs to see the trace of
> the problem.
>
> Thanks in advance for your help,
>
> François Valenduc
>
> PS: please use "cc" to answer because I don't have a subscription to the
> mailing list.
I should add that the problem doesn't occur if I use "cp" to copy the
file. However, it also occurs if the file is copied via rsync.

François Valenduc

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Kernel crash while copying big files since kernel 3.18
  2015-01-05  7:12 Kernel crash while copying big files since kernel 3.18 François Valenduc
  2015-01-05 16:13 ` François Valenduc
@ 2015-01-05 17:25 ` Larry Finger
  2015-01-05 18:46   ` François Valenduc
  1 sibling, 1 reply; 9+ messages in thread
From: Larry Finger @ 2015-01-05 17:25 UTC (permalink / raw)
  To: François Valenduc, linux-wireless

On 01/05/2015 01:12 AM, François Valenduc wrote:
> Hello everybody,
>
> Since kernel 3.18, I encounter a kernel crash each time when I copy a
> big file (around 12 Gb) from an external USB drive to the harddrive of
> my laptop.
> I tried a bisection between kernels 3.17 and 3.18 and I was surprised to
> find that this has to do with the driver of the wireless card
> (rtl8188ee). However, I don't have problems if I copy the file while the
> rtl8188 module is not loaded. Unfortunately, the results of git-bisect
> are not totally conclusive because the kernel crash during boot when the
> wireless connection is established. Here are the last steps of the
> bisection:
>
> # bad: [c151aed6aa146e9587590051aba9da68b9370f9b] rtlwifi: rtl8188ee:
> Update driver to match Realtek release of 06282014
> git bisect bad c151aed6aa146e9587590051aba9da68b9370f9b
> # good: [fd09ff958777cf583d7541f180991c0fc50bd2f7] rtlwifi: Remove extra
> workqueue for enter/leave power state
> git bisect good fd09ff958777cf583d7541f180991c0fc50bd2f7
> # skip: [9afa2e44f4d8f9d031f815c32bb8f225f0f6746b] rtlwifi: Modify
> base.{c,h} for new drivers
> git bisect skip 9afa2e44f4d8f9d031f815c32bb8f225f0f6746b
> # skip: [3c67b8f9f3b5bb1207c9bb198e5ef04ff56921dd] rtlwifi: Modify
> cam.{c,h} and efuse.{c,h} for new drivers
> git bisect skip 3c67b8f9f3b5bb1207c9bb198e5ef04ff56921dd
> # skip: [f7953b2ad66cc5fc66e13d5c0a40e61b45cdfca8] rtlwifi: Modify
> core.c for new drivers
> git bisect skip f7953b2ad66cc5fc66e13d5c0a40e61b45cdfca8
> # skip: [d3feae41a3473a0f7b431d6af4e092865d586e52] rtlwifi: Update
> power-save routines for 062814 driver
> git bisect skip d3feae41a3473a0f7b431d6af4e092865d586e52
> # skip: [38506ecefab911785d5e1aa5889f6eeb462e0954] rtlwifi: rtl_pci:
> Start modification for new drivers
> git bisect skip 38506ecefab911785d5e1aa5889f6eeb462e0954
> # skip: [f3a97e93814aeac3f13e857a0071726acc9bd626] rtlwifi: Finish
> modifying core routines for new drivers
> git bisect skip f3a97e93814aeac3f13e857a0071726acc9bd626
> # only skipped commits left to test
> # possible first bad commit: [c151aed6aa146e9587590051aba9da68b9370f9b]
> rtlwifi: rtl8188ee: Update driver to match Realtek release of 06282014
> # possible first bad commit: [f3a97e93814aeac3f13e857a0071726acc9bd626]
> rtlwifi: Finish modifying core routines for new drivers
> # possible first bad commit: [d3feae41a3473a0f7b431d6af4e092865d586e52]
> rtlwifi: Update power-save routines for 062814 driver
> # possible first bad commit: [3c67b8f9f3b5bb1207c9bb198e5ef04ff56921dd]
> rtlwifi: Modify cam.{c,h} and efuse.{c,h} for new drivers
> # possible first bad commit: [9afa2e44f4d8f9d031f815c32bb8f225f0f6746b]
> rtlwifi: Modify base.{c,h} for new drivers
> # possible first bad commit: [f7953b2ad66cc5fc66e13d5c0a40e61b45cdfca8]
> rtlwifi: Modify core.c for new drivers
> # possible first bad commit: [38506ecefab911785d5e1aa5889f6eeb462e0954]
> rtlwifi: rtl_pci: Start modification for new drivers
>
> Can somebody explain what's happening ? I do the copy via Dolphin in KDE
> and the screen becomes black and the computer becomes totally
> unresponsive. So, I don't have access to the logs to see the trace of
> the problem.
>
> Thanks in advance for your help,

There is a bug in 3.18 that is triggered when an O(3) memory allocation fails. 
There is a patch to fix this at 
http://marc.info/?l=linux-netdev&m=141999680927473&w=2 that has been merged into 
wireless-drivers as commit e9538cf4f90713eca71b1d6a74b4eae1d445c664. It will be 
applied to 3.18.X when it makes it into mainline 3.19-rcY, but that has not yet 
happened.

You could manually apply that patch to your kernel source, or you could pull the 
git repo at http://github.com/lwfinger/rtlwifi_new.git. That code has this patch 
already applied.

If this patch does not fix the problem, you might be able to capture at least 
part of the backtrace by starting the transfer and then switching to the logging 
console. When a crash happens, photograph the screen. On my system, I display it 
with CTRL-ALT-F10. I return to the normal graphical console with CTRL-ALT-F7, 
but your distro may use different virtual consoles.

Larry



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Kernel crash while copying big files since kernel 3.18
  2015-01-05 17:25 ` Larry Finger
@ 2015-01-05 18:46   ` François Valenduc
  2015-01-05 19:25     ` Larry Finger
  0 siblings, 1 reply; 9+ messages in thread
From: François Valenduc @ 2015-01-05 18:46 UTC (permalink / raw)
  To: Larry Finger, linux-wireless

Le 05/01/15 18:25, Larry Finger a écrit :
> On 01/05/2015 01:12 AM, François Valenduc wrote:
>> Hello everybody,
>>
>> Since kernel 3.18, I encounter a kernel crash each time when I copy a
>> big file (around 12 Gb) from an external USB drive to the harddrive of
>> my laptop.
>> I tried a bisection between kernels 3.17 and 3.18 and I was surprised to
>> find that this has to do with the driver of the wireless card
>> (rtl8188ee). However, I don't have problems if I copy the file while the
>> rtl8188 module is not loaded. Unfortunately, the results of git-bisect
>> are not totally conclusive because the kernel crash during boot when the
>> wireless connection is established. Here are the last steps of the
>> bisection:
>>
>> # bad: [c151aed6aa146e9587590051aba9da68b9370f9b] rtlwifi: rtl8188ee:
>> Update driver to match Realtek release of 06282014
>> git bisect bad c151aed6aa146e9587590051aba9da68b9370f9b
>> # good: [fd09ff958777cf583d7541f180991c0fc50bd2f7] rtlwifi: Remove extra
>> workqueue for enter/leave power state
>> git bisect good fd09ff958777cf583d7541f180991c0fc50bd2f7
>> # skip: [9afa2e44f4d8f9d031f815c32bb8f225f0f6746b] rtlwifi: Modify
>> base.{c,h} for new drivers
>> git bisect skip 9afa2e44f4d8f9d031f815c32bb8f225f0f6746b
>> # skip: [3c67b8f9f3b5bb1207c9bb198e5ef04ff56921dd] rtlwifi: Modify
>> cam.{c,h} and efuse.{c,h} for new drivers
>> git bisect skip 3c67b8f9f3b5bb1207c9bb198e5ef04ff56921dd
>> # skip: [f7953b2ad66cc5fc66e13d5c0a40e61b45cdfca8] rtlwifi: Modify
>> core.c for new drivers
>> git bisect skip f7953b2ad66cc5fc66e13d5c0a40e61b45cdfca8
>> # skip: [d3feae41a3473a0f7b431d6af4e092865d586e52] rtlwifi: Update
>> power-save routines for 062814 driver
>> git bisect skip d3feae41a3473a0f7b431d6af4e092865d586e52
>> # skip: [38506ecefab911785d5e1aa5889f6eeb462e0954] rtlwifi: rtl_pci:
>> Start modification for new drivers
>> git bisect skip 38506ecefab911785d5e1aa5889f6eeb462e0954
>> # skip: [f3a97e93814aeac3f13e857a0071726acc9bd626] rtlwifi: Finish
>> modifying core routines for new drivers
>> git bisect skip f3a97e93814aeac3f13e857a0071726acc9bd626
>> # only skipped commits left to test
>> # possible first bad commit: [c151aed6aa146e9587590051aba9da68b9370f9b]
>> rtlwifi: rtl8188ee: Update driver to match Realtek release of 06282014
>> # possible first bad commit: [f3a97e93814aeac3f13e857a0071726acc9bd626]
>> rtlwifi: Finish modifying core routines for new drivers
>> # possible first bad commit: [d3feae41a3473a0f7b431d6af4e092865d586e52]
>> rtlwifi: Update power-save routines for 062814 driver
>> # possible first bad commit: [3c67b8f9f3b5bb1207c9bb198e5ef04ff56921dd]
>> rtlwifi: Modify cam.{c,h} and efuse.{c,h} for new drivers
>> # possible first bad commit: [9afa2e44f4d8f9d031f815c32bb8f225f0f6746b]
>> rtlwifi: Modify base.{c,h} for new drivers
>> # possible first bad commit: [f7953b2ad66cc5fc66e13d5c0a40e61b45cdfca8]
>> rtlwifi: Modify core.c for new drivers
>> # possible first bad commit: [38506ecefab911785d5e1aa5889f6eeb462e0954]
>> rtlwifi: rtl_pci: Start modification for new drivers
>>
>> Can somebody explain what's happening ? I do the copy via Dolphin in KDE
>> and the screen becomes black and the computer becomes totally
>> unresponsive. So, I don't have access to the logs to see the trace of
>> the problem.
>>
>> Thanks in advance for your help,
>
> There is a bug in 3.18 that is triggered when an O(3) memory
> allocation fails. There is a patch to fix this at
> http://marc.info/?l=linux-netdev&m=141999680927473&w=2 that has been
> merged into wireless-drivers as commit
> e9538cf4f90713eca71b1d6a74b4eae1d445c664. It will be applied to 3.18.X
> when it makes it into mainline 3.19-rcY, but that has not yet happened.
>
> You could manually apply that patch to your kernel source, or you
> could pull the git repo at http://github.com/lwfinger/rtlwifi_new.git.
> That code has this patch already applied.
>
> If this patch does not fix the problem, you might be able to capture
> at least part of the backtrace by starting the transfer and then
> switching to the logging console. When a crash happens, photograph the
> screen. On my system, I display it with CTRL-ALT-F10. I return to the
> normal graphical console with CTRL-ALT-F7, but your distro may use
> different virtual consoles.
>
> Larry
>
>
Thanks for your help, it seems that your patch solves the problem. Now,
the system doesn't crash anymore after copying the same large file than
yesterday. I also see this message in the log:
rtl_pci: Allocation of new skb failed in _rtl_pci_rx_interrupt which is
added by your patch.
Should I worry about this failure ? Or is it expected ?

François Valenduc


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Kernel crash while copying big files since kernel 3.18
  2015-01-05 18:46   ` François Valenduc
@ 2015-01-05 19:25     ` Larry Finger
  2015-01-11 14:35       ` François Valenduc
  0 siblings, 1 reply; 9+ messages in thread
From: Larry Finger @ 2015-01-05 19:25 UTC (permalink / raw)
  To: François Valenduc, linux-wireless

On 01/05/2015 12:46 PM, François Valenduc wrote:
> Le 05/01/15 18:25, Larry Finger a écrit :
>> On 01/05/2015 01:12 AM, François Valenduc wrote:
>>> Hello everybody,
>>>
>>> Since kernel 3.18, I encounter a kernel crash each time when I copy a
>>> big file (around 12 Gb) from an external USB drive to the harddrive of
>>> my laptop.
>>> I tried a bisection between kernels 3.17 and 3.18 and I was surprised to
>>> find that this has to do with the driver of the wireless card
>>> (rtl8188ee). However, I don't have problems if I copy the file while the
>>> rtl8188 module is not loaded. Unfortunately, the results of git-bisect
>>> are not totally conclusive because the kernel crash during boot when the
>>> wireless connection is established. Here are the last steps of the
>>> bisection:
>>>
>>> # bad: [c151aed6aa146e9587590051aba9da68b9370f9b] rtlwifi: rtl8188ee:
>>> Update driver to match Realtek release of 06282014
>>> git bisect bad c151aed6aa146e9587590051aba9da68b9370f9b
>>> # good: [fd09ff958777cf583d7541f180991c0fc50bd2f7] rtlwifi: Remove extra
>>> workqueue for enter/leave power state
>>> git bisect good fd09ff958777cf583d7541f180991c0fc50bd2f7
>>> # skip: [9afa2e44f4d8f9d031f815c32bb8f225f0f6746b] rtlwifi: Modify
>>> base.{c,h} for new drivers
>>> git bisect skip 9afa2e44f4d8f9d031f815c32bb8f225f0f6746b
>>> # skip: [3c67b8f9f3b5bb1207c9bb198e5ef04ff56921dd] rtlwifi: Modify
>>> cam.{c,h} and efuse.{c,h} for new drivers
>>> git bisect skip 3c67b8f9f3b5bb1207c9bb198e5ef04ff56921dd
>>> # skip: [f7953b2ad66cc5fc66e13d5c0a40e61b45cdfca8] rtlwifi: Modify
>>> core.c for new drivers
>>> git bisect skip f7953b2ad66cc5fc66e13d5c0a40e61b45cdfca8
>>> # skip: [d3feae41a3473a0f7b431d6af4e092865d586e52] rtlwifi: Update
>>> power-save routines for 062814 driver
>>> git bisect skip d3feae41a3473a0f7b431d6af4e092865d586e52
>>> # skip: [38506ecefab911785d5e1aa5889f6eeb462e0954] rtlwifi: rtl_pci:
>>> Start modification for new drivers
>>> git bisect skip 38506ecefab911785d5e1aa5889f6eeb462e0954
>>> # skip: [f3a97e93814aeac3f13e857a0071726acc9bd626] rtlwifi: Finish
>>> modifying core routines for new drivers
>>> git bisect skip f3a97e93814aeac3f13e857a0071726acc9bd626
>>> # only skipped commits left to test
>>> # possible first bad commit: [c151aed6aa146e9587590051aba9da68b9370f9b]
>>> rtlwifi: rtl8188ee: Update driver to match Realtek release of 06282014
>>> # possible first bad commit: [f3a97e93814aeac3f13e857a0071726acc9bd626]
>>> rtlwifi: Finish modifying core routines for new drivers
>>> # possible first bad commit: [d3feae41a3473a0f7b431d6af4e092865d586e52]
>>> rtlwifi: Update power-save routines for 062814 driver
>>> # possible first bad commit: [3c67b8f9f3b5bb1207c9bb198e5ef04ff56921dd]
>>> rtlwifi: Modify cam.{c,h} and efuse.{c,h} for new drivers
>>> # possible first bad commit: [9afa2e44f4d8f9d031f815c32bb8f225f0f6746b]
>>> rtlwifi: Modify base.{c,h} for new drivers
>>> # possible first bad commit: [f7953b2ad66cc5fc66e13d5c0a40e61b45cdfca8]
>>> rtlwifi: Modify core.c for new drivers
>>> # possible first bad commit: [38506ecefab911785d5e1aa5889f6eeb462e0954]
>>> rtlwifi: rtl_pci: Start modification for new drivers
>>>
>>> Can somebody explain what's happening ? I do the copy via Dolphin in KDE
>>> and the screen becomes black and the computer becomes totally
>>> unresponsive. So, I don't have access to the logs to see the trace of
>>> the problem.
>>>
>>> Thanks in advance for your help,
>>
>> There is a bug in 3.18 that is triggered when an O(3) memory
>> allocation fails. There is a patch to fix this at
>> http://marc.info/?l=linux-netdev&m=141999680927473&w=2 that has been
>> merged into wireless-drivers as commit
>> e9538cf4f90713eca71b1d6a74b4eae1d445c664. It will be applied to 3.18.X
>> when it makes it into mainline 3.19-rcY, but that has not yet happened.
>>
>> You could manually apply that patch to your kernel source, or you
>> could pull the git repo at http://github.com/lwfinger/rtlwifi_new.git.
>> That code has this patch already applied.
>>
>> If this patch does not fix the problem, you might be able to capture
>> at least part of the backtrace by starting the transfer and then
>> switching to the logging console. When a crash happens, photograph the
>> screen. On my system, I display it with CTRL-ALT-F10. I return to the
>> normal graphical console with CTRL-ALT-F7, but your distro may use
>> different virtual consoles.
>>
>> Larry
>>
>>
> Thanks for your help, it seems that your patch solves the problem. Now,
> the system doesn't crash anymore after copying the same large file than
> yesterday. I also see this message in the log:
> rtl_pci: Allocation of new skb failed in _rtl_pci_rx_interrupt which is
> added by your patch.
> Should I worry about this failure ? Or is it expected ?

That is the positive proof that the new patch worked. Getting to that condition 
without the patch would have crashed the system. That printk is there to see if 
we were actually getting the condition and recovering. As the code is obviously 
working now, that line will be removed soon.

Thanks,

Larry


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Kernel crash while copying big files since kernel 3.18
  2015-01-05 19:25     ` Larry Finger
@ 2015-01-11 14:35       ` François Valenduc
  2015-01-11 17:00         ` Larry Finger
  0 siblings, 1 reply; 9+ messages in thread
From: François Valenduc @ 2015-01-11 14:35 UTC (permalink / raw)
  To: Larry Finger, linux-wireless

Le 05/01/15 20:25, Larry Finger a écrit :
> On 01/05/2015 12:46 PM, François Valenduc wrote:
>> Le 05/01/15 18:25, Larry Finger a écrit :
>>> On 01/05/2015 01:12 AM, François Valenduc wrote:
>>>> Hello everybody,
>>>>
>>>> Since kernel 3.18, I encounter a kernel crash each time when I copy a
>>>> big file (around 12 Gb) from an external USB drive to the harddrive of
>>>> my laptop.
>>>> I tried a bisection between kernels 3.17 and 3.18 and I was
>>>> surprised to
>>>> find that this has to do with the driver of the wireless card
>>>> (rtl8188ee). However, I don't have problems if I copy the file
>>>> while the
>>>> rtl8188 module is not loaded. Unfortunately, the results of git-bisect
>>>> are not totally conclusive because the kernel crash during boot
>>>> when the
>>>> wireless connection is established. Here are the last steps of the
>>>> bisection:
>>>>
>>>> # bad: [c151aed6aa146e9587590051aba9da68b9370f9b] rtlwifi: rtl8188ee:
>>>> Update driver to match Realtek release of 06282014
>>>> git bisect bad c151aed6aa146e9587590051aba9da68b9370f9b
>>>> # good: [fd09ff958777cf583d7541f180991c0fc50bd2f7] rtlwifi: Remove
>>>> extra
>>>> workqueue for enter/leave power state
>>>> git bisect good fd09ff958777cf583d7541f180991c0fc50bd2f7
>>>> # skip: [9afa2e44f4d8f9d031f815c32bb8f225f0f6746b] rtlwifi: Modify
>>>> base.{c,h} for new drivers
>>>> git bisect skip 9afa2e44f4d8f9d031f815c32bb8f225f0f6746b
>>>> # skip: [3c67b8f9f3b5bb1207c9bb198e5ef04ff56921dd] rtlwifi: Modify
>>>> cam.{c,h} and efuse.{c,h} for new drivers
>>>> git bisect skip 3c67b8f9f3b5bb1207c9bb198e5ef04ff56921dd
>>>> # skip: [f7953b2ad66cc5fc66e13d5c0a40e61b45cdfca8] rtlwifi: Modify
>>>> core.c for new drivers
>>>> git bisect skip f7953b2ad66cc5fc66e13d5c0a40e61b45cdfca8
>>>> # skip: [d3feae41a3473a0f7b431d6af4e092865d586e52] rtlwifi: Update
>>>> power-save routines for 062814 driver
>>>> git bisect skip d3feae41a3473a0f7b431d6af4e092865d586e52
>>>> # skip: [38506ecefab911785d5e1aa5889f6eeb462e0954] rtlwifi: rtl_pci:
>>>> Start modification for new drivers
>>>> git bisect skip 38506ecefab911785d5e1aa5889f6eeb462e0954
>>>> # skip: [f3a97e93814aeac3f13e857a0071726acc9bd626] rtlwifi: Finish
>>>> modifying core routines for new drivers
>>>> git bisect skip f3a97e93814aeac3f13e857a0071726acc9bd626
>>>> # only skipped commits left to test
>>>> # possible first bad commit:
>>>> [c151aed6aa146e9587590051aba9da68b9370f9b]
>>>> rtlwifi: rtl8188ee: Update driver to match Realtek release of 06282014
>>>> # possible first bad commit:
>>>> [f3a97e93814aeac3f13e857a0071726acc9bd626]
>>>> rtlwifi: Finish modifying core routines for new drivers
>>>> # possible first bad commit:
>>>> [d3feae41a3473a0f7b431d6af4e092865d586e52]
>>>> rtlwifi: Update power-save routines for 062814 driver
>>>> # possible first bad commit:
>>>> [3c67b8f9f3b5bb1207c9bb198e5ef04ff56921dd]
>>>> rtlwifi: Modify cam.{c,h} and efuse.{c,h} for new drivers
>>>> # possible first bad commit:
>>>> [9afa2e44f4d8f9d031f815c32bb8f225f0f6746b]
>>>> rtlwifi: Modify base.{c,h} for new drivers
>>>> # possible first bad commit:
>>>> [f7953b2ad66cc5fc66e13d5c0a40e61b45cdfca8]
>>>> rtlwifi: Modify core.c for new drivers
>>>> # possible first bad commit:
>>>> [38506ecefab911785d5e1aa5889f6eeb462e0954]
>>>> rtlwifi: rtl_pci: Start modification for new drivers
>>>>
>>>> Can somebody explain what's happening ? I do the copy via Dolphin
>>>> in KDE
>>>> and the screen becomes black and the computer becomes totally
>>>> unresponsive. So, I don't have access to the logs to see the trace of
>>>> the problem.
>>>>
>>>> Thanks in advance for your help,
>>>
>>> There is a bug in 3.18 that is triggered when an O(3) memory
>>> allocation fails. There is a patch to fix this at
>>> http://marc.info/?l=linux-netdev&m=141999680927473&w=2 that has been
>>> merged into wireless-drivers as commit
>>> e9538cf4f90713eca71b1d6a74b4eae1d445c664. It will be applied to 3.18.X
>>> when it makes it into mainline 3.19-rcY, but that has not yet happened.
>>>
>>> You could manually apply that patch to your kernel source, or you
>>> could pull the git repo at http://github.com/lwfinger/rtlwifi_new.git.
>>> That code has this patch already applied.
>>>
>>> If this patch does not fix the problem, you might be able to capture
>>> at least part of the backtrace by starting the transfer and then
>>> switching to the logging console. When a crash happens, photograph the
>>> screen. On my system, I display it with CTRL-ALT-F10. I return to the
>>> normal graphical console with CTRL-ALT-F7, but your distro may use
>>> different virtual consoles.
>>>
>>> Larry
>>>
>>>
>> Thanks for your help, it seems that your patch solves the problem. Now,
>> the system doesn't crash anymore after copying the same large file than
>> yesterday. I also see this message in the log:
>> rtl_pci: Allocation of new skb failed in _rtl_pci_rx_interrupt which is
>> added by your patch.
>> Should I worry about this failure ? Or is it expected ?
>
> That is the positive proof that the new patch worked. Getting to that
> condition without the patch would have crashed the system. That printk
> is there to see if we were actually getting the condition and
> recovering. As the code is obviously working now, that line will be
> removed soon.
>
> Thanks,
>
> Larry
>
Do you still intend to remove the line about allocation failure in the
log ? I made a backup of my root partition compressed with pixz and that
line appeared 1350 times. So I removed the code which add this line. Is
it really expected that it occurs so often ? pixz use multithreading to
compress files and therefore at least 3 of the 4 CPU are used during
around 20 minutes, but are you sure there is no other problems ?

Thanks for your help,

François Valenduc

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Kernel crash while copying big files since kernel 3.18
  2015-01-11 14:35       ` François Valenduc
@ 2015-01-11 17:00         ` Larry Finger
  2015-01-25 19:38           ` François Valenduc
  0 siblings, 1 reply; 9+ messages in thread
From: Larry Finger @ 2015-01-11 17:00 UTC (permalink / raw)
  To: François Valenduc, linux-wireless

On 01/11/2015 08:35 AM, François Valenduc wrote:
> Do you still intend to remove the line about allocation failure in the
> log ? I made a backup of my root partition compressed with pixz and that
> line appeared 1350 times. So I removed the code which add this line. Is
> it really expected that it occurs so often ? pixz use multithreading to
> compress files and therefore at least 3 of the 4 CPU are used during
> around 20 minutes, but are you sure there is no other problems ?

Yes, I do intend to remove that line; however, I want to keep it for a while 
just in case there are other crashes. If this message never appears in that 
case, then there is another bug.

You are, of course, free to remove it from your system. BTW, how much memory do 
you have?

Larry



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Kernel crash while copying big files since kernel 3.18
  2015-01-11 17:00         ` Larry Finger
@ 2015-01-25 19:38           ` François Valenduc
  2015-01-25 20:24             ` Larry Finger
  0 siblings, 1 reply; 9+ messages in thread
From: François Valenduc @ 2015-01-25 19:38 UTC (permalink / raw)
  To: Larry Finger, linux-wireless

Le 11/01/15 18:00, Larry Finger a écrit :
> On 01/11/2015 08:35 AM, François Valenduc wrote:
>> Do you still intend to remove the line about allocation failure in the
>> log ? I made a backup of my root partition compressed with pixz and that
>> line appeared 1350 times. So I removed the code which add this line. Is
>> it really expected that it occurs so often ? pixz use multithreading to
>> compress files and therefore at least 3 of the 4 CPU are used during
>> around 20 minutes, but are you sure there is no other problems ?
>
> Yes, I do intend to remove that line; however, I want to keep it for a
> while just in case there are other crashes. If this message never
> appears in that case, then there is another bug.
>
> You are, of course, free to remove it from your system. BTW, how much
> memory do you have?
>
> Larry
>
>
Sorry for having forgotten to answer. I have 4 Gb of RAM. Taking a
backup of a DVD with k9copy also produces so much messages (1479 with a
patch using rate_limit).
Is it really expected that skb allocation fails so often ? Could there
be another problem ?

François

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Kernel crash while copying big files since kernel 3.18
  2015-01-25 19:38           ` François Valenduc
@ 2015-01-25 20:24             ` Larry Finger
  0 siblings, 0 replies; 9+ messages in thread
From: Larry Finger @ 2015-01-25 20:24 UTC (permalink / raw)
  To: François Valenduc, linux-wireless

On 01/25/2015 01:38 PM, François Valenduc wrote:
> Le 11/01/15 18:00, Larry Finger a écrit :
>> On 01/11/2015 08:35 AM, François Valenduc wrote:
>>> Do you still intend to remove the line about allocation failure in the
>>> log ? I made a backup of my root partition compressed with pixz and that
>>> line appeared 1350 times. So I removed the code which add this line. Is
>>> it really expected that it occurs so often ? pixz use multithreading to
>>> compress files and therefore at least 3 of the 4 CPU are used during
>>> around 20 minutes, but are you sure there is no other problems ?
>>
>> Yes, I do intend to remove that line; however, I want to keep it for a
>> while just in case there are other crashes. If this message never
>> appears in that case, then there is another bug.
>>
>> You are, of course, free to remove it from your system. BTW, how much
>> memory do you have?
>>
>> Larry
>>
>>
> Sorry for having forgotten to answer. I have 4 Gb of RAM. Taking a
> backup of a DVD with k9copy also produces so much messages (1479 with a
> patch using rate_limit).
> Is it really expected that skb allocation fails so often ? Could there
> be another problem ?

It is a matter of memory fragmentation. The driver uses a 9100-byte buffer, thus 
the allocation is of order 3. After a system has been running for a while, the 
number of memory blocks of that size may be small. I have not looked at the 
source of k9copy, but I suspect it also allocates large buffers. On a 4G system, 
both DMA and regular allocations come from the same pool of memory.

I have submitted a patch to remove the printout. You should drop it from your 
system.

I am considering a slightly different approach to skb allocation that would 
pre-allocate a number of buffers in a storage pool when the driver was started. 
When the interrupt routine needed one, it would extract it from the pool, which 
would be kept refilled by a work queue routine. If and when I prepare that 
patch, your workload would be a good test. When you used k9copy, was the DVD 
driver local and the destination remote, or were both local? If the latter, you 
are just suffering from memory starvation.

Larry



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-01-25 20:24 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-05  7:12 Kernel crash while copying big files since kernel 3.18 François Valenduc
2015-01-05 16:13 ` François Valenduc
2015-01-05 17:25 ` Larry Finger
2015-01-05 18:46   ` François Valenduc
2015-01-05 19:25     ` Larry Finger
2015-01-11 14:35       ` François Valenduc
2015-01-11 17:00         ` Larry Finger
2015-01-25 19:38           ` François Valenduc
2015-01-25 20:24             ` Larry Finger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.