linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RCU bug with v3.17-rc3 ?
@ 2014-09-04 18:40 Felipe Balbi
  2014-09-04 19:16 ` Paul E. McKenney
  0 siblings, 1 reply; 36+ messages in thread
From: Felipe Balbi @ 2014-09-04 18:40 UTC (permalink / raw)
  To: Linux USB Mailing List, Alan Stern, paulmck, josh,
	Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 8878 bytes --]

Hi,

I keep triggering the following Oops with -rc3 when writing to the mass
storage gadget driver:

| # modprobe g_mass_storage stall=0 removable=1 file=/dev/sda
| [   44.883554] Number of LUNs=8
| [   44.886709] Mass Storage Function, version: 2009/09/11
| [   44.892303] LUN: removable file: (no medium)
| [   44.896916] Number of LUNs=1
| [   44.901198] LUN: removable file: /dev/sda
| [   44.905410] Number of LUNs=1
| [   44.917706] g_mass_storage gadget: Mass Storage Gadget, version: 2009/09/11
| [   44.925018] g_mass_storage gadget: userspace failed to provide iSerialNumber
| [   44.932489] g_mass_storage gadget: g_mass_storage ready
| [   52.583773] g_mass_storage gadget: high-speed config #1: Linux File-Backed Storage
| # [   98.270585] Unable to handle kernel paging request at virtual address ffffffff
| [   98.278198] pgd = c0004000
| [   98.281027] [ffffffff] *pgd=ae7f6821, *pte=00000000, *ppte=00000000
| [   98.287648] Internal error: Oops: 17 [#1] SMP ARM
| [   98.292559] Modules linked in: g_mass_storage usb_f_mass_storage libcomposite configfs usb_storage xhci_hcd dwc3 udc_core matrix_keypad lis3lv02d_i2c dwc3_omap lis3lv02d input_polldev
| [   98.309721] CPU: 0 PID: 1820 Comm: file-storage Not tainted 3.17.0-rc3-00013-gc6b1a7d #806
| [   98.318346] task: ec356040 ti: ec378000 task.ti: ec378000
| [   98.324000] PC is at find_get_entry+0x7c/0x128
| [   98.328640] LR is at 0xfffffffa
| [   98.331912] pc : [<c011394c>]    lr : [<fffffffa>]    psr: a0000013
| [   98.331912] sp : ec379b50  ip : 00000000  fp : ec379b84
| [   98.343888] r10: c0c81243  r9 : 00000001  r8 : ea123d28
| [   98.349352] r7 : ec378010  r6 : 00000001  r5 : 00000000  r4 : 0000000f
| [   98.356181] r3 : ec379b3c  r2 : 00000000  r1 : 00000001  r0 : ffffffff
| [   98.363006] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
| [   98.370646] Control: 10c5387d  Table: ac2b0059  DAC: 00000015
| [   98.376641] Process file-storage (pid: 1820, stack limit = 0xec378248)
| [   98.383454] Stack: (0xec379b50 to 0xec37a000)
| [   98.388003] 9b40:                                     00000000 00000000 c01138d0 c002aa3c
| [   98.396560] 9b60: 0000000f 00000000 ea123d24 000200d0 00000001 000000d0 ec379bbc ec379b88
| [   98.405100] 9b80: c0114360 c01138dc c1486a00 60000013 ec379bc4 00001400 00000000 ea123d24
| [   98.413635] 9ba0: 00000c00 00000400 ec378010 c06dea0c ec379bdc ec379bc0 c011478c c0114330
| [   98.422183] 9bc0: 000000d0 c00904f8 c1486a00 00001400 ec379c04 ec379be0 c019cd68 c0114760
| [   98.430732] 9be0: c0090808 c0090590 ec379c34 00000001 00000c00 ea123d24 ec379c2c ec379c08
| [   98.439300] 9c00: c019ecbc c019cd44 00000c00 00000001 ec379c58 c019eb9c 00000c00 ec379d54
| [   98.447860] 9c20: ec379c8c ec379c30 c0113f14 c019ec8c 00000c00 00000001 ec379c58 ec379c5c
| [   98.456414] 9c40: ec378030 00000001 ec250cc0 00000000 00001400 00000000 c018195c c00acd08
| [   98.464974] 9c60: 5408b05a 00001000 ec250cc0 00000000 ec379d68 ea123d24 ec378010 00000000
| [   98.473533] 9c80: ec379cf4 ec379c90 c0115ed4 c0113e6c 00000001 00000000 c019f2b0 c0090590
| [   98.482071] 9ca0: ec379cc4 ec378010 c06c3df4 00001000 ea123c64 c019f2b0 ec379d54 ec379cc8
| [   98.490607] 9cc0: 00001400 00000000 00000001 ec379d68 ec379d54 ec379e30 ec250cc0 ec356040
| [   98.499178] 9ce0: ed7ab800 ec30d800 ec379d3c ec379cf8 c019f2b0 c0115c8c c06be3b8 c006dcec
| [   98.507741] 9d00: ec1b0010 ec30d800 ec379d08 ec379d08 ec379d10 ec379d10 ec379d18 ec379d18
| [   98.516288] 9d20: 00001400 00000000 ec379e30 ec250cc0 ec379dc4 ec379d40 c016618c c019f284
| [   98.524833] 9d40: 00001000 c0317b78 ec379d7c ec394000 00001000 00000003 00000000 00001000
| [   98.533385] 9d60: ec379d4c 00000001 ec250cc0 00000000 00000000 00000000 ec356040 00000000
| [   98.541946] 9d80: 00000000 00000000 00001400 00000000 00001000 00000000 00000000 00000000
| [   98.550482] 9da0: ec394000 ec250cc0 ec394000 ec379e30 00001000 00001000 ec379df4 ec379dc8
| [   98.559023] 9dc0: c0166a3c c01660f4 00000002 ec0ace20 00001000 0000000e ec0ace00 00000000
| [   98.567567] 9de0: 00001000 ed7ab800 ec379e64 ec379df8 bf0bc3b4 c0166994 0000006f 00001000
| [   98.576112] 9e00: bf0bc7a4 60000013 e8156000 0000000e 3930343d 00000000 bf0bc7a4 ec0ace00
| [   98.584660] 9e20: 00002400 00000000 00001400 00000000 00001400 00000000 ec379e64 00000000
| [   98.593193] 9e40: ed36ddc0 ec378018 ec30d894 ec0ace00 ec30d800 ec30d840 ec379ed4 ec379e68
| [   98.601754] 9e60: bf0bd1c8 bf0bc08c bf0bf6ec ec378010 c06c3df4 ec356040 00000001 00000000
| [   98.610305] 9e80: ec379eac ec379e90 c00906b0 c00904f8 ec30d894 ed36ddc0 ec378018 ec30d894
| [   98.618857] 9ea0: ec379ebc ec379eb0 c0090808 ec30d800 ed36ddc0 ec378018 ec30d894 00000000
| [   98.627405] 9ec0: 00000200 ec0ace00 ec379f14 ec379ed8 bf0bdbe8 bf0bc74c c06c3d94 ec0acc80
| [   98.635942] 9ee0: ec394000 ec30d800 bf0bd8cc ec0acc80 00000000 ec30d800 bf0bd8cc 00000000
| [   98.644465] 9f00: 00000000 00000000 ec379fac ec379f18 c0066ac4 bf0bd8d8 ed1d1040 00000000
| [   98.652990] 9f20: ec379f3c ec30d800 00000000 00000000 dead4ead ffffffff ffffffff c0c86138
| [   98.661526] 9f40: 00000000 00000000 c08998e0 00000000 c006dd7c ec379f54 ec379f54 00000000
| [   98.670077] 9f60: 00000000 dead4ead ffffffff ffffffff c0c86138 00000000 00000000 c08998e0
| [   98.678612] 9f80: 00000000 ec379f90 ec379f88 ec379f88 ec0acc80 c00669e0 00000000 00000000
| [   98.687148] 9fa0: 00000000 ec379fb0 c000eea8 c00669ec 00000000 00000000 00000000 00000000
| [   98.695699] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
| [   98.704249] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
| [   98.712805] [<c011394c>] (find_get_entry) from [<c0114360>] (pagecache_get_page+0x3c/0x1f0)
| [   98.721529] [<c0114360>] (pagecache_get_page) from [<c011478c>] (grab_cache_page_write_begin+0x38/0x50)
| [   98.731345] [<c011478c>] (grab_cache_page_write_begin) from [<c019cd68>] (block_write_begin+0x30/0x90)
| [   98.741067] [<c019cd68>] (block_write_begin) from [<c019ecbc>] (blkdev_write_begin+0x3c/0x48)
| [   98.749974] [<c019ecbc>] (blkdev_write_begin) from [<c0113f14>] (generic_perform_write+0xb4/0x1e4)
| [   98.759335] [<c0113f14>] (generic_perform_write) from [<c0115ed4>] (__generic_file_write_iter+0x254/0x51c)
| [   98.769424] [<c0115ed4>] (__generic_file_write_iter) from [<c019f2b0>] (blkdev_write_iter+0x38/0xc0)
| [   98.778978] [<c019f2b0>] (blkdev_write_iter) from [<c016618c>] (new_sync_write+0xa4/0xcc)
| [   98.787526] [<c016618c>] (new_sync_write) from [<c0166a3c>] (vfs_write+0xb4/0x1c0)
| [   98.795462] [<c0166a3c>] (vfs_write) from [<bf0bc3b4>] (do_write+0x334/0x53c [usb_f_mass_storage])
| [   98.804858] [<bf0bc3b4>] (do_write [usb_f_mass_storage]) from [<bf0bd1c8>] (do_scsi_command+0xa88/0x118c [usb_f_mass_storage])
| [   98.816782] [<bf0bd1c8>] (do_scsi_command [usb_f_mass_storage]) from [<bf0bdbe8>] (fsg_main_thread+0x31c/0x72c [usb_f_mass_storage])
| [   98.829249] [<bf0bdbe8>] (fsg_main_thread [usb_f_mass_storage]) from [<c0066ac4>] (kthread+0xe4/0x100)
| [   98.838993] [<c0066ac4>] (kthread) from [<c000eea8>] (ret_from_fork+0x14/0x20)
| [   98.846554] Code: e1a01009 eb0905d4 e3500000 0a00001f (e5904000) 
| [   98.853110] ---[ end trace 8bdf31522b942652 ]---


The setup is a bit "odd", I have a USB stick attached to the host port
on my platform and the peripheral port uses that stick as backing file.
that is connected to a laptop which I'm using to read/write to that
backing file. The problem doesn't seem to trigger if I run the exact
same test straight to the USB stick which is attached to the host port.

My test application is rather basic [1] which I run with a script [2] to
pass sensible arguments. I haven't found another way to reproducing this
yet, so it could very well be that g_mass_storage is at fault here, as I
also managed to trigger this when using a tmpfs as backing file.

Anyway, looking at PC:

| (gdb) list *(find_get_entry+0x7c)
| 0xc011394c is in find_get_entry (include/linux/radix-tree.h:196).
| 191      * radix_tree_deref_retry must be used to confirm validity of the pointer if
| 192      * only the read lock is held.
| 193      */
| 194     static inline void *radix_tree_deref_slot(void **pslot)
| 195     {
| 196             return rcu_dereference(*pslot);
| 197     }
| 198
| 199     /**
| 200      * radix_tree_deref_slot_protected      - dereference a slot without RCU lock but with tree lock held
| (gdb) 

And looking at the arguments for that function, we're passing r0 as
0xffffffff and r1 as 1, which clearly is bogus, but I don't know, at
least not yet, where did those come from. I'll see if I can reproduce
the same problem with dummy_hcd to rule out a bug in my dwc3 driver :-)

cheers

-- 
balbi

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: RCU bug with v3.17-rc3 ?
  2014-09-04 18:40 RCU bug with v3.17-rc3 ? Felipe Balbi
@ 2014-09-04 19:16 ` Paul E. McKenney
  2014-09-04 19:25   ` Felipe Balbi
  0 siblings, 1 reply; 36+ messages in thread
From: Paul E. McKenney @ 2014-09-04 19:16 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: Linux USB Mailing List, Alan Stern, josh, Linux Kernel Mailing List

On Thu, Sep 04, 2014 at 01:40:21PM -0500, Felipe Balbi wrote:
> Hi,
> 
> I keep triggering the following Oops with -rc3 when writing to the mass
> storage gadget driver:

v3.17-rc3, correct?

I take it that the test passes on some earlier version?

							Thanx, Paul

> | # modprobe g_mass_storage stall=0 removable=1 file=/dev/sda
> | [   44.883554] Number of LUNs=8
> | [   44.886709] Mass Storage Function, version: 2009/09/11
> | [   44.892303] LUN: removable file: (no medium)
> | [   44.896916] Number of LUNs=1
> | [   44.901198] LUN: removable file: /dev/sda
> | [   44.905410] Number of LUNs=1
> | [   44.917706] g_mass_storage gadget: Mass Storage Gadget, version: 2009/09/11
> | [   44.925018] g_mass_storage gadget: userspace failed to provide iSerialNumber
> | [   44.932489] g_mass_storage gadget: g_mass_storage ready
> | [   52.583773] g_mass_storage gadget: high-speed config #1: Linux File-Backed Storage
> | # [   98.270585] Unable to handle kernel paging request at virtual address ffffffff
> | [   98.278198] pgd = c0004000
> | [   98.281027] [ffffffff] *pgd=ae7f6821, *pte=00000000, *ppte=00000000
> | [   98.287648] Internal error: Oops: 17 [#1] SMP ARM
> | [   98.292559] Modules linked in: g_mass_storage usb_f_mass_storage libcomposite configfs usb_storage xhci_hcd dwc3 udc_core matrix_keypad lis3lv02d_i2c dwc3_omap lis3lv02d input_polldev
> | [   98.309721] CPU: 0 PID: 1820 Comm: file-storage Not tainted 3.17.0-rc3-00013-gc6b1a7d #806
> | [   98.318346] task: ec356040 ti: ec378000 task.ti: ec378000
> | [   98.324000] PC is at find_get_entry+0x7c/0x128
> | [   98.328640] LR is at 0xfffffffa
> | [   98.331912] pc : [<c011394c>]    lr : [<fffffffa>]    psr: a0000013
> | [   98.331912] sp : ec379b50  ip : 00000000  fp : ec379b84
> | [   98.343888] r10: c0c81243  r9 : 00000001  r8 : ea123d28
> | [   98.349352] r7 : ec378010  r6 : 00000001  r5 : 00000000  r4 : 0000000f
> | [   98.356181] r3 : ec379b3c  r2 : 00000000  r1 : 00000001  r0 : ffffffff
> | [   98.363006] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
> | [   98.370646] Control: 10c5387d  Table: ac2b0059  DAC: 00000015
> | [   98.376641] Process file-storage (pid: 1820, stack limit = 0xec378248)
> | [   98.383454] Stack: (0xec379b50 to 0xec37a000)
> | [   98.388003] 9b40:                                     00000000 00000000 c01138d0 c002aa3c
> | [   98.396560] 9b60: 0000000f 00000000 ea123d24 000200d0 00000001 000000d0 ec379bbc ec379b88
> | [   98.405100] 9b80: c0114360 c01138dc c1486a00 60000013 ec379bc4 00001400 00000000 ea123d24
> | [   98.413635] 9ba0: 00000c00 00000400 ec378010 c06dea0c ec379bdc ec379bc0 c011478c c0114330
> | [   98.422183] 9bc0: 000000d0 c00904f8 c1486a00 00001400 ec379c04 ec379be0 c019cd68 c0114760
> | [   98.430732] 9be0: c0090808 c0090590 ec379c34 00000001 00000c00 ea123d24 ec379c2c ec379c08
> | [   98.439300] 9c00: c019ecbc c019cd44 00000c00 00000001 ec379c58 c019eb9c 00000c00 ec379d54
> | [   98.447860] 9c20: ec379c8c ec379c30 c0113f14 c019ec8c 00000c00 00000001 ec379c58 ec379c5c
> | [   98.456414] 9c40: ec378030 00000001 ec250cc0 00000000 00001400 00000000 c018195c c00acd08
> | [   98.464974] 9c60: 5408b05a 00001000 ec250cc0 00000000 ec379d68 ea123d24 ec378010 00000000
> | [   98.473533] 9c80: ec379cf4 ec379c90 c0115ed4 c0113e6c 00000001 00000000 c019f2b0 c0090590
> | [   98.482071] 9ca0: ec379cc4 ec378010 c06c3df4 00001000 ea123c64 c019f2b0 ec379d54 ec379cc8
> | [   98.490607] 9cc0: 00001400 00000000 00000001 ec379d68 ec379d54 ec379e30 ec250cc0 ec356040
> | [   98.499178] 9ce0: ed7ab800 ec30d800 ec379d3c ec379cf8 c019f2b0 c0115c8c c06be3b8 c006dcec
> | [   98.507741] 9d00: ec1b0010 ec30d800 ec379d08 ec379d08 ec379d10 ec379d10 ec379d18 ec379d18
> | [   98.516288] 9d20: 00001400 00000000 ec379e30 ec250cc0 ec379dc4 ec379d40 c016618c c019f284
> | [   98.524833] 9d40: 00001000 c0317b78 ec379d7c ec394000 00001000 00000003 00000000 00001000
> | [   98.533385] 9d60: ec379d4c 00000001 ec250cc0 00000000 00000000 00000000 ec356040 00000000
> | [   98.541946] 9d80: 00000000 00000000 00001400 00000000 00001000 00000000 00000000 00000000
> | [   98.550482] 9da0: ec394000 ec250cc0 ec394000 ec379e30 00001000 00001000 ec379df4 ec379dc8
> | [   98.559023] 9dc0: c0166a3c c01660f4 00000002 ec0ace20 00001000 0000000e ec0ace00 00000000
> | [   98.567567] 9de0: 00001000 ed7ab800 ec379e64 ec379df8 bf0bc3b4 c0166994 0000006f 00001000
> | [   98.576112] 9e00: bf0bc7a4 60000013 e8156000 0000000e 3930343d 00000000 bf0bc7a4 ec0ace00
> | [   98.584660] 9e20: 00002400 00000000 00001400 00000000 00001400 00000000 ec379e64 00000000
> | [   98.593193] 9e40: ed36ddc0 ec378018 ec30d894 ec0ace00 ec30d800 ec30d840 ec379ed4 ec379e68
> | [   98.601754] 9e60: bf0bd1c8 bf0bc08c bf0bf6ec ec378010 c06c3df4 ec356040 00000001 00000000
> | [   98.610305] 9e80: ec379eac ec379e90 c00906b0 c00904f8 ec30d894 ed36ddc0 ec378018 ec30d894
> | [   98.618857] 9ea0: ec379ebc ec379eb0 c0090808 ec30d800 ed36ddc0 ec378018 ec30d894 00000000
> | [   98.627405] 9ec0: 00000200 ec0ace00 ec379f14 ec379ed8 bf0bdbe8 bf0bc74c c06c3d94 ec0acc80
> | [   98.635942] 9ee0: ec394000 ec30d800 bf0bd8cc ec0acc80 00000000 ec30d800 bf0bd8cc 00000000
> | [   98.644465] 9f00: 00000000 00000000 ec379fac ec379f18 c0066ac4 bf0bd8d8 ed1d1040 00000000
> | [   98.652990] 9f20: ec379f3c ec30d800 00000000 00000000 dead4ead ffffffff ffffffff c0c86138
> | [   98.661526] 9f40: 00000000 00000000 c08998e0 00000000 c006dd7c ec379f54 ec379f54 00000000
> | [   98.670077] 9f60: 00000000 dead4ead ffffffff ffffffff c0c86138 00000000 00000000 c08998e0
> | [   98.678612] 9f80: 00000000 ec379f90 ec379f88 ec379f88 ec0acc80 c00669e0 00000000 00000000
> | [   98.687148] 9fa0: 00000000 ec379fb0 c000eea8 c00669ec 00000000 00000000 00000000 00000000
> | [   98.695699] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> | [   98.704249] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
> | [   98.712805] [<c011394c>] (find_get_entry) from [<c0114360>] (pagecache_get_page+0x3c/0x1f0)
> | [   98.721529] [<c0114360>] (pagecache_get_page) from [<c011478c>] (grab_cache_page_write_begin+0x38/0x50)
> | [   98.731345] [<c011478c>] (grab_cache_page_write_begin) from [<c019cd68>] (block_write_begin+0x30/0x90)
> | [   98.741067] [<c019cd68>] (block_write_begin) from [<c019ecbc>] (blkdev_write_begin+0x3c/0x48)
> | [   98.749974] [<c019ecbc>] (blkdev_write_begin) from [<c0113f14>] (generic_perform_write+0xb4/0x1e4)
> | [   98.759335] [<c0113f14>] (generic_perform_write) from [<c0115ed4>] (__generic_file_write_iter+0x254/0x51c)
> | [   98.769424] [<c0115ed4>] (__generic_file_write_iter) from [<c019f2b0>] (blkdev_write_iter+0x38/0xc0)
> | [   98.778978] [<c019f2b0>] (blkdev_write_iter) from [<c016618c>] (new_sync_write+0xa4/0xcc)
> | [   98.787526] [<c016618c>] (new_sync_write) from [<c0166a3c>] (vfs_write+0xb4/0x1c0)
> | [   98.795462] [<c0166a3c>] (vfs_write) from [<bf0bc3b4>] (do_write+0x334/0x53c [usb_f_mass_storage])
> | [   98.804858] [<bf0bc3b4>] (do_write [usb_f_mass_storage]) from [<bf0bd1c8>] (do_scsi_command+0xa88/0x118c [usb_f_mass_storage])
> | [   98.816782] [<bf0bd1c8>] (do_scsi_command [usb_f_mass_storage]) from [<bf0bdbe8>] (fsg_main_thread+0x31c/0x72c [usb_f_mass_storage])
> | [   98.829249] [<bf0bdbe8>] (fsg_main_thread [usb_f_mass_storage]) from [<c0066ac4>] (kthread+0xe4/0x100)
> | [   98.838993] [<c0066ac4>] (kthread) from [<c000eea8>] (ret_from_fork+0x14/0x20)
> | [   98.846554] Code: e1a01009 eb0905d4 e3500000 0a00001f (e5904000) 
> | [   98.853110] ---[ end trace 8bdf31522b942652 ]---
> 
> 
> The setup is a bit "odd", I have a USB stick attached to the host port
> on my platform and the peripheral port uses that stick as backing file.
> that is connected to a laptop which I'm using to read/write to that
> backing file. The problem doesn't seem to trigger if I run the exact
> same test straight to the USB stick which is attached to the host port.
> 
> My test application is rather basic [1] which I run with a script [2] to
> pass sensible arguments. I haven't found another way to reproducing this
> yet, so it could very well be that g_mass_storage is at fault here, as I
> also managed to trigger this when using a tmpfs as backing file.
> 
> Anyway, looking at PC:
> 
> | (gdb) list *(find_get_entry+0x7c)
> | 0xc011394c is in find_get_entry (include/linux/radix-tree.h:196).
> | 191      * radix_tree_deref_retry must be used to confirm validity of the pointer if
> | 192      * only the read lock is held.
> | 193      */
> | 194     static inline void *radix_tree_deref_slot(void **pslot)
> | 195     {
> | 196             return rcu_dereference(*pslot);
> | 197     }
> | 198
> | 199     /**
> | 200      * radix_tree_deref_slot_protected      - dereference a slot without RCU lock but with tree lock held
> | (gdb) 
> 
> And looking at the arguments for that function, we're passing r0 as
> 0xffffffff and r1 as 1, which clearly is bogus, but I don't know, at
> least not yet, where did those come from. I'll see if I can reproduce
> the same problem with dummy_hcd to rule out a bug in my dwc3 driver :-)
> 
> cheers
> 
> -- 
> balbi



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: RCU bug with v3.17-rc3 ?
  2014-09-04 19:16 ` Paul E. McKenney
@ 2014-09-04 19:25   ` Felipe Balbi
  2014-09-04 20:04     ` Felipe Balbi
  0 siblings, 1 reply; 36+ messages in thread
From: Felipe Balbi @ 2014-09-04 19:25 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Felipe Balbi, Linux USB Mailing List, Alan Stern, josh,
	Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 407 bytes --]

On Thu, Sep 04, 2014 at 12:16:42PM -0700, Paul E. McKenney wrote:
> On Thu, Sep 04, 2014 at 01:40:21PM -0500, Felipe Balbi wrote:
> > Hi,
> > 
> > I keep triggering the following Oops with -rc3 when writing to the mass
> > storage gadget driver:
> 
> v3.17-rc3, correct?

yup, as in subject ;-)

> I take it that the test passes on some earlier version?

about to test v3.14.17.

-- 
balbi

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: RCU bug with v3.17-rc3 ?
  2014-09-04 19:25   ` Felipe Balbi
@ 2014-09-04 20:04     ` Felipe Balbi
  2014-09-05 21:32       ` Paul E. McKenney
  0 siblings, 1 reply; 36+ messages in thread
From: Felipe Balbi @ 2014-09-04 20:04 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: Paul E. McKenney, Linux USB Mailing List, Alan Stern, josh,
	Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 4743 bytes --]

Hi,

On Thu, Sep 04, 2014 at 02:25:35PM -0500, Felipe Balbi wrote:
> On Thu, Sep 04, 2014 at 12:16:42PM -0700, Paul E. McKenney wrote:
> > On Thu, Sep 04, 2014 at 01:40:21PM -0500, Felipe Balbi wrote:
> > > Hi,
> > > 
> > > I keep triggering the following Oops with -rc3 when writing to the mass
> > > storage gadget driver:
> > 
> > v3.17-rc3, correct?
> 
> yup, as in subject ;-)
> 
> > I take it that the test passes on some earlier version?
> 
> about to test v3.14.17.

coudln't get v3.14 working on this board but at least v3.16 is also
affected except that on now it happened during boot, I didn't even need
to run my test:

[   17.438195] Unable to handle kernel paging request at virtual address ffffffff
[   17.446109] pgd = ec360000
[   17.448947] [ffffffff] *pgd=ae7f6821, *pte=00000000, *ppte=00000000
[   17.455639] Internal error: Oops: 17 [#1] SMP ARM
[   17.460578] Modules linked in: dwc3(+) udc_core lis3lv02d_i2c lis3lv02d input_polldev dwc3_omap matrix_keypad
[   17.471060] CPU: 0 PID: 1381 Comm: accounts-daemon Tainted: G W     3.16.0-00005-g8a6cdb4 #811
[   17.480735] task: ed716040 ti: ec026000 task.ti: ec026000
[   17.486405] PC is at find_get_entry+0x7c/0x128
[   17.491070] LR is at 0xfffffffa
[   17.494364] pc : [<c0110b4c>]    lr : [<fffffffa>]    psr: a0000013
[   17.494364] sp : ec027dc8  ip : 00000000  fp : ec027dfc
[   17.506384] r10: c0c6f6bc  r9 : 00000005  r8 : ecdf22f8
[   17.511860] r7 : ec026008  r6 : 00000001  r5 : 00000000  r4 : 00000000
[   17.518705] r3 : ec027db4  r2 : 00000000  r1 : 00000005  r0 : ffffffff
[   17.525526] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM Segment user
[   17.533007] Control: 10c5387d  Table: ac360059  DAC: 00000015
[   17.539020] Process accounts-daemon (pid: 1381, stack limit = 0xec026248)
[   17.546151] Stack: (0xec027dc8 to 0xec028000)
[   17.550710] 7dc0:                   00000000 00000000 c0110ad0 ecdf0b80 00000000 ecdf22f4
[   17.559259] 7de0: ecdf22f4 00000000 00000005 00000000 ec027e34 ec027e00 c0111874 c0110adc
[   17.567824] 7e00: ecdf0b80 c03565b4 ed7165f8 ec3dddf0 ecdf22f4 00000005 ec3ddd00 00000001
[   17.576385] 7e20: ecdf21a0 00000000 ec027ebc ec027e38 c0112978 c0111844 00000000 c06af938
[   17.584950] 7e40: ecdf0b70 ecdf0b70 ec027e6c ec027e58 00000005 00000006 00000b80 ecdf0b70
[   17.593514] 7e60: 00000000 c0163264 ec3dddf0 ec027ee8 ec027ed4 00000b80 ec027eac ec027e88
[   17.602087] 7e80: c0178d98 c0356590 00000000 00000000 00020000 00005b80 00000000 ec027f78
[   17.610653] 7ea0: ec3ddd00 ed716040 b6cab018 00000000 ec027f44 ec027ec0 c0163264 c0112780
[   17.619202] 7ec0: 00000180 00000180 ec027efc b6cab018 00000180 00000000 00000000 00000180
[   17.627772] 7ee0: ec027ecc 00000001 ec3ddd00 00000000 00000000 00000000 ed716040 00000000
[   17.636371] 7f00: 00000000 00000000 00005b80 00000000 00000180 00000000 00000000 00000000
[   17.644946] 7f20: b6cab018 ec3ddd00 b6cab018 ec027f78 ec3ddd00 00000180 ec027f74 ec027f48
[   17.653524] 7f40: c0163a6c c01631cc b6cab018 00000000 00005b80 00000000 ec3ddd03 ec3ddd00
[   17.662085] 7f60: 00000180 b6cab018 ec027fa4 ec027f78 c0164198 c01639e0 00005b80 00000000
[   17.670658] 7f80: be91badc be91ba50 00044a00 00000003 c000f044 ec026000 00000000 ec027fa8
[   17.679222] 7fa0: c000edc0 c0164158 be91badc be91ba50 00000008 b6cab018 00000180 be91ba38
[   17.687794] 7fc0: be91badc be91ba50 00044a00 00000003 be91bbac b6cab008 00000000 00000000
[   17.696370] 7fe0: 00000020 be91ba40 b6c78e8c b6c78ea8 60000010 00000008 ae7f6821 ae7f6c21
[   17.704956] [<c0110b4c>] (find_get_entry) from [<c0111874>] (pagecache_get_page+0x3c/0x1f4)
[   17.713687] [<c0111874>] (pagecache_get_page) from [<c0112978>] (generic_file_read_iter+0x204/0x794)
[   17.723259] [<c0112978>] (generic_file_read_iter) from [<c0163264>] (new_sync_read+0xa4/0xcc)
[   17.732185] [<c0163264>] (new_sync_read) from [<c0163a6c>] (vfs_read+0x98/0x158)
[   17.739945] [<c0163a6c>] (vfs_read) from [<c0164198>] (SyS_read+0x4c/0xa0)
[   17.747149] [<c0164198>] (SyS_read) from [<c000edc0>] (ret_fast_syscall+0x0/0x48)
[   17.754994] Code: e1a01009 eb08ffa9 e3500000 0a00001f (e5904000) 
[   17.761476] ---[ end trace 49c4ed35a1c01157 ]---

It seems to be a difficult-to-reproduce race though. On a second boot it
didn't die during boot, but died with my USB test case. Unfortunately,
the platform I'm using is pretty new and only goes as far back as v3.16
(which I had to backport 11 patches to get it to boot good enough for
this test).

I wonder if a corrupt file system could cause such problems... I keep
seeing EXT4 errors every now and again; considering that this dies in a
path through VFS, I wonder...

cheers

-- 
balbi

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: RCU bug with v3.17-rc3 ?
  2014-09-04 20:04     ` Felipe Balbi
@ 2014-09-05 21:32       ` Paul E. McKenney
  2014-10-08 17:13         ` Felipe Balbi
  0 siblings, 1 reply; 36+ messages in thread
From: Paul E. McKenney @ 2014-09-05 21:32 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: Linux USB Mailing List, Alan Stern, josh, Linux Kernel Mailing List

On Thu, Sep 04, 2014 at 03:04:03PM -0500, Felipe Balbi wrote:
> Hi,
> 
> On Thu, Sep 04, 2014 at 02:25:35PM -0500, Felipe Balbi wrote:
> > On Thu, Sep 04, 2014 at 12:16:42PM -0700, Paul E. McKenney wrote:
> > > On Thu, Sep 04, 2014 at 01:40:21PM -0500, Felipe Balbi wrote:
> > > > Hi,
> > > > 
> > > > I keep triggering the following Oops with -rc3 when writing to the mass
> > > > storage gadget driver:
> > > 
> > > v3.17-rc3, correct?
> > 
> > yup, as in subject ;-)
> > 
> > > I take it that the test passes on some earlier version?
> > 
> > about to test v3.14.17.
> 
> coudln't get v3.14 working on this board but at least v3.16 is also
> affected except that on now it happened during boot, I didn't even need
> to run my test:
> 
> [   17.438195] Unable to handle kernel paging request at virtual address ffffffff
> [   17.446109] pgd = ec360000
> [   17.448947] [ffffffff] *pgd=ae7f6821, *pte=00000000, *ppte=00000000
> [   17.455639] Internal error: Oops: 17 [#1] SMP ARM
> [   17.460578] Modules linked in: dwc3(+) udc_core lis3lv02d_i2c lis3lv02d input_polldev dwc3_omap matrix_keypad
> [   17.471060] CPU: 0 PID: 1381 Comm: accounts-daemon Tainted: G W     3.16.0-00005-g8a6cdb4 #811
> [   17.480735] task: ed716040 ti: ec026000 task.ti: ec026000
> [   17.486405] PC is at find_get_entry+0x7c/0x128
> [   17.491070] LR is at 0xfffffffa
> [   17.494364] pc : [<c0110b4c>]    lr : [<fffffffa>]    psr: a0000013
> [   17.494364] sp : ec027dc8  ip : 00000000  fp : ec027dfc
> [   17.506384] r10: c0c6f6bc  r9 : 00000005  r8 : ecdf22f8
> [   17.511860] r7 : ec026008  r6 : 00000001  r5 : 00000000  r4 : 00000000
> [   17.518705] r3 : ec027db4  r2 : 00000000  r1 : 00000005  r0 : ffffffff
> [   17.525526] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM Segment user
> [   17.533007] Control: 10c5387d  Table: ac360059  DAC: 00000015
> [   17.539020] Process accounts-daemon (pid: 1381, stack limit = 0xec026248)
> [   17.546151] Stack: (0xec027dc8 to 0xec028000)
> [   17.550710] 7dc0:                   00000000 00000000 c0110ad0 ecdf0b80 00000000 ecdf22f4
> [   17.559259] 7de0: ecdf22f4 00000000 00000005 00000000 ec027e34 ec027e00 c0111874 c0110adc
> [   17.567824] 7e00: ecdf0b80 c03565b4 ed7165f8 ec3dddf0 ecdf22f4 00000005 ec3ddd00 00000001
> [   17.576385] 7e20: ecdf21a0 00000000 ec027ebc ec027e38 c0112978 c0111844 00000000 c06af938
> [   17.584950] 7e40: ecdf0b70 ecdf0b70 ec027e6c ec027e58 00000005 00000006 00000b80 ecdf0b70
> [   17.593514] 7e60: 00000000 c0163264 ec3dddf0 ec027ee8 ec027ed4 00000b80 ec027eac ec027e88
> [   17.602087] 7e80: c0178d98 c0356590 00000000 00000000 00020000 00005b80 00000000 ec027f78
> [   17.610653] 7ea0: ec3ddd00 ed716040 b6cab018 00000000 ec027f44 ec027ec0 c0163264 c0112780
> [   17.619202] 7ec0: 00000180 00000180 ec027efc b6cab018 00000180 00000000 00000000 00000180
> [   17.627772] 7ee0: ec027ecc 00000001 ec3ddd00 00000000 00000000 00000000 ed716040 00000000
> [   17.636371] 7f00: 00000000 00000000 00005b80 00000000 00000180 00000000 00000000 00000000
> [   17.644946] 7f20: b6cab018 ec3ddd00 b6cab018 ec027f78 ec3ddd00 00000180 ec027f74 ec027f48
> [   17.653524] 7f40: c0163a6c c01631cc b6cab018 00000000 00005b80 00000000 ec3ddd03 ec3ddd00
> [   17.662085] 7f60: 00000180 b6cab018 ec027fa4 ec027f78 c0164198 c01639e0 00005b80 00000000
> [   17.670658] 7f80: be91badc be91ba50 00044a00 00000003 c000f044 ec026000 00000000 ec027fa8
> [   17.679222] 7fa0: c000edc0 c0164158 be91badc be91ba50 00000008 b6cab018 00000180 be91ba38
> [   17.687794] 7fc0: be91badc be91ba50 00044a00 00000003 be91bbac b6cab008 00000000 00000000
> [   17.696370] 7fe0: 00000020 be91ba40 b6c78e8c b6c78ea8 60000010 00000008 ae7f6821 ae7f6c21
> [   17.704956] [<c0110b4c>] (find_get_entry) from [<c0111874>] (pagecache_get_page+0x3c/0x1f4)
> [   17.713687] [<c0111874>] (pagecache_get_page) from [<c0112978>] (generic_file_read_iter+0x204/0x794)
> [   17.723259] [<c0112978>] (generic_file_read_iter) from [<c0163264>] (new_sync_read+0xa4/0xcc)
> [   17.732185] [<c0163264>] (new_sync_read) from [<c0163a6c>] (vfs_read+0x98/0x158)
> [   17.739945] [<c0163a6c>] (vfs_read) from [<c0164198>] (SyS_read+0x4c/0xa0)
> [   17.747149] [<c0164198>] (SyS_read) from [<c000edc0>] (ret_fast_syscall+0x0/0x48)
> [   17.754994] Code: e1a01009 eb08ffa9 e3500000 0a00001f (e5904000) 
> [   17.761476] ---[ end trace 49c4ed35a1c01157 ]---
> 
> It seems to be a difficult-to-reproduce race though. On a second boot it
> didn't die during boot, but died with my USB test case. Unfortunately,
> the platform I'm using is pretty new and only goes as far back as v3.16
> (which I had to backport 11 patches to get it to boot good enough for
> this test).
> 
> I wonder if a corrupt file system could cause such problems... I keep
> seeing EXT4 errors every now and again; considering that this dies in a
> path through VFS, I wonder...

I recall hearing of similar things in the past, but must defer to the
FS/VFS experts on this one.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: RCU bug with v3.17-rc3 ?
  2014-09-05 21:32       ` Paul E. McKenney
@ 2014-10-08 17:13         ` Felipe Balbi
  2014-10-08 17:57           ` Felipe Balbi
  0 siblings, 1 reply; 36+ messages in thread
From: Felipe Balbi @ 2014-10-08 17:13 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Felipe Balbi, Linux USB Mailing List, Alan Stern, josh,
	Linux Kernel Mailing List, Tony Lindgren,
	Linux OMAP Mailing List, Linux ARM Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 5911 bytes --]

Hi,

On Fri, Sep 05, 2014 at 02:32:16PM -0700, Paul E. McKenney wrote:
> On Thu, Sep 04, 2014 at 03:04:03PM -0500, Felipe Balbi wrote:
> > Hi,
> > 
> > On Thu, Sep 04, 2014 at 02:25:35PM -0500, Felipe Balbi wrote:
> > > On Thu, Sep 04, 2014 at 12:16:42PM -0700, Paul E. McKenney wrote:
> > > > On Thu, Sep 04, 2014 at 01:40:21PM -0500, Felipe Balbi wrote:
> > > > > Hi,
> > > > > 
> > > > > I keep triggering the following Oops with -rc3 when writing to the mass
> > > > > storage gadget driver:
> > > > 
> > > > v3.17-rc3, correct?
> > > 
> > > yup, as in subject ;-)
> > > 
> > > > I take it that the test passes on some earlier version?
> > > 
> > > about to test v3.14.17.
> > 
> > coudln't get v3.14 working on this board but at least v3.16 is also
> > affected except that on now it happened during boot, I didn't even need
> > to run my test:
> > 
> > [   17.438195] Unable to handle kernel paging request at virtual address ffffffff
> > [   17.446109] pgd = ec360000
> > [   17.448947] [ffffffff] *pgd=ae7f6821, *pte=00000000, *ppte=00000000
> > [   17.455639] Internal error: Oops: 17 [#1] SMP ARM
> > [   17.460578] Modules linked in: dwc3(+) udc_core lis3lv02d_i2c lis3lv02d input_polldev dwc3_omap matrix_keypad
> > [   17.471060] CPU: 0 PID: 1381 Comm: accounts-daemon Tainted: G W     3.16.0-00005-g8a6cdb4 #811
> > [   17.480735] task: ed716040 ti: ec026000 task.ti: ec026000
> > [   17.486405] PC is at find_get_entry+0x7c/0x128
> > [   17.491070] LR is at 0xfffffffa
> > [   17.494364] pc : [<c0110b4c>]    lr : [<fffffffa>]    psr: a0000013
> > [   17.494364] sp : ec027dc8  ip : 00000000  fp : ec027dfc
> > [   17.506384] r10: c0c6f6bc  r9 : 00000005  r8 : ecdf22f8
> > [   17.511860] r7 : ec026008  r6 : 00000001  r5 : 00000000  r4 : 00000000
> > [   17.518705] r3 : ec027db4  r2 : 00000000  r1 : 00000005  r0 : ffffffff
> > [   17.525526] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM Segment user
> > [   17.533007] Control: 10c5387d  Table: ac360059  DAC: 00000015
> > [   17.539020] Process accounts-daemon (pid: 1381, stack limit = 0xec026248)
> > [   17.546151] Stack: (0xec027dc8 to 0xec028000)
> > [   17.550710] 7dc0:                   00000000 00000000 c0110ad0 ecdf0b80 00000000 ecdf22f4
> > [   17.559259] 7de0: ecdf22f4 00000000 00000005 00000000 ec027e34 ec027e00 c0111874 c0110adc
> > [   17.567824] 7e00: ecdf0b80 c03565b4 ed7165f8 ec3dddf0 ecdf22f4 00000005 ec3ddd00 00000001
> > [   17.576385] 7e20: ecdf21a0 00000000 ec027ebc ec027e38 c0112978 c0111844 00000000 c06af938
> > [   17.584950] 7e40: ecdf0b70 ecdf0b70 ec027e6c ec027e58 00000005 00000006 00000b80 ecdf0b70
> > [   17.593514] 7e60: 00000000 c0163264 ec3dddf0 ec027ee8 ec027ed4 00000b80 ec027eac ec027e88
> > [   17.602087] 7e80: c0178d98 c0356590 00000000 00000000 00020000 00005b80 00000000 ec027f78
> > [   17.610653] 7ea0: ec3ddd00 ed716040 b6cab018 00000000 ec027f44 ec027ec0 c0163264 c0112780
> > [   17.619202] 7ec0: 00000180 00000180 ec027efc b6cab018 00000180 00000000 00000000 00000180
> > [   17.627772] 7ee0: ec027ecc 00000001 ec3ddd00 00000000 00000000 00000000 ed716040 00000000
> > [   17.636371] 7f00: 00000000 00000000 00005b80 00000000 00000180 00000000 00000000 00000000
> > [   17.644946] 7f20: b6cab018 ec3ddd00 b6cab018 ec027f78 ec3ddd00 00000180 ec027f74 ec027f48
> > [   17.653524] 7f40: c0163a6c c01631cc b6cab018 00000000 00005b80 00000000 ec3ddd03 ec3ddd00
> > [   17.662085] 7f60: 00000180 b6cab018 ec027fa4 ec027f78 c0164198 c01639e0 00005b80 00000000
> > [   17.670658] 7f80: be91badc be91ba50 00044a00 00000003 c000f044 ec026000 00000000 ec027fa8
> > [   17.679222] 7fa0: c000edc0 c0164158 be91badc be91ba50 00000008 b6cab018 00000180 be91ba38
> > [   17.687794] 7fc0: be91badc be91ba50 00044a00 00000003 be91bbac b6cab008 00000000 00000000
> > [   17.696370] 7fe0: 00000020 be91ba40 b6c78e8c b6c78ea8 60000010 00000008 ae7f6821 ae7f6c21
> > [   17.704956] [<c0110b4c>] (find_get_entry) from [<c0111874>] (pagecache_get_page+0x3c/0x1f4)
> > [   17.713687] [<c0111874>] (pagecache_get_page) from [<c0112978>] (generic_file_read_iter+0x204/0x794)
> > [   17.723259] [<c0112978>] (generic_file_read_iter) from [<c0163264>] (new_sync_read+0xa4/0xcc)
> > [   17.732185] [<c0163264>] (new_sync_read) from [<c0163a6c>] (vfs_read+0x98/0x158)
> > [   17.739945] [<c0163a6c>] (vfs_read) from [<c0164198>] (SyS_read+0x4c/0xa0)
> > [   17.747149] [<c0164198>] (SyS_read) from [<c000edc0>] (ret_fast_syscall+0x0/0x48)
> > [   17.754994] Code: e1a01009 eb08ffa9 e3500000 0a00001f (e5904000) 
> > [   17.761476] ---[ end trace 49c4ed35a1c01157 ]---
> > 
> > It seems to be a difficult-to-reproduce race though. On a second boot it
> > didn't die during boot, but died with my USB test case. Unfortunately,
> > the platform I'm using is pretty new and only goes as far back as v3.16
> > (which I had to backport 11 patches to get it to boot good enough for
> > this test).
> > 
> > I wonder if a corrupt file system could cause such problems... I keep
> > seeing EXT4 errors every now and again; considering that this dies in a
> > path through VFS, I wonder...
> 
> I recall hearing of similar things in the past, but must defer to the
> FS/VFS experts on this one.

resurrecting this thread. I'm facing the same issues with a brand new
filesystem mounted through NFS. The way to reproduce is the same though:
using g_mass_storage with either tmpfs or mmc as backing store.

However it seems to die much more frequently than before. I can
reproduce all the time. It's definitely not a problem with my board as I
have two boards with different SoCs (ARM Cortex A8 and ARM Cortex A9)
with two different USB peripheral controllers (MUSB and DWC3), using the
same rootfs and they die the exact same way no matter if I use tmpfs or
MMC as backing store.

Adding a few more folks here.

-- 
balbi

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: RCU bug with v3.17-rc3 ?
  2014-10-08 17:13         ` Felipe Balbi
@ 2014-10-08 17:57           ` Felipe Balbi
  2014-10-08 21:29             ` Felipe Balbi
  0 siblings, 1 reply; 36+ messages in thread
From: Felipe Balbi @ 2014-10-08 17:57 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: Paul E. McKenney, Linux USB Mailing List, Alan Stern, josh,
	Linux Kernel Mailing List, Tony Lindgren,
	Linux OMAP Mailing List, Linux ARM Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 6316 bytes --]

Hi,

On Wed, Oct 08, 2014 at 12:13:22PM -0500, Felipe Balbi wrote:
> On Fri, Sep 05, 2014 at 02:32:16PM -0700, Paul E. McKenney wrote:
> > On Thu, Sep 04, 2014 at 03:04:03PM -0500, Felipe Balbi wrote:
> > > Hi,
> > > 
> > > On Thu, Sep 04, 2014 at 02:25:35PM -0500, Felipe Balbi wrote:
> > > > On Thu, Sep 04, 2014 at 12:16:42PM -0700, Paul E. McKenney wrote:
> > > > > On Thu, Sep 04, 2014 at 01:40:21PM -0500, Felipe Balbi wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > I keep triggering the following Oops with -rc3 when writing to the mass
> > > > > > storage gadget driver:
> > > > > 
> > > > > v3.17-rc3, correct?
> > > > 
> > > > yup, as in subject ;-)
> > > > 
> > > > > I take it that the test passes on some earlier version?
> > > > 
> > > > about to test v3.14.17.
> > > 
> > > coudln't get v3.14 working on this board but at least v3.16 is also
> > > affected except that on now it happened during boot, I didn't even need
> > > to run my test:
> > > 
> > > [   17.438195] Unable to handle kernel paging request at virtual address ffffffff
> > > [   17.446109] pgd = ec360000
> > > [   17.448947] [ffffffff] *pgd=ae7f6821, *pte=00000000, *ppte=00000000
> > > [   17.455639] Internal error: Oops: 17 [#1] SMP ARM
> > > [   17.460578] Modules linked in: dwc3(+) udc_core lis3lv02d_i2c lis3lv02d input_polldev dwc3_omap matrix_keypad
> > > [   17.471060] CPU: 0 PID: 1381 Comm: accounts-daemon Tainted: G W     3.16.0-00005-g8a6cdb4 #811
> > > [   17.480735] task: ed716040 ti: ec026000 task.ti: ec026000
> > > [   17.486405] PC is at find_get_entry+0x7c/0x128
> > > [   17.491070] LR is at 0xfffffffa
> > > [   17.494364] pc : [<c0110b4c>]    lr : [<fffffffa>]    psr: a0000013
> > > [   17.494364] sp : ec027dc8  ip : 00000000  fp : ec027dfc
> > > [   17.506384] r10: c0c6f6bc  r9 : 00000005  r8 : ecdf22f8
> > > [   17.511860] r7 : ec026008  r6 : 00000001  r5 : 00000000  r4 : 00000000
> > > [   17.518705] r3 : ec027db4  r2 : 00000000  r1 : 00000005  r0 : ffffffff
> > > [   17.525526] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM Segment user
> > > [   17.533007] Control: 10c5387d  Table: ac360059  DAC: 00000015
> > > [   17.539020] Process accounts-daemon (pid: 1381, stack limit = 0xec026248)
> > > [   17.546151] Stack: (0xec027dc8 to 0xec028000)
> > > [   17.550710] 7dc0:                   00000000 00000000 c0110ad0 ecdf0b80 00000000 ecdf22f4
> > > [   17.559259] 7de0: ecdf22f4 00000000 00000005 00000000 ec027e34 ec027e00 c0111874 c0110adc
> > > [   17.567824] 7e00: ecdf0b80 c03565b4 ed7165f8 ec3dddf0 ecdf22f4 00000005 ec3ddd00 00000001
> > > [   17.576385] 7e20: ecdf21a0 00000000 ec027ebc ec027e38 c0112978 c0111844 00000000 c06af938
> > > [   17.584950] 7e40: ecdf0b70 ecdf0b70 ec027e6c ec027e58 00000005 00000006 00000b80 ecdf0b70
> > > [   17.593514] 7e60: 00000000 c0163264 ec3dddf0 ec027ee8 ec027ed4 00000b80 ec027eac ec027e88
> > > [   17.602087] 7e80: c0178d98 c0356590 00000000 00000000 00020000 00005b80 00000000 ec027f78
> > > [   17.610653] 7ea0: ec3ddd00 ed716040 b6cab018 00000000 ec027f44 ec027ec0 c0163264 c0112780
> > > [   17.619202] 7ec0: 00000180 00000180 ec027efc b6cab018 00000180 00000000 00000000 00000180
> > > [   17.627772] 7ee0: ec027ecc 00000001 ec3ddd00 00000000 00000000 00000000 ed716040 00000000
> > > [   17.636371] 7f00: 00000000 00000000 00005b80 00000000 00000180 00000000 00000000 00000000
> > > [   17.644946] 7f20: b6cab018 ec3ddd00 b6cab018 ec027f78 ec3ddd00 00000180 ec027f74 ec027f48
> > > [   17.653524] 7f40: c0163a6c c01631cc b6cab018 00000000 00005b80 00000000 ec3ddd03 ec3ddd00
> > > [   17.662085] 7f60: 00000180 b6cab018 ec027fa4 ec027f78 c0164198 c01639e0 00005b80 00000000
> > > [   17.670658] 7f80: be91badc be91ba50 00044a00 00000003 c000f044 ec026000 00000000 ec027fa8
> > > [   17.679222] 7fa0: c000edc0 c0164158 be91badc be91ba50 00000008 b6cab018 00000180 be91ba38
> > > [   17.687794] 7fc0: be91badc be91ba50 00044a00 00000003 be91bbac b6cab008 00000000 00000000
> > > [   17.696370] 7fe0: 00000020 be91ba40 b6c78e8c b6c78ea8 60000010 00000008 ae7f6821 ae7f6c21
> > > [   17.704956] [<c0110b4c>] (find_get_entry) from [<c0111874>] (pagecache_get_page+0x3c/0x1f4)
> > > [   17.713687] [<c0111874>] (pagecache_get_page) from [<c0112978>] (generic_file_read_iter+0x204/0x794)
> > > [   17.723259] [<c0112978>] (generic_file_read_iter) from [<c0163264>] (new_sync_read+0xa4/0xcc)
> > > [   17.732185] [<c0163264>] (new_sync_read) from [<c0163a6c>] (vfs_read+0x98/0x158)
> > > [   17.739945] [<c0163a6c>] (vfs_read) from [<c0164198>] (SyS_read+0x4c/0xa0)
> > > [   17.747149] [<c0164198>] (SyS_read) from [<c000edc0>] (ret_fast_syscall+0x0/0x48)
> > > [   17.754994] Code: e1a01009 eb08ffa9 e3500000 0a00001f (e5904000) 
> > > [   17.761476] ---[ end trace 49c4ed35a1c01157 ]---
> > > 
> > > It seems to be a difficult-to-reproduce race though. On a second boot it
> > > didn't die during boot, but died with my USB test case. Unfortunately,
> > > the platform I'm using is pretty new and only goes as far back as v3.16
> > > (which I had to backport 11 patches to get it to boot good enough for
> > > this test).
> > > 
> > > I wonder if a corrupt file system could cause such problems... I keep
> > > seeing EXT4 errors every now and again; considering that this dies in a
> > > path through VFS, I wonder...
> > 
> > I recall hearing of similar things in the past, but must defer to the
> > FS/VFS experts on this one.
> 
> resurrecting this thread. I'm facing the same issues with a brand new
> filesystem mounted through NFS. The way to reproduce is the same though:
> using g_mass_storage with either tmpfs or mmc as backing store.
> 
> However it seems to die much more frequently than before. I can
> reproduce all the time. It's definitely not a problem with my board as I
> have two boards with different SoCs (ARM Cortex A8 and ARM Cortex A9)
> with two different USB peripheral controllers (MUSB and DWC3), using the
> same rootfs and they die the exact same way no matter if I use tmpfs or
> MMC as backing store.
> 
> Adding a few more folks here.

alright, first stable kernel with Cortex A8 was v3.14. All other kernel
versions die starting with v3.15 to today's Linus. I'll start bisecting
now.

-- 
balbi

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: RCU bug with v3.17-rc3 ?
  2014-10-08 17:57           ` Felipe Balbi
@ 2014-10-08 21:29             ` Felipe Balbi
  2014-10-09 16:01               ` Johannes Weiner
  0 siblings, 1 reply; 36+ messages in thread
From: Felipe Balbi @ 2014-10-08 21:29 UTC (permalink / raw)
  To: Felipe Balbi, hannes, Andrew Morton, Linus Torvalds, Sasha Levin
  Cc: Paul E. McKenney, Linux USB Mailing List, Alan Stern, josh,
	Linux Kernel Mailing List, Tony Lindgren,
	Linux OMAP Mailing List, Linux ARM Kernel Mailing List,
	Rik van Riel

[-- Attachment #1: Type: text/plain, Size: 10799 bytes --]

Hi,

On Wed, Oct 08, 2014 at 12:57:07PM -0500, Felipe Balbi wrote:

[ snip ]

> > > > It seems to be a difficult-to-reproduce race though. On a second boot it
> > > > didn't die during boot, but died with my USB test case. Unfortunately,
> > > > the platform I'm using is pretty new and only goes as far back as v3.16
> > > > (which I had to backport 11 patches to get it to boot good enough for
> > > > this test).
> > > > 
> > > > I wonder if a corrupt file system could cause such problems... I keep
> > > > seeing EXT4 errors every now and again; considering that this dies in a
> > > > path through VFS, I wonder...
> > > 
> > > I recall hearing of similar things in the past, but must defer to the
> > > FS/VFS experts on this one.
> > 
> > resurrecting this thread. I'm facing the same issues with a brand new
> > filesystem mounted through NFS. The way to reproduce is the same though:
> > using g_mass_storage with either tmpfs or mmc as backing store.
> > 
> > However it seems to die much more frequently than before. I can
> > reproduce all the time. It's definitely not a problem with my board as I
> > have two boards with different SoCs (ARM Cortex A8 and ARM Cortex A9)
> > with two different USB peripheral controllers (MUSB and DWC3), using the
> > same rootfs and they die the exact same way no matter if I use tmpfs or
> > MMC as backing store.
> > 
> > Adding a few more folks here.
> 
> alright, first stable kernel with Cortex A8 was v3.14. All other kernel
> versions die starting with v3.15 to today's Linus. I'll start bisecting
> now.

Finally bisected it down to commit 139e561660fe11e0fc35e142a800df3dd7d03e9d
(lib: radix_tree: tree node interface). Here's full bisect log:

git bisect start
# good: [455c6fdbd219161bd09b1165f11699d6d73de11c] Linux 3.14
git bisect good 455c6fdbd219161bd09b1165f11699d6d73de11c
# bad: [1860e379875dfe7271c649058aeddffe5afd9d0d] Linux 3.15
git bisect bad 1860e379875dfe7271c649058aeddffe5afd9d0d
# bad: [74a475acea49459721ae4b062d3da68c74259009] SubmittingPatches: add style recommendation to use imperative descriptions
git bisect bad 74a475acea49459721ae4b062d3da68c74259009
# good: [c12e69c6aaf785fd307d05cb6f36ca0e7577ead7] Merge tag 'staging-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
git bisect good c12e69c6aaf785fd307d05cb6f36ca0e7577ead7
# good: [0fc31966035d7a540c011b6c967ce8eae1db121b] Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
git bisect good 0fc31966035d7a540c011b6c967ce8eae1db121b
# good: [bdfc7cbdeef8cadba0e5793079ac0130b8e2220c] Merge branch 'mips-for-linux-next' of git://git.linux-mips.org/pub/scm/ralf/upstream-sfr
git bisect good bdfc7cbdeef8cadba0e5793079ac0130b8e2220c
# good: [0f1b1e6d73cb989ce2c071edc57deade3b084dfe] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
git bisect good 0f1b1e6d73cb989ce2c071edc57deade3b084dfe
# good: [181e7d5d7bd7747e882e3ca89ecbf0fc3e72d0da] ixgbe: remove redundant if clause from PTP work
git bisect good 181e7d5d7bd7747e882e3ca89ecbf0fc3e72d0da
# good: [59ecc26004e77e100c700b1d0da7502b0fdadb46] Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
git bisect good 59ecc26004e77e100c700b1d0da7502b0fdadb46
# good: [2b665e276c15ba7d9fc8cdd16931883a51ed13e4] fs/direct-io.c: remove redundant comparison
git bisect good 2b665e276c15ba7d9fc8cdd16931883a51ed13e4
# bad: [f412c97abef71026d8192ca8efca231f1e3906b3] mm, hugetlb: mark some bootstrap functions as __init
git bisect bad f412c97abef71026d8192ca8efca231f1e3906b3
# good: [4e35f483850ba46b838adfd312b3052416e15204] mm, hugetlb: use vma_resv_map() map types
git bisect good 4e35f483850ba46b838adfd312b3052416e15204
# good: [6dbaf22ce1f1dfba33313198eb5bd989ae76dd87] mm: shmem: save one radix tree lookup when truncating swapped pages
git bisect good 6dbaf22ce1f1dfba33313198eb5bd989ae76dd87
# good: [91b0abe36a7b2b3b02d7500925a5f8455334f0e5] mm + fs: store shadow entries in page cache
git bisect good 91b0abe36a7b2b3b02d7500925a5f8455334f0e5
# bad: [139e561660fe11e0fc35e142a800df3dd7d03e9d] lib: radix_tree: tree node interface
git bisect bad 139e561660fe11e0fc35e142a800df3dd7d03e9d
# good: [a528910e12ec7ee203095eb1711468a66b9b60b0] mm: thrash detection-based file cache sizing
git bisect good a528910e12ec7ee203095eb1711468a66b9b60b0
# first bad commit: [139e561660fe11e0fc35e142a800df3dd7d03e9d] lib: radix_tree: tree node interface

I tried reverting that commit on v3.15 but it's non-trivial; I'll leave
that for tomorrow. Meanwhile, adding folks involved with that commit to
Cc list and another backtrace for reference:

[  113.696647] Unable to handle kernel paging request at virtual address ffffffff
[  113.704370] pgd = c0004000
[  113.707276] [ffffffff] *pgd=9fef6821, *pte=00000000, *ppte=00000000
[  113.713998] Internal error: Oops: 17 [#1] SMP ARM
[  113.718912] Modules linked in: g_mass_storage usb_f_mass_storage libcomposite configfs musb_dsps musb_hdrc musb_am335x
[  113.730144] CPU: 0 PID: 1368 Comm: file-storage Not tainted 3.17.0-02899-g748eb79 #239
[  113.738410] task: de606e00 ti: dd0ba000 task.ti: dd0ba000
[  113.744060] PC is at find_get_entry+0x64/0x100
[  113.748700] LR is at 0xfffffffa
[  113.751978] pc : [<c01065b4>]    lr : [<fffffffa>]    psr: a00f0013
[  113.751978] sp : dd0bbba0  ip : 00000000  fp : dd0bbbd4
[  113.763962] r10: c0665100  r9 : 00001000  r8 : 0000001a
[  113.769415] r7 : dd0ee9b8  r6 : 00000001  r5 : 00000000  r4 : dd0ee880
[  113.776228] r3 : dd0bbb8c  r2 : 00000000  r1 : 0000001a  r0 : ffffffff
[  113.783044] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
[  113.790674] Control: 10c5387d  Table: 9e210019  DAC: 00000015
[  113.796672] Process file-storage (pid: 1368, stack limit = 0xdd0ba248)
[  113.803486] Stack: (0xdd0bbba0 to 0xdd0bc000)
[  113.808038] bba0: 00000000 00000000 c0106550 00017508 00000002 dd0ee880 dd0ee9b4 0000001a
[  113.816578] bbc0: 00001000 00000000 dd0bbbf4 dd0bbbd8 c010716c c010655c 00013ef0 dd0ee880
[  113.825118] bbe0: dd0bbda4 00000003 dd0bbc6c dd0bbbf8 c011df94 c0107150 dd0bbc2c c0106b9c
[  113.833657] bc00: c0089a3c c0089328 00000001 c0107080 00000002 dd0bbcc0 000000d0 00000000
[  113.842197] bc20: 0001a000 00000000 00000000 dd0ee9b4 0000001a c011e74c dd0bbc94 dd0bbc48
[  113.850736] bc40: c011beec 00001000 dd0bbda4 dd0ee9b4 00001000 00000000 00001000 c0665100
[  113.859276] bc60: dd0bbc94 dd0bbc70 c011e74c c011df08 000200da 00000000 00001000 dd0bbda4
[  113.867816] bc80: dd0ee9b4 00001000 dd0bbcf4 dd0bbc98 c0106b10 c011e700 00001000 00000001
[  113.876356] bca0: dd0bbcc0 dd0bbcc4 dd0ba000 00000001 de60ee40 00002000 0001a000 00000000
[  113.884896] bcc0: dfe71ac0 c00a3b60 54355ca1 00004000 de60ee40 00000000 dd0bbdb8 dd0ee9b4
[  113.893436] bce0: dd0ee880 ffffffff dd0bbd5c dd0bbcf8 c0108c6c c0106a68 dd0bbd5c dd0bbd08
[  113.901975] bd00: c064b790 c0089c48 00000001 dd0ba038 c0108f70 c0089328 00000001 c0108f7c
[  113.910515] bd20: dd0bbda4 de606e00 00018000 00000000 dd0bbd5c dd0bbdb8 dd0ee920 dd0bbda4
[  113.919055] bd40: de60ee40 de606e00 dd0e5000 de664a00 dd0bbd8c dd0bbd60 c0108f7c c0108a24
[  113.927595] bd60: c008c410 c0089fd0 00000001 00000000 00018000 00000000 dd0bbe80 de60ee40
[  113.936134] bd80: dd0bbe14 dd0bbd90 c014c920 c0108f40 00004000 00000001 00000001 de274000
[  113.944674] bda0: 00004000 00000003 00002000 00002000 dd0bbd9c 00000001 de60ee40 00000000
[  113.953214] bdc0: 00000000 00000000 de606e00 00000000 00000000 00000000 00018000 00000000
[  113.961753] bde0: 00004000 00000000 00000000 00000000 de274000 de60ee40 de274000 dd0bbe80
[  113.970293] be00: 00004000 de6ce9c0 dd0bbe44 dd0bbe18 c014d1c8 c014c888 00000002 de6ce9c0
[  113.978833] be20: 00004000 00000000 00000000 00008000 de6ce9c0 dd0e5000 dd0bbeb4 dd0bbe48
[  113.987373] be40: bf059cc4 c014d120 00000000 dd0bbe9c dd0bbe68 bf05a04c 19000000 00000000
[  113.995912] be60: dd0ba000 00000000 00000000 6f48202c 00018000 00000000 00020000 00000000
[  114.004452] be80: 00018000 00000000 00000000 de664a00 de6ce9c0 00000000 de664a38 de664a00
[  114.012992] bea0: dd0ba038 de664a7c dd0bbf24 dd0bbeb8 bf05a938 bf059980 00000001 c00899dc
[  114.021531] bec0: a00f0013 de2e3bd4 00000000 00052000 00000000 dd0bbee0 c0089c50 c0089a70
[  114.030071] bee0: dd0bbf04 dd0bbef0 c064f3a4 de6ce840 00000000 de664a00 bf05a244 de6ce840
[  114.038611] bf00: 00000000 de664a00 bf05a244 00000000 00000000 00000000 dd0bbfac dd0bbf28
[  114.047151] bf20: c0065bdc bf05a250 c0089c50 00000000 dd0bbf54 de664a00 00000000 00000000
[  114.055690] bf40: dead4ead ffffffff ffffffff c0a8a238 00000000 00000000 c08070f8 dd0bbf5c
[  114.064230] bf60: dd0bbf5c 00000000 00000000 dead4ead ffffffff ffffffff c0a8a238 00000000
[  114.072770] bf80: 00000000 c08070f8 dd0bbf88 dd0bbf88 de6ce840 c0065af8 00000000 00000000
[  114.081310] bfa0: 00000000 dd0bbfb0 c000eea8 c0065b04 00000000 00000000 00000000 00000000
[  114.089850] bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  114.098389] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000 0001086e 00001a02
[  114.106944] [<c01065b4>] (find_get_entry) from [<c010716c>] (find_lock_entry+0x28/0x7c)
[  114.115316] [<c010716c>] (find_lock_entry) from [<c011df94>] (shmem_getpage_gfp+0x98/0x7f8)
[  114.124042] [<c011df94>] (shmem_getpage_gfp) from [<c011e74c>] (shmem_write_begin+0x58/0x94)
[  114.132856] [<c011e74c>] (shmem_write_begin) from [<c0106b10>] (generic_perform_write+0xb4/0x1c8)
[  114.142124] [<c0106b10>] (generic_perform_write) from [<c0108c6c>] (__generic_file_write_iter+0x254/0x51c)
[  114.152208] [<c0108c6c>] (__generic_file_write_iter) from [<c0108f7c>] (generic_file_write_iter+0x48/0xdc)
[  114.162298] [<c0108f7c>] (generic_file_write_iter) from [<c014c920>] (new_sync_write+0xa4/0xcc)
[  114.171386] [<c014c920>] (new_sync_write) from [<c014d1c8>] (vfs_write+0xb4/0x1c0)
[  114.179334] [<c014d1c8>] (vfs_write) from [<bf059cc4>] (do_write+0x350/0x4b8 [usb_f_mass_storage])
[  114.188719] [<bf059cc4>] (do_write [usb_f_mass_storage]) from [<bf05a938>] (fsg_main_thread+0x6f4/0x13f8 [usb_f_mass_storage])
[  114.200636] [<bf05a938>] (fsg_main_thread [usb_f_mass_storage]) from [<c0065bdc>] (kthread+0xe4/0x100)
[  114.210368] [<c0065bdc>] (kthread) from [<c000eea8>] (ret_from_fork+0x14/0x20)
[  114.217914] Code: e1a01008 eb08abbe e3500000 0a00001b (e5904000) 
[  114.224529] ---[ end trace afb7e71d4b71be98 ]---

for those who are coming by late, the problem happens when I use
g_mass_storage with either Cortex A8 or Cortex A9 with two different USB
peripheral controllers using either tmpfs or mmc as backing store.

-- 
balbi

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: RCU bug with v3.17-rc3 ?
  2014-10-08 21:29             ` Felipe Balbi
@ 2014-10-09 16:01               ` Johannes Weiner
  2014-10-09 16:26                 ` Felipe Balbi
  0 siblings, 1 reply; 36+ messages in thread
From: Johannes Weiner @ 2014-10-09 16:01 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: Andrew Morton, Linus Torvalds, Sasha Levin, Paul E. McKenney,
	Linux USB Mailing List, Alan Stern, josh,
	Linux Kernel Mailing List, Tony Lindgren,
	Linux OMAP Mailing List, Linux ARM Kernel Mailing List,
	Rik van Riel

Hi Felipe,

On Wed, Oct 08, 2014 at 04:29:38PM -0500, Felipe Balbi wrote:
> Finally bisected it down to commit 139e561660fe11e0fc35e142a800df3dd7d03e9d
> (lib: radix_tree: tree node interface). Here's full bisect log:
> 
> git bisect start
> # good: [455c6fdbd219161bd09b1165f11699d6d73de11c] Linux 3.14
> git bisect good 455c6fdbd219161bd09b1165f11699d6d73de11c
> # bad: [1860e379875dfe7271c649058aeddffe5afd9d0d] Linux 3.15
> git bisect bad 1860e379875dfe7271c649058aeddffe5afd9d0d
> # bad: [74a475acea49459721ae4b062d3da68c74259009] SubmittingPatches: add style recommendation to use imperative descriptions
> git bisect bad 74a475acea49459721ae4b062d3da68c74259009
> # good: [c12e69c6aaf785fd307d05cb6f36ca0e7577ead7] Merge tag 'staging-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
> git bisect good c12e69c6aaf785fd307d05cb6f36ca0e7577ead7
> # good: [0fc31966035d7a540c011b6c967ce8eae1db121b] Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
> git bisect good 0fc31966035d7a540c011b6c967ce8eae1db121b
> # good: [bdfc7cbdeef8cadba0e5793079ac0130b8e2220c] Merge branch 'mips-for-linux-next' of git://git.linux-mips.org/pub/scm/ralf/upstream-sfr
> git bisect good bdfc7cbdeef8cadba0e5793079ac0130b8e2220c
> # good: [0f1b1e6d73cb989ce2c071edc57deade3b084dfe] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
> git bisect good 0f1b1e6d73cb989ce2c071edc57deade3b084dfe
> # good: [181e7d5d7bd7747e882e3ca89ecbf0fc3e72d0da] ixgbe: remove redundant if clause from PTP work
> git bisect good 181e7d5d7bd7747e882e3ca89ecbf0fc3e72d0da
> # good: [59ecc26004e77e100c700b1d0da7502b0fdadb46] Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
> git bisect good 59ecc26004e77e100c700b1d0da7502b0fdadb46
> # good: [2b665e276c15ba7d9fc8cdd16931883a51ed13e4] fs/direct-io.c: remove redundant comparison
> git bisect good 2b665e276c15ba7d9fc8cdd16931883a51ed13e4
> # bad: [f412c97abef71026d8192ca8efca231f1e3906b3] mm, hugetlb: mark some bootstrap functions as __init
> git bisect bad f412c97abef71026d8192ca8efca231f1e3906b3
> # good: [4e35f483850ba46b838adfd312b3052416e15204] mm, hugetlb: use vma_resv_map() map types
> git bisect good 4e35f483850ba46b838adfd312b3052416e15204
> # good: [6dbaf22ce1f1dfba33313198eb5bd989ae76dd87] mm: shmem: save one radix tree lookup when truncating swapped pages
> git bisect good 6dbaf22ce1f1dfba33313198eb5bd989ae76dd87
> # good: [91b0abe36a7b2b3b02d7500925a5f8455334f0e5] mm + fs: store shadow entries in page cache
> git bisect good 91b0abe36a7b2b3b02d7500925a5f8455334f0e5
> # bad: [139e561660fe11e0fc35e142a800df3dd7d03e9d] lib: radix_tree: tree node interface
> git bisect bad 139e561660fe11e0fc35e142a800df3dd7d03e9d
> # good: [a528910e12ec7ee203095eb1711468a66b9b60b0] mm: thrash detection-based file cache sizing
> git bisect good a528910e12ec7ee203095eb1711468a66b9b60b0
> # first bad commit: [139e561660fe11e0fc35e142a800df3dd7d03e9d] lib: radix_tree: tree node interface
> 
> I tried reverting that commit on v3.15 but it's non-trivial; I'll leave
> that for tomorrow. Meanwhile, adding folks involved with that commit to
> Cc list and another backtrace for reference:
> 
> [  113.696647] Unable to handle kernel paging request at virtual address ffffffff
> [  113.704370] pgd = c0004000
> [  113.707276] [ffffffff] *pgd=9fef6821, *pte=00000000, *ppte=00000000
> [  113.713998] Internal error: Oops: 17 [#1] SMP ARM
> [  113.718912] Modules linked in: g_mass_storage usb_f_mass_storage libcomposite configfs musb_dsps musb_hdrc musb_am335x
> [  113.730144] CPU: 0 PID: 1368 Comm: file-storage Not tainted 3.17.0-02899-g748eb79 #239
> [  113.738410] task: de606e00 ti: dd0ba000 task.ti: dd0ba000
> [  113.744060] PC is at find_get_entry+0x64/0x100

Could you please provide the disassembly of that function?

I'm thinking it's not the slot pointer itself that's bad, because
__radix_tree_lookup() dereferences that to test if it's populated
before returning it, and slot life-time is guaranteed by RCU.

That would only leave garbage in the slot itself, crashing during
page_cache_get_speculative().

I'll keep staring at this change, but nothing stands out to me yet.

Thanks,
Johannes

> [  113.748700] LR is at 0xfffffffa
> [  113.751978] pc : [<c01065b4>]    lr : [<fffffffa>]    psr: a00f0013
> [  113.751978] sp : dd0bbba0  ip : 00000000  fp : dd0bbbd4
> [  113.763962] r10: c0665100  r9 : 00001000  r8 : 0000001a
> [  113.769415] r7 : dd0ee9b8  r6 : 00000001  r5 : 00000000  r4 : dd0ee880
> [  113.776228] r3 : dd0bbb8c  r2 : 00000000  r1 : 0000001a  r0 : ffffffff
> [  113.783044] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
> [  113.790674] Control: 10c5387d  Table: 9e210019  DAC: 00000015
> [  113.796672] Process file-storage (pid: 1368, stack limit = 0xdd0ba248)
> [  113.803486] Stack: (0xdd0bbba0 to 0xdd0bc000)
> [  113.808038] bba0: 00000000 00000000 c0106550 00017508 00000002 dd0ee880 dd0ee9b4 0000001a
> [  113.816578] bbc0: 00001000 00000000 dd0bbbf4 dd0bbbd8 c010716c c010655c 00013ef0 dd0ee880
> [  113.825118] bbe0: dd0bbda4 00000003 dd0bbc6c dd0bbbf8 c011df94 c0107150 dd0bbc2c c0106b9c
> [  113.833657] bc00: c0089a3c c0089328 00000001 c0107080 00000002 dd0bbcc0 000000d0 00000000
> [  113.842197] bc20: 0001a000 00000000 00000000 dd0ee9b4 0000001a c011e74c dd0bbc94 dd0bbc48
> [  113.850736] bc40: c011beec 00001000 dd0bbda4 dd0ee9b4 00001000 00000000 00001000 c0665100
> [  113.859276] bc60: dd0bbc94 dd0bbc70 c011e74c c011df08 000200da 00000000 00001000 dd0bbda4
> [  113.867816] bc80: dd0ee9b4 00001000 dd0bbcf4 dd0bbc98 c0106b10 c011e700 00001000 00000001
> [  113.876356] bca0: dd0bbcc0 dd0bbcc4 dd0ba000 00000001 de60ee40 00002000 0001a000 00000000
> [  113.884896] bcc0: dfe71ac0 c00a3b60 54355ca1 00004000 de60ee40 00000000 dd0bbdb8 dd0ee9b4
> [  113.893436] bce0: dd0ee880 ffffffff dd0bbd5c dd0bbcf8 c0108c6c c0106a68 dd0bbd5c dd0bbd08
> [  113.901975] bd00: c064b790 c0089c48 00000001 dd0ba038 c0108f70 c0089328 00000001 c0108f7c
> [  113.910515] bd20: dd0bbda4 de606e00 00018000 00000000 dd0bbd5c dd0bbdb8 dd0ee920 dd0bbda4
> [  113.919055] bd40: de60ee40 de606e00 dd0e5000 de664a00 dd0bbd8c dd0bbd60 c0108f7c c0108a24
> [  113.927595] bd60: c008c410 c0089fd0 00000001 00000000 00018000 00000000 dd0bbe80 de60ee40
> [  113.936134] bd80: dd0bbe14 dd0bbd90 c014c920 c0108f40 00004000 00000001 00000001 de274000
> [  113.944674] bda0: 00004000 00000003 00002000 00002000 dd0bbd9c 00000001 de60ee40 00000000
> [  113.953214] bdc0: 00000000 00000000 de606e00 00000000 00000000 00000000 00018000 00000000
> [  113.961753] bde0: 00004000 00000000 00000000 00000000 de274000 de60ee40 de274000 dd0bbe80
> [  113.970293] be00: 00004000 de6ce9c0 dd0bbe44 dd0bbe18 c014d1c8 c014c888 00000002 de6ce9c0
> [  113.978833] be20: 00004000 00000000 00000000 00008000 de6ce9c0 dd0e5000 dd0bbeb4 dd0bbe48
> [  113.987373] be40: bf059cc4 c014d120 00000000 dd0bbe9c dd0bbe68 bf05a04c 19000000 00000000
> [  113.995912] be60: dd0ba000 00000000 00000000 6f48202c 00018000 00000000 00020000 00000000
> [  114.004452] be80: 00018000 00000000 00000000 de664a00 de6ce9c0 00000000 de664a38 de664a00
> [  114.012992] bea0: dd0ba038 de664a7c dd0bbf24 dd0bbeb8 bf05a938 bf059980 00000001 c00899dc
> [  114.021531] bec0: a00f0013 de2e3bd4 00000000 00052000 00000000 dd0bbee0 c0089c50 c0089a70
> [  114.030071] bee0: dd0bbf04 dd0bbef0 c064f3a4 de6ce840 00000000 de664a00 bf05a244 de6ce840
> [  114.038611] bf00: 00000000 de664a00 bf05a244 00000000 00000000 00000000 dd0bbfac dd0bbf28
> [  114.047151] bf20: c0065bdc bf05a250 c0089c50 00000000 dd0bbf54 de664a00 00000000 00000000
> [  114.055690] bf40: dead4ead ffffffff ffffffff c0a8a238 00000000 00000000 c08070f8 dd0bbf5c
> [  114.064230] bf60: dd0bbf5c 00000000 00000000 dead4ead ffffffff ffffffff c0a8a238 00000000
> [  114.072770] bf80: 00000000 c08070f8 dd0bbf88 dd0bbf88 de6ce840 c0065af8 00000000 00000000
> [  114.081310] bfa0: 00000000 dd0bbfb0 c000eea8 c0065b04 00000000 00000000 00000000 00000000
> [  114.089850] bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [  114.098389] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000 0001086e 00001a02
> [  114.106944] [<c01065b4>] (find_get_entry) from [<c010716c>] (find_lock_entry+0x28/0x7c)
> [  114.115316] [<c010716c>] (find_lock_entry) from [<c011df94>] (shmem_getpage_gfp+0x98/0x7f8)
> [  114.124042] [<c011df94>] (shmem_getpage_gfp) from [<c011e74c>] (shmem_write_begin+0x58/0x94)
> [  114.132856] [<c011e74c>] (shmem_write_begin) from [<c0106b10>] (generic_perform_write+0xb4/0x1c8)
> [  114.142124] [<c0106b10>] (generic_perform_write) from [<c0108c6c>] (__generic_file_write_iter+0x254/0x51c)
> [  114.152208] [<c0108c6c>] (__generic_file_write_iter) from [<c0108f7c>] (generic_file_write_iter+0x48/0xdc)
> [  114.162298] [<c0108f7c>] (generic_file_write_iter) from [<c014c920>] (new_sync_write+0xa4/0xcc)
> [  114.171386] [<c014c920>] (new_sync_write) from [<c014d1c8>] (vfs_write+0xb4/0x1c0)
> [  114.179334] [<c014d1c8>] (vfs_write) from [<bf059cc4>] (do_write+0x350/0x4b8 [usb_f_mass_storage])
> [  114.188719] [<bf059cc4>] (do_write [usb_f_mass_storage]) from [<bf05a938>] (fsg_main_thread+0x6f4/0x13f8 [usb_f_mass_storage])
> [  114.200636] [<bf05a938>] (fsg_main_thread [usb_f_mass_storage]) from [<c0065bdc>] (kthread+0xe4/0x100)
> [  114.210368] [<c0065bdc>] (kthread) from [<c000eea8>] (ret_from_fork+0x14/0x20)
> [  114.217914] Code: e1a01008 eb08abbe e3500000 0a00001b (e5904000) 
> [  114.224529] ---[ end trace afb7e71d4b71be98 ]---
> 
> for those who are coming by late, the problem happens when I use
> g_mass_storage with either Cortex A8 or Cortex A9 with two different USB
> peripheral controllers using either tmpfs or mmc as backing store.
> 
> -- 
> balbi

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: RCU bug with v3.17-rc3 ?
  2014-10-09 16:01               ` Johannes Weiner
@ 2014-10-09 16:26                 ` Felipe Balbi
  2014-10-09 20:35                   ` Felipe Balbi
  2014-10-09 20:41                   ` Rabin Vincent
  0 siblings, 2 replies; 36+ messages in thread
From: Felipe Balbi @ 2014-10-09 16:26 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Felipe Balbi, Andrew Morton, Linus Torvalds, Sasha Levin,
	Paul E. McKenney, Linux USB Mailing List, Alan Stern, josh,
	Linux Kernel Mailing List, Tony Lindgren,
	Linux OMAP Mailing List, Linux ARM Kernel Mailing List,
	Rik van Riel

[-- Attachment #1: Type: text/plain, Size: 12427 bytes --]

Hi Johannes,

On Thu, Oct 09, 2014 at 12:01:38PM -0400, Johannes Weiner wrote:
> On Wed, Oct 08, 2014 at 04:29:38PM -0500, Felipe Balbi wrote:
> > Finally bisected it down to commit 139e561660fe11e0fc35e142a800df3dd7d03e9d
> > (lib: radix_tree: tree node interface). Here's full bisect log:
> > 
> > git bisect start
> > # good: [455c6fdbd219161bd09b1165f11699d6d73de11c] Linux 3.14
> > git bisect good 455c6fdbd219161bd09b1165f11699d6d73de11c
> > # bad: [1860e379875dfe7271c649058aeddffe5afd9d0d] Linux 3.15
> > git bisect bad 1860e379875dfe7271c649058aeddffe5afd9d0d
> > # bad: [74a475acea49459721ae4b062d3da68c74259009] SubmittingPatches: add style recommendation to use imperative descriptions
> > git bisect bad 74a475acea49459721ae4b062d3da68c74259009
> > # good: [c12e69c6aaf785fd307d05cb6f36ca0e7577ead7] Merge tag 'staging-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
> > git bisect good c12e69c6aaf785fd307d05cb6f36ca0e7577ead7
> > # good: [0fc31966035d7a540c011b6c967ce8eae1db121b] Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
> > git bisect good 0fc31966035d7a540c011b6c967ce8eae1db121b
> > # good: [bdfc7cbdeef8cadba0e5793079ac0130b8e2220c] Merge branch 'mips-for-linux-next' of git://git.linux-mips.org/pub/scm/ralf/upstream-sfr
> > git bisect good bdfc7cbdeef8cadba0e5793079ac0130b8e2220c
> > # good: [0f1b1e6d73cb989ce2c071edc57deade3b084dfe] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
> > git bisect good 0f1b1e6d73cb989ce2c071edc57deade3b084dfe
> > # good: [181e7d5d7bd7747e882e3ca89ecbf0fc3e72d0da] ixgbe: remove redundant if clause from PTP work
> > git bisect good 181e7d5d7bd7747e882e3ca89ecbf0fc3e72d0da
> > # good: [59ecc26004e77e100c700b1d0da7502b0fdadb46] Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
> > git bisect good 59ecc26004e77e100c700b1d0da7502b0fdadb46
> > # good: [2b665e276c15ba7d9fc8cdd16931883a51ed13e4] fs/direct-io.c: remove redundant comparison
> > git bisect good 2b665e276c15ba7d9fc8cdd16931883a51ed13e4
> > # bad: [f412c97abef71026d8192ca8efca231f1e3906b3] mm, hugetlb: mark some bootstrap functions as __init
> > git bisect bad f412c97abef71026d8192ca8efca231f1e3906b3
> > # good: [4e35f483850ba46b838adfd312b3052416e15204] mm, hugetlb: use vma_resv_map() map types
> > git bisect good 4e35f483850ba46b838adfd312b3052416e15204
> > # good: [6dbaf22ce1f1dfba33313198eb5bd989ae76dd87] mm: shmem: save one radix tree lookup when truncating swapped pages
> > git bisect good 6dbaf22ce1f1dfba33313198eb5bd989ae76dd87
> > # good: [91b0abe36a7b2b3b02d7500925a5f8455334f0e5] mm + fs: store shadow entries in page cache
> > git bisect good 91b0abe36a7b2b3b02d7500925a5f8455334f0e5
> > # bad: [139e561660fe11e0fc35e142a800df3dd7d03e9d] lib: radix_tree: tree node interface
> > git bisect bad 139e561660fe11e0fc35e142a800df3dd7d03e9d
> > # good: [a528910e12ec7ee203095eb1711468a66b9b60b0] mm: thrash detection-based file cache sizing
> > git bisect good a528910e12ec7ee203095eb1711468a66b9b60b0
> > # first bad commit: [139e561660fe11e0fc35e142a800df3dd7d03e9d] lib: radix_tree: tree node interface
> > 
> > I tried reverting that commit on v3.15 but it's non-trivial; I'll leave
> > that for tomorrow. Meanwhile, adding folks involved with that commit to
> > Cc list and another backtrace for reference:
> > 
> > [  113.696647] Unable to handle kernel paging request at virtual address ffffffff
> > [  113.704370] pgd = c0004000
> > [  113.707276] [ffffffff] *pgd=9fef6821, *pte=00000000, *ppte=00000000
> > [  113.713998] Internal error: Oops: 17 [#1] SMP ARM
> > [  113.718912] Modules linked in: g_mass_storage usb_f_mass_storage libcomposite configfs musb_dsps musb_hdrc musb_am335x
> > [  113.730144] CPU: 0 PID: 1368 Comm: file-storage Not tainted 3.17.0-02899-g748eb79 #239
> > [  113.738410] task: de606e00 ti: dd0ba000 task.ti: dd0ba000
> > [  113.744060] PC is at find_get_entry+0x64/0x100
> 
> Could you please provide the disassembly of that function?

here you go. It's ARM assembly however:

Dump of assembler code for function find_get_entry:
   0xc011da48 <+0>:	mov	r12, sp
   0xc011da4c <+4>:	push	{r4, r5, r6, r7, r8, r9, r11, r12, lr, pc}
   0xc011da50 <+8>:	sub	r11, r12, #4
   0xc011da54 <+12>:	sub	sp, sp, #16
   0xc011da58 <+16>:	push	{lr}		; (str lr, [sp, #-4]!)
   0xc011da5c <+20>:	bl	0xc000ef00 <__gnu_mcount_nc>
   0xc011da60 <+24>:	mov	r6, r0
   0xc011da64 <+28>:	mov	r7, r1
   0xc011da68 <+32>:	ldr	r2, [pc, #520]	; 0xc011dc78 <find_get_entry+560>
   0xc011da6c <+36>:	mov	r3, #0
   0xc011da70 <+40>:	mov	r1, r3
   0xc011da74 <+44>:	str	r2, [sp, #8]
   0xc011da78 <+48>:	str	r3, [sp]
   0xc011da7c <+52>:	mov	r2, r3
   0xc011da80 <+56>:	str	r3, [sp, #4]
   0xc011da84 <+60>:	ldr	r0, [pc, #496]	; 0xc011dc7c <find_get_entry+564>
   0xc011da88 <+64>:	mov	r3, #2
   0xc011da8c <+68>:	bl	0xc0095f88 <lock_acquire>
   0xc011da90 <+72>:	bl	0xc00a7b50 <debug_lockdep_rcu_enabled>
   0xc011da94 <+76>:	cmp	r0, #0
   0xc011da98 <+80>:	beq	0xc011daac <find_get_entry+100>
   0xc011da9c <+84>:	ldr	r4, [pc, #476]	; 0xc011dc80 <find_get_entry+568>
   0xc011daa0 <+88>:	ldrb	r3, [r4, #1]
   0xc011daa4 <+92>:	cmp	r3, #0
   0xc011daa8 <+96>:	beq	0xc011dbfc <find_get_entry+436>
   0xc011daac <+100>:	ldr	r8, [pc, #460]	; 0xc011dc80 <find_get_entry+568>
   0xc011dab0 <+104>:	add	r6, r6, #4
   0xc011dab4 <+108>:	mov	r5, #1
   0xc011dab8 <+112>:	mov	r0, r6
   0xc011dabc <+116>:	mov	r1, r7
   0xc011dac0 <+120>:	bl	0xc0364660 <radix_tree_lookup_slot>
   0xc011dac4 <+124>:	subs	r9, r0, #0
   0xc011dac8 <+128>:	beq	0xc011dc24 <find_get_entry+476>
   0xc011dacc <+132>:	ldr	r4, [r9]
   0xc011dad0 <+136>:	bl	0xc00a7b50 <debug_lockdep_rcu_enabled>
   0xc011dad4 <+140>:	cmp	r0, #0
   0xc011dad8 <+144>:	beq	0xc011dae8 <find_get_entry+160>
   0xc011dadc <+148>:	ldrb	r3, [r8, #2]
   0xc011dae0 <+152>:	cmp	r3, #0
   0xc011dae4 <+156>:	beq	0xc011dbcc <find_get_entry+388>
   0xc011dae8 <+160>:	cmp	r4, #0
   0xc011daec <+164>:	beq	0xc011dc24 <find_get_entry+476>
   0xc011daf0 <+168>:	tst	r4, #3
   0xc011daf4 <+172>:	bne	0xc011dc4c <find_get_entry+516>
   0xc011daf8 <+176>:	mov	r2, sp
   0xc011dafc <+180>:	bic	r3, r2, #8128	; 0x1fc0
   0xc011db00 <+184>:	bic	r3, r3, #63	; 0x3f
   0xc011db04 <+188>:	ldr	r2, [pc, #376]	; 0xc011dc84 <find_get_entry+572>
   0xc011db08 <+192>:	ldr	r3, [r3, #4]
   0xc011db0c <+196>:	and	r2, r2, r3
   0xc011db10 <+200>:	cmp	r2, #0
   0xc011db14 <+204>:	bne	0xc011dc68 <find_get_entry+544>
   0xc011db18 <+208>:	add	r3, r4, #16
   0xc011db1c <+212>:	mcr	15, 0, r2, cr7, cr10, {5}
   0xc011db20 <+216>:	mov	r2, #0
   0xc011db24 <+220>:	pld	[r3]
   0xc011db28 <+224>:	ldrex	r1, [r3]
   0xc011db2c <+228>:	teq	r1, r2
   0xc011db30 <+232>:	beq	0xc011db44 <find_get_entry+252>
   0xc011db34 <+236>:	add	r0, r1, r5
   0xc011db38 <+240>:	strex	r12, r0, [r3]
   0xc011db3c <+244>:	teq	r12, #0
   0xc011db40 <+248>:	bne	0xc011db28 <find_get_entry+224>
   0xc011db44 <+252>:	cmp	r1, #0
   0xc011db48 <+256>:	beq	0xc011dab8 <find_get_entry+112>
   0xc011db4c <+260>:	mov	r3, #0
   0xc011db50 <+264>:	mcr	15, 0, r3, cr7, cr10, {5}
   0xc011db54 <+268>:	ldr	r3, [r4]
   0xc011db58 <+272>:	tst	r3, #32768	; 0x8000
   0xc011db5c <+276>:	bne	0xc011dc58 <find_get_entry+528>
   0xc011db60 <+280>:	ldr	r3, [r9]
   0xc011db64 <+284>:	cmp	r3, r4
   0xc011db68 <+288>:	bne	0xc011dc6c <find_get_entry+548>
   0xc011db6c <+292>:	bl	0xc00a7b50 <debug_lockdep_rcu_enabled>
   0xc011db70 <+296>:	cmp	r0, #0
   0xc011db74 <+300>:	beq	0xc011db88 <find_get_entry+320>
   0xc011db78 <+304>:	ldr	r5, [pc, #256]	; 0xc011dc80 <find_get_entry+568>
   0xc011db7c <+308>:	ldrb	r3, [r5, #3]
   0xc011db80 <+312>:	cmp	r3, #0
   0xc011db84 <+316>:	beq	0xc011dba4 <find_get_entry+348>
   0xc011db88 <+320>:	ldr	r0, [pc, #236]	; 0xc011dc7c <find_get_entry+564>
   0xc011db8c <+324>:	mov	r1, #1
   0xc011db90 <+328>:	ldr	r2, [pc, #240]	; 0xc011dc88 <find_get_entry+576>
   0xc011db94 <+332>:	bl	0xc0096380 <lock_release>
   0xc011db98 <+336>:	sub	sp, r11, #36	; 0x24
   0xc011db9c <+340>:	mov	r0, r4
   0xc011dba0 <+344>:	ldm	sp, {r4, r5, r6, r7, r8, r9, r11, sp, pc}
   0xc011dba4 <+348>:	bl	0xc00aadc4 <rcu_is_watching>
   0xc011dba8 <+352>:	cmp	r0, #0
   0xc011dbac <+356>:	bne	0xc011db88 <find_get_entry+320>
   0xc011dbb0 <+360>:	mov	r3, #1
   0xc011dbb4 <+364>:	ldr	r0, [pc, #208]	; 0xc011dc8c <find_get_entry+580>
   0xc011dbb8 <+368>:	ldr	r1, [pc, #208]	; 0xc011dc90 <find_get_entry+584>
   0xc011dbbc <+372>:	ldr	r2, [pc, #208]	; 0xc011dc94 <find_get_entry+588>
   0xc011dbc0 <+376>:	strb	r3, [r5, #3]
   0xc011dbc4 <+380>:	bl	0xc00920cc <lockdep_rcu_suspicious>
   0xc011dbc8 <+384>:	b	0xc011db88 <find_get_entry+320>
   0xc011dbcc <+388>:	bl	0xc00a7b50 <debug_lockdep_rcu_enabled>
   0xc011dbd0 <+392>:	cmp	r0, #0
   0xc011dbd4 <+396>:	beq	0xc011dae8 <find_get_entry+160>
   0xc011dbd8 <+400>:	bl	0xc00aadc4 <rcu_is_watching>
   0xc011dbdc <+404>:	cmp	r0, #0
   0xc011dbe0 <+408>:	bne	0xc011dc2c <find_get_entry+484>
   0xc011dbe4 <+412>:	ldr	r0, [pc, #172]	; 0xc011dc98 <find_get_entry+592>
   0xc011dbe8 <+416>:	mov	r1, #196	; 0xc4
   0xc011dbec <+420>:	ldr	r2, [pc, #168]	; 0xc011dc9c <find_get_entry+596>
   0xc011dbf0 <+424>:	strb	r5, [r8, #2]
   0xc011dbf4 <+428>:	bl	0xc00920cc <lockdep_rcu_suspicious>
   0xc011dbf8 <+432>:	b	0xc011dae8 <find_get_entry+160>
   0xc011dbfc <+436>:	bl	0xc00aadc4 <rcu_is_watching>
   0xc011dc00 <+440>:	cmp	r0, #0
   0xc011dc04 <+444>:	bne	0xc011daac <find_get_entry+100>
   0xc011dc08 <+448>:	mov	r3, #1
   0xc011dc0c <+452>:	ldr	r0, [pc, #120]	; 0xc011dc8c <find_get_entry+580>
   0xc011dc10 <+456>:	mov	r1, #844	; 0x34c
   0xc011dc14 <+460>:	ldr	r2, [pc, #132]	; 0xc011dca0 <find_get_entry+600>
   0xc011dc18 <+464>:	strb	r3, [r4, #1]
   0xc011dc1c <+468>:	bl	0xc00920cc <lockdep_rcu_suspicious>
   0xc011dc20 <+472>:	b	0xc011daac <find_get_entry+100>
   0xc011dc24 <+476>:	mov	r4, #0
   0xc011dc28 <+480>:	b	0xc011db6c <find_get_entry+292>
   0xc011dc2c <+484>:	bl	0xc00ac38c <rcu_lockdep_current_cpu_online>
   0xc011dc30 <+488>:	cmp	r0, #0
   0xc011dc34 <+492>:	beq	0xc011dbe4 <find_get_entry+412>
   0xc011dc38 <+496>:	ldr	r0, [pc, #60]	; 0xc011dc7c <find_get_entry+564>
   0xc011dc3c <+500>:	bl	0xc0091264 <lock_is_held>
   0xc011dc40 <+504>:	cmp	r0, #0
   0xc011dc44 <+508>:	beq	0xc011dbe4 <find_get_entry+412>
   0xc011dc48 <+512>:	b	0xc011dae8 <find_get_entry+160>
   0xc011dc4c <+516>:	tst	r4, #1
   0xc011dc50 <+520>:	beq	0xc011db6c <find_get_entry+292>
   0xc011dc54 <+524>:	b	0xc011dab8 <find_get_entry+112>
   0xc011dc58 <+528>:	mov	r0, r4
   0xc011dc5c <+532>:	ldr	r1, [pc, #64]	; 0xc011dca4 <find_get_entry+604>
   0xc011dc60 <+536>:	bl	0xc01254d4 <dump_page>
   0xc011dc64 <+540>:			; <UNDEFINED> instruction: 0xe7f001f2
   0xc011dc68 <+544>:			; <UNDEFINED> instruction: 0xe7f001f2
   0xc011dc6c <+548>:	mov	r0, r4
   0xc011dc70 <+552>:	bl	0xc012db6c <put_page>
   0xc011dc74 <+556>:	b	0xc011dab8 <find_get_entry+112>
   0xc011dc78 <+560>:	andsgt	sp, r1, r8, asr #20
   0xc011dc7c <+564>:	adcgt	r2, r11, r8, lsl r2
   0xc011dc80 <+568>:	ldrhtgt	r0, [r0], r1
   0xc011dc84 <+572>:	andseq	pc, pc, r0, lsl #30
   0xc011dc88 <+576>:	andsgt	sp, r1, r8, lsl #23
   0xc011dc8c <+580>:	addgt	sp, r5, r8, lsl #5
   0xc011dc90 <+584>:	andeq	r0, r0, sp, ror r3
   0xc011dc94 <+588>:	ldrdgt	sp, [r5], r0
   0xc011dc98 <+592>:	addgt	sp, r7, r8, asr #7
   0xc011dc9c <+596>:	addgt	lr, r6, r8, lsl #17
   0xc011dca0 <+600>:	addgt	sp, r5, r4, lsr #5
   0xc011dca4 <+604>:	addgt	sp, r7, r4, ror #7
End of assembler dump.

> I'm thinking it's not the slot pointer itself that's bad, because
> __radix_tree_lookup() dereferences that to test if it's populated
> before returning it, and slot life-time is guaranteed by RCU.
> 
> That would only leave garbage in the slot itself, crashing during
> page_cache_get_speculative().
> 
> I'll keep staring at this change, but nothing stands out to me yet.

alright, it's pretty deterministic however. Always on the same test, no
matter which USB controller, no matter if backing store is RAM or MMC.

Those two undefined instructions on the disassembly caught my attention,
perhaps I'm facing a GCC bug ?

-- 
balbi

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: RCU bug with v3.17-rc3 ?
  2014-10-09 16:26                 ` Felipe Balbi
@ 2014-10-09 20:35                   ` Felipe Balbi
  2014-10-09 20:41                   ` Rabin Vincent
  1 sibling, 0 replies; 36+ messages in thread
From: Felipe Balbi @ 2014-10-09 20:35 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: Johannes Weiner, Andrew Morton, Linus Torvalds, Sasha Levin,
	Paul E. McKenney, Linux USB Mailing List, Alan Stern, josh,
	Linux Kernel Mailing List, Tony Lindgren,
	Linux OMAP Mailing List, Linux ARM Kernel Mailing List,
	Rik van Riel

[-- Attachment #1: Type: text/plain, Size: 1074 bytes --]

Hi,

On Thu, Oct 09, 2014 at 11:26:56AM -0500, Felipe Balbi wrote:
> > I'm thinking it's not the slot pointer itself that's bad, because
> > __radix_tree_lookup() dereferences that to test if it's populated
> > before returning it, and slot life-time is guaranteed by RCU.
> > 
> > That would only leave garbage in the slot itself, crashing during
> > page_cache_get_speculative().
> > 
> > I'll keep staring at this change, but nothing stands out to me yet.
> 
> alright, it's pretty deterministic however. Always on the same test, no
> matter which USB controller, no matter if backing store is RAM or MMC.
> 
> Those two undefined instructions on the disassembly caught my attention,
> perhaps I'm facing a GCC bug ?

no, probably not a GCC bug. Looking at your commit, however. Man, it
does quite many things at once. Moves code around, adds new functions by
refactoring (and changing) code, renames things, changes int offset into
unsigned ints. Should not be too difficult too to miss a bug in there.

I'll continue digging here.

-- 
balbi

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: RCU bug with v3.17-rc3 ?
  2014-10-09 16:26                 ` Felipe Balbi
  2014-10-09 20:35                   ` Felipe Balbi
@ 2014-10-09 20:41                   ` Rabin Vincent
  2014-10-09 20:46                     ` Felipe Balbi
  2014-10-09 21:47                     ` Aaro Koskinen
  1 sibling, 2 replies; 36+ messages in thread
From: Rabin Vincent @ 2014-10-09 20:41 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: Johannes Weiner, Andrew Morton, Linus Torvalds, Sasha Levin,
	Paul E. McKenney, Linux USB Mailing List, Alan Stern, josh,
	Linux Kernel Mailing List, Tony Lindgren,
	Linux OMAP Mailing List, Linux ARM Kernel Mailing List,
	Rik van Riel

On Thu, Oct 09, 2014 at 11:26:56AM -0500, Felipe Balbi wrote:
> alright, it's pretty deterministic however. Always on the same test, no
> matter which USB controller, no matter if backing store is RAM or MMC.
> 
> Those two undefined instructions on the disassembly caught my attention,
> perhaps I'm facing a GCC bug ?

The undefined instructions are just ARM's BUG() implementation.

But did you see the question I asked you yesterday in your other thread?
http://www.spinics.net/lists/arm-kernel/msg368634.html

Here it is again:

  What GCC version are you using?
  
  4.8.1 and 4.8.2 are known to miscompile the ARM kernel and these
  find_get_entry() crashes with 0xffffffff involved smell a lot like the
  earlier reports from kernels build with those compilers:
  
  https://lkml.org/lkml/2014/6/25/456
  https://lkml.org/lkml/2014/6/30/375
  https://lkml.org/lkml/2014/6/30/660
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854
  https://lkml.org/lkml/2014/5/9/330

Also, I didn't see any public email making a definitive link between GCC
PR 58854 that Nathan pointed out in https://lkml.org/lkml/2014/6/30/660
and the earlier find_get_entry() crashes, but I just built GCC 4.8.1 and
an ARM kernel with that, and the GCC bug is clearly seen in
radix_tree_lookup_slot() which returns the pointer which
find_get_entry() is dereferencing:

  <radix_tree_lookup_slot>:
   e1a0c00d  mov     ip, sp
   e92dd800  push    {fp, ip, lr, pc}
   e24cb004  sub     fp, ip, #4
   e24dd008  sub     sp, sp, #8
   e3a02000  mov     r2, #0
   e24b3010  sub     r3, fp, #16
   ebffffc5  bl      c0176ab8 <__radix_tree_lookup>
   e24bd00c  sub     sp, fp, #12		<--- sp moved up
   e3500000  cmp     r0, #0
   151b0010  ldrne   r0, [fp, #-16]		<--- load from under sp 
   e89da800  ldm     sp, {fp, sp, pc}

Please check your compiler to make sure it's not the same problem.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: RCU bug with v3.17-rc3 ?
  2014-10-09 20:41                   ` Rabin Vincent
@ 2014-10-09 20:46                     ` Felipe Balbi
  2014-10-09 21:07                       ` Felipe Balbi
  2014-10-09 21:47                     ` Aaro Koskinen
  1 sibling, 1 reply; 36+ messages in thread
From: Felipe Balbi @ 2014-10-09 20:46 UTC (permalink / raw)
  To: Rabin Vincent
  Cc: Felipe Balbi, Johannes Weiner, Andrew Morton, Linus Torvalds,
	Sasha Levin, Paul E. McKenney, Linux USB Mailing List,
	Alan Stern, josh, Linux Kernel Mailing List, Tony Lindgren,
	Linux OMAP Mailing List, Linux ARM Kernel Mailing List,
	Rik van Riel

[-- Attachment #1: Type: text/plain, Size: 714 bytes --]

Hi,

On Thu, Oct 09, 2014 at 10:41:01PM +0200, Rabin Vincent wrote:
> On Thu, Oct 09, 2014 at 11:26:56AM -0500, Felipe Balbi wrote:
> > alright, it's pretty deterministic however. Always on the same test, no
> > matter which USB controller, no matter if backing store is RAM or MMC.
> > 
> > Those two undefined instructions on the disassembly caught my attention,
> > perhaps I'm facing a GCC bug ?
> 
> The undefined instructions are just ARM's BUG() implementation.
> 
> But did you see the question I asked you yesterday in your other thread?
> http://www.spinics.net/lists/arm-kernel/msg368634.html

hmm, completely missed that, sorry. I'm using 4.8.2, will try something
else.

-- 
balbi

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: RCU bug with v3.17-rc3 ?
  2014-10-09 20:46                     ` Felipe Balbi
@ 2014-10-09 21:07                       ` Felipe Balbi
  2014-10-10 13:57                         ` Felipe Balbi
  0 siblings, 1 reply; 36+ messages in thread
From: Felipe Balbi @ 2014-10-09 21:07 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: Rabin Vincent, Johannes Weiner, Andrew Morton, Linus Torvalds,
	Sasha Levin, Paul E. McKenney, Linux USB Mailing List,
	Alan Stern, josh, Linux Kernel Mailing List, Tony Lindgren,
	Linux OMAP Mailing List, Linux ARM Kernel Mailing List,
	Rik van Riel

[-- Attachment #1: Type: text/plain, Size: 1060 bytes --]

Hi,

On Thu, Oct 09, 2014 at 03:46:37PM -0500, Felipe Balbi wrote:
> On Thu, Oct 09, 2014 at 10:41:01PM +0200, Rabin Vincent wrote:
> > On Thu, Oct 09, 2014 at 11:26:56AM -0500, Felipe Balbi wrote:
> > > alright, it's pretty deterministic however. Always on the same test, no
> > > matter which USB controller, no matter if backing store is RAM or MMC.
> > > 
> > > Those two undefined instructions on the disassembly caught my attention,
> > > perhaps I'm facing a GCC bug ?
> > 
> > The undefined instructions are just ARM's BUG() implementation.
> > 
> > But did you see the question I asked you yesterday in your other thread?
> > http://www.spinics.net/lists/arm-kernel/msg368634.html
> 
> hmm, completely missed that, sorry. I'm using 4.8.2, will try something
> else.

seems to be working fine now, thanks. I'll leave test running overnight
just in case.

thanks again, and sorry for the noise.

PS: I wonder if we should a warning message to the build system if we're
building with known broken versions of GCC.

-- 
balbi

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: RCU bug with v3.17-rc3 ?
  2014-10-09 20:41                   ` Rabin Vincent
  2014-10-09 20:46                     ` Felipe Balbi
@ 2014-10-09 21:47                     ` Aaro Koskinen
  2014-10-10 16:18                       ` Russell King - ARM Linux
  1 sibling, 1 reply; 36+ messages in thread
From: Aaro Koskinen @ 2014-10-09 21:47 UTC (permalink / raw)
  To: Rabin Vincent
  Cc: Felipe Balbi, Johannes Weiner, Andrew Morton, Linus Torvalds,
	Sasha Levin, Paul E. McKenney, Linux USB Mailing List,
	Alan Stern, josh, Linux Kernel Mailing List, Tony Lindgren,
	Linux OMAP Mailing List, Linux ARM Kernel Mailing List,
	Rik van Riel

Hi,

On Thu, Oct 09, 2014 at 10:41:01PM +0200, Rabin Vincent wrote:
>   What GCC version are you using?
>   
>   4.8.1 and 4.8.2 are known to miscompile the ARM kernel and these
>   find_get_entry() crashes with 0xffffffff involved smell a lot like the
>   earlier reports from kernels build with those compilers:
>   
>   https://lkml.org/lkml/2014/6/25/456
>   https://lkml.org/lkml/2014/6/30/375
>   https://lkml.org/lkml/2014/6/30/660
>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854
>   https://lkml.org/lkml/2014/5/9/330

Is it possible to blacklist those GCC versions on ARM somehow as it
seems people are still using them?

This bug also ruined a file system on one of my boxes last year
(see e.g. http://marc.info/?l=linux-arm-kernel&m=139033442527244&w=2).

A.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: RCU bug with v3.17-rc3 ?
  2014-10-09 21:07                       ` Felipe Balbi
@ 2014-10-10 13:57                         ` Felipe Balbi
  2014-10-10 16:25                           ` Russell King - ARM Linux
  0 siblings, 1 reply; 36+ messages in thread
From: Felipe Balbi @ 2014-10-10 13:57 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: Rabin Vincent, Johannes Weiner, Andrew Morton, Linus Torvalds,
	Sasha Levin, Paul E. McKenney, Linux USB Mailing List,
	Alan Stern, josh, Linux Kernel Mailing List, Tony Lindgren,
	Linux OMAP Mailing List, Linux ARM Kernel Mailing List,
	Rik van Riel

[-- Attachment #1: Type: text/plain, Size: 1047 bytes --]

On Thu, Oct 09, 2014 at 04:07:15PM -0500, Felipe Balbi wrote:
> Hi,
> 
> On Thu, Oct 09, 2014 at 03:46:37PM -0500, Felipe Balbi wrote:
> > On Thu, Oct 09, 2014 at 10:41:01PM +0200, Rabin Vincent wrote:
> > > On Thu, Oct 09, 2014 at 11:26:56AM -0500, Felipe Balbi wrote:
> > > > alright, it's pretty deterministic however. Always on the same test, no
> > > > matter which USB controller, no matter if backing store is RAM or MMC.
> > > > 
> > > > Those two undefined instructions on the disassembly caught my attention,
> > > > perhaps I'm facing a GCC bug ?
> > > 
> > > The undefined instructions are just ARM's BUG() implementation.
> > > 
> > > But did you see the question I asked you yesterday in your other thread?
> > > http://www.spinics.net/lists/arm-kernel/msg368634.html
> > 
> > hmm, completely missed that, sorry. I'm using 4.8.2, will try something
> > else.
> 
> seems to be working fine now, thanks. I'll leave test running overnight
> just in case.

yup, ran over night without any problems.

-- 
balbi

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: RCU bug with v3.17-rc3 ?
  2014-10-09 21:47                     ` Aaro Koskinen
@ 2014-10-10 16:18                       ` Russell King - ARM Linux
  2014-10-10 20:52                         ` Aaro Koskinen
  0 siblings, 1 reply; 36+ messages in thread
From: Russell King - ARM Linux @ 2014-10-10 16:18 UTC (permalink / raw)
  To: Aaro Koskinen
  Cc: Rabin Vincent, Rik van Riel, Linux OMAP Mailing List,
	Tony Lindgren, Linux USB Mailing List, josh, Felipe Balbi,
	Linux Kernel Mailing List, Alan Stern, Johannes Weiner,
	Sasha Levin, Andrew Morton, Paul E. McKenney, Linus Torvalds,
	Linux ARM Kernel Mailing List

On Fri, Oct 10, 2014 at 12:47:06AM +0300, Aaro Koskinen wrote:
> Hi,
> 
> On Thu, Oct 09, 2014 at 10:41:01PM +0200, Rabin Vincent wrote:
> >   What GCC version are you using?
> >   
> >   4.8.1 and 4.8.2 are known to miscompile the ARM kernel and these
> >   find_get_entry() crashes with 0xffffffff involved smell a lot like the
> >   earlier reports from kernels build with those compilers:
> >   
> >   https://lkml.org/lkml/2014/6/25/456
> >   https://lkml.org/lkml/2014/6/30/375
> >   https://lkml.org/lkml/2014/6/30/660
> >   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854
> >   https://lkml.org/lkml/2014/5/9/330
> 
> Is it possible to blacklist those GCC versions on ARM somehow as it
> seems people are still using them?
> 
> This bug also ruined a file system on one of my boxes last year
> (see e.g. http://marc.info/?l=linux-arm-kernel&m=139033442527244&w=2).

Given that, why the fsck (pun intended) did you not shout a little louder
about getting it blacklisted.  Looking at your marc.info URL, there's
very little information there which hints at filesystem corruption, and
it's a thread of only *one* message according to marc.info.

Even _if_ I did read the message you point to above, that on its own did
not hint at filesystem corruption.

So, would you please mind passing on further details about this,
specifically which function in the ext4 code is affected, so it can
be properly written up.

Thanks.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: RCU bug with v3.17-rc3 ?
  2014-10-10 13:57                         ` Felipe Balbi
@ 2014-10-10 16:25                           ` Russell King - ARM Linux
  2014-10-11  1:44                             ` Nathan Lynch
  0 siblings, 1 reply; 36+ messages in thread
From: Russell King - ARM Linux @ 2014-10-10 16:25 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: Rik van Riel, Linux OMAP Mailing List, Tony Lindgren,
	Linux USB Mailing List, josh, Linux Kernel Mailing List,
	Rabin Vincent, Alan Stern, Johannes Weiner, Sasha Levin,
	Andrew Morton, Paul E. McKenney, Linus Torvalds,
	Linux ARM Kernel Mailing List

On Fri, Oct 10, 2014 at 08:57:43AM -0500, Felipe Balbi wrote:
> On Thu, Oct 09, 2014 at 04:07:15PM -0500, Felipe Balbi wrote:
> > Hi,
> > 
> > On Thu, Oct 09, 2014 at 03:46:37PM -0500, Felipe Balbi wrote:
> > > On Thu, Oct 09, 2014 at 10:41:01PM +0200, Rabin Vincent wrote:
> > > > On Thu, Oct 09, 2014 at 11:26:56AM -0500, Felipe Balbi wrote:
> > > > > alright, it's pretty deterministic however. Always on the same test, no
> > > > > matter which USB controller, no matter if backing store is RAM or MMC.
> > > > > 
> > > > > Those two undefined instructions on the disassembly caught my attention,
> > > > > perhaps I'm facing a GCC bug ?
> > > > 
> > > > The undefined instructions are just ARM's BUG() implementation.
> > > > 
> > > > But did you see the question I asked you yesterday in your other thread?
> > > > http://www.spinics.net/lists/arm-kernel/msg368634.html
> > > 
> > > hmm, completely missed that, sorry. I'm using 4.8.2, will try something
> > > else.
> > 
> > seems to be working fine now, thanks. I'll leave test running overnight
> > just in case.
> 
> yup, ran over night without any problems.

Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and
it seems that this has been known about for some time.)

We can blacklist these GCC versions quite easily.  We already have GCC
3.3 blacklisted, and it's trivial to add others.  I would want to include
some proper details about the bug, just like the other existing entries
we already have in asm-offsets.c, where we name the functions that the
compiler is known to break where appropriate.

However, I'm rather annoyed that there are people here who have known
for some time that GCC 4.8.1 and GCC 4.8.2 _can_ lead to filesystem
corruption, and have sat on their backsides doing nothing about getting
it blacklisted for something like a year.

When people talk about the ARM community being dysfunctional... well,
this kind of irresponsible behaviour just gives them more fodder to
throw at us.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: RCU bug with v3.17-rc3 ?
  2014-10-10 16:18                       ` Russell King - ARM Linux
@ 2014-10-10 20:52                         ` Aaro Koskinen
  0 siblings, 0 replies; 36+ messages in thread
From: Aaro Koskinen @ 2014-10-10 20:52 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Rabin Vincent, Rik van Riel, Linux OMAP Mailing List,
	Tony Lindgren, Linux USB Mailing List, josh, Felipe Balbi,
	Linux Kernel Mailing List, Alan Stern, Johannes Weiner,
	Sasha Levin, Andrew Morton, Paul E. McKenney, Linus Torvalds,
	Linux ARM Kernel Mailing List

On Fri, Oct 10, 2014 at 05:18:35PM +0100, Russell King - ARM Linux wrote:
> On Fri, Oct 10, 2014 at 12:47:06AM +0300, Aaro Koskinen wrote:
> > On Thu, Oct 09, 2014 at 10:41:01PM +0200, Rabin Vincent wrote:
> > >   What GCC version are you using?
> > >   
> > >   4.8.1 and 4.8.2 are known to miscompile the ARM kernel and these
> > >   find_get_entry() crashes with 0xffffffff involved smell a lot like the
> > >   earlier reports from kernels build with those compilers:
> > >   
> > >   https://lkml.org/lkml/2014/6/25/456
> > >   https://lkml.org/lkml/2014/6/30/375
> > >   https://lkml.org/lkml/2014/6/30/660
> > >   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58854
> > >   https://lkml.org/lkml/2014/5/9/330
> > 
> > Is it possible to blacklist those GCC versions on ARM somehow as it
> > seems people are still using them?
> > 
> > This bug also ruined a file system on one of my boxes last year
> > (see e.g. http://marc.info/?l=linux-arm-kernel&m=139033442527244&w=2).
> 
> Given that, why the fsck (pun intended) did you not shout a little louder
> about getting it blacklisted.  Looking at your marc.info URL, there's
> very little information there which hints at filesystem corruption, and
> it's a thread of only *one* message according to marc.info.
> 
> Even _if_ I did read the message you point to above, that on its own did
> not hint at filesystem corruption.
> 
> So, would you please mind passing on further details about this,
> specifically which function in the ext4 code is affected, so it can
> be properly written up.

I have not done any proper deeper analysis. After I first mailed about
the issue I just downgraded GCC and pretty much forgot about it until
an engineer from some commercial Linux vendor replied privately months
later and kindly pointed me the needed GCC fix (which I then shared
in the reply). Then I just moved on using a newer GCC with no issues.
Obviously this was not a widespread problem since no one else
reported the same.

Today I again booted a kernel compiled with GCC 4.8.2 and still was able
reproduce the issue, and I think below shows that at least ext3 can
easily end up in inconsistent state using these compiler versions:

0) Run the bad kernel:

~ # dmesg|grep GCC
[    0.000000] Linux version 3.17.0-mvebu-los_9755+ (aaro@cooljazz) (gcc version 4.8.2 (GCC) ) #1 Fri Oct 10 21:05:20 EEST 2014

1) Start with small ext3 (writeback) fs with gcc tarball:

/mnt/test # ls -l
total 84092
-rw-r--r--    1 root     root      85999682 Apr 24 21:52 gcc-4.8.2.tar.bz2
drwx------    2 root     root         16384 Oct 10 10:33 lost+found
/mnt/test # df -h .
Filesystem                Size      Used Available Use% Mounted on
/dev/sdc1                 3.8G     90.2M      3.5G   2% /mnt/test

2) Extract, delete & crash:

/mnt/test # tar xjf gcc-4.8.2.tar.bz2
/mnt/test # rm -rf gcc-4.8.2
rm: can't remove 'gcc-4.8.2/libgfortran/generated': Directory not empty
rm: can't remove 'gcc-4.8.2/libgfortran': Directory not empty
rm: can't remove 'gcc-4.8.2/gcc/testsuite/gcc.dg/compat/struct-by-value-18a_y.c': No such file or directory
rm: can't remove 'gcc-4.8.2/gcc/testsuite/gcc.dg/compat': Directory not empty
rm: can't remove 'gcc-4.8.2/gcc/testsuite/gcc.dg/tree-ssa': Directory not empty
rm: can't remove 'gcc-4.8.2/gcc/testsuite/gcc.dg': Directory not empty
rm: can't remove 'gcc-4.8.2/gcc/testsuite/gfortran.dg/result_default_init_1.f90': No such file or directory
rm: can't remove 'gcc-4.8.2/gcc/testsuite/gfortran.dg': Directory not empty
[  960.864433] Unable to handle kernel paging request at virtual address ffffffff
[  960.930597] pgd = df6e0000
[  960.990849] [ffffffff] *pgd=1fffd831, *pte=00000000, *ppte=00000000
[  961.056512] Internal error: Oops: 1 [#1] ARM
[  961.120063] Modules linked in:
[  961.180974] CPU: 0 PID: 684 Comm: rm Not tainted 3.17.0-mvebu-los_9755+ #1
[  961.247146] task: df447b00 ti: df4de000 task.ti: df4de000
[  961.311524] PC is at find_get_entry+0x28/0x84
[  961.375037] LR is at radix_tree_lookup_slot+0x1c/0x2c
[  961.439061] pc : [<c006e418>]    lr : [<c018392c>]    psr: a0000013
[  961.439061] sp : df4dfc68  ip : 00000000  fp : df4dfc7c
[  961.570018] r10: 00000001  r9 : c04e3253  r8 : df020b60
[  961.634596] r7 : 0009001a  r6 : 00000000  r5 : 0009001a  r4 : df020c90
[  961.700070] r3 : ffffffff  r2 : 00000000  r1 : 0009001a  r0 : ffffffff
[  961.764437] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[  961.830518] Control: 0005317f  Table: 1f6e0000  DAC: 00000015
[  961.895866] Process rm (pid: 684, stack limit = 0xdf4de1c0)
[  961.960597] Stack: (0xdf4dfc68 to 0xdf4e0000)
[  962.022968] fc60:                   00000001 df020c8c df4dfcb4 df4dfc80 c006eef68 c006e400
[  962.091214] fc80: c00d4e80 c00d4764 00001000 0009001a 00000000 00000000 df0200b60 df020b60
[  962.159490] fca0: df020bd8 df04e4d8 df4dfd04 df4dfcb8 c00d34c0 c006ef44 000000000 df4dfcc8
[  962.226940] fcc0: c00d4e80 c00d4764 00001000 00000001 df4dfd84 dd1c73f0 000900306 00000000
[  962.295558] fce0: 00090068 00000000 00000000 df020b60 df04e4d8 00000181 df4dffd4c df4dfd08
[  962.364710] fd00: c00d4828 c00d347c 00000000 00000001 df4dfdc4 dd1c73f0 000000000 00000000
[  962.433394] fd20: 00000000 00000000 df4dfd84 00090002 00001000 dbaa2200 df0200b60 df04e4d8
[  962.501810] fd40: df4dfdbc df4dfd50 c00d4e80 c00d4764 00001000 df4dfd60 c01411284 c0148708
[  962.569685] fd60: 0009001a 00000000 c0ebc7c0 df041180 00000002 00000000 df4dffd9c df4dfd88
[  962.639143] fd80: c003813c c0038084 df041180 df0b7320 df4dfdac 00090002 000000000 dbaa2200
[  962.708562] fda0: df4dfe4c df04e4d8 00000181 df04e4d8 df4dfe24 df4dfdc0 c010887c0 c00d4e6c
[  962.778108] fdc0: 00001000 c038caf8 0000128f 00000000 00000000 00011000 000000001 c9c59740
[  962.846670] fde0: 0009001a 00000000 00000a26 c824f240 00000010 00000000 df4dffe1c df04e4d8
[  962.913956] fe00: df04e4d8 df4dfe4c de53cf40 de53cf40 00000000 df04e4d8 df4dffe44 df4dfe28
[  962.980679] fe20: c010c5a8 c01086c4 df04e4d8 dee12000 dbaa2200 df04e4b4 df4dffe84 df4dfe48
[  963.046696] fe40: c0115dc4 c010c584 dd1c73f0 00000000 00000100 00000012 000000000 c0fbfe00
[  963.112648] fe60: df04e4d8 dd1c73f0 de53cf40 00000000 df4dff04 df04e4d8 df4dffecc df4dfe88
[  963.178402] fe80: c0116b24 c0115ce0 00000000 c00b3b24 df4dfeac c067b174 5437dd0a4 22921900
[  963.244947] fea0: df4dfecc df4dfeb0 c00b7a50 c19ca440 df04e4d8 df04e534 dd1c773f0 000b6650
[  963.311517] fec0: df4dfefc df4dfed0 c00b7e4c c01168d8 df4dfefc df4dfee0 c19caa440 00000000
[  963.377319] fee0: df4e6000 00000000 000b6650 ffffff9c df4dff94 df4dff00 c00b880b0 c00b7d94
[  963.443083] ff00: 5437d035 00000000 dba4a8d0 d899f6e8 78ae7ba4 0000000d df4e6603c 0000000c
[  963.509416] ff20: 00000000 c0009624 dd1c73f0 00000000 00000004 00000038 000000000 00000000
[  963.575556] ff40: 00024182 00000000 00800021 c04c81b4 00000001 000003e8 0000003e8 00000000
[  963.641281] ff60: 0000024d 00000000 4bfad53f 000b6650 00000008 0000000c 00000000a c0009624
[  963.707194] ff80: df4de000 00000000 df4dffa4 df4dff98 c00b8e20 c00b7ed0 000000000 df4dffa8
[  963.773584] ffa0: c00094c0 c00b8e18 000b6650 00000008 000b6650 bed03990 bed033990 00008000
[  963.841022] ffc0: 000b6650 00000008 0000000c 0000000a 000b6650 00000000 b6fccc000 00000000
[  963.907530] ffe0: 00093224 bed0398c 00071284 b6efa39c 60000010 000b6650 0000fffff 0000ffff
[  963.973653] Backtrace: [  964.032680] [<c006e3f0>] (find_get_entry) from [<c006ef68>] (pagecache_get_page+0x34/0x1fc)
[  964.100751]  r5:df020c8c r4:00000001
[  964.162591] [<c006ef34>] (pagecache_get_page) from [<c00d34c0>] (__find_get_b
block_slow+0x54/0x16c)
[  964.291505]  r10:df04e4d8 r9:df020bd8 r8:df020b60 r7:df020b60 r6:00000000 r5:
:00000000
[  964.361857]  r4:0009001a
[  964.425342] [<c00d346c>] (__find_get_block_slow) from [<c00d4828>] (__find_ge
et_block+0xd4/0x1e4)
[  964.498345]  r9:00000181 r8:df04e4d8 r7:df020b60 r6:00000000 r5:00000000 r4:0
00090068
[  964.570979] [<c00d4754>] (__find_get_block) from [<c00d4e80>] (__getblk+0x24/
/0x358)
[  964.643833]  r8:df04e4d8 r7:df020b60 r6:dbaa2200 r5:00001000 r4:00090002
[  964.716031] [<c00d4e5c>] (__getblk) from [<c01087c0>] (__ext4_get_inode_loc+0
0x10c/0x454)
[  964.790734]  r10:df04e4d8 r9:00000181 r8:df04e4d8 r7:df4dfe4c r6:dbaa2200 r5:
:00000000
[  964.865945]  r4:00090002
[  964.934187] [<c01086b4>] (__ext4_get_inode_loc) from [<c010c5a8>] (ext4_reser
rve_inode_write+0x34/0x9c)
[  965.080216]  r10:df04e4d8 r9:00000000 r8:de53cf40 r7:de53cf40 r6:df4dfe4c r5:
:df04e4d8
[  965.159656]  r4:df04e4d8
[  965.232230] [<c010c574>] (ext4_reserve_inode_write) from [<c0115dc4>] (ext4_o
orphan_add+0xf4/0x218)
[  965.385687]  r7:df04e4b4 r6:dbaa2200 r5:dee12000 r4:df04e4d8
[  965.464523] [<c0115cd0>] (ext4_orphan_add) from [<c0116b24>] (ext4_unlink+0x2
25c/0x26c)
[  965.547430]  r10:df04e4d8 r9:df4dff04 r8:00000000 r7:de53cf40 r6:dd1c73f0 r5:
:df04e4d8
[  965.631429]  r4:c0fbfe00
[  965.708445] [<c01168c8>] (ext4_unlink) from [<c00b7e4c>] (vfs_unlink+0xc8/0x1
13c)
[  965.792677]  r8:000b6650 r7:dd1c73f0 r6:df04e534 r5:df04e4d8 r4:c19ca440
[  965.877297] [<c00b7d84>] (vfs_unlink) from [<c00b80b0>] (do_unlinkat+0x1f0/0x
x210)
[  965.963851]  r9:ffffff9c r8:000b6650 r7:00000000 r6:df4e6000 r5:00000000 r4:c
c19ca440
[  966.051666] [<c00b7ec0>] (do_unlinkat) from [<c00b8e20>] (SyS_unlink+0x18/0x1
1c)
[  966.139262]  r10:00000000 r9:df4de000 r8:c0009624 r7:0000000a r6:0000000c r5:
:00000008
[  966.228970]  r4:000b6650
[  966.311776] [<c00b8e08>] (SyS_unlink) from [<c00094c0>] (ret_fast_syscall+0x0
0/0x2c)
[  966.401452] Code: e1a01005 eb04553f e2503000 0a00000f (e5930000) 
[  966.608250] ---[ end trace a1b54af48fda09ed ]---
[  966.693854] Kernel panic - not syncing: Fatal exception
[  966.781707] ---[ end Kernel panic - not syncing: Fatal exception

3) Boot a good kernel:

~ # dmesg | grep GCC
[    0.000000] Linux version 3.17.0-mvebu-los_1b42 (aaro@cooljazz) (gcc version 4.9.1 (GCC) ) #1 Thu Oct 9 06:46:07 EEST 2014

4) Use the beforementioned file system and try to clean the mess:

/mnt/test # df -h .
Filesystem                Size      Used Available Use% Mounted on
/dev/sdc1                 3.8G    796.2M      2.8G  22% /mnt/test
/mnt/test # rm -rf gcc-4.8.2
rm: can't remove 'gcc-4.8.2/gcc/testsuite/gcc.dg/tree-ssa': Directory not empty
rm: can't remove 'gcc-4.8.2/gcc/testsuite/gcc.dg': Directory not empty
rm: can't remove 'gcc-4.8.2/gcc/testsuite/gfortran.dg': Directory not empty
rm: can't remove 'gcc-4.8.2/gcc/testsuite': Directory not empty
rm: can't remove 'gcc-4.8.2/gcc': Directory not empty
rm: can't remove 'gcc-4.8.2': Directory not empty
/mnt/test # rm -rf gcc-4.8.2
rm: can't remove 'gcc-4.8.2/gcc/testsuite/gcc.dg/tree-ssa': Directory not empty
rm: can't remove 'gcc-4.8.2/gcc/testsuite/gcc.dg': Directory not empty
rm: can't remove 'gcc-4.8.2/gcc/testsuite/gfortran.dg': Directory not empty
rm: can't remove 'gcc-4.8.2/gcc/testsuite': Directory not empty
rm: can't remove 'gcc-4.8.2/gcc': Directory not empty
rm: can't remove 'gcc-4.8.2': Directory not empty
/mnt/test # df -h .
Filesystem                Size      Used Available Use% Mounted on
/dev/sdc1                 3.8G     90.5M      3.5G   2% /mnt/test
/mnt/test # find gcc-4.8.2
gcc-4.8.2
gcc-4.8.2/gcc
gcc-4.8.2/gcc/testsuite
gcc-4.8.2/gcc/testsuite/gcc.dg
gcc-4.8.2/gcc/testsuite/gcc.dg/tree-ssa
find: gcc-4.8.2/gcc/testsuite/gcc.dg/tree-ssa/forwprop-8.c: No such file or directory
gcc-4.8.2/gcc/testsuite/gfortran.dg
find: gcc-4.8.2/gcc/testsuite/gfortran.dg/result_default_init_1.f90: No such file or directory

5) fsck to rescue:

/mnt/test # cd /
~ # umount /mnt/test
~ # fsck /dev/sdc1
fsck 1.42.9 (28-Dec-2013)
e2fsck 1.42.9 (28-Dec-2013)
/dev/sdc1: clean, 21/262144 files, 72408/1048576 blocks
~ # fsck -f /dev/sdc1
fsck 1.42.9 (28-Dec-2013)
e2fsck 1.42.9 (28-Dec-2013)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Problem in HTREE directory inode 118267: block #4 has bad min hash
Problem in HTREE directory inode 118267: block #26 has bad max hash
Invalid HTREE directory inode 118267 (/gcc-4.8.2/gcc/testsuite/gfortran.dg).  Clear HTree index<y>? yes
Problem in HTREE directory inode 174218: block #8 has bad min hash
Invalid HTREE directory inode 174218 (/gcc-4.8.2/gcc/testsuite/gcc.dg/tree-ssa).  Clear HTree index<y>? yes
Pass 3: Checking directory connectivity
Pass 3A: Optimizing directories
Pass 4: Checking reference counts
Pass 5: Checking group summary information

/dev/sdc1: ***** FILE SYSTEM WAS MODIFIED *****
/dev/sdc1: 21/262144 files (19.0% non-contiguous), 72368/1048576 blocks
~ # mount /dev/sdc1 /mnt/
~ # rm -rf /mnt/gcc-4.8.2
~ # 

So in this case fsck was able to fix it.

A.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: RCU bug with v3.17-rc3 ?
  2014-10-10 16:25                           ` Russell King - ARM Linux
@ 2014-10-11  1:44                             ` Nathan Lynch
  2014-10-11  2:40                               ` Peter Hurley
                                                 ` (4 more replies)
  0 siblings, 5 replies; 36+ messages in thread
From: Nathan Lynch @ 2014-10-11  1:44 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Felipe Balbi, Rik van Riel, Paul E. McKenney, Tony Lindgren,
	Linux USB Mailing List, Linux Kernel Mailing List, josh,
	Rabin Vincent, Alan Stern, Johannes Weiner, Sasha Levin,
	Andrew Morton, Linux OMAP Mailing List, Linus Torvalds,
	Linux ARM Kernel Mailing List

On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote:
> 
> Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and
> it seems that this has been known about for some time.)

Looking at http://gcc.gnu.org/PR58854 it seems that all 4.8.x for x < 3
are affected, as well as 4.9.0.

> We can blacklist these GCC versions quite easily.  We already have GCC
> 3.3 blacklisted, and it's trivial to add others.  I would want to include
> some proper details about the bug, just like the other existing entries
> we already have in asm-offsets.c, where we name the functions that the
> compiler is known to break where appropriate.

Before blacklisting anything, it's worth considering that simple version
checks would break existing pre-4.8.3 compilers that have been patched
for PR58854.  It looks like Yocto and Buildroot issued releases with
patched 4.8.2 compilers well before the (fixed) 4.8.3 release.  I think
the most we can reasonably do without breaking some correctly-behaving
toolchains is to emit a warning.

Hopefully nobody's still using gcc 4.8 from the Linaro 2013.11 toolchain
release -- since it's a 4.8.3 prerelease from before the fix was
committed you'll get GCC_VERSION == 40803 but still generate bad code.

> However, I'm rather annoyed that there are people here who have known
> for some time that GCC 4.8.1 and GCC 4.8.2 _can_ lead to filesystem
> corruption, and have sat on their backsides doing nothing about getting
> it blacklisted for something like a year.

Mea culpa, although I hadn't drawn the connection to FS corruption
reports until now.  I have known about the issue for some time, but
figured the prevalence of the fix in downstream projects largely
mitigated the issue.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: RCU bug with v3.17-rc3 ?
  2014-10-11  1:44                             ` Nathan Lynch
@ 2014-10-11  2:40                               ` Peter Hurley
  2014-10-11  3:54                               ` Peter Chen
                                                 ` (3 subsequent siblings)
  4 siblings, 0 replies; 36+ messages in thread
From: Peter Hurley @ 2014-10-11  2:40 UTC (permalink / raw)
  To: Nathan Lynch, Russell King - ARM Linux
  Cc: Felipe Balbi, Rik van Riel, Paul E. McKenney, Tony Lindgren,
	Linux USB Mailing List, Linux Kernel Mailing List, josh,
	Rabin Vincent, Alan Stern, Johannes Weiner, Sasha Levin,
	Andrew Morton, Linux OMAP Mailing List, Linus Torvalds,
	Linux ARM Kernel Mailing List

On 10/10/2014 09:44 PM, Nathan Lynch wrote:
> On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote:
>>
>> Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and
>> it seems that this has been known about for some time.)
> 
> Looking at http://gcc.gnu.org/PR58854 it seems that all 4.8.x for x < 3
> are affected, as well as 4.9.0.
> 
>> We can blacklist these GCC versions quite easily.  We already have GCC
>> 3.3 blacklisted, and it's trivial to add others.  I would want to include
>> some proper details about the bug, just like the other existing entries
>> we already have in asm-offsets.c, where we name the functions that the
>> compiler is known to break where appropriate.
> 
> Before blacklisting anything, it's worth considering that simple version
> checks would break existing pre-4.8.3 compilers that have been patched
> for PR58854.  It looks like Yocto and Buildroot issued releases with
> patched 4.8.2 compilers well before the (fixed) 4.8.3 release.  I think
> the most we can reasonably do without breaking some correctly-behaving
> toolchains is to emit a warning.

Providing a manual switch to override blacklisting is way more sane
than a build warning that no one's looking at.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: RCU bug with v3.17-rc3 ?
  2014-10-11  1:44                             ` Nathan Lynch
  2014-10-11  2:40                               ` Peter Hurley
@ 2014-10-11  3:54                               ` Peter Chen
  2014-10-11 14:16                                 ` Russell King - ARM Linux
  2014-10-11 14:14                               ` Russell King - ARM Linux
                                                 ` (2 subsequent siblings)
  4 siblings, 1 reply; 36+ messages in thread
From: Peter Chen @ 2014-10-11  3:54 UTC (permalink / raw)
  To: Nathan Lynch
  Cc: Russell King - ARM Linux, Felipe Balbi, Rik van Riel,
	Paul E. McKenney, Tony Lindgren, Linux USB Mailing List,
	Linux Kernel Mailing List, josh, Rabin Vincent, Alan Stern,
	Johannes Weiner, Sasha Levin, Andrew Morton,
	Linux OMAP Mailing List, Linus Torvalds,
	Linux ARM Kernel Mailing List

On Fri, Oct 10, 2014 at 08:44:33PM -0500, Nathan Lynch wrote:
> On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote:
> > 
> > Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and
> > it seems that this has been known about for some time.)
> 
> Looking at http://gcc.gnu.org/PR58854 it seems that all 4.8.x for x < 3
> are affected, as well as 4.9.0.
> 
> > We can blacklist these GCC versions quite easily.  We already have GCC
> > 3.3 blacklisted, and it's trivial to add others.  I would want to include
> > some proper details about the bug, just like the other existing entries
> > we already have in asm-offsets.c, where we name the functions that the
> > compiler is known to break where appropriate.
> 
> Before blacklisting anything, it's worth considering that simple version
> checks would break existing pre-4.8.3 compilers that have been patched
> for PR58854.  It looks like Yocto and Buildroot issued releases with
> patched 4.8.2 compilers well before the (fixed) 4.8.3 release.  I think
> the most we can reasonably do without breaking some correctly-behaving
> toolchains is to emit a warning.

Yocto has PR58854 problem patch.

http://git.yoctoproject.org/cgit/cgit.cgi/poky/tree/meta/recipes-devtools/gcc/gcc-4.8/0048-PR58854_fix_arm_apcs_epilogue.patch?h=daisy

> 
> Hopefully nobody's still using gcc 4.8 from the Linaro 2013.11 toolchain
> release -- since it's a 4.8.3 prerelease from before the fix was
> committed you'll get GCC_VERSION == 40803 but still generate bad code.
> 
> > However, I'm rather annoyed that there are people here who have known
> > for some time that GCC 4.8.1 and GCC 4.8.2 _can_ lead to filesystem
> > corruption, and have sat on their backsides doing nothing about getting
> > it blacklisted for something like a year.
> 
> Mea culpa, although I hadn't drawn the connection to FS corruption
> reports until now.  I have known about the issue for some time, but
> figured the prevalence of the fix in downstream projects largely
> mitigated the issue.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-usb" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Best Regards,
Peter Chen

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: RCU bug with v3.17-rc3 ?
  2014-10-11  1:44                             ` Nathan Lynch
  2014-10-11  2:40                               ` Peter Hurley
  2014-10-11  3:54                               ` Peter Chen
@ 2014-10-11 14:14                               ` Russell King - ARM Linux
  2014-10-11 19:27                               ` Nathan Lynch
  2014-10-13  9:11                               ` David Laight
  4 siblings, 0 replies; 36+ messages in thread
From: Russell King - ARM Linux @ 2014-10-11 14:14 UTC (permalink / raw)
  To: Nathan Lynch
  Cc: Felipe Balbi, Rik van Riel, Paul E. McKenney, Tony Lindgren,
	Linux USB Mailing List, Linux Kernel Mailing List, josh,
	Rabin Vincent, Alan Stern, Johannes Weiner, Sasha Levin,
	Andrew Morton, Linux OMAP Mailing List, Linus Torvalds,
	Linux ARM Kernel Mailing List

On Fri, Oct 10, 2014 at 08:44:33PM -0500, Nathan Lynch wrote:
> On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote:
> > We can blacklist these GCC versions quite easily.  We already have GCC
> > 3.3 blacklisted, and it's trivial to add others.  I would want to include
> > some proper details about the bug, just like the other existing entries
> > we already have in asm-offsets.c, where we name the functions that the
> > compiler is known to break where appropriate.
> 
> Before blacklisting anything, it's worth considering that simple version
> checks would break existing pre-4.8.3 compilers that have been patched
> for PR58854.  It looks like Yocto and Buildroot issued releases with
> patched 4.8.2 compilers well before the (fixed) 4.8.3 release.  I think
> the most we can reasonably do without breaking some correctly-behaving
> toolchains is to emit a warning.

I wish that it was possible to just do the warning thing, but unfortunately
evidence is that many people ignore compiler warnings, because they see
them appearing from the kernel soo often they have become de-sensitised
to them.

This is pretty obvious from the various nightly build systems which produce
the same warnings for months without any progress on them - some of them
can be quite serious (oops-able) where printf format strings are concerned.

> > for some time that GCC 4.8.1 and GCC 4.8.2 _can_ lead to filesystem
> > corruption, and have sat on their backsides doing nothing about getting
> > it blacklisted for something like a year.
> 
> Mea culpa, although I hadn't drawn the connection to FS corruption
> reports until now.  I have known about the issue for some time, but
> figured the prevalence of the fix in downstream projects largely
> mitigated the issue.

It's the FS corruption which swings it in favour of a #error - even if
we have a bunch of compilers around with that version which have the
problem fixed, it's /far/ better to #error out.  Those people who know
definitely that they have a fixed compiler can comment out the test
after checking that they do indeed have a fixed version, or are willing
to take the risk.

What we can't do is have kernels built by people who then run into FS
corruption because of this known issue.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: RCU bug with v3.17-rc3 ?
  2014-10-11  3:54                               ` Peter Chen
@ 2014-10-11 14:16                                 ` Russell King - ARM Linux
  2014-10-11 14:51                                   ` Otavio Salvador
  0 siblings, 1 reply; 36+ messages in thread
From: Russell King - ARM Linux @ 2014-10-11 14:16 UTC (permalink / raw)
  To: Peter Chen
  Cc: Nathan Lynch, Felipe Balbi, Rik van Riel, Paul E. McKenney,
	Tony Lindgren, Linux USB Mailing List, Linux Kernel Mailing List,
	josh, Rabin Vincent, Alan Stern, Johannes Weiner, Sasha Levin,
	Andrew Morton, Linux OMAP Mailing List, Linus Torvalds,
	Linux ARM Kernel Mailing List

On Sat, Oct 11, 2014 at 11:54:32AM +0800, Peter Chen wrote:
> On Fri, Oct 10, 2014 at 08:44:33PM -0500, Nathan Lynch wrote:
> > On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote:
> > > 
> > > Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and
> > > it seems that this has been known about for some time.)
> > 
> > Looking at http://gcc.gnu.org/PR58854 it seems that all 4.8.x for x < 3
> > are affected, as well as 4.9.0.
> > 
> > > We can blacklist these GCC versions quite easily.  We already have GCC
> > > 3.3 blacklisted, and it's trivial to add others.  I would want to include
> > > some proper details about the bug, just like the other existing entries
> > > we already have in asm-offsets.c, where we name the functions that the
> > > compiler is known to break where appropriate.
> > 
> > Before blacklisting anything, it's worth considering that simple version
> > checks would break existing pre-4.8.3 compilers that have been patched
> > for PR58854.  It looks like Yocto and Buildroot issued releases with
> > patched 4.8.2 compilers well before the (fixed) 4.8.3 release.  I think
> > the most we can reasonably do without breaking some correctly-behaving
> > toolchains is to emit a warning.
> 
> Yocto has PR58854 problem patch.
> 
> http://git.yoctoproject.org/cgit/cgit.cgi/poky/tree/meta/recipes-devtools/gcc/gcc-4.8/0048-PR58854_fix_arm_apcs_epilogue.patch?h=daisy

Right, and we can provide links to these in the comments above the #error
so people have the right places to do a bit of research into whether their
compiler is safe.

It is unfortunate that they are indistinguishable from the broken versions,
but that's really a distro problem for causing that issue themselves -
especially given how serious this bug is.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: RCU bug with v3.17-rc3 ?
  2014-10-11 14:16                                 ` Russell King - ARM Linux
@ 2014-10-11 14:51                                   ` Otavio Salvador
  2014-10-11 18:15                                     ` Peter Hurley
  0 siblings, 1 reply; 36+ messages in thread
From: Otavio Salvador @ 2014-10-11 14:51 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Peter Chen, Rik van Riel, Linux OMAP Mailing List, Tony Lindgren,
	Linux USB Mailing List, Nathan Lynch, Linux Kernel Mailing List,
	Felipe Balbi, Josh Triplett, Rabin Vincent, Alan Stern,
	Johannes Weiner, Sasha Levin, Andrew Morton, Paul E. McKenney,
	Linus Torvalds, Linux ARM Kernel Mailing List

Hello Russell,

On Sat, Oct 11, 2014 at 11:16 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Sat, Oct 11, 2014 at 11:54:32AM +0800, Peter Chen wrote:
>> On Fri, Oct 10, 2014 at 08:44:33PM -0500, Nathan Lynch wrote:
>> > On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote:
>> > >
>> > > Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and
>> > > it seems that this has been known about for some time.)
>> >
>> > Looking at http://gcc.gnu.org/PR58854 it seems that all 4.8.x for x < 3
>> > are affected, as well as 4.9.0.
>> >
>> > > We can blacklist these GCC versions quite easily.  We already have GCC
>> > > 3.3 blacklisted, and it's trivial to add others.  I would want to include
>> > > some proper details about the bug, just like the other existing entries
>> > > we already have in asm-offsets.c, where we name the functions that the
>> > > compiler is known to break where appropriate.
>> >
>> > Before blacklisting anything, it's worth considering that simple version
>> > checks would break existing pre-4.8.3 compilers that have been patched
>> > for PR58854.  It looks like Yocto and Buildroot issued releases with
>> > patched 4.8.2 compilers well before the (fixed) 4.8.3 release.  I think
>> > the most we can reasonably do without breaking some correctly-behaving
>> > toolchains is to emit a warning.
>>
>> Yocto has PR58854 problem patch.
>>
>> http://git.yoctoproject.org/cgit/cgit.cgi/poky/tree/meta/recipes-devtools/gcc/gcc-4.8/0048-PR58854_fix_arm_apcs_epilogue.patch?h=daisy
>
> Right, and we can provide links to these in the comments above the #error
> so people have the right places to do a bit of research into whether their
> compiler is safe.
>
> It is unfortunate that they are indistinguishable from the broken versions,
> but that's really a distro problem for causing that issue themselves -
> especially given how serious this bug is.

What about checking if GCC_PR58854_FIXED is not defined for error? So
build systems and people could easily define it if they know their GCC
has the fix applied.

-- 
Otavio Salvador                             O.S. Systems
http://www.ossystems.com.br        http://code.ossystems.com.br
Mobile: +55 (53) 9981-7854            Mobile: +1 (347) 903-9750

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: RCU bug with v3.17-rc3 ?
  2014-10-11 14:51                                   ` Otavio Salvador
@ 2014-10-11 18:15                                     ` Peter Hurley
  0 siblings, 0 replies; 36+ messages in thread
From: Peter Hurley @ 2014-10-11 18:15 UTC (permalink / raw)
  To: Otavio Salvador, Russell King - ARM Linux
  Cc: Peter Chen, Rik van Riel, Linux OMAP Mailing List, Tony Lindgren,
	Linux USB Mailing List, Nathan Lynch, Linux Kernel Mailing List,
	Felipe Balbi, Josh Triplett, Rabin Vincent, Alan Stern,
	Johannes Weiner, Sasha Levin, Andrew Morton, Paul E. McKenney,
	Linus Torvalds, Linux ARM Kernel Mailing List

On 10/11/2014 10:51 AM, Otavio Salvador wrote:
> Hello Russell,
> 
> On Sat, Oct 11, 2014 at 11:16 AM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
>> On Sat, Oct 11, 2014 at 11:54:32AM +0800, Peter Chen wrote:
>>> On Fri, Oct 10, 2014 at 08:44:33PM -0500, Nathan Lynch wrote:
>>>> On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote:
>>>>>
>>>>> Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and
>>>>> it seems that this has been known about for some time.)
>>>>
>>>> Looking at http://gcc.gnu.org/PR58854 it seems that all 4.8.x for x < 3
>>>> are affected, as well as 4.9.0.
>>>>
>>>>> We can blacklist these GCC versions quite easily.  We already have GCC
>>>>> 3.3 blacklisted, and it's trivial to add others.  I would want to include
>>>>> some proper details about the bug, just like the other existing entries
>>>>> we already have in asm-offsets.c, where we name the functions that the
>>>>> compiler is known to break where appropriate.
>>>>
>>>> Before blacklisting anything, it's worth considering that simple version
>>>> checks would break existing pre-4.8.3 compilers that have been patched
>>>> for PR58854.  It looks like Yocto and Buildroot issued releases with
>>>> patched 4.8.2 compilers well before the (fixed) 4.8.3 release.  I think
>>>> the most we can reasonably do without breaking some correctly-behaving
>>>> toolchains is to emit a warning.
>>>
>>> Yocto has PR58854 problem patch.
>>>
>>> http://git.yoctoproject.org/cgit/cgit.cgi/poky/tree/meta/recipes-devtools/gcc/gcc-4.8/0048-PR58854_fix_arm_apcs_epilogue.patch?h=daisy
>>
>> Right, and we can provide links to these in the comments above the #error
>> so people have the right places to do a bit of research into whether their
>> compiler is safe.
>>
>> It is unfortunate that they are indistinguishable from the broken versions,
>> but that's really a distro problem for causing that issue themselves -
>> especially given how serious this bug is.
> 
> What about checking if GCC_PR58854_FIXED is not defined for error? So
> build systems and people could easily define it if they know their GCC
> has the fix applied.

If the distro/build system/individual is capable of patching gcc, then it
seems reasonable that the same distro/build system/individual is capable
of carrying a patch on top of mainline kernel for building with their
"special" compiler.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: RCU bug with v3.17-rc3 ?
  2014-10-11  1:44                             ` Nathan Lynch
                                                 ` (2 preceding siblings ...)
  2014-10-11 14:14                               ` Russell King - ARM Linux
@ 2014-10-11 19:27                               ` Nathan Lynch
  2014-10-13  9:11                               ` David Laight
  4 siblings, 0 replies; 36+ messages in thread
From: Nathan Lynch @ 2014-10-11 19:27 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Rik van Riel, Linux OMAP Mailing List, Tony Lindgren,
	Linux USB Mailing List, Linux Kernel Mailing List, Felipe Balbi,
	josh, Rabin Vincent, Alan Stern, Johannes Weiner, Sasha Levin,
	Andrew Morton, Paul E. McKenney, Linus Torvalds,
	Linux ARM Kernel Mailing List

On 10/10/2014 08:44 PM, Nathan Lynch wrote:
> On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote:
>>
>> Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and
>> it seems that this has been known about for some time.)
> 
> Looking at http://gcc.gnu.org/PR58854 it seems that all 4.8.x for x < 3
> are affected, as well as 4.9.0.

Correction -- 4.9.0 has this fixed, even though the GCC PR shows it as a
"known to fail" version.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: RCU bug with v3.17-rc3 ?
  2014-10-11  1:44                             ` Nathan Lynch
                                                 ` (3 preceding siblings ...)
  2014-10-11 19:27                               ` Nathan Lynch
@ 2014-10-13  9:11                               ` David Laight
  2014-10-13 11:43                                 ` Russell King - ARM Linux
  4 siblings, 1 reply; 36+ messages in thread
From: David Laight @ 2014-10-13  9:11 UTC (permalink / raw)
  To: 'Nathan Lynch', Russell King - ARM Linux
  Cc: Felipe Balbi, Rik van Riel, Paul E. McKenney, Tony Lindgren,
	Linux USB Mailing List, Linux Kernel Mailing List, josh,
	Rabin Vincent, Alan Stern, Johannes Weiner, Sasha Levin,
	Andrew Morton, Linux OMAP Mailing List, Linus Torvalds,
	Linux ARM Kernel Mailing List

From: Nathan Lynch
> On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote:
> >
> > Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and
> > it seems that this has been known about for some time.)
> 
> Looking at http://gcc.gnu.org/PR58854 it seems that all 4.8.x for x < 3
> are affected, as well as 4.9.0.
> 
> > We can blacklist these GCC versions quite easily.  We already have GCC
> > 3.3 blacklisted, and it's trivial to add others.  I would want to include
> > some proper details about the bug, just like the other existing entries
> > we already have in asm-offsets.c, where we name the functions that the
> > compiler is known to break where appropriate.
> 
> Before blacklisting anything, it's worth considering that simple version
> checks would break existing pre-4.8.3 compilers that have been patched
> for PR58854.  It looks like Yocto and Buildroot issued releases with
> patched 4.8.2 compilers well before the (fixed) 4.8.3 release.  I think
> the most we can reasonably do without breaking some correctly-behaving
> toolchains is to emit a warning.

Is it possible to compile a small code fragment and check the generated
code for the bug?
Possibly predicated on the broken version number to avoid false positives.

	David




^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: RCU bug with v3.17-rc3 ?
  2014-10-13  9:11                               ` David Laight
@ 2014-10-13 11:43                                 ` Russell King - ARM Linux
  2014-10-14  2:06                                   ` Greg KH
  0 siblings, 1 reply; 36+ messages in thread
From: Russell King - ARM Linux @ 2014-10-13 11:43 UTC (permalink / raw)
  To: David Laight
  Cc: 'Nathan Lynch',
	Felipe Balbi, Rik van Riel, Paul E. McKenney, Tony Lindgren,
	Linux USB Mailing List, Linux Kernel Mailing List, josh,
	Rabin Vincent, Alan Stern, Johannes Weiner, Sasha Levin,
	Andrew Morton, Linux OMAP Mailing List, Linus Torvalds,
	Linux ARM Kernel Mailing List

On Mon, Oct 13, 2014 at 09:11:34AM +0000, David Laight wrote:
> From: Nathan Lynch
> > On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote:
> > >
> > > Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and
> > > it seems that this has been known about for some time.)
> > 
> > Looking at http://gcc.gnu.org/PR58854 it seems that all 4.8.x for x < 3
> > are affected, as well as 4.9.0.
> > 
> > > We can blacklist these GCC versions quite easily.  We already have GCC
> > > 3.3 blacklisted, and it's trivial to add others.  I would want to include
> > > some proper details about the bug, just like the other existing entries
> > > we already have in asm-offsets.c, where we name the functions that the
> > > compiler is known to break where appropriate.
> > 
> > Before blacklisting anything, it's worth considering that simple version
> > checks would break existing pre-4.8.3 compilers that have been patched
> > for PR58854.  It looks like Yocto and Buildroot issued releases with
> > patched 4.8.2 compilers well before the (fixed) 4.8.3 release.  I think
> > the most we can reasonably do without breaking some correctly-behaving
> > toolchains is to emit a warning.
> 
> Is it possible to compile a small code fragment and check the generated
> code for the bug?
> Possibly predicated on the broken version number to avoid false positives.

I don't see how - it looks like it requires an interrupt to occur at an
opportune moment to provoke the function to fail.  The alternative would
be to parse the assembly generated by the compiler to determine how it
is dealing with the stack.

I think the only viable solution here is that:

1. We blacklist the bad compiler versions outright in the kernel.
2. We /consider/ a testing a preprocessor symbol which when present
   indicates that these versions are fixed and should not be blacklisted.

The argument for (2) is that /if/ distros want to patch their compilers
to fix the problem, they /also/ have the ability to patch their compilers
to make them identifyable, and that is a far more reliable solution than
trying to parse the assembly output from multiple different GCC versions.

Remember, it's the distro's choice to fix these buggy compilers, so the
onus is on _them_ to deal with the mess they've created by doing so.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: RCU bug with v3.17-rc3 ?
  2014-10-13 11:43                                 ` Russell King - ARM Linux
@ 2014-10-14  2:06                                   ` Greg KH
  2014-10-14 10:27                                     ` Peter Hurley
  2014-10-15 21:23                                     ` Russell King - ARM Linux
  0 siblings, 2 replies; 36+ messages in thread
From: Greg KH @ 2014-10-14  2:06 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: David Laight, 'Nathan Lynch',
	Felipe Balbi, Rik van Riel, Paul E. McKenney, Tony Lindgren,
	Linux USB Mailing List, Linux Kernel Mailing List, josh,
	Rabin Vincent, Alan Stern, Johannes Weiner, Sasha Levin,
	Andrew Morton, Linux OMAP Mailing List, Linus Torvalds,
	Linux ARM Kernel Mailing List

On Mon, Oct 13, 2014 at 12:43:07PM +0100, Russell King - ARM Linux wrote:
> On Mon, Oct 13, 2014 at 09:11:34AM +0000, David Laight wrote:
> > From: Nathan Lynch
> > > On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote:
> > > >
> > > > Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and
> > > > it seems that this has been known about for some time.)
> > > 
> > > Looking at http://gcc.gnu.org/PR58854 it seems that all 4.8.x for x < 3
> > > are affected, as well as 4.9.0.
> > > 
> > > > We can blacklist these GCC versions quite easily.  We already have GCC
> > > > 3.3 blacklisted, and it's trivial to add others.  I would want to include
> > > > some proper details about the bug, just like the other existing entries
> > > > we already have in asm-offsets.c, where we name the functions that the
> > > > compiler is known to break where appropriate.
> > > 
> > > Before blacklisting anything, it's worth considering that simple version
> > > checks would break existing pre-4.8.3 compilers that have been patched
> > > for PR58854.  It looks like Yocto and Buildroot issued releases with
> > > patched 4.8.2 compilers well before the (fixed) 4.8.3 release.  I think
> > > the most we can reasonably do without breaking some correctly-behaving
> > > toolchains is to emit a warning.
> > 
> > Is it possible to compile a small code fragment and check the generated
> > code for the bug?
> > Possibly predicated on the broken version number to avoid false positives.
> 
> I don't see how - it looks like it requires an interrupt to occur at an
> opportune moment to provoke the function to fail.  The alternative would
> be to parse the assembly generated by the compiler to determine how it
> is dealing with the stack.
> 
> I think the only viable solution here is that:
> 
> 1. We blacklist the bad compiler versions outright in the kernel.

Yes, please do this, it's what we have done for other buggy compiler
versions, no need to do something different here.

> Remember, it's the distro's choice to fix these buggy compilers, so the
> onus is on _them_ to deal with the mess they've created by doing so.

I totally agree.

Is someone going to send this patch, or do I have to write it myself?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: RCU bug with v3.17-rc3 ?
  2014-10-14  2:06                                   ` Greg KH
@ 2014-10-14 10:27                                     ` Peter Hurley
  2014-10-15 21:23                                     ` Russell King - ARM Linux
  1 sibling, 0 replies; 36+ messages in thread
From: Peter Hurley @ 2014-10-14 10:27 UTC (permalink / raw)
  To: Greg KH, Russell King - ARM Linux
  Cc: David Laight, 'Nathan Lynch',
	Felipe Balbi, Rik van Riel, Paul E. McKenney, Tony Lindgren,
	Linux USB Mailing List, Linux Kernel Mailing List, josh,
	Rabin Vincent, Alan Stern, Johannes Weiner, Sasha Levin,
	Andrew Morton, Linux OMAP Mailing List, Linus Torvalds,
	Linux ARM Kernel Mailing List

On 10/13/2014 10:06 PM, Greg KH wrote:
> On Mon, Oct 13, 2014 at 12:43:07PM +0100, Russell King - ARM Linux wrote:
>> On Mon, Oct 13, 2014 at 09:11:34AM +0000, David Laight wrote:
>>> From: Nathan Lynch
>>>> On 10/10/2014 11:25 AM, Russell King - ARM Linux wrote:
>>>>>
>>>>> Right, so GCC 4.8.{1,2} are totally unsuitable for kernel building (and
>>>>> it seems that this has been known about for some time.)
>>>>
>>>> Looking at http://gcc.gnu.org/PR58854 it seems that all 4.8.x for x < 3
>>>> are affected, as well as 4.9.0.
>>>>
>>>>> We can blacklist these GCC versions quite easily.  We already have GCC
>>>>> 3.3 blacklisted, and it's trivial to add others.  I would want to include
>>>>> some proper details about the bug, just like the other existing entries
>>>>> we already have in asm-offsets.c, where we name the functions that the
>>>>> compiler is known to break where appropriate.
>>>>
>>>> Before blacklisting anything, it's worth considering that simple version
>>>> checks would break existing pre-4.8.3 compilers that have been patched
>>>> for PR58854.  It looks like Yocto and Buildroot issued releases with
>>>> patched 4.8.2 compilers well before the (fixed) 4.8.3 release.  I think
>>>> the most we can reasonably do without breaking some correctly-behaving
>>>> toolchains is to emit a warning.
>>>
>>> Is it possible to compile a small code fragment and check the generated
>>> code for the bug?
>>> Possibly predicated on the broken version number to avoid false positives.
>>
>> I don't see how - it looks like it requires an interrupt to occur at an
>> opportune moment to provoke the function to fail.  The alternative would
>> be to parse the assembly generated by the compiler to determine how it
>> is dealing with the stack.
>>
>> I think the only viable solution here is that:
>>
>> 1. We blacklist the bad compiler versions outright in the kernel.
> 
> Yes, please do this, it's what we have done for other buggy compiler
> versions, no need to do something different here.
> 
>> Remember, it's the distro's choice to fix these buggy compilers, so the
>> onus is on _them_ to deal with the mess they've created by doing so.
> 
> I totally agree.
> 
> Is someone going to send this patch, or do I have to write it myself?

I did on Friday (arm: Blacklist gcc 4.8.[012] ...) but Russell said he
was doing it himself.

Regards,
Peter Hurley


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: RCU bug with v3.17-rc3 ?
  2014-10-14  2:06                                   ` Greg KH
  2014-10-14 10:27                                     ` Peter Hurley
@ 2014-10-15 21:23                                     ` Russell King - ARM Linux
  2014-10-15 21:25                                       ` Russell King - ARM Linux
  1 sibling, 1 reply; 36+ messages in thread
From: Russell King - ARM Linux @ 2014-10-15 21:23 UTC (permalink / raw)
  To: Greg KH
  Cc: David Laight, 'Nathan Lynch',
	Felipe Balbi, Rik van Riel, Paul E. McKenney, Tony Lindgren,
	Linux USB Mailing List, Linux Kernel Mailing List, josh,
	Rabin Vincent, Alan Stern, Johannes Weiner, Sasha Levin,
	Andrew Morton, Linux OMAP Mailing List, Linus Torvalds,
	Linux ARM Kernel Mailing List

On Tue, Oct 14, 2014 at 04:06:40AM +0200, Greg KH wrote:
> On Mon, Oct 13, 2014 at 12:43:07PM +0100, Russell King - ARM Linux wrote:
> > I think the only viable solution here is that:
> > 
> > 1. We blacklist the bad compiler versions outright in the kernel.
> 
> Yes, please do this, it's what we have done for other buggy compiler
> versions, no need to do something different here.
> 
> > Remember, it's the distro's choice to fix these buggy compilers, so the
> > onus is on _them_ to deal with the mess they've created by doing so.
> 
> I totally agree.
> 
> Is someone going to send this patch, or do I have to write it myself?

As I said, I have a patch in progress, but it seems that there needed
to be some discussion about exactly which compiler versions are affected.
It seems that it's not as trivial as looking at the GCC bug entry.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: RCU bug with v3.17-rc3 ?
  2014-10-15 21:23                                     ` Russell King - ARM Linux
@ 2014-10-15 21:25                                       ` Russell King - ARM Linux
  2014-10-19  9:54                                         ` Russell King - ARM Linux
  0 siblings, 1 reply; 36+ messages in thread
From: Russell King - ARM Linux @ 2014-10-15 21:25 UTC (permalink / raw)
  To: Greg KH
  Cc: David Laight, 'Nathan Lynch',
	Felipe Balbi, Rik van Riel, Paul E. McKenney, Tony Lindgren,
	Linux USB Mailing List, Linux Kernel Mailing List, josh,
	Rabin Vincent, Alan Stern, Johannes Weiner, Sasha Levin,
	Andrew Morton, Linux OMAP Mailing List, Linus Torvalds,
	Linux ARM Kernel Mailing List

On Wed, Oct 15, 2014 at 10:23:10PM +0100, Russell King - ARM Linux wrote:
> As I said, I have a patch in progress, but it seems that there needed
> to be some discussion about exactly which compiler versions are affected.
> It seems that it's not as trivial as looking at the GCC bug entry.

... and in any case, it has been a known bug for well over a year now,
and it seems that it doesn't affect _that_ many people.  So taking some
extra time to get it properly correct is the _right_ thing to do.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: RCU bug with v3.17-rc3 ?
  2014-10-15 21:25                                       ` Russell King - ARM Linux
@ 2014-10-19  9:54                                         ` Russell King - ARM Linux
  2014-10-19 15:28                                           ` Felipe Balbi
  0 siblings, 1 reply; 36+ messages in thread
From: Russell King - ARM Linux @ 2014-10-19  9:54 UTC (permalink / raw)
  To: Greg KH
  Cc: Rik van Riel, Linux OMAP Mailing List, Tony Lindgren,
	Linux USB Mailing List, 'Nathan Lynch',
	Linux Kernel Mailing List, Felipe Balbi, josh, Rabin Vincent,
	David Laight, Alan Stern, Johannes Weiner, Sasha Levin,
	Andrew Morton, Paul E. McKenney, Linus Torvalds,
	Linux ARM Kernel Mailing List

On Wed, Oct 15, 2014 at 10:25:13PM +0100, Russell King - ARM Linux wrote:
> On Wed, Oct 15, 2014 at 10:23:10PM +0100, Russell King - ARM Linux wrote:
> > As I said, I have a patch in progress, but it seems that there needed
> > to be some discussion about exactly which compiler versions are affected.
> > It seems that it's not as trivial as looking at the GCC bug entry.
> 
> ... and in any case, it has been a known bug for well over a year now,
> and it seems that it doesn't affect _that_ many people.  So taking some
> extra time to get it properly correct is the _right_ thing to do.

Well, this is just great.  Pushing out the change which blacklists these
compilers takes out Olof's kernel build system...

Things are not as trivial as they seem.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: RCU bug with v3.17-rc3 ?
  2014-10-19  9:54                                         ` Russell King - ARM Linux
@ 2014-10-19 15:28                                           ` Felipe Balbi
  2014-10-19 20:48                                             ` Olof Johansson
  0 siblings, 1 reply; 36+ messages in thread
From: Felipe Balbi @ 2014-10-19 15:28 UTC (permalink / raw)
  To: Russell King - ARM Linux, Olof Johansson
  Cc: Greg KH, Rik van Riel, Linux OMAP Mailing List, Tony Lindgren,
	Linux USB Mailing List, 'Nathan Lynch',
	Linux Kernel Mailing List, Felipe Balbi, josh, Rabin Vincent,
	David Laight, Alan Stern, Johannes Weiner, Sasha Levin,
	Andrew Morton, Paul E. McKenney, Linus Torvalds,
	Linux ARM Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 950 bytes --]

Hi,

On Sun, Oct 19, 2014 at 10:54:16AM +0100, Russell King - ARM Linux wrote:
> On Wed, Oct 15, 2014 at 10:25:13PM +0100, Russell King - ARM Linux wrote:
> > On Wed, Oct 15, 2014 at 10:23:10PM +0100, Russell King - ARM Linux wrote:
> > > As I said, I have a patch in progress, but it seems that there needed
> > > to be some discussion about exactly which compiler versions are affected.
> > > It seems that it's not as trivial as looking at the GCC bug entry.
> > 
> > ... and in any case, it has been a known bug for well over a year now,
> > and it seems that it doesn't affect _that_ many people.  So taking some
> > extra time to get it properly correct is the _right_ thing to do.
> 
> Well, this is just great.  Pushing out the change which blacklists these
> compilers takes out Olof's kernel build system...
> 
> Things are not as trivial as they seem.

Maybe Olof just needs to update his compiler. Olof ?

-- 
balbi

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: RCU bug with v3.17-rc3 ?
  2014-10-19 15:28                                           ` Felipe Balbi
@ 2014-10-19 20:48                                             ` Olof Johansson
  0 siblings, 0 replies; 36+ messages in thread
From: Olof Johansson @ 2014-10-19 20:48 UTC (permalink / raw)
  To: Felipe Balbi
  Cc: Russell King - ARM Linux, Greg KH, Rik van Riel,
	Linux OMAP Mailing List, Tony Lindgren, Linux USB Mailing List,
	Nathan Lynch, Linux Kernel Mailing List, josh, Rabin Vincent,
	David Laight, Alan Stern, Johannes Weiner, Sasha Levin,
	Andrew Morton, Paul E. McKenney, Linus Torvalds,
	Linux ARM Kernel Mailing List

On Sun, Oct 19, 2014 at 8:28 AM, Felipe Balbi <balbi@ti.com> wrote:
> Hi,
>
> On Sun, Oct 19, 2014 at 10:54:16AM +0100, Russell King - ARM Linux wrote:
>> On Wed, Oct 15, 2014 at 10:25:13PM +0100, Russell King - ARM Linux wrote:
>> > On Wed, Oct 15, 2014 at 10:23:10PM +0100, Russell King - ARM Linux wrote:
>> > > As I said, I have a patch in progress, but it seems that there needed
>> > > to be some discussion about exactly which compiler versions are affected.
>> > > It seems that it's not as trivial as looking at the GCC bug entry.
>> >
>> > ... and in any case, it has been a known bug for well over a year now,
>> > and it seems that it doesn't affect _that_ many people.  So taking some
>> > extra time to get it properly correct is the _right_ thing to do.
>>
>> Well, this is just great.  Pushing out the change which blacklists these
>> compilers takes out Olof's kernel build system...
>>
>> Things are not as trivial as they seem.
>
> Maybe Olof just needs to update his compiler. Olof ?

Yep, doing a run with 4.9.1 to see how it looks. In the past, 4.9 has
been really noisy with warnings, maybe most of them have been fixed by
now.


-Olof

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2014-10-19 20:48 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-04 18:40 RCU bug with v3.17-rc3 ? Felipe Balbi
2014-09-04 19:16 ` Paul E. McKenney
2014-09-04 19:25   ` Felipe Balbi
2014-09-04 20:04     ` Felipe Balbi
2014-09-05 21:32       ` Paul E. McKenney
2014-10-08 17:13         ` Felipe Balbi
2014-10-08 17:57           ` Felipe Balbi
2014-10-08 21:29             ` Felipe Balbi
2014-10-09 16:01               ` Johannes Weiner
2014-10-09 16:26                 ` Felipe Balbi
2014-10-09 20:35                   ` Felipe Balbi
2014-10-09 20:41                   ` Rabin Vincent
2014-10-09 20:46                     ` Felipe Balbi
2014-10-09 21:07                       ` Felipe Balbi
2014-10-10 13:57                         ` Felipe Balbi
2014-10-10 16:25                           ` Russell King - ARM Linux
2014-10-11  1:44                             ` Nathan Lynch
2014-10-11  2:40                               ` Peter Hurley
2014-10-11  3:54                               ` Peter Chen
2014-10-11 14:16                                 ` Russell King - ARM Linux
2014-10-11 14:51                                   ` Otavio Salvador
2014-10-11 18:15                                     ` Peter Hurley
2014-10-11 14:14                               ` Russell King - ARM Linux
2014-10-11 19:27                               ` Nathan Lynch
2014-10-13  9:11                               ` David Laight
2014-10-13 11:43                                 ` Russell King - ARM Linux
2014-10-14  2:06                                   ` Greg KH
2014-10-14 10:27                                     ` Peter Hurley
2014-10-15 21:23                                     ` Russell King - ARM Linux
2014-10-15 21:25                                       ` Russell King - ARM Linux
2014-10-19  9:54                                         ` Russell King - ARM Linux
2014-10-19 15:28                                           ` Felipe Balbi
2014-10-19 20:48                                             ` Olof Johansson
2014-10-09 21:47                     ` Aaro Koskinen
2014-10-10 16:18                       ` Russell King - ARM Linux
2014-10-10 20:52                         ` Aaro Koskinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).