All of lore.kernel.org
 help / color / mirror / Atom feed
* Boot failure on Arndale with next-20131105
@ 2013-11-05 11:49 Tushar Behera
  2013-11-05 16:42 ` Tomasz Figa
  2013-11-05 19:33 ` Jens Axboe
  0 siblings, 2 replies; 17+ messages in thread
From: Tushar Behera @ 2013-11-05 11:49 UTC (permalink / raw)
  To: linux-next, lkml; +Cc: Jens Axboe, Chris Mason

Hi,

We are having a boot-time kernel panic on Samsung's Exynos5250-based
Arndale board with next-20131105. Bisect points to following commit.

<<<
commit febca1baea1cfe2d7a0271385d89b03d5fb34f94
Author: Chris Mason <chris.mason@fusionio.com>
Date:   Thu Oct 31 13:32:42 2013 -0600

    block: setup bi_vcnt on clones

    commit 9fc6286f347d changed the cloning code to make clones cheaper for
    the case where we don't need to clone the iovec array.  But,
    the new clone needs the bi_vnct from the original.

    Signed-off-by: Chris Mason <chris.mason@fusionio.com>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>>

Reverting above commit, Arndale is able to boot again.

Excerpts from the boot log (just in case, it helps in debugging).

[    1.972062] Unable to handle kernel paging request at virtual
address 025e63a0
[    1.981164] pgd = c0004000
[    1.982375] [025e63a0] *pgd=00000000
[    1.985875] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[    1.991086] Modules linked in:
[    1.994076] CPU: 0 PID: 1178 Comm: mmcqd/0 Not tainted
3.12.0-rc5-00051-gfebca1b #21
[    2.001683] task: ef3530c0 ti: ee82e000 task.ti: ee82e000
[    2.006981] PC is at dma_cache_maint_page+0x84/0x174
[    2.011842] LR is at 0x6

[    2.043532] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM
Segment kernel
[    2.050708] Control: 10c5387d  Table: 4000406a  DAC: 00000015
[    2.056342] Process mmcqd/0 (pid: 1178, stack limit = 0xee82e240)
[    2.062321] Stack: (0xee82fd58 to 0xee830000)

[ ... ]

[    2.275352] [<c0015768>] (dma_cache_maint_page+0x84/0x174) from
[<c0015880>] (__dma_page_cpu_to_dev+0x28/0xa0)
[    2.285170] [<c0015880>] (__dma_page_cpu_to_dev+0x28/0xa0) from
[<c00159a4>] (arm_dma_map_page+0x6c/0x70)
[    2.294565] [<c00159a4>] (arm_dma_map_page+0x6c/0x70) from
[<c0015d28>] (arm_dma_map_sg+0x74/0xec)
[    2.303366] [<c0015d28>] (arm_dma_map_sg+0x74/0xec) from
[<c02bf534>] (dw_mci_pre_dma_transfer.isra.16+0x124/0x15c)
[    2.313614] [<c02bf534>]
(dw_mci_pre_dma_transfer.isra.16+0x124/0x15c) from [<c02bf8d4>]
(dw_mci_pre_req+0x44/0x50)
[    2.323863] [<c02bf8d4>] (dw_mci_pre_req+0x44/0x50) from
[<c02a8970>] (mmc_start_req+0x3c/0x39c)
[    2.332486] [<c02a8970>] (mmc_start_req+0x3c/0x39c) from
[<c02b606c>] (mmc_blk_issue_rw_rq+0xbc/0xa9c)
[    2.341625] [<c02b606c>] (mmc_blk_issue_rw_rq+0xbc/0xa9c) from
[<c02b6c14>] (mmc_blk_issue_rq+0x1c8/0x498)
[    2.351106] [<c02b6c14>] (mmc_blk_issue_rq+0x1c8/0x498) from
[<c02b75d4>] (mmc_queue_thread+0xa4/0x144)
[    2.360331] [<c02b75d4>] (mmc_queue_thread+0xa4/0x144) from
[<c0038614>] (kthread+0xb4/0xb8)
[    2.368616] [<c0038614>] (kthread+0xb4/0xb8) from [<c000e2f8>]
(ret_from_fork+0x14/0x3c)
[    2.376556] Code: 17e81051 10822181 e592c000 e3ccc003 (e79c2007)
[    2.382570] ---[ end trace df06b64b1b7fa443 ]---

[ ... ]

Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done.
Gave up waiting for root device.  Common problems:
 - Boot args (cat /proc/cmdline)
   - Check rootdelay= (did the system wait long enough?)
   - Check root= (did the system wait for the right device?)
 - Missing modules (cat /proc/modules; ls /dev)
ALERT!  /dev/mmcblk1p3 does not exist.  Dropping to a shell!
FATAL: Could not load
/lib/modules/3.12.0-rc5-00051-gfebca1b/modules.dep: No such file or
directory
FATAL: Could not load
/lib/modules/3.12.0-rc5-00051-gfebca1b/modules.dep: No such file or
directory


-- 
Tushar Behera

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Boot failure on Arndale with next-20131105
  2013-11-05 11:49 Boot failure on Arndale with next-20131105 Tushar Behera
@ 2013-11-05 16:42 ` Tomasz Figa
  2013-11-05 17:38   ` Stephen Warren
  2013-11-05 19:59   ` Jens Axboe
  2013-11-05 19:33 ` Jens Axboe
  1 sibling, 2 replies; 17+ messages in thread
From: Tomasz Figa @ 2013-11-05 16:42 UTC (permalink / raw)
  To: Tushar Behera; +Cc: linux-next, lkml, Jens Axboe, Chris Mason

Hi,

On Tuesday 05 of November 2013 17:19:00 Tushar Behera wrote:
> Hi,
> 
> We are having a boot-time kernel panic on Samsung's Exynos5250-based
> Arndale board with next-20131105. Bisect points to following commit.
> 
> <<<
> commit febca1baea1cfe2d7a0271385d89b03d5fb34f94
> Author: Chris Mason <chris.mason@fusionio.com>
> Date:   Thu Oct 31 13:32:42 2013 -0600
> 
>     block: setup bi_vcnt on clones
> 
>     commit 9fc6286f347d changed the cloning code to make clones cheaper for
>     the case where we don't need to clone the iovec array.  But,
>     the new clone needs the bi_vnct from the original.
> 
>     Signed-off-by: Chris Mason <chris.mason@fusionio.com>
>     Signed-off-by: Jens Axboe <axboe@kernel.dk>
> >>>
> 
> Reverting above commit, Arndale is able to boot again.

I can confirm exactly the same behavior on Exynos 4210-based Trats board,
with exactly the same bisection results.

Also note that I spotted multiple build failures in block layer during
the bisection.

# skip: [c198aee7e9a801d32cee4607453871cff7c43e6c] ceph: Convert to immutable biovecs
git bisect skip c198aee7e9a801d32cee4607453871cff7c43e6c
# skip: [3d75d579a04be023552b45f791cd95f5b6a45ba6] block: Kill bio_segments()/bi_vcnt usage
git bisect skip 3d75d579a04be023552b45f791cd95f5b6a45ba6
# skip: [f2da8e013088387e5e61930b715ff0defea9aa58] aoe: Convert to immutable biovecs
git bisect skip f2da8e013088387e5e61930b715ff0defea9aa58
# skip: [44931ee84c6362ec8d9b97b02432760035a2b639] block: Kill bio_pair_split()
git bisect skip 44931ee84c6362ec8d9b97b02432760035a2b639
# skip: [d4fbf2c24290f237cf5989d8e4c8507969ae2299] rbd: Refactor bio cloning, don't clone biovecs
git bisect skip d4fbf2c24290f237cf5989d8e4c8507969ae2299
# skip: [cc4067bd8adeb5507829b7ae8f17211aab5d1e9d] block: Kill bio_iovec_idx(), __bio_iovec()
git bisect skip cc4067bd8adeb5507829b7ae8f17211aab5d1e9d
# skip: [a040a44b1c2b56fed3ebef3734681b6fe473fd33] dm: Refactor for new bio cloning/splitting
git bisect skip a040a44b1c2b56fed3ebef3734681b6fe473fd33
# skip: [948809ba161cce4060977970e1133a66fffc3449] block: Introduce new bio_split()
git bisect skip 948809ba161cce4060977970e1133a66fffc3449
# skip: [7e814b148e1127be7c32bb438ceaadb0b6e33042] block: Remove bi_idx hacks
git bisect skip 7e814b148e1127be7c32bb438ceaadb0b6e33042
# skip: [3dbdffcc4c1ffb7d7ac631be55cd5aab3b258614] block: Immutable bio vecs
git bisect skip 3dbdffcc4c1ffb7d7ac631be55cd5aab3b258614
# skip: [919b8823a6ef27103fe3abd05026f87ad85ed1ad] block: Convert drivers to immutable biovecs
git bisect skip 919b8823a6ef27103fe3abd05026f87ad85ed1ad
# skip: [5fbc9c23b291ac8d8ffe73cbc55cd7cb9c57fd04] block: Convert bio_copy_data() to bvec_iter
git bisect skip 5fbc9c23b291ac8d8ffe73cbc55cd7cb9c57fd04
# skip: [2771aecc0cc33d70747c8335239c20c9ff87ac67] block: Generic bio chaining
git bisect skip 2771aecc0cc33d70747c8335239c20c9ff87ac67
# skip: [9fc6286f347d00528adcdcf12396d220f47492ed] block: Don't save/copy bvec array anymore, share when cloning
git bisect skip 9fc6286f347d00528adcdcf12396d220f47492ed
# skip: [85bf1bd38f53e93712a149a8c31abe6936494d64] block: Rename bio_split() -> bio_pair_split()
git bisect skip 85bf1bd38f53e93712a149a8c31abe6936494d64
# skip: [5d1f127c3e0c57d64ce75ee04a0db2b40a3e21df] block: Convert bio_for_each_segment() to bvec_iter
git bisect skip 5d1f127c3e0c57d64ce75ee04a0db2b40a3e21df
# skip: [eb225c28a0b3f730b50f096946aee5eef2cb9969] bio-integrity: Convert to bvec_iter
git bisect skip eb225c28a0b3f730b50f096946aee5eef2cb9969

Full bisect log:

# bad: [98dd2f31c585ddcfb78ce14f8d0efcb52e5ed2e9] Add linux-next specific files for 20131105
# good: [355e62f5ad12b005c862838156262eb2df2f8dff] of/irq: Fix potential buffer overflow
git bisect start 'v3.13-sdhci-fail' '355e62f'
# good: [c2e1895eb0564667394c28e1ecd772ee6a27ea54] Merge remote-tracking branch 'crypto/master'
git bisect good c2e1895eb0564667394c28e1ecd772ee6a27ea54
# bad: [7f1546329db7b573f3a640d75cc1af40dc5ee9ed] Merge remote-tracking branch 'tip/auto-latest'
git bisect bad 7f1546329db7b573f3a640d75cc1af40dc5ee9ed
# bad: [07d76d00209c960eb8bcce9dfdf36e7edd458da3] Merge remote-tracking branch 'md/for-next'
git bisect bad 07d76d00209c960eb8bcce9dfdf36e7edd458da3
# good: [36753aaf7758b2089a55b3e67e6f1a9242462bb4] Merge remote-tracking branch 'drm-tegra/drm/for-next'
git bisect good 36753aaf7758b2089a55b3e67e6f1a9242462bb4
# good: [ca5f026efedeb01287863a9c7e1d5fdaf82d196d] Merge remote-tracking branch 'virtio/virtio-next'
git bisect good ca5f026efedeb01287863a9c7e1d5fdaf82d196d
# bad: [67b89a119b28377ced0ea844aed51f74976db36b] Merge remote-tracking branch 'block/for-next'
git bisect bad 67b89a119b28377ced0ea844aed51f74976db36b
# bad: [26f584573c613d2a7292d8c66dc063ae2bece90a] Merge branch 'for-3.13/core' into for-next
git bisect bad 26f584573c613d2a7292d8c66dc063ae2bece90a
# bad: [0023432f72015803e050e381f12a724e59eded74] dm: fix missing bi_remaining accounting
git bisect bad 0023432f72015803e050e381f12a724e59eded74
# good: [8b6df54182c8c775f346a0703ccb4c531c18a8f0] block: Use rw_copy_check_uvector()
git bisect good 8b6df54182c8c775f346a0703ccb4c531c18a8f0
# skip: [c198aee7e9a801d32cee4607453871cff7c43e6c] ceph: Convert to immutable biovecs
git bisect skip c198aee7e9a801d32cee4607453871cff7c43e6c
# skip: [3d75d579a04be023552b45f791cd95f5b6a45ba6] block: Kill bio_segments()/bi_vcnt usage
git bisect skip 3d75d579a04be023552b45f791cd95f5b6a45ba6
# bad: [febca1baea1cfe2d7a0271385d89b03d5fb34f94] block: setup bi_vcnt on clones
git bisect bad febca1baea1cfe2d7a0271385d89b03d5fb34f94
# skip: [f2da8e013088387e5e61930b715ff0defea9aa58] aoe: Convert to immutable biovecs
git bisect skip f2da8e013088387e5e61930b715ff0defea9aa58
# skip: [44931ee84c6362ec8d9b97b02432760035a2b639] block: Kill bio_pair_split()
git bisect skip 44931ee84c6362ec8d9b97b02432760035a2b639
# skip: [d4fbf2c24290f237cf5989d8e4c8507969ae2299] rbd: Refactor bio cloning, don't clone biovecs
git bisect skip d4fbf2c24290f237cf5989d8e4c8507969ae2299
# good: [971ecaf05e526fee159a3711a7ee831fe4d397ab] dm: Use bvec_iter for dm_bio_record()
git bisect good 971ecaf05e526fee159a3711a7ee831fe4d397ab
# skip: [cc4067bd8adeb5507829b7ae8f17211aab5d1e9d] block: Kill bio_iovec_idx(), __bio_iovec()
git bisect skip cc4067bd8adeb5507829b7ae8f17211aab5d1e9d
# skip: [a040a44b1c2b56fed3ebef3734681b6fe473fd33] dm: Refactor for new bio cloning/splitting
git bisect skip a040a44b1c2b56fed3ebef3734681b6fe473fd33
# skip: [948809ba161cce4060977970e1133a66fffc3449] block: Introduce new bio_split()
git bisect skip 948809ba161cce4060977970e1133a66fffc3449
# skip: [7e814b148e1127be7c32bb438ceaadb0b6e33042] block: Remove bi_idx hacks
git bisect skip 7e814b148e1127be7c32bb438ceaadb0b6e33042
# skip: [3dbdffcc4c1ffb7d7ac631be55cd5aab3b258614] block: Immutable bio vecs
git bisect skip 3dbdffcc4c1ffb7d7ac631be55cd5aab3b258614
# skip: [919b8823a6ef27103fe3abd05026f87ad85ed1ad] block: Convert drivers to immutable biovecs
git bisect skip 919b8823a6ef27103fe3abd05026f87ad85ed1ad
# skip: [5fbc9c23b291ac8d8ffe73cbc55cd7cb9c57fd04] block: Convert bio_copy_data() to bvec_iter
git bisect skip 5fbc9c23b291ac8d8ffe73cbc55cd7cb9c57fd04
# skip: [2771aecc0cc33d70747c8335239c20c9ff87ac67] block: Generic bio chaining
git bisect skip 2771aecc0cc33d70747c8335239c20c9ff87ac67
# skip: [9fc6286f347d00528adcdcf12396d220f47492ed] block: Don't save/copy bvec array anymore, share when cloning
git bisect skip 9fc6286f347d00528adcdcf12396d220f47492ed
# skip: [85bf1bd38f53e93712a149a8c31abe6936494d64] block: Rename bio_split() -> bio_pair_split()
git bisect skip 85bf1bd38f53e93712a149a8c31abe6936494d64
# skip: [5d1f127c3e0c57d64ce75ee04a0db2b40a3e21df] block: Convert bio_for_each_segment() to bvec_iter
git bisect skip 5d1f127c3e0c57d64ce75ee04a0db2b40a3e21df
# skip: [eb225c28a0b3f730b50f096946aee5eef2cb9969] bio-integrity: Convert to bvec_iter
git bisect skip eb225c28a0b3f730b50f096946aee5eef2cb9969
# good: [3d14ea51119f4afad8d0ac4d206923bca744684d] block: Convert bio_iovec() to bvec_iter
git bisect good 3d14ea51119f4afad8d0ac4d206923bca744684d
# good: [b62ad46ef438c94164b33cd58ad945ebc210c67b] block: fixup rq/bio dcache page flushing for ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE
git bisect good b62ad46ef438c94164b33cd58ad945ebc210c67b
# first bad commit: [febca1baea1cfe2d7a0271385d89b03d5fb34f94] block: setup bi_vcnt on clones

Best regards,
Tomasz


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Boot failure on Arndale with next-20131105
  2013-11-05 16:42 ` Tomasz Figa
@ 2013-11-05 17:38   ` Stephen Warren
  2013-11-05 21:25     ` Jens Axboe
  2013-11-05 19:59   ` Jens Axboe
  1 sibling, 1 reply; 17+ messages in thread
From: Stephen Warren @ 2013-11-05 17:38 UTC (permalink / raw)
  To: Tomasz Figa, Tushar Behera; +Cc: linux-next, lkml, Jens Axboe, Chris Mason

On 11/05/2013 09:42 AM, Tomasz Figa wrote:
> Hi,
> 
> On Tuesday 05 of November 2013 17:19:00 Tushar Behera wrote:
>> Hi,
>>
>> We are having a boot-time kernel panic on Samsung's Exynos5250-based
>> Arndale board with next-20131105. Bisect points to following commit.
>>
>> <<<
>> commit febca1baea1cfe2d7a0271385d89b03d5fb34f94
>> Author: Chris Mason <chris.mason@fusionio.com>
>> Date:   Thu Oct 31 13:32:42 2013 -0600
>>
>>     block: setup bi_vcnt on clones
>>
>>     commit 9fc6286f347d changed the cloning code to make clones cheaper for
>>     the case where we don't need to clone the iovec array.  But,
>>     the new clone needs the bi_vnct from the original.
>>
>>     Signed-off-by: Chris Mason <chris.mason@fusionio.com>
>>     Signed-off-by: Jens Axboe <axboe@kernel.dk>
>> >>>
>>
>> Reverting above commit, Arndale is able to boot again.
> 
> I can confirm exactly the same behavior on Exynos 4210-based Trats board,
> with exactly the same bisection results.

Despite the backtrace looking different, reverting that commit also
solves the boot failures on the Tegra-based "Beaver" board.

> Also note that I spotted multiple build failures in block layer during
> the bisection.

I note that compiling next-20131105 generates quite a few warnings re:
uninitialized variables. Reverting the commit doesn't solve those.

> block/blk-merge.c: In function ‘blk_bio_map_sg’:
> block/blk-merge.c:133:8: warning: ‘bvprv.bv_len’ may be used uninitialized in this function [-Wmaybe-uninitialized]
> block/blk-merge.c:233:23: note: ‘bvprv.bv_len’ was declared here
> block/blk-merge.c:133:8: warning: ‘bvprv.bv_offset’ may be used uninitialized in this function [-Wmaybe-uninitialized]
> block/blk-merge.c:233:23: note: ‘bvprv.bv_offset’ was declared here
> block/blk-merge.c:133:8: warning: ‘bvprv.bv_page’ may be used uninitialized in this function [-Wmaybe-uninitialized]
> block/blk-merge.c:233:23: note: ‘bvprv.bv_page’ was declared here
> block/blk-merge.c: In function ‘blk_rq_map_sg’:
> block/blk-merge.c:133:8: warning: ‘bvprv.bv_page’ may be used uninitialized in this function [-Wmaybe-uninitialized]
> block/blk-merge.c:171:23: note: ‘bvprv.bv_page’ was declared here
> block/blk-merge.c:133:8: warning: ‘bvprv.bv_offset’ may be used uninitialized in this function [-Wmaybe-uninitialized]
> block/blk-merge.c:171:23: note: ‘bvprv.bv_offset’ was declared here
> block/blk-merge.c:133:8: warning: ‘bvprv.bv_len’ may be used uninitialized in this function [-Wmaybe-uninitialized]
> block/blk-merge.c:171:23: note: ‘bvprv.bv_len’ was declared here
> block/blk-merge.c: In function ‘attempt_merge’:
> block/blk-merge.c:108:7: warning: ‘end_bv.bv_offset’ may be used uninitialized in this function [-Wmaybe-uninitialized]
> block/blk-merge.c:89:17: note: ‘end_bv.bv_offset’ was declared here
> block/blk-merge.c:108:7: warning: ‘end_bv.bv_page’ may be used uninitialized in this function [-Wmaybe-uninitialized]
> block/blk-merge.c:89:17: note: ‘end_bv.bv_page’ was declared here
> block/blk-merge.c:108:7: warning: ‘end_bv.bv_len’ may be used uninitialized in this function [-Wmaybe-uninitialized]
> block/blk-merge.c:89:17: note: ‘end_bv.bv_len’ was declared here


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Boot failure on Arndale with next-20131105
  2013-11-05 11:49 Boot failure on Arndale with next-20131105 Tushar Behera
  2013-11-05 16:42 ` Tomasz Figa
@ 2013-11-05 19:33 ` Jens Axboe
  2013-11-05 20:23   ` Olof Johansson
  1 sibling, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2013-11-05 19:33 UTC (permalink / raw)
  To: Tushar Behera, linux-next, lkml; +Cc: Chris Mason

On 11/05/2013 04:49 AM, Tushar Behera wrote:
> Hi,
> 
> We are having a boot-time kernel panic on Samsung's Exynos5250-based
> Arndale board with next-20131105. Bisect points to following commit.
> 
> <<<
> commit febca1baea1cfe2d7a0271385d89b03d5fb34f94
> Author: Chris Mason <chris.mason@fusionio.com>
> Date:   Thu Oct 31 13:32:42 2013 -0600
> 
>     block: setup bi_vcnt on clones
> 
>     commit 9fc6286f347d changed the cloning code to make clones cheaper for
>     the case where we don't need to clone the iovec array.  But,
>     the new clone needs the bi_vnct from the original.
> 
>     Signed-off-by: Chris Mason <chris.mason@fusionio.com>
>     Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>>>
> 
> Reverting above commit, Arndale is able to boot again.
> 
> Excerpts from the boot log (just in case, it helps in debugging).
> 
> [    1.972062] Unable to handle kernel paging request at virtual
> address 025e63a0
> [    1.981164] pgd = c0004000
> [    1.982375] [025e63a0] *pgd=00000000
> [    1.985875] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
> [    1.991086] Modules linked in:
> [    1.994076] CPU: 0 PID: 1178 Comm: mmcqd/0 Not tainted
> 3.12.0-rc5-00051-gfebca1b #21
> [    2.001683] task: ef3530c0 ti: ee82e000 task.ti: ee82e000
> [    2.006981] PC is at dma_cache_maint_page+0x84/0x174
> [    2.011842] LR is at 0x6
> 
> [    2.043532] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM
> Segment kernel
> [    2.050708] Control: 10c5387d  Table: 4000406a  DAC: 00000015
> [    2.056342] Process mmcqd/0 (pid: 1178, stack limit = 0xee82e240)
> [    2.062321] Stack: (0xee82fd58 to 0xee830000)
> 
> [ ... ]
> 
> [    2.275352] [<c0015768>] (dma_cache_maint_page+0x84/0x174) from
> [<c0015880>] (__dma_page_cpu_to_dev+0x28/0xa0)
> [    2.285170] [<c0015880>] (__dma_page_cpu_to_dev+0x28/0xa0) from
> [<c00159a4>] (arm_dma_map_page+0x6c/0x70)
> [    2.294565] [<c00159a4>] (arm_dma_map_page+0x6c/0x70) from
> [<c0015d28>] (arm_dma_map_sg+0x74/0xec)
> [    2.303366] [<c0015d28>] (arm_dma_map_sg+0x74/0xec) from
> [<c02bf534>] (dw_mci_pre_dma_transfer.isra.16+0x124/0x15c)
> [    2.313614] [<c02bf534>]
> (dw_mci_pre_dma_transfer.isra.16+0x124/0x15c) from [<c02bf8d4>]
> (dw_mci_pre_req+0x44/0x50)
> [    2.323863] [<c02bf8d4>] (dw_mci_pre_req+0x44/0x50) from
> [<c02a8970>] (mmc_start_req+0x3c/0x39c)
> [    2.332486] [<c02a8970>] (mmc_start_req+0x3c/0x39c) from
> [<c02b606c>] (mmc_blk_issue_rw_rq+0xbc/0xa9c)
> [    2.341625] [<c02b606c>] (mmc_blk_issue_rw_rq+0xbc/0xa9c) from
> [<c02b6c14>] (mmc_blk_issue_rq+0x1c8/0x498)
> [    2.351106] [<c02b6c14>] (mmc_blk_issue_rq+0x1c8/0x498) from
> [<c02b75d4>] (mmc_queue_thread+0xa4/0x144)
> [    2.360331] [<c02b75d4>] (mmc_queue_thread+0xa4/0x144) from
> [<c0038614>] (kthread+0xb4/0xb8)
> [    2.368616] [<c0038614>] (kthread+0xb4/0xb8) from [<c000e2f8>]
> (ret_from_fork+0x14/0x3c)
> [    2.376556] Code: 17e81051 10822181 e592c000 e3ccc003 (e79c2007)
> [    2.382570] ---[ end trace df06b64b1b7fa443 ]---
> 
> [ ... ]
> 
> Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done.
> Gave up waiting for root device.  Common problems:
>  - Boot args (cat /proc/cmdline)
>    - Check rootdelay= (did the system wait long enough?)
>    - Check root= (did the system wait for the right device?)
>  - Missing modules (cat /proc/modules; ls /dev)
> ALERT!  /dev/mmcblk1p3 does not exist.  Dropping to a shell!
> FATAL: Could not load
> /lib/modules/3.12.0-rc5-00051-gfebca1b/modules.dep: No such file or
> directory
> FATAL: Could not load
> /lib/modules/3.12.0-rc5-00051-gfebca1b/modules.dep: No such file or
> directory

Very weird! What file system is being used?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Boot failure on Arndale with next-20131105
  2013-11-05 16:42 ` Tomasz Figa
  2013-11-05 17:38   ` Stephen Warren
@ 2013-11-05 19:59   ` Jens Axboe
  1 sibling, 0 replies; 17+ messages in thread
From: Jens Axboe @ 2013-11-05 19:59 UTC (permalink / raw)
  To: Tomasz Figa, Tushar Behera; +Cc: linux-next, lkml, Chris Mason

[-- Attachment #1: Type: text/plain, Size: 1031 bytes --]

On 11/05/2013 09:42 AM, Tomasz Figa wrote:
> Hi,
> 
> On Tuesday 05 of November 2013 17:19:00 Tushar Behera wrote:
>> Hi,
>>
>> We are having a boot-time kernel panic on Samsung's Exynos5250-based
>> Arndale board with next-20131105. Bisect points to following commit.
>>
>> <<<
>> commit febca1baea1cfe2d7a0271385d89b03d5fb34f94
>> Author: Chris Mason <chris.mason@fusionio.com>
>> Date:   Thu Oct 31 13:32:42 2013 -0600
>>
>>     block: setup bi_vcnt on clones
>>
>>     commit 9fc6286f347d changed the cloning code to make clones cheaper for
>>     the case where we don't need to clone the iovec array.  But,
>>     the new clone needs the bi_vnct from the original.
>>
>>     Signed-off-by: Chris Mason <chris.mason@fusionio.com>
>>     Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>>>>
>>
>> Reverting above commit, Arndale is able to boot again.
> 
> I can confirm exactly the same behavior on Exynos 4210-based Trats board,
> with exactly the same bisection results.

Can either (or both) of you try this?

-- 
Jens Axboe


[-- Attachment #2: dm-clone.patch --]
[-- Type: text/x-patch, Size: 803 bytes --]

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 8e6174c..a1177e1 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1123,8 +1123,13 @@ struct clone_info {
 
 static void bio_setup_sector(struct bio *bio, sector_t sector, sector_t len)
 {
-	bio->bi_iter.bi_sector = sector;
-	bio->bi_iter.bi_size = to_bytes(len);
+	if (len) {
+		bio->bi_iter.bi_sector = sector;
+		bio->bi_iter.bi_size = to_bytes(len);
+	} else {
+		bio->bi_iter.bi_size = 0;
+		bio->bi_vcnt = 0;
+	}
 }
 
 /*
@@ -1178,8 +1183,7 @@ static void __clone_and_map_simple_bio(struct clone_info *ci,
 	 * and discard, so no need for concern about wasted bvec allocations.
 	 */
 	 __bio_clone(clone, ci->bio);
-	if (len)
-		bio_setup_sector(clone, ci->sector, len);
+	bio_setup_sector(clone, ci->sector, len);
 
 	__map_bio(tio);
 }

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: Boot failure on Arndale with next-20131105
  2013-11-05 19:33 ` Jens Axboe
@ 2013-11-05 20:23   ` Olof Johansson
  2013-11-05 20:33     ` Chris Mason
  2013-11-05 20:34     ` Jens Axboe
  0 siblings, 2 replies; 17+ messages in thread
From: Olof Johansson @ 2013-11-05 20:23 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Tushar Behera, linux-next, lkml, Chris Mason

On Tue, Nov 5, 2013 at 11:33 AM, Jens Axboe <axboe@kernel.dk> wrote:
> On 11/05/2013 04:49 AM, Tushar Behera wrote:
>> Hi,
>>
>> We are having a boot-time kernel panic on Samsung's Exynos5250-based
>> Arndale board with next-20131105. Bisect points to following commit.
>>
>> <<<
>> commit febca1baea1cfe2d7a0271385d89b03d5fb34f94
>> Author: Chris Mason <chris.mason@fusionio.com>
>> Date:   Thu Oct 31 13:32:42 2013 -0600
>>
>>     block: setup bi_vcnt on clones
>>
>>     commit 9fc6286f347d changed the cloning code to make clones cheaper for
>>     the case where we don't need to clone the iovec array.  But,
>>     the new clone needs the bi_vnct from the original.
>>
>>     Signed-off-by: Chris Mason <chris.mason@fusionio.com>
>>     Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>>>>
>>
>> Reverting above commit, Arndale is able to boot again.
>>
>> Excerpts from the boot log (just in case, it helps in debugging).
>>
>> [    1.972062] Unable to handle kernel paging request at virtual
>> address 025e63a0
>> [    1.981164] pgd = c0004000
>> [    1.982375] [025e63a0] *pgd=00000000
>> [    1.985875] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
>> [    1.991086] Modules linked in:
>> [    1.994076] CPU: 0 PID: 1178 Comm: mmcqd/0 Not tainted
>> 3.12.0-rc5-00051-gfebca1b #21
>> [    2.001683] task: ef3530c0 ti: ee82e000 task.ti: ee82e000
>> [    2.006981] PC is at dma_cache_maint_page+0x84/0x174
>> [    2.011842] LR is at 0x6
>>
>> [    2.043532] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM
>> Segment kernel
>> [    2.050708] Control: 10c5387d  Table: 4000406a  DAC: 00000015
>> [    2.056342] Process mmcqd/0 (pid: 1178, stack limit = 0xee82e240)
>> [    2.062321] Stack: (0xee82fd58 to 0xee830000)
>>
>> [ ... ]
>>
>> [    2.275352] [<c0015768>] (dma_cache_maint_page+0x84/0x174) from
>> [<c0015880>] (__dma_page_cpu_to_dev+0x28/0xa0)
>> [    2.285170] [<c0015880>] (__dma_page_cpu_to_dev+0x28/0xa0) from
>> [<c00159a4>] (arm_dma_map_page+0x6c/0x70)
>> [    2.294565] [<c00159a4>] (arm_dma_map_page+0x6c/0x70) from
>> [<c0015d28>] (arm_dma_map_sg+0x74/0xec)
>> [    2.303366] [<c0015d28>] (arm_dma_map_sg+0x74/0xec) from
>> [<c02bf534>] (dw_mci_pre_dma_transfer.isra.16+0x124/0x15c)
>> [    2.313614] [<c02bf534>]
>> (dw_mci_pre_dma_transfer.isra.16+0x124/0x15c) from [<c02bf8d4>]
>> (dw_mci_pre_req+0x44/0x50)
>> [    2.323863] [<c02bf8d4>] (dw_mci_pre_req+0x44/0x50) from
>> [<c02a8970>] (mmc_start_req+0x3c/0x39c)
>> [    2.332486] [<c02a8970>] (mmc_start_req+0x3c/0x39c) from
>> [<c02b606c>] (mmc_blk_issue_rw_rq+0xbc/0xa9c)
>> [    2.341625] [<c02b606c>] (mmc_blk_issue_rw_rq+0xbc/0xa9c) from
>> [<c02b6c14>] (mmc_blk_issue_rq+0x1c8/0x498)
>> [    2.351106] [<c02b6c14>] (mmc_blk_issue_rq+0x1c8/0x498) from
>> [<c02b75d4>] (mmc_queue_thread+0xa4/0x144)
>> [    2.360331] [<c02b75d4>] (mmc_queue_thread+0xa4/0x144) from
>> [<c0038614>] (kthread+0xb4/0xb8)
>> [    2.368616] [<c0038614>] (kthread+0xb4/0xb8) from [<c000e2f8>]
>> (ret_from_fork+0x14/0x3c)
>> [    2.376556] Code: 17e81051 10822181 e592c000 e3ccc003 (e79c2007)
>> [    2.382570] ---[ end trace df06b64b1b7fa443 ]---
>>
>> [ ... ]
>>
>> Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done.
>> Gave up waiting for root device.  Common problems:
>>  - Boot args (cat /proc/cmdline)
>>    - Check rootdelay= (did the system wait long enough?)
>>    - Check root= (did the system wait for the right device?)
>>  - Missing modules (cat /proc/modules; ls /dev)
>> ALERT!  /dev/mmcblk1p3 does not exist.  Dropping to a shell!
>> FATAL: Could not load
>> /lib/modules/3.12.0-rc5-00051-gfebca1b/modules.dep: No such file or
>> directory
>> FATAL: Could not load
>> /lib/modules/3.12.0-rc5-00051-gfebca1b/modules.dep: No such file or
>> directory
>
> Very weird! What file system is being used?

Most of my failures have happened on regular MMC cards with ext4
filesystems on them.

Note that the panic happens during device probe / partition table
scanning, not after mounting the filesystem.

Giving your patch a go now across the board. I'm very concerned about
the reports of bisectability, build failures and heaps of warnings
though. Did the 0-day builder pick up any of those? :-/


-Olof

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Boot failure on Arndale with next-20131105
  2013-11-05 20:23   ` Olof Johansson
@ 2013-11-05 20:33     ` Chris Mason
  2013-11-05 20:38       ` Olof Johansson
  2013-11-05 20:34     ` Jens Axboe
  1 sibling, 1 reply; 17+ messages in thread
From: Chris Mason @ 2013-11-05 20:33 UTC (permalink / raw)
  To: Olof Johansson, Jens Axboe; +Cc: Tushar Behera, linux-next, lkml

Quoting Olof Johansson (2013-11-05 15:23:51)
> On Tue, Nov 5, 2013 at 11:33 AM, Jens Axboe <axboe@kernel.dk> wrote:

[ horrible crashes fixed by removing my patch ]

> > Very weird! What file system is being used?
> 
> Most of my failures have happened on regular MMC cards with ext4
> filesystems on them.
> 
> Note that the panic happens during device probe / partition table
> scanning, not after mounting the filesystem.
> 
> Giving your patch a go now across the board. I'm very concerned about
> the reports of bisectability, build failures and heaps of warnings
> though. Did the 0-day builder pick up any of those? :-/
> 

Hmmm, is bcache in your config?

-chris


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Boot failure on Arndale with next-20131105
  2013-11-05 20:23   ` Olof Johansson
  2013-11-05 20:33     ` Chris Mason
@ 2013-11-05 20:34     ` Jens Axboe
  1 sibling, 0 replies; 17+ messages in thread
From: Jens Axboe @ 2013-11-05 20:34 UTC (permalink / raw)
  To: Olof Johansson; +Cc: Tushar Behera, linux-next, lkml, Chris Mason

On 11/05/2013 01:23 PM, Olof Johansson wrote:
>> Very weird! What file system is being used?
> 
> Most of my failures have happened on regular MMC cards with ext4
> filesystems on them.
> 
> Note that the panic happens during device probe / partition table
> scanning, not after mounting the filesystem.

Hmm ok.

> Giving your patch a go now across the board. I'm very concerned about
> the reports of bisectability, build failures and heaps of warnings
> though. Did the 0-day builder pick up any of those? :-/

Yeah, unfortunately the immutable conversion has turned out to be quite
messy :-(

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Boot failure on Arndale with next-20131105
  2013-11-05 20:33     ` Chris Mason
@ 2013-11-05 20:38       ` Olof Johansson
  2013-11-05 20:56         ` Chris Mason
  0 siblings, 1 reply; 17+ messages in thread
From: Olof Johansson @ 2013-11-05 20:38 UTC (permalink / raw)
  To: Chris Mason; +Cc: Jens Axboe, Tushar Behera, linux-next, lkml

On Tue, Nov 5, 2013 at 12:33 PM, Chris Mason <chris.mason@fusionio.com> wrote:
> Quoting Olof Johansson (2013-11-05 15:23:51)
>> On Tue, Nov 5, 2013 at 11:33 AM, Jens Axboe <axboe@kernel.dk> wrote:
>
> [ horrible crashes fixed by removing my patch ]
>
>> > Very weird! What file system is being used?
>>
>> Most of my failures have happened on regular MMC cards with ext4
>> filesystems on them.
>>
>> Note that the panic happens during device probe / partition table
>> scanning, not after mounting the filesystem.
>>
>> Giving your patch a go now across the board. I'm very concerned about
>> the reports of bisectability, build failures and heaps of warnings
>> though. Did the 0-day builder pick up any of those? :-/
>>
>
> Hmmm, is bcache in your config?

Doesn't look that way -- no ARM defconfigs enable it (it's what I
build and boot), and the option defaults to off and nothing selects
it.


-Olof

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Boot failure on Arndale with next-20131105
  2013-11-05 20:38       ` Olof Johansson
@ 2013-11-05 20:56         ` Chris Mason
  2013-11-05 21:27           ` Olof Johansson
                             ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Chris Mason @ 2013-11-05 20:56 UTC (permalink / raw)
  To: Olof Johansson
  Cc: Jens Axboe, Tushar Behera, linux-next, lkml, Kent Overstreet

Quoting Olof Johansson (2013-11-05 15:38:33)
> On Tue, Nov 5, 2013 at 12:33 PM, Chris Mason <chris.mason@fusionio.com> wrote:
> > Quoting Olof Johansson (2013-11-05 15:23:51)
> >> On Tue, Nov 5, 2013 at 11:33 AM, Jens Axboe <axboe@kernel.dk> wrote:
> >
> > [ horrible crashes fixed by removing my patch ]
> >
> >> > Very weird! What file system is being used?
> >>
> >> Most of my failures have happened on regular MMC cards with ext4
> >> filesystems on them.
> >>
> >> Note that the panic happens during device probe / partition table
> >> scanning, not after mounting the filesystem.
> >>
> >> Giving your patch a go now across the board. I'm very concerned about
> >> the reports of bisectability, build failures and heaps of warnings
> >> though. Did the 0-day builder pick up any of those? :-/
> >>
> >
> > Hmmm, is bcache in your config?
> 
> Doesn't look that way -- no ARM defconfigs enable it (it's what I
> build and boot), and the option defaults to off and nothing selects
> it.

Ok, I think I see it.  My guess is that you're hitting bounce buffers.
__blk_queue_bounce is the only caller of the bio splitting code I can
see that you might be hitting.

My first patch exposed a lurking bug in bio_clone_biovec.  Basically
bio->bi_vcnt is being doubled instead of initialized.

This patch is only compile tested, but I think it'll fix it.

diff --git a/fs/bio.c b/fs/bio.c
index be93de1..3595456 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -612,6 +612,7 @@ int bio_clone_biovec(struct bio *bio, gfp_t gfp_mask)
 	unsigned nr_iovecs = 0;
 	struct bio_vec bv, *bvl = NULL;
 	struct bvec_iter iter;
+	int i;
 
 	BUG_ON(!bio->bi_pool);
 	BUG_ON(BIO_POOL_IDX(bio) != BIO_POOL_NONE);
@@ -628,8 +629,9 @@ int bio_clone_biovec(struct bio *bio, gfp_t gfp_mask)
 		bvl = bio->bi_inline_vecs;
 	}
 
+	i = 0;
 	bio_for_each_segment(bv, bio, iter)
-		bvl[bio->bi_vcnt++] = bv;
+		bvl[i++] = bv;
 
 	bio->bi_io_vec = bvl;
 	bio->bi_iter.bi_idx = 0;

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: Boot failure on Arndale with next-20131105
  2013-11-05 17:38   ` Stephen Warren
@ 2013-11-05 21:25     ` Jens Axboe
  2013-11-08  8:45       ` Stephen Rothwell
  0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2013-11-05 21:25 UTC (permalink / raw)
  To: Stephen Warren, Tomasz Figa, Tushar Behera; +Cc: linux-next, lkml, Chris Mason

On 11/05/2013 10:38 AM, Stephen Warren wrote:
> I note that compiling next-20131105 generates quite a few warnings re:
> uninitialized variables. Reverting the commit doesn't solve those.
> 
>> block/blk-merge.c: In function ‘blk_bio_map_sg’:
>> block/blk-merge.c:133:8: warning: ‘bvprv.bv_len’ may be used uninitialized in this function [-Wmaybe-uninitialized]
>> block/blk-merge.c:233:23: note: ‘bvprv.bv_len’ was declared here
>> block/blk-merge.c:133:8: warning: ‘bvprv.bv_offset’ may be used uninitialized in this function [-Wmaybe-uninitialized]
>> block/blk-merge.c:233:23: note: ‘bvprv.bv_offset’ was declared here
>> block/blk-merge.c:133:8: warning: ‘bvprv.bv_page’ may be used uninitialized in this function [-Wmaybe-uninitialized]
>> block/blk-merge.c:233:23: note: ‘bvprv.bv_page’ was declared here
>> block/blk-merge.c: In function ‘blk_rq_map_sg’:
>> block/blk-merge.c:133:8: warning: ‘bvprv.bv_page’ may be used uninitialized in this function [-Wmaybe-uninitialized]
>> block/blk-merge.c:171:23: note: ‘bvprv.bv_page’ was declared here
>> block/blk-merge.c:133:8: warning: ‘bvprv.bv_offset’ may be used uninitialized in this function [-Wmaybe-uninitialized]
>> block/blk-merge.c:171:23: note: ‘bvprv.bv_offset’ was declared here
>> block/blk-merge.c:133:8: warning: ‘bvprv.bv_len’ may be used uninitialized in this function [-Wmaybe-uninitialized]
>> block/blk-merge.c:171:23: note: ‘bvprv.bv_len’ was declared here
>> block/blk-merge.c: In function ‘attempt_merge’:
>> block/blk-merge.c:108:7: warning: ‘end_bv.bv_offset’ may be used uninitialized in this function [-Wmaybe-uninitialized]
>> block/blk-merge.c:89:17: note: ‘end_bv.bv_offset’ was declared here
>> block/blk-merge.c:108:7: warning: ‘end_bv.bv_page’ may be used uninitialized in this function [-Wmaybe-uninitialized]
>> block/blk-merge.c:89:17: note: ‘end_bv.bv_page’ was declared here
>> block/blk-merge.c:108:7: warning: ‘end_bv.bv_len’ may be used uninitialized in this function [-Wmaybe-uninitialized]
>> block/blk-merge.c:89:17: note: ‘end_bv.bv_len’ was declared here

Looks like an incomplete merge. The patch to silence those warnings
(which aren't bugs, BTW) is definitely in my for-next branch.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Boot failure on Arndale with next-20131105
  2013-11-05 20:56         ` Chris Mason
@ 2013-11-05 21:27           ` Olof Johansson
  2013-11-05 22:06           ` Stephen Warren
  2013-11-06  6:15           ` Tushar Behera
  2 siblings, 0 replies; 17+ messages in thread
From: Olof Johansson @ 2013-11-05 21:27 UTC (permalink / raw)
  To: Chris Mason; +Cc: Jens Axboe, Tushar Behera, linux-next, lkml, Kent Overstreet

On Tue, Nov 5, 2013 at 12:56 PM, Chris Mason <chris.mason@fusionio.com> wrote:
> Quoting Olof Johansson (2013-11-05 15:38:33)
>> On Tue, Nov 5, 2013 at 12:33 PM, Chris Mason <chris.mason@fusionio.com> wrote:
>> > Quoting Olof Johansson (2013-11-05 15:23:51)
>> >> On Tue, Nov 5, 2013 at 11:33 AM, Jens Axboe <axboe@kernel.dk> wrote:
>> >
>> > [ horrible crashes fixed by removing my patch ]
>> >
>> >> > Very weird! What file system is being used?
>> >>
>> >> Most of my failures have happened on regular MMC cards with ext4
>> >> filesystems on them.
>> >>
>> >> Note that the panic happens during device probe / partition table
>> >> scanning, not after mounting the filesystem.
>> >>
>> >> Giving your patch a go now across the board. I'm very concerned about
>> >> the reports of bisectability, build failures and heaps of warnings
>> >> though. Did the 0-day builder pick up any of those? :-/
>> >>
>> >
>> > Hmmm, is bcache in your config?
>>
>> Doesn't look that way -- no ARM defconfigs enable it (it's what I
>> build and boot), and the option defaults to off and nothing selects
>> it.
>
> Ok, I think I see it.  My guess is that you're hitting bounce buffers.
> __blk_queue_bounce is the only caller of the bio splitting code I can
> see that you might be hitting.
>
> My first patch exposed a lurking bug in bio_clone_biovec.  Basically
> bio->bi_vcnt is being doubled instead of initialized.
>
> This patch is only compile tested, but I think it'll fix it.

Thanks, giving it a go now (will have results in 30+ minutes). Jens'
patch didn't make a difference, which makes sense given lack of dm
usage.


-Olof

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Boot failure on Arndale with next-20131105
  2013-11-05 20:56         ` Chris Mason
  2013-11-05 21:27           ` Olof Johansson
@ 2013-11-05 22:06           ` Stephen Warren
  2013-11-05 22:41             ` Olof Johansson
  2013-11-06  6:15           ` Tushar Behera
  2 siblings, 1 reply; 17+ messages in thread
From: Stephen Warren @ 2013-11-05 22:06 UTC (permalink / raw)
  To: Chris Mason, Olof Johansson
  Cc: Jens Axboe, Tushar Behera, linux-next, lkml, Kent Overstreet

On 11/05/2013 01:56 PM, Chris Mason wrote:
> Quoting Olof Johansson (2013-11-05 15:38:33)
>> On Tue, Nov 5, 2013 at 12:33 PM, Chris Mason <chris.mason@fusionio.com> wrote:
>>> Quoting Olof Johansson (2013-11-05 15:23:51)
>>>> On Tue, Nov 5, 2013 at 11:33 AM, Jens Axboe <axboe@kernel.dk> wrote:
>>>
>>> [ horrible crashes fixed by removing my patch ]
>>>
...
> Ok, I think I see it.  My guess is that you're hitting bounce buffers.
> __blk_queue_bounce is the only caller of the bio splitting code I can
> see that you might be hitting.
> 
> My first patch exposed a lurking bug in bio_clone_biovec.  Basically
> bio->bi_vcnt is being doubled instead of initialized.
> 
> This patch is only compile tested, but I think it'll fix it.

Tested-by: Stephen Warren <swarren@nvidia.com>
(this fixes the issue on Tegra30/Beaver at least)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Boot failure on Arndale with next-20131105
  2013-11-05 22:06           ` Stephen Warren
@ 2013-11-05 22:41             ` Olof Johansson
  2013-11-06  0:04               ` Chris Mason
  0 siblings, 1 reply; 17+ messages in thread
From: Olof Johansson @ 2013-11-05 22:41 UTC (permalink / raw)
  To: Stephen Warren
  Cc: Chris Mason, Jens Axboe, Tushar Behera, linux-next, lkml,
	Kent Overstreet

On Tue, Nov 5, 2013 at 2:06 PM, Stephen Warren <swarren@wwwdotorg.org> wrote:
> On 11/05/2013 01:56 PM, Chris Mason wrote:
>> Quoting Olof Johansson (2013-11-05 15:38:33)
>>> On Tue, Nov 5, 2013 at 12:33 PM, Chris Mason <chris.mason@fusionio.com> wrote:
>>>> Quoting Olof Johansson (2013-11-05 15:23:51)
>>>>> On Tue, Nov 5, 2013 at 11:33 AM, Jens Axboe <axboe@kernel.dk> wrote:
>>>>
>>>> [ horrible crashes fixed by removing my patch ]
>>>>
> ...
>> Ok, I think I see it.  My guess is that you're hitting bounce buffers.
>> __blk_queue_bounce is the only caller of the bio splitting code I can
>> see that you might be hitting.
>>
>> My first patch exposed a lurking bug in bio_clone_biovec.  Basically
>> bio->bi_vcnt is being doubled instead of initialized.
>>
>> This patch is only compile tested, but I think it'll fix it.
>
> Tested-by: Stephen Warren <swarren@nvidia.com>
> (this fixes the issue on Tegra30/Beaver at least)

Tested-by: Olof Johansson <olof@lixom.net>

This resolves boot failures on:
* Tegra30/beaver
* OMAP4/panda
* i.MX6/wandboard

Thanks Chris!

-Olof

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Boot failure on Arndale with next-20131105
  2013-11-05 22:41             ` Olof Johansson
@ 2013-11-06  0:04               ` Chris Mason
  0 siblings, 0 replies; 17+ messages in thread
From: Chris Mason @ 2013-11-06  0:04 UTC (permalink / raw)
  To: Olof Johansson, Stephen Warren
  Cc: Jens Axboe, Tushar Behera, linux-next, lkml, Kent Overstreet

Quoting Olof Johansson (2013-11-05 17:41:42)
> On Tue, Nov 5, 2013 at 2:06 PM, Stephen Warren <swarren@wwwdotorg.org> wrote:
> > On 11/05/2013 01:56 PM, Chris Mason wrote:
> >> Quoting Olof Johansson (2013-11-05 15:38:33)
> >>> On Tue, Nov 5, 2013 at 12:33 PM, Chris Mason <chris.mason@fusionio.com> wrote:
> >>>> Quoting Olof Johansson (2013-11-05 15:23:51)
> >>>>> On Tue, Nov 5, 2013 at 11:33 AM, Jens Axboe <axboe@kernel.dk> wrote:
> >>>>
> >>>> [ horrible crashes fixed by removing my patch ]
> >>>>
> > ...
> >> Ok, I think I see it.  My guess is that you're hitting bounce buffers.
> >> __blk_queue_bounce is the only caller of the bio splitting code I can
> >> see that you might be hitting.
> >>
> >> My first patch exposed a lurking bug in bio_clone_biovec.  Basically
> >> bio->bi_vcnt is being doubled instead of initialized.
> >>
> >> This patch is only compile tested, but I think it'll fix it.
> >
> > Tested-by: Stephen Warren <swarren@nvidia.com>
> > (this fixes the issue on Tegra30/Beaver at least)
> 
> Tested-by: Olof Johansson <olof@lixom.net>
> 
> This resolves boot failures on:
> * Tegra30/beaver
> * OMAP4/panda
> * i.MX6/wandboard
> 
> Thanks Chris!

Perfect, thanks for bisecting and trying the patches.  Kent, if things
get rebased, could you please fold this patch and my bi_vcnt patch in?

-chris


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Boot failure on Arndale with next-20131105
  2013-11-05 20:56         ` Chris Mason
  2013-11-05 21:27           ` Olof Johansson
  2013-11-05 22:06           ` Stephen Warren
@ 2013-11-06  6:15           ` Tushar Behera
  2 siblings, 0 replies; 17+ messages in thread
From: Tushar Behera @ 2013-11-06  6:15 UTC (permalink / raw)
  To: Chris Mason; +Cc: Olof Johansson, Jens Axboe, linux-next, lkml, Kent Overstreet

On 6 November 2013 02:26, Chris Mason <chris.mason@fusionio.com> wrote:
> Quoting Olof Johansson (2013-11-05 15:38:33)
>> On Tue, Nov 5, 2013 at 12:33 PM, Chris Mason <chris.mason@fusionio.com> wrote:
>> > Quoting Olof Johansson (2013-11-05 15:23:51)
>> >> On Tue, Nov 5, 2013 at 11:33 AM, Jens Axboe <axboe@kernel.dk> wrote:

> This patch is only compile tested, but I think it'll fix it.
>
> diff --git a/fs/bio.c b/fs/bio.c
> index be93de1..3595456 100644
> --- a/fs/bio.c
> +++ b/fs/bio.c
> @@ -612,6 +612,7 @@ int bio_clone_biovec(struct bio *bio, gfp_t gfp_mask)
>         unsigned nr_iovecs = 0;
>         struct bio_vec bv, *bvl = NULL;
>         struct bvec_iter iter;
> +       int i;
>
>         BUG_ON(!bio->bi_pool);
>         BUG_ON(BIO_POOL_IDX(bio) != BIO_POOL_NONE);
> @@ -628,8 +629,9 @@ int bio_clone_biovec(struct bio *bio, gfp_t gfp_mask)
>                 bvl = bio->bi_inline_vecs;
>         }
>
> +       i = 0;
>         bio_for_each_segment(bv, bio, iter)
> -               bvl[bio->bi_vcnt++] = bv;
> +               bvl[i++] = bv;
>
>         bio->bi_io_vec = bvl;
>         bio->bi_iter.bi_idx = 0;


Tested-by: Tushar Behera <tushar.behera@linaro.org>
(Fixes boot failure on Exynos5250-based Arndale board.)

-- 
Tushar Behera

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Boot failure on Arndale with next-20131105
  2013-11-05 21:25     ` Jens Axboe
@ 2013-11-08  8:45       ` Stephen Rothwell
  0 siblings, 0 replies; 17+ messages in thread
From: Stephen Rothwell @ 2013-11-08  8:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Stephen Warren, Tomasz Figa, Tushar Behera, linux-next, lkml,
	Chris Mason

[-- Attachment #1: Type: text/plain, Size: 2638 bytes --]

Hi Jens,

On Tue, 05 Nov 2013 14:25:00 -0700 Jens Axboe <axboe@kernel.dk> wrote:
>
> On 11/05/2013 10:38 AM, Stephen Warren wrote:
> > I note that compiling next-20131105 generates quite a few warnings re:
> > uninitialized variables. Reverting the commit doesn't solve those.
> > 
> >> block/blk-merge.c: In function ‘blk_bio_map_sg’:
> >> block/blk-merge.c:133:8: warning: ‘bvprv.bv_len’ may be used uninitialized in this function [-Wmaybe-uninitialized]
> >> block/blk-merge.c:233:23: note: ‘bvprv.bv_len’ was declared here
> >> block/blk-merge.c:133:8: warning: ‘bvprv.bv_offset’ may be used uninitialized in this function [-Wmaybe-uninitialized]
> >> block/blk-merge.c:233:23: note: ‘bvprv.bv_offset’ was declared here
> >> block/blk-merge.c:133:8: warning: ‘bvprv.bv_page’ may be used uninitialized in this function [-Wmaybe-uninitialized]
> >> block/blk-merge.c:233:23: note: ‘bvprv.bv_page’ was declared here
> >> block/blk-merge.c: In function ‘blk_rq_map_sg’:
> >> block/blk-merge.c:133:8: warning: ‘bvprv.bv_page’ may be used uninitialized in this function [-Wmaybe-uninitialized]
> >> block/blk-merge.c:171:23: note: ‘bvprv.bv_page’ was declared here
> >> block/blk-merge.c:133:8: warning: ‘bvprv.bv_offset’ may be used uninitialized in this function [-Wmaybe-uninitialized]
> >> block/blk-merge.c:171:23: note: ‘bvprv.bv_offset’ was declared here
> >> block/blk-merge.c:133:8: warning: ‘bvprv.bv_len’ may be used uninitialized in this function [-Wmaybe-uninitialized]
> >> block/blk-merge.c:171:23: note: ‘bvprv.bv_len’ was declared here
> >> block/blk-merge.c: In function ‘attempt_merge’:
> >> block/blk-merge.c:108:7: warning: ‘end_bv.bv_offset’ may be used uninitialized in this function [-Wmaybe-uninitialized]
> >> block/blk-merge.c:89:17: note: ‘end_bv.bv_offset’ was declared here
> >> block/blk-merge.c:108:7: warning: ‘end_bv.bv_page’ may be used uninitialized in this function [-Wmaybe-uninitialized]
> >> block/blk-merge.c:89:17: note: ‘end_bv.bv_page’ was declared here
> >> block/blk-merge.c:108:7: warning: ‘end_bv.bv_len’ may be used uninitialized in this function [-Wmaybe-uninitialized]
> >> block/blk-merge.c:89:17: note: ‘end_bv.bv_len’ was declared here
> 
> Looks like an incomplete merge. The patch to silence those warnings
> (which aren't bugs, BTW) is definitely in my for-next branch.

I am still getting those warnings in linux-next for various builds
(include 1386 defconfig).  Any hints would be good.

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-11-08  8:45 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-05 11:49 Boot failure on Arndale with next-20131105 Tushar Behera
2013-11-05 16:42 ` Tomasz Figa
2013-11-05 17:38   ` Stephen Warren
2013-11-05 21:25     ` Jens Axboe
2013-11-08  8:45       ` Stephen Rothwell
2013-11-05 19:59   ` Jens Axboe
2013-11-05 19:33 ` Jens Axboe
2013-11-05 20:23   ` Olof Johansson
2013-11-05 20:33     ` Chris Mason
2013-11-05 20:38       ` Olof Johansson
2013-11-05 20:56         ` Chris Mason
2013-11-05 21:27           ` Olof Johansson
2013-11-05 22:06           ` Stephen Warren
2013-11-05 22:41             ` Olof Johansson
2013-11-06  0:04               ` Chris Mason
2013-11-06  6:15           ` Tushar Behera
2013-11-05 20:34     ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.