All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vyacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
To: Ryusuke Konishi
	<konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
Cc: linux-nilfs <linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"Jérôme Poulin"
	<jeromepoulin-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Subject: Re: Kernel Bug: unable to handle kernel paging request
Date: Fri, 09 Aug 2013 17:15:25 +0400	[thread overview]
Message-ID: <1376054125.2272.84.camel@slavad-ubuntu> (raw)
In-Reply-To: <CALJXSJqV5nYb_t6GMS0FpWyf1aRehAgpvebwgbJzMJfctf1b2A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

Hi Ryusuke,

I am investigating the issue during last two weeks and I think that it
is time to share current results and my considerations. I feel necessity
to discuss possible reasons of the issue. Maybe, I miss something and it
needs to advise me a proper way of the issue investigation.

Actually, I can reproduce the issue by means of way of starting on
rootfs compilation task of Linux kernel and apt-get update task in
parallel. The issue results in such crash:

[  220.130662] BUG: unable to handle kernel paging request at 0000000000004612
[  220.130666] IP: [<ffffffff812b55ae>] nilfs_end_page_io+0x3e/0x180

[  220.130574] Call Trace:
[  220.130587]  [<ffffffff816c6b57>] dump_stack+0x19/0x1b
[  220.130593]  [<ffffffff812b5667>] nilfs_end_page_io+0xf7/0x180
[  220.130598]  [<ffffffff812ba2c4>] nilfs_segctor_do_construct+0x1984/0x2410
[  220.130603]  [<ffffffff812bb1f3>] nilfs_segctor_construct+0x1c3/0x450
[  220.130608]  [<ffffffff812bb5da>] nilfs_segctor_thread+0x15a/0x4c0
[  220.130612]  [<ffffffff816cad1f>] ? __schedule+0x3cf/0x810
[  220.130617]  [<ffffffff812bb480>] ? nilfs_segctor_construct+0x450/0x450
[  220.130622]  [<ffffffff81069760>] kthread+0xc0/0xd0
[  220.130626]  [<ffffffff810696a0>] ? flush_kthread_worker+0xb0/0xb0
[  220.130631]  [<ffffffff816d519c>] ret_from_fork+0x7c/0xb0
[  220.130635]  [<ffffffff810696a0>] ? flush_kthread_worker+0xb0/0xb0

I suppose that I haven't clear picture of the issue, currently. But I
have some steady reproducible results of the issue investigation.

As I can see, the issue is reproduced in the case of writing on volume
many blocks of a big file (for example, 1518 blocks) with mixture in the
buffer heads chain some count of another small files' blocks. Usually,
the issue takes place for a buffer heads chain that contains about 1500
- 2000 blocks.

I have such picture on the phase of adding of payload buffers:

[  959.803987] NILFS [nilfs_segbuf_add_payload_buffer]:167 page->index 22579166, i_ino 3, i_size 0, nblocks 1762
[  959.803990] NILFS [nilfs_segbuf_add_payload_buffer]:168 bh->b_assoc_buffers.next ffff8802247e3af8, bh->b_assoc_buffers.prev ffff880209838a08
[  959.803993] NILFS [nilfs_segctor_apply_buffers]:1158 listp ffff880220345ba8, listp->prev ffff880209836a70, listp->next ffff880209839ad8
[  959.803997] NILFS [nilfs_segctor_apply_buffers]:1159 bh->b_blocknr 22579166, bh->b_size 4096, bh->b_page ffffea000895db40
[  959.804000] NILFS [nilfs_segctor_apply_buffers]:1160 bh->b_assoc_buffers.next ffff8802247e3af8, bh->b_assoc_buffers.prev ffff880209838a08
[  959.804006] NILFS [nilfs_segbuf_add_payload_buffer]:167 page->index 22579167, i_ino 3, i_size 0, nblocks 1763
[  959.804009] NILFS [nilfs_segbuf_add_payload_buffer]:168 bh->b_assoc_buffers.next ffff8802247e3af8, bh->b_assoc_buffers.prev ffff8802267aac78
[  959.804013] NILFS [nilfs_segctor_apply_buffers]:1158 listp ffff880220345ba8, listp->prev ffff880209836a70, listp->next ffff880209836ad8
[  959.804016] NILFS [nilfs_segctor_apply_buffers]:1159 bh->b_blocknr 22579167, bh->b_size 4096, bh->b_page ffffea00082b73c0
[  959.804025] NILFS [nilfs_segbuf_add_payload_buffer]:167 page->index 22579168, i_ino 3, i_size 0, nblocks 1764
[  959.804028] NILFS [nilfs_segbuf_add_payload_buffer]:168 bh->b_assoc_buffers.next ffff8802247e3af8, bh->b_assoc_buffers.prev ffff880209839ad8
[  959.804032] NILFS [nilfs_segctor_apply_buffers]:1158 listp ffff880220345ba8, listp->prev ffff880209836a70, listp->next ffff880209836a70
[  959.804035] NILFS [nilfs_segctor_apply_buffers]:1159 bh->b_blocknr 22579168, bh->b_size 4096, bh->b_page ffffea00082afc00
[  959.804044] NILFS [nilfs_segbuf_add_payload_buffer]:167 page->index 22579169, i_ino 3, i_size 0, nblocks 1765
[  959.804047] NILFS [nilfs_segbuf_add_payload_buffer]:168 bh->b_assoc_buffers.next ffff8802247e3af8, bh->b_assoc_buffers.prev ffff880209836ad8
[  959.804051] NILFS [nilfs_segctor_apply_buffers]:1158 listp ffff880220345ba8, listp->prev ffff880220345ba8, listp->next ffff880220345ba8
[  959.804054] NILFS [nilfs_segctor_apply_buffers]:1159 bh->b_blocknr 22579169, bh->b_size 4096, bh->b_page ffffea00082a9b40
[  959.804058] NILFS [nilfs_segctor_apply_buffers]:1160 bh->b_assoc_buffers.next ffff8802247e3af8, bh->b_assoc_buffers.prev ffff880209836ad8
[  959.804092] NILFS [nilfs_segbuf_add_payload_buffer]:167 page->index 22583013, i_ino 0, i_size 242770509824, nblocks 1766
[  959.804096] NILFS [nilfs_segbuf_add_payload_buffer]:168 bh->b_assoc_buffers.next ffff8802247e3af8, bh->b_assoc_buffers.prev ffff880209836a70

It is possible to see that:
(1) It was added 1766 blocks in list.
(2) The last blocks are blocks of inode (ino = 3): #1762, #1763, #1764,
#1765.
(3) The last buffer head has next pointer ffff8802247e3af8 that is
pointed on first buffer head in list (as I understand).

But on the stage of complete write we have such picture:

[  959.848722] NILFS [nilfs_segctor_complete_write]:2224 bh_count 1
[  959.848735] NILFS [nilfs_segctor_complete_write]:2225 bh->b_blocknr 21394345, bh->b_size 4096, bh->b_page ffffea00076ffd80
[  959.848739] NILFS [nilfs_segctor_complete_write]:2226 bh->b_assoc_buffers.next ffff88021de434c0, bh->b_assoc_buffers.prev ffff8802247e3828
[  959.848744] NILFS [nilfs_segctor_complete_write]:2227 page->index 12, i_ino 1005398, i_size 77824
[  959.848752] NILFS [nilfs_segctor_complete_write]:2224 bh_count 2
[  959.848756] NILFS [nilfs_segctor_complete_write]:2225 bh->b_blocknr 21394887, bh->b_size 4096, bh->b_page ffffea00078db900
[  959.848759] NILFS [nilfs_segctor_complete_write]:2226 bh->b_assoc_buffers.next ffff88021de10048, bh->b_assoc_buffers.prev ffff88021de42048
[  959.848763] NILFS [nilfs_segctor_complete_write]:2227 page->index 13, i_ino 1005398, i_size 77824
[  959.848771] NILFS [nilfs_segctor_complete_write]:2224 bh_count 3
[  959.848774] NILFS [nilfs_segctor_complete_write]:2225 bh->b_blocknr 50231152, bh->b_size 4096, bh->b_page ffffea000889ae80
[  959.848778] NILFS [nilfs_segctor_complete_write]:2226 bh->b_assoc_buffers.next ffff880182abab40, bh->b_assoc_buffers.prev ffff88021de434c0
[  959.848782] NILFS [nilfs_segctor_complete_write]:2227 page->index 50231152, i_ino 1005398, i_size 77824

[............................................................................................................................................]

[  959.874242] NILFS [nilfs_segctor_complete_write]:2224 bh_count 1761
[  959.874245] NILFS [nilfs_segctor_complete_write]:2225 bh->b_blocknr 22583012, bh->b_size 4096, bh->b_page ffffea00082a9b40
[  959.874249] NILFS [nilfs_segctor_complete_write]:2226 bh->b_assoc_buffers.next ffff880182b97fb8, bh->b_assoc_buffers.prev ffff880209836ad8
[  959.874252] NILFS [nilfs_segctor_complete_write]:2227 page->index 22583012, i_ino 3, i_size 0
[  959.874255] NILFS [nilfs_segctor_complete_write]:2224 bh_count 1762
[  959.874259] NILFS [nilfs_segctor_complete_write]:2225 bh->b_blocknr 22583013, bh->b_size 4096, bh->b_page ffffea0005fe3080
[  959.874262] NILFS [nilfs_segctor_complete_write]:2226 bh->b_assoc_buffers.next ffff8802247e3af8, bh->b_assoc_buffers.prev ffff880209836a70
[  959.874266] NILFS [nilfs_segctor_complete_write]:2227 page->index 22583013, i_ino 0, i_size 242770509824
[  959.874270] NILFS [nilfs_segctor_complete_write]:2224 bh_count 1763
[  959.874274] NILFS [nilfs_segctor_complete_write]:2225 bh->b_blocknr 22581248, bh->b_size 22583295, bh->b_page 0000000000002b13
[  959.874277] NILFS [nilfs_segctor_complete_write]:2226 bh->b_assoc_buffers.next ffff880182abab40, bh->b_assoc_buffers.prev ffff880182b97fb8


It is possible to see that buffer head {page->index 22583013, i_ino 0,
i_size 242770509824, nblocks 1766} has #1762 index on complete write
phase and namely next item in the list to raise crash because of illegal
page address {bh->b_page 0000000000002b13}. But all content of next item
is very strange. So, I think that it is not list's memory. But it is
more strange that bh->b_assoc_buffers.prev ffff880182b97fb8 of this
corrupted item has address that points on previous good item (this item
was last in the list). As I can see, item #1762 {page->index 22583013,
i_ino 0, i_size 242770509824} has unchanged next and prev pointers
{bh->b_assoc_buffers.next ffff8802247e3af8, bh->b_assoc_buffers.prev
ffff880209836a70}. So, I suspect that we have the reason of the issue
somewhere between add payload buffer and complete write phase. But,
currently, I haven't clear understanding of the whole picture and the
reason of the issue.

I think that it makes sense to try to simplify the issue environment
with the purpose to investigate the issue more deeply. But, maybe, you
can advise something yet.

Do you have any ideas about the reason of the issue? Could you share
your vision of possible reason of the issue? Anyway, I continue
investigation of the issue. But, unfortunately, I don't catch the issue
reason yet.

With the best regards,
Vyacheslav Dubeyko.


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2013-08-09 13:15 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-12  5:24 Kernel Bug: unable to handle kernel paging request Jérôme Poulin
     [not found] ` <CALJXSJquK6YxGKuH97Ec2CTMyJaZrJjOfePSKtgPDm8_9YXzzw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-07-12 18:58   ` Jérôme Poulin
     [not found]     ` <CALJXSJoW9Qpp9t42u_k4cW3gO6qzSPoeCjtQDU3tDKq6TJ=K8Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-07-18 17:30       ` Vyacheslav Dubeyko
     [not found]         ` <F4156394-8A25-4F81-81C3-9921CB00BD92-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2013-07-22 19:11           ` Jérôme Poulin
     [not found]             ` <CALJXSJrj0J_-ZUCOurJXaYhx_wEJwxb2_5OOJjQSSmmP-PQDgg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-07-23 11:15               ` Vyacheslav Dubeyko
2013-07-26 18:13                 ` Jérôme Poulin
     [not found]                   ` <CALJXSJrY22eGkYA76wwL4moAdsjV+_PUtvVO6tt5K16hzMh8xQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-07-27 17:06                     ` Vyacheslav Dubeyko
     [not found]                       ` <CALJXSJqV5nYb_t6GMS0FpWyf1aRehAgpvebwgbJzMJfctf1b2A@mail.gmail.com>
     [not found]                         ` <CALJXSJqV5nYb_t6GMS0FpWyf1aRehAgpvebwgbJzMJfctf1b2A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-07-29 17:30                           ` Vyacheslav Dubeyko
2013-08-09 13:15                           ` Vyacheslav Dubeyko [this message]
2013-08-14 22:38                             ` Ryusuke Konishi
     [not found]                               ` <20130815.073806.260411879.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
2013-08-16  4:49                                 ` Ryusuke Konishi
     [not found]                                   ` <20130816.134934.27810145.konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
2013-08-16  7:03                                     ` Vyacheslav Dubeyko
2013-08-29 19:10                                       ` Vyacheslav Dubeyko
     [not found]                                         ` <72C60256-983E-43D0-9DA1-D4A446B578BB-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org>
2013-08-29 23:37                                           ` Jérôme Poulin
     [not found]                                             ` <CALJXSJpbHN2SQWz0e2gC_hrRKG8EcnV2bWf068GWsuoa8AX5Dw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-08-30  5:43                                               ` Vyacheslav Dubeyko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1376054125.2272.84.camel@slavad-ubuntu \
    --to=slava-yeenwd64clxbdgjk7y7tuq@public.gmane.org \
    --cc=jeromepoulin-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=konishi.ryusuke-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org \
    --cc=linux-nilfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.