* ubifs: assertion fails @ 2014-03-24 6:03 Dolev Raviv 2014-03-31 10:14 ` Artem Bityutskiy 0 siblings, 1 reply; 8+ messages in thread From: Dolev Raviv @ 2014-03-24 6:03 UTC (permalink / raw) To: linux-mtd Hi all, Im doing my first steps learning ubifs and Im trying to understand a something that does not make much sense to me. In fs/ubifs/shrinker.c, at shrink_tnc(), there is an assert condition that shows up every once I a while (after stressing). ubifs_assert(atomic_long_read(&c->clean_zn_cnt) >= 0); In another place in the same file in the function ubifs_shrinker(), I found the following comment: /* * Due to the way UBIFS updates the clean znode counter it may * temporarily be negative. */ Could the assertion condition be wrong? Can anyone share information on what are those times that the counter can be negative? Thanks, Dolev -- QUALCOMM ISRAEL, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: ubifs: assertion fails 2014-03-24 6:03 ubifs: assertion fails Dolev Raviv @ 2014-03-31 10:14 ` Artem Bityutskiy 2014-04-01 5:52 ` Tanya Brokhman 0 siblings, 1 reply; 8+ messages in thread From: Artem Bityutskiy @ 2014-03-31 10:14 UTC (permalink / raw) To: Dolev Raviv; +Cc: linux-mtd On Mon, 2014-03-24 at 06:03 +0000, Dolev Raviv wrote: > Hi all, > > Im doing my first steps learning ubifs and Im trying to understand a > something that does not make much sense to me. > > In fs/ubifs/shrinker.c, at shrink_tnc(), there is an assert condition that > shows up every once I a while (after stressing). > ubifs_assert(atomic_long_read(&c->clean_zn_cnt) >= 0); When this happens, do you then see a storm of similar assertions from other parts of the code? I am trying to understand if this assertion is incorrect, or you really get the accounting screwed when shrinking happens. In the former case, this would probably be a single assertion, on the latter you'd probably see many similar warnings from other code. E.g., when you unmount. > In another place in the same file in the function ubifs_shrinker(), I > found the following comment: > /* > * Due to the way UBIFS updates the clean znode counter it may > * temporarily be negative. > */ Yeah. The key here is this 'c->next'. If it is NULL, the accounting must be correct, if it is not NULL, it may be incorrect. And it will be correct when the on-going commit operation finishes and 'free_obsolete_znodes()' is called. > Could the assertion condition be wrong? Could be, but could also show that there is an accounting error happening when shrinker starts. And I saw misterious errors when shrinker starts working at some point, but did not have time to dig this. So there is at least 1 bug in the shrinker path which I saw. > Can anyone share information on what are those times that the counter can > be negative? When the commit operation starts, it grabs the tnc_mutex, prepares the list of nodes to commit, and release tnc_mutex. Now the accounting is incorrect. When the commit finishes, it grabs the mutex again, does some stuff, and also fixes the accounting. Then drops the mutex. The idea was to make sure that commit does not block I/O. Meaning that you can still write files while commit is going on. -- Best Regards, Artem Bityutskiy ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: ubifs: assertion fails 2014-03-31 10:14 ` Artem Bityutskiy @ 2014-04-01 5:52 ` Tanya Brokhman 2014-04-07 11:43 ` Artem Bityutskiy 0 siblings, 1 reply; 8+ messages in thread From: Tanya Brokhman @ 2014-04-01 5:52 UTC (permalink / raw) To: dedekind1, Dolev Raviv; +Cc: linux-mtd Hi Artem On 3/31/2014 1:14 PM, Artem Bityutskiy wrote: > On Mon, 2014-03-24 at 06:03 +0000, Dolev Raviv wrote: >> Hi all, >> >> Im doing my first steps learning ubifs and Im trying to understand a >> something that does not make much sense to me. >> >> In fs/ubifs/shrinker.c, at shrink_tnc(), there is an assert condition that >> shows up every once I a while (after stressing). >> ubifs_assert(atomic_long_read(&c->clean_zn_cnt) >= 0); > > When this happens, do you then see a storm of similar assertions from > other parts of the code? I am trying to understand if this assertion is > incorrect, or you really get the accounting screwed when shrinking > happens. > > In the former case, this would probably be a single assertion, on the > latter you'd probably see many similar warnings from other code. E.g., > when you unmount. > The log isn't flooded with the above mentioned error, but it does repeat several times. > >> Could the assertion condition be wrong? > > Could be, but could also show that there is an accounting error > happening when shrinker starts. > > And I saw misterious errors when shrinker starts working at some point, > but did not have time to dig this. So there is at least 1 bug in the > shrinker path which I saw. Is there any way we can help in debugging and fixing this? Also, we're running on a 3.10 based kernel and I saw a lot of patches that change the shrinker after 3.10 on linux-next. What kernel version did you see the shrinker errors on? > >> Can anyone share information on what are those times that the counter can >> be negative? > > When the commit operation starts, it grabs the tnc_mutex, prepares the > list of nodes to commit, and release tnc_mutex. Now the accounting is > incorrect. When the commit finishes, it grabs the mutex again, does some > stuff, and also fixes the accounting. Then drops the mutex. > > The idea was to make sure that commit does not block I/O. Meaning that > you can still write files while commit is going on. > Thanks, Tanya Brokhman -- QUALCOMM ISRAEL, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: ubifs: assertion fails 2014-04-01 5:52 ` Tanya Brokhman @ 2014-04-07 11:43 ` Artem Bityutskiy 2014-04-27 12:12 ` Dolev Raviv 0 siblings, 1 reply; 8+ messages in thread From: Artem Bityutskiy @ 2014-04-07 11:43 UTC (permalink / raw) To: Tanya Brokhman; +Cc: Dolev Raviv, linux-mtd On Tue, 2014-04-01 at 08:52 +0300, Tanya Brokhman wrote: > The log isn't flooded with the above mentioned error, but it does repeat > several times. Even though you do a lot of I/O operations? If the accounting becomes incorrect, I think you'd see a warning for each and every I/O operations. Also, do you have several UBIFS file-systems mounted? > Is there any way we can help in debugging and fixing this? Also, we're > running on a 3.10 based kernel and I saw a lot of patches that change > the shrinker after 3.10 on linux-next. What kernel version did you see > the shrinker errors on? Well, you just need to stress the shrinker. I did it by simply adding a hack to ubifs module init which kmalloc()'ed a lot of memory. Not too-much, but enough to cause some pressure when doing a lot of I/O operations and get UBIFS shrinker be invoked. This took me some amount of time to get a good hack. I use a normal PC with some Fedora, and nandsime. This makes it a lot easier to debug, comparing with doing this on a real target. After all, shrinker is a generic component. I think that PC had 4GB of RAM. I do not remember how much I allocated in my hack. I think I used older version than 3.10 back then. -- Best Regards, Artem Bityutskiy ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: ubifs: assertion fails 2014-04-07 11:43 ` Artem Bityutskiy @ 2014-04-27 12:12 ` Dolev Raviv 2014-05-29 1:55 ` hujianyang 0 siblings, 1 reply; 8+ messages in thread From: Dolev Raviv @ 2014-04-27 12:12 UTC (permalink / raw) To: dedekind1; +Cc: Dolev Raviv, linux-mtd, Tanya Brokhman > On Tue, 2014-04-01 at 08:52 +0300, Tanya Brokhman wrote: >> The log isn't flooded with the above mentioned error, but it does repeat >> several times. > > Even though you do a lot of I/O operations? > > If the accounting becomes incorrect, I think you'd see a warning for > each and every I/O operations. Yes, the log seems clean to me. The error actually occurs while stressing with iozone benchmark. Could the error be a result of the corner case you described in your previous mail. > > Also, do you have several UBIFS file-systems mounted? > Yes, we have prepared images based on ubifs. So almost all the mounted partitions are ubifs. >> Is there any way we can help in debugging and fixing this? Also, we're >> running on a 3.10 based kernel and I saw a lot of patches that change >> the shrinker after 3.10 on linux-next. What kernel version did you see >> the shrinker errors on? > > Well, you just need to stress the shrinker. I did it by simply adding a > hack to ubifs module init which kmalloc()'ed a lot of memory. Not > too-much, but enough to cause some pressure when doing a lot of I/O > operations and get UBIFS shrinker be invoked. This took me some amount > of time to get a good hack. I use a normal PC with some Fedora, and > nandsime. This makes it a lot easier to debug, comparing with doing this > on a real target. After all, shrinker is a generic component. I think > that PC had 4GB of RAM. I do not remember how much I allocated in my > hack. I'll probably will take a look at this once I finish other more urgent tasks. > > I think I used older version than 3.10 back then. > > -- > Best Regards, > Artem Bityutskiy > > -- QUALCOMM ISRAEL, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: ubifs: assertion fails 2014-04-27 12:12 ` Dolev Raviv @ 2014-05-29 1:55 ` hujianyang 2014-05-29 7:24 ` Dolev Raviv 2014-05-29 7:42 ` Artem Bityutskiy 0 siblings, 2 replies; 8+ messages in thread From: hujianyang @ 2014-05-29 1:55 UTC (permalink / raw) To: Dolev Raviv, Artem Bityutskiy; +Cc: linux-mtd Hi Dolev and Artem, I hit the same assert failed in Kernel v3.10 shows like this: [ 9641.164028] UBIFS assert failed in shrink_tnc at 131 (pid 13297) [ 9641.234078] CPU: 1 PID: 13297 Comm: mmap.test Tainted: G O 3.10.40 #1 [ 9641.234116] [<c0011a6c>] (unwind_backtrace+0x0/0x12c) from [<c000d0b0>] (show_stack+0x20/0x24) [ 9641.234137] [<c000d0b0>] (show_stack+0x20/0x24) from [<c0311134>] (dump_stack+0x20/0x28) [ 9641.234188] [<c0311134>] (dump_stack+0x20/0x28) from [<bf22425c>] (shrink_tnc_trees+0x25c/0x350 [ubifs]) [ 9641.234265] [<bf22425c>] (shrink_tnc_trees+0x25c/0x350 [ubifs]) from [<bf2245ac>] (ubifs_shrinker+0x25c/0x310 [ubifs]) [ 9641.234307] [<bf2245ac>] (ubifs_shrinker+0x25c/0x310 [ubifs]) from [<c00cdad8>] (shrink_slab+0x1d4/0x2f8) [ 9641.234327] [<c00cdad8>] (shrink_slab+0x1d4/0x2f8) from [<c00d03d0>] (do_try_to_free_pages+0x300/0x544) [ 9641.234344] [<c00d03d0>] (do_try_to_free_pages+0x300/0x544) from [<c00d0a44>] (try_to_free_pages+0x2d0/0x398) [ 9641.234363] [<c00d0a44>] (try_to_free_pages+0x2d0/0x398) from [<c00c6a60>] (__alloc_pages_nodemask+0x494/0x7e8) [ 9641.234382] [<c00c6a60>] (__alloc_pages_nodemask+0x494/0x7e8) from [<c00f62d8>] (new_slab+0x78/0x238) [ 9641.234400] [<c00f62d8>] (new_slab+0x78/0x238) from [<c031081c>] (__slab_alloc.constprop.42+0x1a4/0x50c) [ 9641.234419] [<c031081c>] (__slab_alloc.constprop.42+0x1a4/0x50c) from [<c00f80e8>] (kmem_cache_alloc_trace+0x54/0x188) [ 9641.234459] [<c00f80e8>] (kmem_cache_alloc_trace+0x54/0x188) from [<bf227908>] (do_readpage+0x168/0x468 [ubifs]) [ 9641.234553] [<bf227908>] (do_readpage+0x168/0x468 [ubifs]) from [<bf2296a0>] (ubifs_readpage+0x424/0x464 [ubifs]) [ 9641.234606] [<bf2296a0>] (ubifs_readpage+0x424/0x464 [ubifs]) from [<c00c17c0>] (filemap_fault+0x304/0x418) [ 9641.234638] [<c00c17c0>] (filemap_fault+0x304/0x418) from [<c00de694>] (__do_fault+0xd4/0x530) [ 9641.234665] [<c00de694>] (__do_fault+0xd4/0x530) from [<c00e10c0>] (handle_pte_fault+0x480/0xf54) [ 9641.234690] [<c00e10c0>] (handle_pte_fault+0x480/0xf54) from [<c00e2bf8>] (handle_mm_fault+0x140/0x184) [ 9641.234716] [<c00e2bf8>] (handle_mm_fault+0x140/0x184) from [<c0316688>] (do_page_fault+0x150/0x3ac) [ 9641.234737] [<c0316688>] (do_page_fault+0x150/0x3ac) from [<c000842c>] (do_DataAbort+0x3c/0xa0) [ 9641.234759] [<c000842c>] (do_DataAbort+0x3c/0xa0) from [<c0314e38>] (__dabt_usr+0x38/0x40) Did you fix it in recent patches? If not, I will take some times on it. I hit this when I was doing stress test, only once and no other failed messages. I haven't do umount or rmmod until now so I don't know further information about it. Thanks! Hu ^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: ubifs: assertion fails 2014-05-29 1:55 ` hujianyang @ 2014-05-29 7:24 ` Dolev Raviv 2014-05-29 7:42 ` Artem Bityutskiy 1 sibling, 0 replies; 8+ messages in thread From: Dolev Raviv @ 2014-05-29 7:24 UTC (permalink / raw) To: 'hujianyang', 'Artem Bityutskiy'; +Cc: 'linux-mtd' Hi, I still see this. I was blocked recently by other tasks. I guess I'll get to it in the near future. If you have an insight on this it will be very helpful. Thanks, Dolev -- QUALCOMM ISRAEL, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -----Original Message----- From: hujianyang [mailto:hujianyang@huawei.com] Sent: Thursday, May 29, 2014 4:56 AM To: Dolev Raviv; Artem Bityutskiy Cc: linux-mtd Subject: Re: ubifs: assertion fails Hi Dolev and Artem, I hit the same assert failed in Kernel v3.10 shows like this: [ 9641.164028] UBIFS assert failed in shrink_tnc at 131 (pid 13297) [ 9641.234078] CPU: 1 PID: 13297 Comm: mmap.test Tainted: G O 3.10.40 #1 [ 9641.234116] [<c0011a6c>] (unwind_backtrace+0x0/0x12c) from [<c000d0b0>] (show_stack+0x20/0x24) [ 9641.234137] [<c000d0b0>] (show_stack+0x20/0x24) from [<c0311134>] (dump_stack+0x20/0x28) [ 9641.234188] [<c0311134>] (dump_stack+0x20/0x28) from [<bf22425c>] (shrink_tnc_trees+0x25c/0x350 [ubifs]) [ 9641.234265] [<bf22425c>] (shrink_tnc_trees+0x25c/0x350 [ubifs]) from [<bf2245ac>] (ubifs_shrinker+0x25c/0x310 [ubifs]) [ 9641.234307] [<bf2245ac>] (ubifs_shrinker+0x25c/0x310 [ubifs]) from [<c00cdad8>] (shrink_slab+0x1d4/0x2f8) [ 9641.234327] [<c00cdad8>] (shrink_slab+0x1d4/0x2f8) from [<c00d03d0>] (do_try_to_free_pages+0x300/0x544) [ 9641.234344] [<c00d03d0>] (do_try_to_free_pages+0x300/0x544) from [<c00d0a44>] (try_to_free_pages+0x2d0/0x398) [ 9641.234363] [<c00d0a44>] (try_to_free_pages+0x2d0/0x398) from [<c00c6a60>] (__alloc_pages_nodemask+0x494/0x7e8) [ 9641.234382] [<c00c6a60>] (__alloc_pages_nodemask+0x494/0x7e8) from [<c00f62d8>] (new_slab+0x78/0x238) [ 9641.234400] [<c00f62d8>] (new_slab+0x78/0x238) from [<c031081c>] (__slab_alloc.constprop.42+0x1a4/0x50c) [ 9641.234419] [<c031081c>] (__slab_alloc.constprop.42+0x1a4/0x50c) from [<c00f80e8>] (kmem_cache_alloc_trace+0x54/0x188) [ 9641.234459] [<c00f80e8>] (kmem_cache_alloc_trace+0x54/0x188) from [<bf227908>] (do_readpage+0x168/0x468 [ubifs]) [ 9641.234553] [<bf227908>] (do_readpage+0x168/0x468 [ubifs]) from [<bf2296a0>] (ubifs_readpage+0x424/0x464 [ubifs]) [ 9641.234606] [<bf2296a0>] (ubifs_readpage+0x424/0x464 [ubifs]) from [<c00c17c0>] (filemap_fault+0x304/0x418) [ 9641.234638] [<c00c17c0>] (filemap_fault+0x304/0x418) from [<c00de694>] (__do_fault+0xd4/0x530) [ 9641.234665] [<c00de694>] (__do_fault+0xd4/0x530) from [<c00e10c0>] (handle_pte_fault+0x480/0xf54) [ 9641.234690] [<c00e10c0>] (handle_pte_fault+0x480/0xf54) from [<c00e2bf8>] (handle_mm_fault+0x140/0x184) [ 9641.234716] [<c00e2bf8>] (handle_mm_fault+0x140/0x184) from [<c0316688>] (do_page_fault+0x150/0x3ac) [ 9641.234737] [<c0316688>] (do_page_fault+0x150/0x3ac) from [<c000842c>] (do_DataAbort+0x3c/0xa0) [ 9641.234759] [<c000842c>] (do_DataAbort+0x3c/0xa0) from [<c0314e38>] (__dabt_usr+0x38/0x40) Did you fix it in recent patches? If not, I will take some times on it. I hit this when I was doing stress test, only once and no other failed messages. I haven't do umount or rmmod until now so I don't know further information about it. Thanks! Hu ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: ubifs: assertion fails 2014-05-29 1:55 ` hujianyang 2014-05-29 7:24 ` Dolev Raviv @ 2014-05-29 7:42 ` Artem Bityutskiy 1 sibling, 0 replies; 8+ messages in thread From: Artem Bityutskiy @ 2014-05-29 7:42 UTC (permalink / raw) To: hujianyang; +Cc: Dolev Raviv, linux-mtd On Thu, 2014-05-29 at 09:55 +0800, hujianyang wrote: > Hi Dolev and Artem, > > I hit the same assert failed in Kernel v3.10 shows like this: > > [ 9641.164028] UBIFS assert failed in shrink_tnc at 131 (pid 13297) > [ 9641.234078] CPU: 1 PID: 13297 Comm: mmap.test Tainted: G O 3.10.40 #1 > [ 9641.234116] [<c0011a6c>] (unwind_backtrace+0x0/0x12c) from [<c000d0b0>] (show_stack+0x20/0x24) > [ 9641.234137] [<c000d0b0>] (show_stack+0x20/0x24) from [<c0311134>] (dump_stack+0x20/0x28) > [ 9641.234188] [<c0311134>] (dump_stack+0x20/0x28) from [<bf22425c>] (shrink_tnc_trees+0x25c/0x350 [ubifs]) > [ 9641.234265] [<bf22425c>] (shrink_tnc_trees+0x25c/0x350 [ubifs]) from [<bf2245ac>] (ubifs_shrinker+0x25c/0x310 [ubifs]) > [ 9641.234307] [<bf2245ac>] (ubifs_shrinker+0x25c/0x310 [ubifs]) from [<c00cdad8>] (shrink_slab+0x1d4/0x2f8) > [ 9641.234327] [<c00cdad8>] (shrink_slab+0x1d4/0x2f8) from [<c00d03d0>] (do_try_to_free_pages+0x300/0x544) > [ 9641.234344] [<c00d03d0>] (do_try_to_free_pages+0x300/0x544) from [<c00d0a44>] (try_to_free_pages+0x2d0/0x398) > [ 9641.234363] [<c00d0a44>] (try_to_free_pages+0x2d0/0x398) from [<c00c6a60>] (__alloc_pages_nodemask+0x494/0x7e8) > [ 9641.234382] [<c00c6a60>] (__alloc_pages_nodemask+0x494/0x7e8) from [<c00f62d8>] (new_slab+0x78/0x238) > [ 9641.234400] [<c00f62d8>] (new_slab+0x78/0x238) from [<c031081c>] (__slab_alloc.constprop.42+0x1a4/0x50c) > [ 9641.234419] [<c031081c>] (__slab_alloc.constprop.42+0x1a4/0x50c) from [<c00f80e8>] (kmem_cache_alloc_trace+0x54/0x188) > [ 9641.234459] [<c00f80e8>] (kmem_cache_alloc_trace+0x54/0x188) from [<bf227908>] (do_readpage+0x168/0x468 [ubifs]) > [ 9641.234553] [<bf227908>] (do_readpage+0x168/0x468 [ubifs]) from [<bf2296a0>] (ubifs_readpage+0x424/0x464 [ubifs]) > [ 9641.234606] [<bf2296a0>] (ubifs_readpage+0x424/0x464 [ubifs]) from [<c00c17c0>] (filemap_fault+0x304/0x418) > [ 9641.234638] [<c00c17c0>] (filemap_fault+0x304/0x418) from [<c00de694>] (__do_fault+0xd4/0x530) > [ 9641.234665] [<c00de694>] (__do_fault+0xd4/0x530) from [<c00e10c0>] (handle_pte_fault+0x480/0xf54) > [ 9641.234690] [<c00e10c0>] (handle_pte_fault+0x480/0xf54) from [<c00e2bf8>] (handle_mm_fault+0x140/0x184) > [ 9641.234716] [<c00e2bf8>] (handle_mm_fault+0x140/0x184) from [<c0316688>] (do_page_fault+0x150/0x3ac) > [ 9641.234737] [<c0316688>] (do_page_fault+0x150/0x3ac) from [<c000842c>] (do_DataAbort+0x3c/0xa0) > [ 9641.234759] [<c000842c>] (do_DataAbort+0x3c/0xa0) from [<c0314e38>] (__dabt_usr+0x38/0x40) > > Did you fix it in recent patches? If not, I will take some times > on it. I did not. I know there are issues in shrinker, but I never have time to investigate them. For example, I remember I tried to emulate memory pressure by limiting the amount of memory available to Linux using 'mem=' boot parameter, and writing a fake module which just keeps allocating and then freeing a lot of memory using kmalloc in a loop with a small delay. And I was running UBIFS stress test. This way I made sure UBIFS shrinker is called often. And I saw some oopes, but did not have time to investigate. And this was 3 years ago, so may be this is not longer the case. -- Best Regards, Artem Bityutskiy ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2014-05-29 7:43 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-03-24 6:03 ubifs: assertion fails Dolev Raviv 2014-03-31 10:14 ` Artem Bityutskiy 2014-04-01 5:52 ` Tanya Brokhman 2014-04-07 11:43 ` Artem Bityutskiy 2014-04-27 12:12 ` Dolev Raviv 2014-05-29 1:55 ` hujianyang 2014-05-29 7:24 ` Dolev Raviv 2014-05-29 7:42 ` Artem Bityutskiy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).