All of lore.kernel.org
 help / color / mirror / Atom feed
* UBIFS question: Power-cuts after ubifs_leb_unmap()
@ 2018-07-09 10:11 Richard Weinberger
  2018-07-09 12:21 ` Adrian Hunter
  2018-07-24  8:01 ` Artem Bityutskiy
  0 siblings, 2 replies; 10+ messages in thread
From: Richard Weinberger @ 2018-07-09 10:11 UTC (permalink / raw)
  To: Artem Bityutskiy, Adrian Hunter; +Cc: linux-mtd

Artem, Adrian,

While playing with a new UBI/UBIFS test framework I managed to hit this error, 
with lprops self-checks enabled:

[ 2412.268964] UBIFS error (ubi0:0 pid 708): scan_check_cb: bad accounting of 
LEB 11: free 0, dirty 118072 flags 0x1, should be free 126976, dirty 0

LEB 11 is unmapped but LPT still thinks that some data is used,
lp->free + lp >dirty < leb-size.
Even without lprobs self-checks, the same filesystem will later hit this 
assertion in ubifs_garbage_collect_leb():

ubifs_assert(!list_empty(&sleb->nodes));

The assert makes sure that the LEB actually contains nodes.
ubifs_garbage_collect_leb() handles the special case lp->free + lp->dirty == 
c->leb_size.
But not lp->free + lp->dirty < leb-size.

Now I'm not sure where to fix that, maybe you can remember some design 
decisions.
1. Shall we massage ubifs_garbage_collect_leb() to deal with this special case 
too?
2. Is it already a bug when this case happens?

>From reviewing the code, I think the said situation can arise when we face 
power-cut
in ubifs_garbage_collect_leb():

if (snod->type == UBIFS_IDX_NODE) {
	...
} else {
	...

	err = ubifs_change_one_lp(c, lnum, c->leb_size, 0, 0, 0, 0);

	...

	err = ubifs_leb_unmap(c, lnum);

	// POWER CUT
}

We mark the LEB as free and unmap it.
ubifs_change_one_lp() does not immediately write a new LPT, if we lose power 
right after ubifs_leb_unmap() it can happen that the LEB already got erased 
but the LPT has the old accounting information.
UBIFS seems to expect such situations only for
lp->free + lp->dirty == c->leb_size

What do you think?

Thanks,
//richard

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UBIFS question: Power-cuts after ubifs_leb_unmap()
  2018-07-09 10:11 UBIFS question: Power-cuts after ubifs_leb_unmap() Richard Weinberger
@ 2018-07-09 12:21 ` Adrian Hunter
  2018-07-09 13:11   ` Richard Weinberger
  2018-07-24  8:01 ` Artem Bityutskiy
  1 sibling, 1 reply; 10+ messages in thread
From: Adrian Hunter @ 2018-07-09 12:21 UTC (permalink / raw)
  To: Richard Weinberger, Artem Bityutskiy; +Cc: linux-mtd

On 09/07/18 13:11, Richard Weinberger wrote:
> Artem, Adrian,
> 
> While playing with a new UBI/UBIFS test framework I managed to hit this error, 
> with lprops self-checks enabled:
> 
> [ 2412.268964] UBIFS error (ubi0:0 pid 708): scan_check_cb: bad accounting of 
> LEB 11: free 0, dirty 118072 flags 0x1, should be free 126976, dirty 0
> 
> LEB 11 is unmapped but LPT still thinks that some data is used,
> lp->free + lp >dirty < leb-size.
> Even without lprobs self-checks, the same filesystem will later hit this 
> assertion in ubifs_garbage_collect_leb():
> 
> ubifs_assert(!list_empty(&sleb->nodes));
> 
> The assert makes sure that the LEB actually contains nodes.
> ubifs_garbage_collect_leb() handles the special case lp->free + lp->dirty == 
> c->leb_size.
> But not lp->free + lp->dirty < leb-size.
> 
> Now I'm not sure where to fix that, maybe you can remember some design 
> decisions.
> 1. Shall we massage ubifs_garbage_collect_leb() to deal with this special case 
> too?
> 2. Is it already a bug when this case happens?
> 
>>From reviewing the code, I think the said situation can arise when we face 
> power-cut
> in ubifs_garbage_collect_leb():
> 
> if (snod->type == UBIFS_IDX_NODE) {
> 	...
> } else {
> 	...
> 
> 	err = ubifs_change_one_lp(c, lnum, c->leb_size, 0, 0, 0, 0);
> 
> 	...
> 
> 	err = ubifs_leb_unmap(c, lnum);
> 
> 	// POWER CUT
> }
> 
> We mark the LEB as free and unmap it.
> ubifs_change_one_lp() does not immediately write a new LPT, if we lose power 
> right after ubifs_leb_unmap() it can happen that the LEB already got erased 
> but the LPT has the old accounting information.

Doesn't GC copy the nodes into the journal, so after the journal is
replayed, the old nodes will be dirtied and lprops will be correct again.

> UBIFS seems to expect such situations only for
> lp->free + lp->dirty == c->leb_size
> 
> What do you think?
> 
> Thanks,
> //richard
> 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UBIFS question: Power-cuts after ubifs_leb_unmap()
  2018-07-09 12:21 ` Adrian Hunter
@ 2018-07-09 13:11   ` Richard Weinberger
  2018-07-10  6:58     ` Adrian Hunter
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Weinberger @ 2018-07-09 13:11 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Artem Bityutskiy, linux-mtd

Am Montag, 9. Juli 2018, 14:21:33 CEST schrieb Adrian Hunter:
> On 09/07/18 13:11, Richard Weinberger wrote:
> > Artem, Adrian,
> > 
> > While playing with a new UBI/UBIFS test framework I managed to hit this error, 
> > with lprops self-checks enabled:
> > 
> > [ 2412.268964] UBIFS error (ubi0:0 pid 708): scan_check_cb: bad accounting of 
> > LEB 11: free 0, dirty 118072 flags 0x1, should be free 126976, dirty 0
> > 
> > LEB 11 is unmapped but LPT still thinks that some data is used,
> > lp->free + lp >dirty < leb-size.
> > Even without lprobs self-checks, the same filesystem will later hit this 
> > assertion in ubifs_garbage_collect_leb():
> > 
> > ubifs_assert(!list_empty(&sleb->nodes));
> > 
> > The assert makes sure that the LEB actually contains nodes.
> > ubifs_garbage_collect_leb() handles the special case lp->free + lp->dirty == 
> > c->leb_size.
> > But not lp->free + lp->dirty < leb-size.
> > 
> > Now I'm not sure where to fix that, maybe you can remember some design 
> > decisions.
> > 1. Shall we massage ubifs_garbage_collect_leb() to deal with this special case 
> > too?
> > 2. Is it already a bug when this case happens?
> > 
> >>From reviewing the code, I think the said situation can arise when we face 
> > power-cut
> > in ubifs_garbage_collect_leb():
> > 
> > if (snod->type == UBIFS_IDX_NODE) {
> > 	...
> > } else {
> > 	...
> > 
> > 	err = ubifs_change_one_lp(c, lnum, c->leb_size, 0, 0, 0, 0);
> > 
> > 	...
> > 
> > 	err = ubifs_leb_unmap(c, lnum);
> > 
> > 	// POWER CUT
> > }
> > 
> > We mark the LEB as free and unmap it.
> > ubifs_change_one_lp() does not immediately write a new LPT, if we lose power 
> > right after ubifs_leb_unmap() it can happen that the LEB already got erased 
> > but the LPT has the old accounting information.
> 
> Doesn't GC copy the nodes into the journal, so after the journal is
> replayed, the old nodes will be dirtied and lprops will be correct again.

Yes, this is the theory. But the assert proves that something is not as we expect it. ;-\

Thanks,
//richard

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UBIFS question: Power-cuts after ubifs_leb_unmap()
  2018-07-09 13:11   ` Richard Weinberger
@ 2018-07-10  6:58     ` Adrian Hunter
  0 siblings, 0 replies; 10+ messages in thread
From: Adrian Hunter @ 2018-07-10  6:58 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: Artem Bityutskiy, linux-mtd

On 09/07/18 16:11, Richard Weinberger wrote:
> Am Montag, 9. Juli 2018, 14:21:33 CEST schrieb Adrian Hunter:
>> On 09/07/18 13:11, Richard Weinberger wrote:
>>> Artem, Adrian,
>>>
>>> While playing with a new UBI/UBIFS test framework I managed to hit this error, 
>>> with lprops self-checks enabled:
>>>
>>> [ 2412.268964] UBIFS error (ubi0:0 pid 708): scan_check_cb: bad accounting of 
>>> LEB 11: free 0, dirty 118072 flags 0x1, should be free 126976, dirty 0
>>>
>>> LEB 11 is unmapped but LPT still thinks that some data is used,
>>> lp->free + lp >dirty < leb-size.
>>> Even without lprobs self-checks, the same filesystem will later hit this 
>>> assertion in ubifs_garbage_collect_leb():
>>>
>>> ubifs_assert(!list_empty(&sleb->nodes));
>>>
>>> The assert makes sure that the LEB actually contains nodes.
>>> ubifs_garbage_collect_leb() handles the special case lp->free + lp->dirty == 
>>> c->leb_size.
>>> But not lp->free + lp->dirty < leb-size.
>>>
>>> Now I'm not sure where to fix that, maybe you can remember some design 
>>> decisions.
>>> 1. Shall we massage ubifs_garbage_collect_leb() to deal with this special case 
>>> too?
>>> 2. Is it already a bug when this case happens?

If it is caused by a power-cut then the error would appear during the next
mount, probably due to a failure in the replay to correctly account for the
dirty space.

The downside of papering over the problem in GC is the possibility that that
then hides a real problem.

>>>
>>> >From reviewing the code, I think the said situation can arise when we face 
>>> power-cut
>>> in ubifs_garbage_collect_leb():
>>>
>>> if (snod->type == UBIFS_IDX_NODE) {
>>> 	...
>>> } else {
>>> 	...
>>>
>>> 	err = ubifs_change_one_lp(c, lnum, c->leb_size, 0, 0, 0, 0);
>>>
>>> 	...
>>>
>>> 	err = ubifs_leb_unmap(c, lnum);
>>>
>>> 	// POWER CUT
>>> }
>>>
>>> We mark the LEB as free and unmap it.
>>> ubifs_change_one_lp() does not immediately write a new LPT, if we lose power 
>>> right after ubifs_leb_unmap() it can happen that the LEB already got erased 
>>> but the LPT has the old accounting information.
>>
>> Doesn't GC copy the nodes into the journal, so after the journal is
>> replayed, the old nodes will be dirtied and lprops will be correct again.
> 
> Yes, this is the theory. But the assert proves that something is not as we expect it. ;-\

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UBIFS question: Power-cuts after ubifs_leb_unmap()
  2018-07-09 10:11 UBIFS question: Power-cuts after ubifs_leb_unmap() Richard Weinberger
  2018-07-09 12:21 ` Adrian Hunter
@ 2018-07-24  8:01 ` Artem Bityutskiy
  2018-07-25 22:29   ` Richard Weinberger
  1 sibling, 1 reply; 10+ messages in thread
From: Artem Bityutskiy @ 2018-07-24  8:01 UTC (permalink / raw)
  To: Richard Weinberger, Adrian Hunter; +Cc: linux-mtd

Hi, I looked at this quickly and talked to Adrian. I cannot solve this
for you but here are some thoughts.

On Mon, 2018-07-09 at 12:11 +0200, Richard Weinberger wrote:
> Artem, Adrian,
> 
> While playing with a new UBI/UBIFS test framework I managed to hit
> this error, 
> with lprops self-checks enabled:
> 
> [ 2412.268964] UBIFS error (ubi0:0 pid 708): scan_check_cb: bad
> accounting of 
> LEB 11: free 0, dirty 118072 flags 0x1, should be free 126976, dirty
> 0

So it has 126976 - 118072 = 8904 of used space.

> LEB 11 is unmapped but LPT still thinks that some data is used,
> lp->free + lp >dirty < leb-size.

And the problem is this used space. We think that most probably the bug
is in replay and these 9804 bytes should have been marked as dirty
during replay, but they were not. If they were marked as dirty, we
probably would not have problem.

It is unlikely this is a bug in GC. The recovery path is the most
tricky thing, and recovery happens in slightly different order, and if
we were expecting a but, we'd suspect the recovery replay.

> Even without lprobs self-checks, the same filesystem will later hit
> this 
> assertion in ubifs_garbage_collect_leb():
> 
> ubifs_assert(!list_empty(&sleb->nodes));
> 
> The assert makes sure that the LEB actually contains nodes.
> ubifs_garbage_collect_leb() handles the special case lp->free + lp-
> >dirty == 
> c->leb_size.
> But not lp->free + lp->dirty < leb-size.

> Now I'm not sure where to fix that, maybe you can remember some
> design 
> decisions.
> 1. Shall we massage ubifs_garbage_collect_leb() to deal with this
> special case 
> too?

lp->free + lp->dirty < leb-size means there is used data, and we GC by
moving. So this is not a special case.

> 2. Is it already a bug when this case happens?

I do not think so.

> From reviewing the code, I think the said situation can arise when we
> face 
> power-cut
> in ubifs_garbage_collect_leb():
> 
> if (snod->type == UBIFS_IDX_NODE) {
> 	...
> } else {
> 	...
> 

OK, we are here. This means that lprops in memory belives this LEB has
no used data, only dirt and free space. It can be recycled.

Now, if we have power cut right here, next time we'll go to recovery
replay. If the replay is correct, we'll end up with same lprops in
memory, and we'll recycle the LEB anyway.

> 	err = ubifs_change_one_lp(c, lnum, c->leb_size, 0, 0, 0, 0);

We change lprops in memory.

> 
> 	...
> 
> 	err = ubifs_leb_unmap(c, lnum);
> 
> 	// POWER CUT
> }
> 
> We mark the LEB as free and unmap it.
> ubifs_change_one_lp() does not immediately write a new LPT, if we
> lose power 
> right after ubifs_leb_unmap() it can happen that the LEB already got
> erased 
> but the LPT has the old accounting information.

Yes, and during recovery we'll notice that lprops says an LEB is
freeable (has no used data), and we'll adjust lprops and probably re-
unmap the LEB (I not remember 100%).

In the error you showed - there is 8K of used, and this is the problem.

> What do you think?

1. If you believe some specific place like the one you pointed contains
a bug, then emulate a power cut in that place and try to reproduce the
bug.

2. Try to focus on recovery replay, this is the main suspect. Try to
figure out what is that used space in that LEB?

-- 
Best Regards,
Artem Bityutskiy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UBIFS question: Power-cuts after ubifs_leb_unmap()
  2018-07-24  8:01 ` Artem Bityutskiy
@ 2018-07-25 22:29   ` Richard Weinberger
  2018-07-30  6:55     ` Adrian Hunter
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Weinberger @ 2018-07-25 22:29 UTC (permalink / raw)
  To: artem.bityutskiy; +Cc: Adrian Hunter, linux-mtd

Artem, Adrian,

Am Dienstag, 24. Juli 2018, 10:01:30 CEST schrieb Artem Bityutskiy:
> Hi, I looked at this quickly and talked to Adrian. I cannot solve this
> for you but here are some thoughts.

Both of your thoughts are much appreciated!

> On Mon, 2018-07-09 at 12:11 +0200, Richard Weinberger wrote:
> > Artem, Adrian,
> > 
> > While playing with a new UBI/UBIFS test framework I managed to hit
> > this error, 
> > with lprops self-checks enabled:
> > 
> > [ 2412.268964] UBIFS error (ubi0:0 pid 708): scan_check_cb: bad
> > accounting of 
> > LEB 11: free 0, dirty 118072 flags 0x1, should be free 126976, dirty
> > 0
> 
> So it has 126976 - 118072 = 8904 of used space.
> 

[...]

> 1. If you believe some specific place like the one you pointed contains
> a bug, then emulate a power cut in that place and try to reproduce the
> bug.
> 
> 2. Try to focus on recovery replay, this is the main suspect. Try to
> figure out what is that used space in that LEB?

After spending the third night in a row staring at hexdumps, I think I know
what is going on.

I see that LEB 11 recently got unmapped, but the LPT still accounts 8904 bytes
as used.
To double check that value I've analyzed the index tree. To my surprise it was 
referencing nodes at LEB 11!
Summing up the length of these nodes is exactly 8904. So, the LPT is correct.

All nodes on LEB 11 that are referenced by the index tree are of type 
UBIFS_INO_NODE and have UBIFS_XATTR_FL set.
So, our problem is xattr related. We forget xattr inodes and the free space 
accounting goes nuts sooner or later.

Inspecting the journal gave more details. Only buds of head type BASEHD and 
DATAHD are present. No GCHD. Therefore GC was not interrupted.
With the IO transaction log from before the power-cut I was able to figure 
where the orphaned xattr inodes belong to.
And indeed, the host inode of said xattr is in the journal for deletion.

In theory during replay, removal of the host inode should also remove all
xattr inodes and we are good.
So the question is, why fails UBIFS to remove the xattr inodes even if we have 
the host inode in the journal?

Another round of digging though the index uncovered the problem. The 
UBIFS_XENT_NODE nodes for all orphaned xattr inodes sat on LEB 11.
This means upon replay, when UBIFS tries to locate all xattr entries
of the host inode, it faces the dangling branch case and the loop in 
ubifs_tnc_remove_ino() terminates with -ENOENT. The host inode will be 
removed, but the xattr inodes stay.

Therefore the root of the problem is that UBIFS places only the host inode 
into the journal upon removal of a file with xattrs and assumes that upon 
replay we can discover all xattr inodes. That will fail if due to GC the LEB 
got already unmapped.

As intermediate solution I suggest adding also xattr inodes to the journal
upon unlink. This will make unlink more expensive but for now it is IMHO ok
until a more sophisticated solution is found.
Files usually have only a few xattr, so the cost is not that bad.

What do you guys think? Does my analysis make sense to you?

Thanks,
//richard

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UBIFS question: Power-cuts after ubifs_leb_unmap()
  2018-07-25 22:29   ` Richard Weinberger
@ 2018-07-30  6:55     ` Adrian Hunter
  2018-07-30  7:28       ` Richard Weinberger
  0 siblings, 1 reply; 10+ messages in thread
From: Adrian Hunter @ 2018-07-30  6:55 UTC (permalink / raw)
  To: Richard Weinberger, artem.bityutskiy; +Cc: linux-mtd

On 26/07/18 01:29, Richard Weinberger wrote:
> Artem, Adrian,
> 
> Am Dienstag, 24. Juli 2018, 10:01:30 CEST schrieb Artem Bityutskiy:
>> Hi, I looked at this quickly and talked to Adrian. I cannot solve this
>> for you but here are some thoughts.
> 
> Both of your thoughts are much appreciated!
> 
>> On Mon, 2018-07-09 at 12:11 +0200, Richard Weinberger wrote:
>>> Artem, Adrian,
>>>
>>> While playing with a new UBI/UBIFS test framework I managed to hit
>>> this error, 
>>> with lprops self-checks enabled:
>>>
>>> [ 2412.268964] UBIFS error (ubi0:0 pid 708): scan_check_cb: bad
>>> accounting of 
>>> LEB 11: free 0, dirty 118072 flags 0x1, should be free 126976, dirty
>>> 0
>>
>> So it has 126976 - 118072 = 8904 of used space.
>>
> 
> [...]
> 
>> 1. If you believe some specific place like the one you pointed contains
>> a bug, then emulate a power cut in that place and try to reproduce the
>> bug.
>>
>> 2. Try to focus on recovery replay, this is the main suspect. Try to
>> figure out what is that used space in that LEB?
> 
> After spending the third night in a row staring at hexdumps, I think I know
> what is going on.
> 
> I see that LEB 11 recently got unmapped, but the LPT still accounts 8904 bytes
> as used.
> To double check that value I've analyzed the index tree. To my surprise it was 
> referencing nodes at LEB 11!
> Summing up the length of these nodes is exactly 8904. So, the LPT is correct.
> 
> All nodes on LEB 11 that are referenced by the index tree are of type 
> UBIFS_INO_NODE and have UBIFS_XATTR_FL set.
> So, our problem is xattr related. We forget xattr inodes and the free space 
> accounting goes nuts sooner or later.
> 
> Inspecting the journal gave more details. Only buds of head type BASEHD and 
> DATAHD are present. No GCHD. Therefore GC was not interrupted.
> With the IO transaction log from before the power-cut I was able to figure 
> where the orphaned xattr inodes belong to.
> And indeed, the host inode of said xattr is in the journal for deletion.
> 
> In theory during replay, removal of the host inode should also remove all
> xattr inodes and we are good.
> So the question is, why fails UBIFS to remove the xattr inodes even if we have 
> the host inode in the journal?
> 
> Another round of digging though the index uncovered the problem. The 
> UBIFS_XENT_NODE nodes for all orphaned xattr inodes sat on LEB 11.
> This means upon replay, when UBIFS tries to locate all xattr entries
> of the host inode, it faces the dangling branch case and the loop in 
> ubifs_tnc_remove_ino() terminates with -ENOENT. The host inode will be 
> removed, but the xattr inodes stay.
> 
> Therefore the root of the problem is that UBIFS places only the host inode 
> into the journal upon removal of a file with xattrs and assumes that upon 
> replay we can discover all xattr inodes. That will fail if due to GC the LEB 
> got already unmapped.
> 
> As intermediate solution I suggest adding also xattr inodes to the journal
> upon unlink. This will make unlink more expensive but for now it is IMHO ok
> until a more sophisticated solution is found.
> Files usually have only a few xattr, so the cost is not that bad.
> 
> What do you guys think? Does my analysis make sense to you?

It doesn't seem like there is a way to avoid adding more information to the
journal upon deletion, but you probably need to cater for the possibility
that there is a power cut while writing that information e.g. it wouldn't be
good to remove half the xattrs but leave the rest

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UBIFS question: Power-cuts after ubifs_leb_unmap()
  2018-07-30  6:55     ` Adrian Hunter
@ 2018-07-30  7:28       ` Richard Weinberger
  2018-07-30  8:07         ` Adrian Hunter
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Weinberger @ 2018-07-30  7:28 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: artem.bityutskiy, linux-mtd

Am Montag, 30. Juli 2018, 08:55:24 CEST schrieb Adrian Hunter:
> On 26/07/18 01:29, Richard Weinberger wrote:
> > Artem, Adrian,
> > 
> > Am Dienstag, 24. Juli 2018, 10:01:30 CEST schrieb Artem Bityutskiy:
> >> Hi, I looked at this quickly and talked to Adrian. I cannot solve this
> >> for you but here are some thoughts.
> > 
> > Both of your thoughts are much appreciated!
> > 
> >> On Mon, 2018-07-09 at 12:11 +0200, Richard Weinberger wrote:
> >>> Artem, Adrian,
> >>>
> >>> While playing with a new UBI/UBIFS test framework I managed to hit
> >>> this error, 
> >>> with lprops self-checks enabled:
> >>>
> >>> [ 2412.268964] UBIFS error (ubi0:0 pid 708): scan_check_cb: bad
> >>> accounting of 
> >>> LEB 11: free 0, dirty 118072 flags 0x1, should be free 126976, dirty
> >>> 0
> >>
> >> So it has 126976 - 118072 = 8904 of used space.
> >>
> > 
> > [...]
> > 
> >> 1. If you believe some specific place like the one you pointed contains
> >> a bug, then emulate a power cut in that place and try to reproduce the
> >> bug.
> >>
> >> 2. Try to focus on recovery replay, this is the main suspect. Try to
> >> figure out what is that used space in that LEB?
> > 
> > After spending the third night in a row staring at hexdumps, I think I know
> > what is going on.
> > 
> > I see that LEB 11 recently got unmapped, but the LPT still accounts 8904 bytes
> > as used.
> > To double check that value I've analyzed the index tree. To my surprise it was 
> > referencing nodes at LEB 11!
> > Summing up the length of these nodes is exactly 8904. So, the LPT is correct.
> > 
> > All nodes on LEB 11 that are referenced by the index tree are of type 
> > UBIFS_INO_NODE and have UBIFS_XATTR_FL set.
> > So, our problem is xattr related. We forget xattr inodes and the free space 
> > accounting goes nuts sooner or later.
> > 
> > Inspecting the journal gave more details. Only buds of head type BASEHD and 
> > DATAHD are present. No GCHD. Therefore GC was not interrupted.
> > With the IO transaction log from before the power-cut I was able to figure 
> > where the orphaned xattr inodes belong to.
> > And indeed, the host inode of said xattr is in the journal for deletion.
> > 
> > In theory during replay, removal of the host inode should also remove all
> > xattr inodes and we are good.
> > So the question is, why fails UBIFS to remove the xattr inodes even if we have 
> > the host inode in the journal?
> > 
> > Another round of digging though the index uncovered the problem. The 
> > UBIFS_XENT_NODE nodes for all orphaned xattr inodes sat on LEB 11.
> > This means upon replay, when UBIFS tries to locate all xattr entries
> > of the host inode, it faces the dangling branch case and the loop in 
> > ubifs_tnc_remove_ino() terminates with -ENOENT. The host inode will be 
> > removed, but the xattr inodes stay.
> > 
> > Therefore the root of the problem is that UBIFS places only the host inode 
> > into the journal upon removal of a file with xattrs and assumes that upon 
> > replay we can discover all xattr inodes. That will fail if due to GC the LEB 
> > got already unmapped.
> > 
> > As intermediate solution I suggest adding also xattr inodes to the journal
> > upon unlink. This will make unlink more expensive but for now it is IMHO ok
> > until a more sophisticated solution is found.
> > Files usually have only a few xattr, so the cost is not that bad.
> > 
> > What do you guys think? Does my analysis make sense to you?
> 
> It doesn't seem like there is a way to avoid adding more information to the
> journal upon deletion, but you probably need to cater for the possibility
> that there is a power cut while writing that information e.g. it wouldn't be
> good to remove half the xattrs but leave the rest

With grouped nodes a power-cut is okay, the bigger problem is the number of xattrs.
We support up to 2^16. Do delete an inode via journal we need UBIFS_INO_NODE_SZ bytes
in the journal.
So, in worst case we need to write 2^16 times UBIFS_INO_NODE_SZ bytes to the journal.
I'm still looking into this option right now, just to be sure that this is really the
solution to the problem I see. Finding a stable reproducer is also not easy...

Another possibility is playing with ubifs_tnc_remove_ino(), upon unlink() it removes
all entries from the TNC and marks them as dirty.
The problem is that also all xattr inodes are marked as dirty.
If we manage to find a way that these get marked only as dirty when we are sure that
they are gone the GC will no longer unmap a "used" LEB.
But that all is very tricky.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UBIFS question: Power-cuts after ubifs_leb_unmap()
  2018-07-30  7:28       ` Richard Weinberger
@ 2018-07-30  8:07         ` Adrian Hunter
  2018-07-31 21:28           ` Richard Weinberger
  0 siblings, 1 reply; 10+ messages in thread
From: Adrian Hunter @ 2018-07-30  8:07 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: artem.bityutskiy, linux-mtd

On 30/07/18 10:28, Richard Weinberger wrote:
> Am Montag, 30. Juli 2018, 08:55:24 CEST schrieb Adrian Hunter:
>> On 26/07/18 01:29, Richard Weinberger wrote:
>>> Artem, Adrian,
>>>
>>> Am Dienstag, 24. Juli 2018, 10:01:30 CEST schrieb Artem Bityutskiy:
>>>> Hi, I looked at this quickly and talked to Adrian. I cannot solve this
>>>> for you but here are some thoughts.
>>>
>>> Both of your thoughts are much appreciated!
>>>
>>>> On Mon, 2018-07-09 at 12:11 +0200, Richard Weinberger wrote:
>>>>> Artem, Adrian,
>>>>>
>>>>> While playing with a new UBI/UBIFS test framework I managed to hit
>>>>> this error, 
>>>>> with lprops self-checks enabled:
>>>>>
>>>>> [ 2412.268964] UBIFS error (ubi0:0 pid 708): scan_check_cb: bad
>>>>> accounting of 
>>>>> LEB 11: free 0, dirty 118072 flags 0x1, should be free 126976, dirty
>>>>> 0
>>>>
>>>> So it has 126976 - 118072 = 8904 of used space.
>>>>
>>>
>>> [...]
>>>
>>>> 1. If you believe some specific place like the one you pointed contains
>>>> a bug, then emulate a power cut in that place and try to reproduce the
>>>> bug.
>>>>
>>>> 2. Try to focus on recovery replay, this is the main suspect. Try to
>>>> figure out what is that used space in that LEB?
>>>
>>> After spending the third night in a row staring at hexdumps, I think I know
>>> what is going on.
>>>
>>> I see that LEB 11 recently got unmapped, but the LPT still accounts 8904 bytes
>>> as used.
>>> To double check that value I've analyzed the index tree. To my surprise it was 
>>> referencing nodes at LEB 11!
>>> Summing up the length of these nodes is exactly 8904. So, the LPT is correct.
>>>
>>> All nodes on LEB 11 that are referenced by the index tree are of type 
>>> UBIFS_INO_NODE and have UBIFS_XATTR_FL set.
>>> So, our problem is xattr related. We forget xattr inodes and the free space 
>>> accounting goes nuts sooner or later.
>>>
>>> Inspecting the journal gave more details. Only buds of head type BASEHD and 
>>> DATAHD are present. No GCHD. Therefore GC was not interrupted.
>>> With the IO transaction log from before the power-cut I was able to figure 
>>> where the orphaned xattr inodes belong to.
>>> And indeed, the host inode of said xattr is in the journal for deletion.
>>>
>>> In theory during replay, removal of the host inode should also remove all
>>> xattr inodes and we are good.
>>> So the question is, why fails UBIFS to remove the xattr inodes even if we have 
>>> the host inode in the journal?
>>>
>>> Another round of digging though the index uncovered the problem. The 
>>> UBIFS_XENT_NODE nodes for all orphaned xattr inodes sat on LEB 11.
>>> This means upon replay, when UBIFS tries to locate all xattr entries
>>> of the host inode, it faces the dangling branch case and the loop in 
>>> ubifs_tnc_remove_ino() terminates with -ENOENT. The host inode will be 
>>> removed, but the xattr inodes stay.
>>>
>>> Therefore the root of the problem is that UBIFS places only the host inode 
>>> into the journal upon removal of a file with xattrs and assumes that upon 
>>> replay we can discover all xattr inodes. That will fail if due to GC the LEB 
>>> got already unmapped.
>>>
>>> As intermediate solution I suggest adding also xattr inodes to the journal
>>> upon unlink. This will make unlink more expensive but for now it is IMHO ok
>>> until a more sophisticated solution is found.
>>> Files usually have only a few xattr, so the cost is not that bad.
>>>
>>> What do you guys think? Does my analysis make sense to you?
>>
>> It doesn't seem like there is a way to avoid adding more information to the
>> journal upon deletion, but you probably need to cater for the possibility
>> that there is a power cut while writing that information e.g. it wouldn't be
>> good to remove half the xattrs but leave the rest
> 
> With grouped nodes a power-cut is okay, the bigger problem is the number of xattrs.
> We support up to 2^16. Do delete an inode via journal we need UBIFS_INO_NODE_SZ bytes
> in the journal.
> So, in worst case we need to write 2^16 times UBIFS_INO_NODE_SZ bytes to the journal.
> I'm still looking into this option right now, just to be sure that this is really the
> solution to the problem I see. Finding a stable reproducer is also not easy...
> 
> Another possibility is playing with ubifs_tnc_remove_ino(), upon unlink() it removes
> all entries from the TNC and marks them as dirty.
> The problem is that also all xattr inodes are marked as dirty.
> If we manage to find a way that these get marked only as dirty when we are sure that
> they are gone the GC will no longer unmap a "used" LEB.
> But that all is very tricky.

What about: add the inode to be deleted to orphans, then delete the xattrs
one at a time, then when they are all gone, delete the inode and remove it
from orphans.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: UBIFS question: Power-cuts after ubifs_leb_unmap()
  2018-07-30  8:07         ` Adrian Hunter
@ 2018-07-31 21:28           ` Richard Weinberger
  0 siblings, 0 replies; 10+ messages in thread
From: Richard Weinberger @ 2018-07-31 21:28 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: artem.bityutskiy, linux-mtd

Am Montag, 30. Juli 2018, 10:07:25 CEST schrieb Adrian Hunter:
> > With grouped nodes a power-cut is okay, the bigger problem is the number of xattrs.
> > We support up to 2^16. Do delete an inode via journal we need UBIFS_INO_NODE_SZ bytes
> > in the journal.
> > So, in worst case we need to write 2^16 times UBIFS_INO_NODE_SZ bytes to the journal.
> > I'm still looking into this option right now, just to be sure that this is really the
> > solution to the problem I see. Finding a stable reproducer is also not easy...
> > 
> > Another possibility is playing with ubifs_tnc_remove_ino(), upon unlink() it removes
> > all entries from the TNC and marks them as dirty.
> > The problem is that also all xattr inodes are marked as dirty.
> > If we manage to find a way that these get marked only as dirty when we are sure that
> > they are gone the GC will no longer unmap a "used" LEB.
> > But that all is very tricky.
> 
> What about: add the inode to be deleted to orphans, then delete the xattrs
> one at a time, then when they are all gone, delete the inode and remove it
> from orphans.

Hmm, deleting one at a time still means that UBIFS has to write up to 2^16 times
to the journal.
But I agree this approach is better than allocating space for 2^16 inodes at once
in the journal.
Using the orphan list we can make sure that deletion still is atomic.

Thanks,
//richard

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-07-31 21:29 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-09 10:11 UBIFS question: Power-cuts after ubifs_leb_unmap() Richard Weinberger
2018-07-09 12:21 ` Adrian Hunter
2018-07-09 13:11   ` Richard Weinberger
2018-07-10  6:58     ` Adrian Hunter
2018-07-24  8:01 ` Artem Bityutskiy
2018-07-25 22:29   ` Richard Weinberger
2018-07-30  6:55     ` Adrian Hunter
2018-07-30  7:28       ` Richard Weinberger
2018-07-30  8:07         ` Adrian Hunter
2018-07-31 21:28           ` Richard Weinberger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.