linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed
* [linux-lvm] LVM archive management ( /etc/lvm/archives) expiry / retention misbehaves after index #100, 000.
@ 2017-08-13 12:05 Mark Mielke
  2017-08-14  3:49 ` Mark Mielke
  0 siblings, 1 reply; 2+ messages in thread
From: Mark Mielke @ 2017-08-13 12:05 UTC (permalink / raw)
  To: LVM general discussion and development

[-- Attachment #1: Type: text/plain, Size: 3288 bytes --]

I searched around for this a bit, and although other users may have hit
this, I didn't find a good explanation offered. I suspect the users clean
it up manually and then it disappears for another 2 years. I hope this
message will get captured by Google, and help somebody else out. Also, I
hope to have some discussion about this as it seems like an easily
preventable problem.

The archive file names are generated like:

                if (dm_snprintf(archive_name, sizeof(archive_name),
                                 "%s/%s_%05u-%d.vg",
                                 dir, vg->name, ix, rnum) < 0) {

The directory scanning code that loads the archive file names into memory
recognizes a problem, although it isn't explicit about what the problem is:

        /* Sort fails beyond 5-digit indexes */
        if ((count = scandir(dir, &dirent, NULL, alphasort)) < 0) {
                log_error("Couldn't scan the archive directory (%s).", dir);
                return 0;
        }

The file names encode the index like "00000". The sorting code uses
"alphasort", which will only work properly as long as the index stays
within 5 digits. As soon as it exceeds 5 digits, it begins to sort the
"100000" to the beginning, and "99999" to the end. Then, new archives seems
to *all* be "100000". We had some 40,000 indexes with "100000" before we
noticed. And, because the index is followed by a random number, it would
only expire a few of the "100000" before it would hit one that was younger
than the 30 days retention period set by default. When I reduced the
retention period to 7 days, it expired only about 12 archive files of
40,000 archive files. This behaviour is probably due to random number
distribution ensuring that there are always some recent records near 0?

This issue eventually affects everyone, although obviously the people that
use features like snapshots more frequently (we use it every 15 minutes,
across multiple volumes) will hit it sooner,

There are a few fixes possible... Probably, "alphasort" should not be used
at all, but a context aware sort should be used, that can filter and sort
as it goes, decoding the index correctly as a number, and comparing it as a
number. Then, if performance is desirable, and scalability, it would be
ideal if it did it in a single pass, and buffering only the minimum needed
to expire the correct archive files.

We hit this on RHEL 7.2. I wasn't surprised to find it in RHEL 7.2, but I
was surprised that it still exists on "master". "git blame" says this has
been an issue since 2002:

5be981bab5 (Alasdair Kergon  2002-05-07 12:47:11 +0000 139)     /* Sort
fails beyond 5-digit indexes */
59d6420b9a (Joe Thornber     2002-02-08 11:58:18 +0000 140)     if ((count
= scandir(dir, &dirent, NULL, alphasort)) < 0) {
b8f47d5f69 (Alasdair Kergon  2009-07-15 20:02:46 +0000 141)
log_error("Couldn't scan the archive directory (%s).", dir);
952d12a5f5 (Alasdair Kergon  2002-01-09 19:16:48 +0000 142)
return 0;
952d12a5f5 (Alasdair Kergon  2002-01-09 19:16:48 +0000 143)     }

Ouch... :-)

For anybody that does hit this.... Prune the archive files with index <
100000 is effective. It starts counting from 100000, and you now have 9X
more life before it will happen again... :-)

-- 
Mark Mielke <mark.mielke@gmail.com>

[-- Attachment #2: Type: text/html, Size: 4700 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [linux-lvm] LVM archive management ( /etc/lvm/archives) expiry / retention misbehaves after index #100, 000.
  2017-08-13 12:05 [linux-lvm] LVM archive management ( /etc/lvm/archives) expiry / retention misbehaves after index #100, 000 Mark Mielke
@ 2017-08-14  3:49 ` Mark Mielke
  0 siblings, 0 replies; 2+ messages in thread
From: Mark Mielke @ 2017-08-14  3:49 UTC (permalink / raw)
  To: LVM general discussion and development

[-- Attachment #1: Type: text/plain, Size: 3640 bytes --]

I opened this Bugzilla issue for tracking purposes:

https://bugzilla.redhat.com/show_bug.cgi?id=1481085

On Sun, Aug 13, 2017 at 8:05 AM, Mark Mielke <mark.mielke@gmail.com> wrote:

> I searched around for this a bit, and although other users may have hit
> this, I didn't find a good explanation offered. I suspect the users clean
> it up manually and then it disappears for another 2 years. I hope this
> message will get captured by Google, and help somebody else out. Also, I
> hope to have some discussion about this as it seems like an easily
> preventable problem.
>
> The archive file names are generated like:
>
>                 if (dm_snprintf(archive_name, sizeof(archive_name),
>                                  "%s/%s_%05u-%d.vg",
>                                  dir, vg->name, ix, rnum) < 0) {
>
> The directory scanning code that loads the archive file names into memory
> recognizes a problem, although it isn't explicit about what the problem is:
>
>         /* Sort fails beyond 5-digit indexes */
>         if ((count = scandir(dir, &dirent, NULL, alphasort)) < 0) {
>                 log_error("Couldn't scan the archive directory (%s).",
> dir);
>                 return 0;
>         }
>
> The file names encode the index like "00000". The sorting code uses
> "alphasort", which will only work properly as long as the index stays
> within 5 digits. As soon as it exceeds 5 digits, it begins to sort the
> "100000" to the beginning, and "99999" to the end. Then, new archives seems
> to *all* be "100000". We had some 40,000 indexes with "100000" before we
> noticed. And, because the index is followed by a random number, it would
> only expire a few of the "100000" before it would hit one that was younger
> than the 30 days retention period set by default. When I reduced the
> retention period to 7 days, it expired only about 12 archive files of
> 40,000 archive files. This behaviour is probably due to random number
> distribution ensuring that there are always some recent records near 0?
>
> This issue eventually affects everyone, although obviously the people that
> use features like snapshots more frequently (we use it every 15 minutes,
> across multiple volumes) will hit it sooner,
>
> There are a few fixes possible... Probably, "alphasort" should not be used
> at all, but a context aware sort should be used, that can filter and sort
> as it goes, decoding the index correctly as a number, and comparing it as a
> number. Then, if performance is desirable, and scalability, it would be
> ideal if it did it in a single pass, and buffering only the minimum needed
> to expire the correct archive files.
>
> We hit this on RHEL 7.2. I wasn't surprised to find it in RHEL 7.2, but I
> was surprised that it still exists on "master". "git blame" says this has
> been an issue since 2002:
>
> 5be981bab5 (Alasdair Kergon  2002-05-07 12:47:11 +0000 139)     /* Sort
> fails beyond 5-digit indexes */
> 59d6420b9a (Joe Thornber     2002-02-08 11:58:18 +0000 140)     if ((count
> = scandir(dir, &dirent, NULL, alphasort)) < 0) {
> b8f47d5f69 (Alasdair Kergon  2009-07-15 20:02:46 +0000 141)
> log_error("Couldn't scan the archive directory (%s).", dir);
> 952d12a5f5 (Alasdair Kergon  2002-01-09 19:16:48 +0000 142)
> return 0;
> 952d12a5f5 (Alasdair Kergon  2002-01-09 19:16:48 +0000 143)     }
>
> Ouch... :-)
>
> For anybody that does hit this.... Prune the archive files with index <
> 100000 is effective. It starts counting from 100000, and you now have 9X
> more life before it will happen again... :-)
>
> --
> Mark Mielke <mark.mielke@gmail.com>
>
>


-- 
Mark Mielke <mark.mielke@gmail.com>

[-- Attachment #2: Type: text/html, Size: 5584 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2017-08-14  3:49 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-13 12:05 [linux-lvm] LVM archive management ( /etc/lvm/archives) expiry / retention misbehaves after index #100, 000 Mark Mielke
2017-08-14  3:49 ` Mark Mielke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).