All of lore.kernel.org
 help / color / mirror / Atom feed
* [mlmmj] Patches wanted for hashing/nesting archive directory (1M+ mails in list)
@ 2018-08-07  3:17 Robin H. Johnson
  2018-10-01  2:45 ` Chris Knadle
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Robin H. Johnson @ 2018-08-07  3:17 UTC (permalink / raw)
  To: mlmmj

[-- Attachment #1: Type: text/plain, Size: 557 bytes --]

One of the larger gentoo.org lists recently passed 1M mails, and while
performance hasn't taken a noticeable hit, doing admin ops in such large
directories is painful.

Has anybody started on patches to use a nested directory structure for
archive? Ideally something that supports cleanly migrating from the old
flat layout.

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
E-Mail   : robbat2@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 1113 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [mlmmj] Patches wanted for hashing/nesting archive directory (1M+ mails in list)
  2018-08-07  3:17 [mlmmj] Patches wanted for hashing/nesting archive directory (1M+ mails in list) Robin H. Johnson
@ 2018-10-01  2:45 ` Chris Knadle
  2018-10-23 21:27 ` Robin H. Johnson
  2018-11-03  3:23 ` Chris Knadle
  2 siblings, 0 replies; 4+ messages in thread
From: Chris Knadle @ 2018-10-01  2:45 UTC (permalink / raw)
  To: mlmmj

Robin H. Johnson:
> One of the larger gentoo.org lists recently passed 1M mails, and while
> performance hasn't taken a noticeable hit, doing admin ops in such large
> directories is painful.

Mmm. Yeah that's a lot for one directory.

> Has anybody started on patches to use a nested directory structure for
> archive? Ideally something that supports cleanly migrating from the old
> flat layout.

Not that I know of, but I think there's a reason.

Mlmmj doesn't include a "web archiver", so I think the only reason to keep old
archives in the same directory is for getting previous messages from the list
via email by sending mail to <listname>-get-N@<list-domain-name> where N is the
number of the desired message.  It's very unlikely that a user would want to use
this to retrieve very many messages.

Thus realistically older mail archives can be moved into directories by month or
year, and configuring the web archiver to look for those archives there.

I hope that helps.
   -- Chris

-- 
Chris Knadle
Chris.Knadle@coredump.us


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [mlmmj] Patches wanted for hashing/nesting archive directory (1M+ mails in list)
  2018-08-07  3:17 [mlmmj] Patches wanted for hashing/nesting archive directory (1M+ mails in list) Robin H. Johnson
  2018-10-01  2:45 ` Chris Knadle
@ 2018-10-23 21:27 ` Robin H. Johnson
  2018-11-03  3:23 ` Chris Knadle
  2 siblings, 0 replies; 4+ messages in thread
From: Robin H. Johnson @ 2018-10-23 21:27 UTC (permalink / raw)
  To: mlmmj

[-- Attachment #1: Type: text/plain, Size: 1590 bytes --]

(Sorry for the lag, busy month so far)

On Mon, Oct 01, 2018 at 02:45:00AM +0000, Chris Knadle wrote:
> Robin H. Johnson:
> > One of the larger gentoo.org lists recently passed 1M mails, and while
> > performance hasn't taken a noticeable hit, doing admin ops in such large
> > directories is painful.
> Mmm. Yeah that's a lot for one directory.
> 
> > Has anybody started on patches to use a nested directory structure for
> > archive? Ideally something that supports cleanly migrating from the old
> > flat layout.
> Not that I know of, but I think there's a reason.
> 
> Mlmmj doesn't include a "web archiver", so I think the only reason to keep old
> archives in the same directory is for getting previous messages from the list
> via email by sending mail to <listname>-get-N@<list-domain-name> where N is the
> number of the desired message.  It's very unlikely that a user would want to use
> this to retrieve very many messages.
Isn't the archive directory also used in the delayed/retry delivery
cases? (I need to check the moderation cases as well).

The get-N functionality does get usage in Gentoo, as we document it in
counterpart to the 'you missed some mail because your address was
bouncing'

For web-archiving, we do it entirely outside of mlmmj (subscribe the
archiver to the list, use get-N for missing mail, profit).

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
E-Mail   : robbat2@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 1113 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [mlmmj] Patches wanted for hashing/nesting archive directory (1M+ mails in list)
  2018-08-07  3:17 [mlmmj] Patches wanted for hashing/nesting archive directory (1M+ mails in list) Robin H. Johnson
  2018-10-01  2:45 ` Chris Knadle
  2018-10-23 21:27 ` Robin H. Johnson
@ 2018-11-03  3:23 ` Chris Knadle
  2 siblings, 0 replies; 4+ messages in thread
From: Chris Knadle @ 2018-11-03  3:23 UTC (permalink / raw)
  To: mlmmj


[-- Attachment #1.1: Type: text/plain, Size: 2889 bytes --]

Robin H. Johnson:
> (Sorry for the lag, busy month so far)
> 
> On Mon, Oct 01, 2018 at 02:45:00AM +0000, Chris Knadle wrote:
>> Robin H. Johnson:
>>> One of the larger gentoo.org lists recently passed 1M mails, and while
>>> performance hasn't taken a noticeable hit, doing admin ops in such large
>>> directories is painful.
>> Mmm. Yeah that's a lot for one directory.
>>
>>> Has anybody started on patches to use a nested directory structure for
>>> archive? Ideally something that supports cleanly migrating from the old
>>> flat layout.
>> Not that I know of, but I think there's a reason.
>>
>> Mlmmj doesn't include a "web archiver", so I think the only reason to keep old
>> archives in the same directory is for getting previous messages from the list
>> via email by sending mail to <listname>-get-N@<list-domain-name> where N is the
>> number of the desired message.  It's very unlikely that a user would want to use
>> this to retrieve very many messages.
>
> Isn't the archive directory also used in the delayed/retry delivery
> cases? (I need to check the moderation cases as well).

I think no; I believe the /queue, /requeue, and /bounce directories for the
particular mailing list are used for that.  (I haven't examined the code.)

> The get-N functionality does get usage in Gentoo, as we document it in
> counterpart to the 'you missed some mail because your address was
> bouncing'
> 
> For web-archiving, we do it entirely outside of mlmmj (subscribe the
> archiver to the list, use get-N for missing mail, profit).

If I were in the position of dealing with the mail archives, what I would want
to do is to leave about 3 months of archives within MLMMJ to allow get-N mail
retrieval, and the rest of the archives broken up into sub-directories by month,
i.e "2018-08", and work out some method of having the web archives updated as
new mail comes in, and then figure out how to deal with the file move transition
for the months that the raw mail archives moved into sub-directories.

MLMMJ only needs a file with the unique message number to retrieve the message,
so I understand your suggestion for allowing nesting of archives within
subdirectories.  However I also realize that would come with a cost -- there'd
be some I/O performance impact for the search if an archive file was not
immediatly in /archive directly.
Still ... I like the idea.  It seems like a logical thing I'd want.

Another thought I had was that it's also possible to make soft-links within
/archive to point to /archive/<subdirectory> mails.  This wouldn't help with the
number of files in /archive because that wouldn't change, but an administrator
looking within /archive could see which messages were new (i.e. which ones were
actual files) rather than old (softlinks).

  -- Chris

-- 
Chris Knadle
Chris.Knadle@coredump.us


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-11-03  3:23 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-07  3:17 [mlmmj] Patches wanted for hashing/nesting archive directory (1M+ mails in list) Robin H. Johnson
2018-10-01  2:45 ` Chris Knadle
2018-10-23 21:27 ` Robin H. Johnson
2018-11-03  3:23 ` Chris Knadle

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.