All of lore.kernel.org
 help / color / mirror / Atom feed
* [mlmmj] mlmmj patches from distributions: Gentoo
@ 2022-12-08  1:13 Robin H. Johnson
  2022-12-08  8:01 ` Baptiste Daroussin
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Robin H. Johnson @ 2022-12-08  1:13 UTC (permalink / raw)
  To: mlmmj


[-- Attachment #1.1: Type: text/plain, Size: 1071 bytes --]

It's great to see more active development again.
Gentoo Linux still runs mlmmj for all our lists, and we've got some
patches that would be great to land upstream.

mlmmj-1.3.0-gcc-10.patch - GCC 10 fix

mlmmj-1.2.19.0-listcontrol-customheaders.patch
Include the customheaders file content in list control messages.

mlmmj-1.3.0-logging.patch - this is a brand new patch, not tested yet.
When dropping messages from non-subscribers, we wanted a better trail
about it. Ideally we'd 
1) log the message-id
2) give a per-message SMTP-time rejection (need postfix filter stuff)

There's one scaling discussion we need to have, but the handling needs
to include how to incrementally get there:

How to have millions of mails in /archive/!

Our most active list is now approaching 1.5M emails in that directory,
and it worries me.

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
E-Mail   : robbat2@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136

[-- Attachment #1.2: mlmmj-1.3.0-logging.patch --]
[-- Type: text/plain, Size: 2632 bytes --]

On a high-mail system, it's hard to link errors back to specific mails.
Log the list address, poster address and envelope to aid that.

Better work here would be capturing the message-id and logging that, but it's
not presently captured, so that is a more invasive change.

Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>

--- mlmmj-1.3.0.orig/src/mlmmj-process.c	2022-11-24 16:09:30.839848253 -0800
+++ mlmmj-1.3.0/src/mlmmj-process.c	2022-11-24 16:52:32.311365699 -0800
@@ -193,7 +193,8 @@ static void newmoderated(const char *lis
 	if (notifymod) {
 		childpid = fork();
 		if(childpid < 0)
-			log_error(LOG_ARGS, "Could not fork; poster not notified");
+			log_error(LOG_ARGS, "Could not fork; poster not notified"
+ 					"; list=%s poster=%s envelope=%s", listaddr, posteraddr, efromsender);
 	} else
 		childpid = -1;
 
@@ -919,8 +920,10 @@ int main(int argc, char **argv)
 			log_error(LOG_ARGS, "Discarding %s because list"
 					" address was not in To: or Cc:,"
 					" and From: was the list or"
-					" notoccdenymails was set",
-					mailfile);
+					" notoccdenymails was set"
+ 					"; list=%s poster=%s envelope=%s",
+					mailfile,
+					listaddr, posteraddr, efrom);
 			myfree(listaddr);
 			unlink(donemailname);
 			myfree(donemailname);
@@ -971,8 +974,10 @@ int main(int argc, char **argv)
 					" it was denied by an access"
 					" rule, and From: was the list"
 					" address or noaccessdenymails"
-					" was set",
-					mailfile);
+					" was set"
+ 					"; list=%s poster=%s envelope=%s",
+					mailfile,
+					listaddr, posteraddr, efrom);
 				myfree(listaddr);
 				unlink(donemailname);
 				myfree(donemailname);
@@ -1046,8 +1051,10 @@ int main(int argc, char **argv)
 		if (strcasecmp(listaddr, posteraddr) == 0) {
 			log_error(LOG_ARGS, "Discarding %s because"
 					" there are sender restrictions but"
-					" From: was the list address",
-					mailfile);
+					" From: was the list address"
+ 					"; list=%s poster=%s envelope=%s",
+					mailfile,
+					listaddr, posteraddr, efrom);
 			myfree(listaddr);
 			unlink(donemailname);
 			myfree(donemailname);
@@ -1072,8 +1079,10 @@ int main(int argc, char **argv)
 				    (modonlypost &&
 				    statctrl(listdir, "nomodonlydenymails"))) {
 				log_error(LOG_ARGS, "Discarding %s because"
-					" no{sub|mod}onlydenymails was set",
-					mailfile);
+					" no{sub|mod}onlydenymails was set"
+ 					"; list=%s poster=%s envelope=%s",
+					mailfile,
+					listaddr, posteraddr, efrom);
 				myfree(listaddr);
 				unlink(donemailname);
 				myfree(donemailname);

[-- Attachment #1.3: mlmmj-1.3.0-gcc-10.patch --]
[-- Type: text/plain, Size: 656 bytes --]

--- a/include/mlmmj.h
+++ b/include/mlmmj.h
@@ -81,7 +81,7 @@ enum subtype {
 	SUB_NONE /* For when an address is not subscribed at all */
 };
 
-char *subtype_strs[7]; /* count matches enum above; defined in subscriberfuncs.c */
+extern char *subtype_strs[7]; /* count matches enum above; defined in subscriberfuncs.c */
 
 enum subreason {
 	SUB_REQUEST,
@@ -92,7 +92,7 @@ enum subreason {
 	SUB_SWITCH
 };
 
-char * subreason_strs[6]; /* count matches enum above; defined in subscriberfuncs.c */
+extern char * subreason_strs[6]; /* count matches enum above; defined in subscriberfuncs.c */
 
 void print_version(const char *prg);
 

[-- Attachment #1.4: mlmmj-1.2.19.0-listcontrol-customheaders.patch --]
[-- Type: text/plain, Size: 1231 bytes --]

List control emails do not include customheaders, and can lead to RBL issues
for forged senders.

Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>

diff -Nuar --exclude '*~' mlmmj-1.2.19.0.orig/src/mlmmj-process.c mlmmj-1.2.19.0/src/mlmmj-process.c
--- mlmmj-1.2.19.0.orig/src/mlmmj-process.c	2014-03-23 17:57:24.000000000 -0700
+++ mlmmj-1.2.19.0/src/mlmmj-process.c	2016-05-04 13:50:26.034174788 -0700
@@ -702,8 +702,19 @@
 						    "output mail file");
 				exit(EXIT_FAILURE);
 			}
-			if(do_all_the_voodoo_here(rawmailfd, donemailfd, -1,
-					-1, delheaders,
+			/* hdrfd is checked in do_all_the_voodoo_here(), because the
+			 * customheaders file might not exist */
+			headerfilename = concatstr(2, listdir, "/control/customheaders");
+			hdrfd = open(headerfilename, O_RDONLY);
+			myfree(headerfilename);
+
+			/* footfd is checked in do_all_the_voodoo_here(), see above */
+			footerfilename = concatstr(2, listdir, "/control/footer");
+			footfd = open(footerfilename, O_RDONLY);
+			myfree(footerfilename);
+
+			if(do_all_the_voodoo_here(rawmailfd, donemailfd, hdrfd,
+					footfd, delheaders,
 					NULL, &allheaders, NULL) < 0) {
 				log_error(LOG_ARGS, "do_all_the_voodoo_here");
 				exit(EXIT_FAILURE);

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 1113 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [mlmmj] mlmmj patches from distributions: Gentoo
  2022-12-08  1:13 [mlmmj] mlmmj patches from distributions: Gentoo Robin H. Johnson
@ 2022-12-08  8:01 ` Baptiste Daroussin
  2022-12-08 12:49 ` Franky Van Liedekerke
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Baptiste Daroussin @ 2022-12-08  8:01 UTC (permalink / raw)
  To: mlmmj

On Thu, Dec 08, 2022 at 01:13:26AM +0000, Robin H. Johnson wrote:
> It's great to see more active development again.
> Gentoo Linux still runs mlmmj for all our lists, and we've got some
> patches that would be great to land upstream.
> 
> mlmmj-1.3.0-gcc-10.patch - GCC 10 fix

This is already in
> 
> mlmmj-1.2.19.0-listcontrol-customheaders.patch
> Include the customheaders file content in list control messages.

I will look at it and see to incomporate
> 
> mlmmj-1.3.0-logging.patch - this is a brand new patch, not tested yet.
> When dropping messages from non-subscribers, we wanted a better trail
> about it. Ideally we'd 
> 1) log the message-id
> 2) give a per-message SMTP-time rejection (need postfix filter stuff)

I will look into it as well.
> 
> There's one scaling discussion we need to have, but the handling needs
> to include how to incrementally get there:
> 
> How to have millions of mails in /archive/!
> 
> Our most active list is now approaching 1.5M emails in that directory,
> and it worries me.

Why is it worrying you? a directory can hold way more than that.

On FreeBSD, one thing we are doing is when creating the public archives via:
https://fossil.nours.eu/mlmmj-archiver/doc/trunk/README.md (this may move to
codeberg), I append the archives to a mbox, so I can cleanup what ever is in the
archives directory if I need (I did not need up to now).

DISCLAIMER: mlmmj-archiver is not portable at all yet, and is not even in alpha
state :D (I am happily accepting portability patches, but this is not my focus
yet).

Best regards,
Bapt


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [mlmmj] mlmmj patches from distributions: Gentoo
  2022-12-08  1:13 [mlmmj] mlmmj patches from distributions: Gentoo Robin H. Johnson
  2022-12-08  8:01 ` Baptiste Daroussin
@ 2022-12-08 12:49 ` Franky Van Liedekerke
  2022-12-08 23:18 ` Robin H. Johnson
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Franky Van Liedekerke @ 2022-12-08 12:49 UTC (permalink / raw)
  To: mlmmj

[-- Attachment #1: Type: text/plain, Size: 1314 bytes --]

On Thu, 2022-12-08 at 09:01 +0100, Baptiste Daroussin wrote:
> On Thu, Dec 08, 2022 at 01:13:26AM +0000, Robin H. Johnson wrote:
> > There's one scaling discussion we need to have, but the handling
> > needs
> > to include how to incrementally get there:
> > 
> > How to have millions of mails in /archive/!
> > 
> > Our most active list is now approaching 1.5M emails in that
> > directory,
> > and it worries me.
> 
> Why is it worrying you? a directory can hold way more than that.
> 
> On FreeBSD, one thing we are doing is when creating the public
> archives via:
> https://fossil.nours.eu/mlmmj-archiver/doc/trunk/README.md (this may
> move to
> codeberg), I append the archives to a mbox, so I can cleanup what
> ever is in the
> archives directory if I need (I did not need up to now).
> 
> DISCLAIMER: mlmmj-archiver is not portable at all yet, and is not
> even in alpha
> state :D (I am happily accepting portability patches, but this is not
> my focus
> yet).

I can share some stuff I did based on mhonarc to create a public
archive (also based on mlmmj-webarchiver in fact), but in the end:

- mlmmj archives are already in subdirectories archive/1 ... archive/15
- the mhonarc archive (public) is in subdirectories per-month, so I'm
totally not woried

Franky


[-- Attachment #2: Type: text/html, Size: 4596 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [mlmmj] mlmmj patches from distributions: Gentoo
  2022-12-08  1:13 [mlmmj] mlmmj patches from distributions: Gentoo Robin H. Johnson
  2022-12-08  8:01 ` Baptiste Daroussin
  2022-12-08 12:49 ` Franky Van Liedekerke
@ 2022-12-08 23:18 ` Robin H. Johnson
  2022-12-09  5:25 ` Baptiste Daroussin
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Robin H. Johnson @ 2022-12-08 23:18 UTC (permalink / raw)
  To: mlmmj

On Thu, Dec 08, 2022 at 09:01:29AM +0100, Baptiste Daroussin wrote:
> > There's one scaling discussion we need to have, but the handling needs
> > to include how to incrementally get there:
> > 
> > How to have millions of mails in /archive/!
> > 
> > Our most active list is now approaching 1.5M emails in that directory,
> > and it worries me.
> 
> Why is it worrying you? a directory can hold way more than that.
>
> On FreeBSD, one thing we are doing is when creating the public archives via:
> https://fossil.nours.eu/mlmmj-archiver/doc/trunk/README.md (this may move to
> codeberg), I append the archives to a mbox, so I can cleanup what ever is in the
> archives directory if I need (I did not need up to now).
Cleaning up the archives directory will break the +get-NNNN
functionality.

It makes the directory take extremely long to scan during backups.
I think it should be partitioned more, but any +get-NNNN code will need
to support both partitioned & non-partitioned.

mbox support for +get would be nice, but that's much more work than just
partitioning the directory.

We do have our own archives site:
https://archives.gentoo.org/

The older versions used to use mhonarc, but also ran into scaling
concerns, so we built something more custom (earlier versions used
ElasticSearch as a DB; I don't recall what the current version uses
underneath, but we can re-ingest everything if we need to).

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
E-Mail   : robbat2@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [mlmmj] mlmmj patches from distributions: Gentoo
  2022-12-08  1:13 [mlmmj] mlmmj patches from distributions: Gentoo Robin H. Johnson
                   ` (2 preceding siblings ...)
  2022-12-08 23:18 ` Robin H. Johnson
@ 2022-12-09  5:25 ` Baptiste Daroussin
  2022-12-09  5:36 ` Robin H. Johnson
  2022-12-09  5:56 ` Baptiste Daroussin
  5 siblings, 0 replies; 7+ messages in thread
From: Baptiste Daroussin @ 2022-12-09  5:25 UTC (permalink / raw)
  To: mlmmj



Le 9 décembre 2022 00:18:46 GMT+01:00, "Robin H. Johnson" <robbat2@gentoo.org> a écrit :
>On Thu, Dec 08, 2022 at 09:01:29AM +0100, Baptiste Daroussin wrote:
>> > There's one scaling discussion we need to have, but the handling needs
>> > to include how to incrementally get there:
>> > 
>> > How to have millions of mails in /archive/!
>> > 
>> > Our most active list is now approaching 1.5M emails in that directory,
>> > and it worries me.
>> 
>> Why is it worrying you? a directory can hold way more than that.
>>
>> On FreeBSD, one thing we are doing is when creating the public archives via:
>> https://fossil.nours.eu/mlmmj-archiver/doc/trunk/README.md (this may move to
>> codeberg), I append the archives to a mbox, so I can cleanup what ever is in the
>> archives directory if I need (I did not need up to now).
>Cleaning up the archives directory will break the +get-NNNN
>functionality.
>
>It makes the directory take extremely long to scan during backups.
>I think it should be partitioned more, but any +get-NNNN code will need
>to support both partitioned & non-partitioned.
>

Good point and easy to implement, I ll looking into it.

>mbox support for +get would be nice, but that's much more work than just
>partitioning the directory.
>
>We do have our own archives site:
>https://archives.gentoo.org/
>
>The older versions used to use mhonarc, but also ran into scaling
>concerns, so we built something more custom (earlier versions used
>ElasticSearch as a DB; I don't recall what the current version uses
>underneath, but we can re-ingest everything if we need to).
>

If this is public I am interrested into it, i wrote mlmmj-archiver for scaling issues as well.


Best regards,
Bapt


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [mlmmj] mlmmj patches from distributions: Gentoo
  2022-12-08  1:13 [mlmmj] mlmmj patches from distributions: Gentoo Robin H. Johnson
                   ` (3 preceding siblings ...)
  2022-12-09  5:25 ` Baptiste Daroussin
@ 2022-12-09  5:36 ` Robin H. Johnson
  2022-12-09  5:56 ` Baptiste Daroussin
  5 siblings, 0 replies; 7+ messages in thread
From: Robin H. Johnson @ 2022-12-09  5:36 UTC (permalink / raw)
  To: mlmmj

On Fri, Dec 09, 2022 at 06:25:58AM +0100, Baptiste Daroussin wrote:
> >Cleaning up the archives directory will break the +get-NNNN
> >functionality.
> >
> >It makes the directory take extremely long to scan during backups.
> >I think it should be partitioned more, but any +get-NNNN code will need
> >to support both partitioned & non-partitioned.
> Good point and easy to implement, I ll looking into it.
Great to hear. Please do consider how to migrate existing /archive/
directories into it.

> >mbox support for +get would be nice, but that's much more work than just
> >partitioning the directory.
> >
> >We do have our own archives site:
> >https://archives.gentoo.org/
> >
> >The older versions used to use mhonarc, but also ran into scaling
> >concerns, so we built something more custom (earlier versions used
> >ElasticSearch as a DB; I don't recall what the current version uses
> >underneath, but we can re-ingest everything if we need to).
> If this is public I am interrested into it, i wrote mlmmj-archiver for
> scaling issues as well.
Yes, it's public (and still running ES as it turns out):
https://gitweb.gentoo.org/sites/archives/frontend.git/
https://gitweb.gentoo.org/sites/archives/backend.git/

Some crappy instructions here:
https://wiki.gentoo.org/wiki/Project:Infrastructure/Service_Catalog/Archives

This is the 4th or 5th rewrite of that service in the last 20 years.


-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
E-Mail   : robbat2@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [mlmmj] mlmmj patches from distributions: Gentoo
  2022-12-08  1:13 [mlmmj] mlmmj patches from distributions: Gentoo Robin H. Johnson
                   ` (4 preceding siblings ...)
  2022-12-09  5:36 ` Robin H. Johnson
@ 2022-12-09  5:56 ` Baptiste Daroussin
  5 siblings, 0 replies; 7+ messages in thread
From: Baptiste Daroussin @ 2022-12-09  5:56 UTC (permalink / raw)
  To: mlmmj



Le 9 décembre 2022 06:36:54 GMT+01:00, "Robin H. Johnson" <robbat2@gentoo.org> a écrit :
>On Fri, Dec 09, 2022 at 06:25:58AM +0100, Baptiste Daroussin wrote:
>> >Cleaning up the archives directory will break the +get-NNNN
>> >functionality.
>> >
>> >It makes the directory take extremely long to scan during backups.
>> >I think it should be partitioned more, but any +get-NNNN code will need
>> >to support both partitioned & non-partitioned.
>> Good point and easy to implement, I ll looking into it.
>Great to hear. Please do consider how to migrate existing /archive/
>directories into it.
>
My current plan is pretty simple make a 100k partition via mlmmj-maintd, is scans the archives and move mails around. So the rest of the code remains the same.

+get will look into 2 places the root of the archive and the partitionned directory

This should (famous last word) be easy and not really intrusive.

Best regards
Bapt


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-12-09  5:56 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-08  1:13 [mlmmj] mlmmj patches from distributions: Gentoo Robin H. Johnson
2022-12-08  8:01 ` Baptiste Daroussin
2022-12-08 12:49 ` Franky Van Liedekerke
2022-12-08 23:18 ` Robin H. Johnson
2022-12-09  5:25 ` Baptiste Daroussin
2022-12-09  5:36 ` Robin H. Johnson
2022-12-09  5:56 ` Baptiste Daroussin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.