Kernel Newbies archive on lore.kernel.org
 help / Atom feed
* [RFC] LKML Archive in Maildir Format
@ 2018-12-16 19:06 Joey Pabalinas
  2018-12-16 19:17 ` Joe Perches
  2018-12-16 19:46 ` Konstantin Ryabitsev
  0 siblings, 2 replies; 7+ messages in thread
From: Joey Pabalinas @ 2018-12-16 19:06 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Greg Kroah-Hartman, Linus Torvalds, Joey Pabalinas, kernelnewbies

[-- Attachment #1.1: Type: text/plain, Size: 677 bytes --]

I spent a lot of time trying to find an LKML archive in Maildir format
that I could use for local searches with nutmuch or something, but all
the links I was able to find were all dead.

I ended up just compiling one myself and I currently host it at:

https://alyptik.org/lkml.tar.xz

It's possible I'm the only weirdo who finds this kind of thing useful, but
I figured I should share it just in case I'm not.

It's about 1.1 million files, I was wondering if anyone had an idea of a
better way to host this? I've tried Github and GitLab, but they don't
appreciate repos with that many files, hah.

Open to suggestions, thanks!

-- 
Cheers,
Joey Pabalinas

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 170 bytes --]

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] LKML Archive in Maildir Format
  2018-12-16 19:06 [RFC] LKML Archive in Maildir Format Joey Pabalinas
@ 2018-12-16 19:17 ` Joe Perches
  2018-12-16 19:21   ` Joey Pabalinas
  2018-12-16 19:46 ` Konstantin Ryabitsev
  1 sibling, 1 reply; 7+ messages in thread
From: Joe Perches @ 2018-12-16 19:17 UTC (permalink / raw)
  To: Joey Pabalinas, Linux Kernel Mailing List
  Cc: Greg Kroah-Hartman, Linus Torvalds, kernelnewbies

On Sun, 2018-12-16 at 09:06 -1000, Joey Pabalinas wrote:
> I spent a lot of time trying to find an LKML archive in Maildir format
> that I could use for local searches with nutmuch or something, but all
> the links I was able to find were all dead.

You might instead use

https://www.kernel.org/lore.html
https://git.kernel.org/pub/scm/public-inbox/vger.kernel.org/git.git/



_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] LKML Archive in Maildir Format
  2018-12-16 19:17 ` Joe Perches
@ 2018-12-16 19:21   ` Joey Pabalinas
  2018-12-16 19:55     ` Konstantin Ryabitsev
  0 siblings, 1 reply; 7+ messages in thread
From: Joey Pabalinas @ 2018-12-16 19:21 UTC (permalink / raw)
  To: Joe Perches
  Cc: Greg Kroah-Hartman, Linus Torvalds, Linux Kernel Mailing List,
	Joey Pabalinas, kernelnewbies

[-- Attachment #1.1: Type: text/plain, Size: 877 bytes --]

On Sun, Dec 16, 2018 at 11:17:34AM -0800, Joe Perches wrote:
> On Sun, 2018-12-16 at 09:06 -1000, Joey Pabalinas wrote:
> > I spent a lot of time trying to find an LKML archive in Maildir format
> > that I could use for local searches with nutmuch or something, but all
> > the links I was able to find were all dead.
> 
> You might instead use
> 
> https://www.kernel.org/lore.html
> https://git.kernel.org/pub/scm/public-inbox/vger.kernel.org/git.git/

That was my first attempt, but the ducumentation for the public-inbox
format is sort of terrible, and after a few hours trying to convert it
to Maildir I just gave up.

I ended up just slowly scraping lkml.org for a couple weeks so I
wouldn't disrupt anything and it worked fairly well. Just looking for
advice on where to host this now so others might be able to use it.

-- 
Cheers,
Joey Pabalinas

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 170 bytes --]

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] LKML Archive in Maildir Format
  2018-12-16 19:06 [RFC] LKML Archive in Maildir Format Joey Pabalinas
  2018-12-16 19:17 ` Joe Perches
@ 2018-12-16 19:46 ` Konstantin Ryabitsev
  2018-12-16 19:53   ` Joey Pabalinas
  1 sibling, 1 reply; 7+ messages in thread
From: Konstantin Ryabitsev @ 2018-12-16 19:46 UTC (permalink / raw)
  To: Joey Pabalinas, Linux Kernel Mailing List, kernelnewbies,
	Linus Torvalds, Greg Kroah-Hartman

On Sun, Dec 16, 2018 at 09:06:39AM -1000, Joey Pabalinas wrote:
> I spent a lot of time trying to find an LKML archive in Maildir format
> that I could use for local searches with nutmuch or something, but all
> the links I was able to find were all dead.
> 
> I ended up just compiling one myself and I currently host it at:
> 
> https://alyptik.org/lkml.tar.xz

You seem to have duplicated a lot of effort that has already been done
to compile the archive on lore.kernel.org.

> It's possible I'm the only weirdo who finds this kind of thing useful, but
> I figured I should share it just in case I'm not.

The maildir format is kind of terrible for LKML, because having millions
of messages in a single directory is very hard on the underlying FS. If
you break it up into multiple folders, then it becomes difficult to
search. This is the main reason why we have chosen to go with the
public-inbox format, which solves both of these problems and allows for
a very efficient archive updating and replication using git.

> It's about 1.1 million files, I was wondering if anyone had an idea of a
> better way to host this? I've tried Github and GitLab, but they don't
> appreciate repos with that many files, hah.

Like I said, you seem to be going down the road we've already tried and
rejected. :)

-K

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] LKML Archive in Maildir Format
  2018-12-16 19:46 ` Konstantin Ryabitsev
@ 2018-12-16 19:53   ` Joey Pabalinas
  0 siblings, 0 replies; 7+ messages in thread
From: Joey Pabalinas @ 2018-12-16 19:53 UTC (permalink / raw)
  To: Joey Pabalinas, Linux Kernel Mailing List, kernelnewbies,
	Linus Torvalds, Greg Kroah-Hartman

[-- Attachment #1.1: Type: text/plain, Size: 1967 bytes --]

On Sun, Dec 16, 2018 at 02:46:49PM -0500, Konstantin Ryabitsev wrote:
> On Sun, Dec 16, 2018 at 09:06:39AM -1000, Joey Pabalinas wrote:
> > I spent a lot of time trying to find an LKML archive in Maildir format
> > that I could use for local searches with nutmuch or something, but all
> > the links I was able to find were all dead.
> > 
> > I ended up just compiling one myself and I currently host it at:
> > 
> > https://alyptik.org/lkml.tar.xz
> 
> You seem to have duplicated a lot of effort that has already been done
> to compile the archive on lore.kernel.org.

Absolutely correct, haha.

> 
> > It's possible I'm the only weirdo who finds this kind of thing useful, but
> > I figured I should share it just in case I'm not.
> 
> The maildir format is kind of terrible for LKML, because having millions
> of messages in a single directory is very hard on the underlying FS. If
> you break it up into multiple folders, then it becomes difficult to
> search. This is the main reason why we have chosen to go with the
> public-inbox format, which solves both of these problems and allows for
> a very efficient archive updating and replication using git.
> 
> > It's about 1.1 million files, I was wondering if anyone had an idea of a
> > better way to host this? I've tried Github and GitLab, but they don't
> > appreciate repos with that many files, hah.
> 
> Like I said, you seem to be going down the road we've already tried and
> rejected. :)

Yes, I had a strong suspicion I might be the only crazy person who prefers this
kind of format :)

My only comment on the public-mailbox choice is that the documentation
is very sparse and erratic. Myself and a couple other people just
couldn't figure out how to convert that format to Maildir or some other
format you could feed into a reader like neomutt.

Do you have any advice on how to convert those public-inbox files
correctly?

-- 
Cheers,
Joey Pabalinas

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 170 bytes --]

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] LKML Archive in Maildir Format
  2018-12-16 19:21   ` Joey Pabalinas
@ 2018-12-16 19:55     ` Konstantin Ryabitsev
  2018-12-16 21:55       ` Joey Pabalinas
  0 siblings, 1 reply; 7+ messages in thread
From: Konstantin Ryabitsev @ 2018-12-16 19:55 UTC (permalink / raw)
  To: Joey Pabalinas, Joe Perches, Linux Kernel Mailing List,
	kernelnewbies, Linus Torvalds, Greg Kroah-Hartman

On Sun, Dec 16, 2018 at 09:21:35AM -1000, Joey Pabalinas wrote:
> That was my first attempt, but the ducumentation for the public-inbox
> format is sort of terrible, 

I'm surprised you think so, because it's basically a simple file called
"m" that is updated on each commit and contains the body of the
message.

> and after a few hours trying to convert it to Maildir I just gave up.

It's as easy as something like this:

for commit in $(git rev-list master); do:
  git show $commit:m > maildir/new/$commit
done

You have to do it per each of the shards for the complete archive.

-K

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC] LKML Archive in Maildir Format
  2018-12-16 19:55     ` Konstantin Ryabitsev
@ 2018-12-16 21:55       ` Joey Pabalinas
  0 siblings, 0 replies; 7+ messages in thread
From: Joey Pabalinas @ 2018-12-16 21:55 UTC (permalink / raw)
  To: Joey Pabalinas, Joe Perches, Linux Kernel Mailing List,
	kernelnewbies, Linus Torvalds, Greg Kroah-Hartman

[-- Attachment #1.1: Type: text/plain, Size: 924 bytes --]

On Sun, Dec 16, 2018 at 02:55:05PM -0500, Konstantin Ryabitsev wrote:
> On Sun, Dec 16, 2018 at 09:21:35AM -1000, Joey Pabalinas wrote:
> > That was my first attempt, but the ducumentation for the public-inbox
> > format is sort of terrible, 
> 
> I'm surprised you think so, because it's basically a simple file called
> "m" that is updated on each commit and contains the body of the
> message.
> 
> > and after a few hours trying to convert it to Maildir I just gave up.
> 
> It's as easy as something like this:
> 
> for commit in $(git rev-list master); do:
>   git show $commit:m > maildir/new/$commit
> done
> 
> You have to do it per each of the shards for the complete archive.

Ah dang, I was trying to use stuff like ssoma to split it, no wonder it
didn't work.  Not sure why I didn't think to try any git commands...

Well, at least now I know, ha. Thanks!

-- 
Cheers,
Joey Pabalinas

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 170 bytes --]

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@kernelnewbies.org
https://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, back to index

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-16 19:06 [RFC] LKML Archive in Maildir Format Joey Pabalinas
2018-12-16 19:17 ` Joe Perches
2018-12-16 19:21   ` Joey Pabalinas
2018-12-16 19:55     ` Konstantin Ryabitsev
2018-12-16 21:55       ` Joey Pabalinas
2018-12-16 19:46 ` Konstantin Ryabitsev
2018-12-16 19:53   ` Joey Pabalinas

Kernel Newbies archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kernelnewbies/0 kernelnewbies/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kernelnewbies kernelnewbies/ https://lore.kernel.org/kernelnewbies \
		kernelnewbies@kernelnewbies.org kernelnewbies@archiver.kernel.org
	public-inbox-index kernelnewbies


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernelnewbies.kernelnewbies


AGPL code for this site: git clone https://public-inbox.org/ public-inbox