* [RFC] LKML Archive in Maildir Format @ 2018-12-16 19:06 Joey Pabalinas 2018-12-16 19:17 ` Joe Perches 2018-12-16 19:46 ` Konstantin Ryabitsev 0 siblings, 2 replies; 14+ messages in thread From: Joey Pabalinas @ 2018-12-16 19:06 UTC (permalink / raw) To: Linux Kernel Mailing List Cc: kernelnewbies, Linus Torvalds, Greg Kroah-Hartman, Joey Pabalinas [-- Attachment #1: Type: text/plain, Size: 677 bytes --] I spent a lot of time trying to find an LKML archive in Maildir format that I could use for local searches with nutmuch or something, but all the links I was able to find were all dead. I ended up just compiling one myself and I currently host it at: https://alyptik.org/lkml.tar.xz It's possible I'm the only weirdo who finds this kind of thing useful, but I figured I should share it just in case I'm not. It's about 1.1 million files, I was wondering if anyone had an idea of a better way to host this? I've tried Github and GitLab, but they don't appreciate repos with that many files, hah. Open to suggestions, thanks! -- Cheers, Joey Pabalinas [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] LKML Archive in Maildir Format 2018-12-16 19:06 [RFC] LKML Archive in Maildir Format Joey Pabalinas @ 2018-12-16 19:17 ` Joe Perches 2018-12-16 19:21 ` Joey Pabalinas 2018-12-16 19:46 ` Konstantin Ryabitsev 1 sibling, 1 reply; 14+ messages in thread From: Joe Perches @ 2018-12-16 19:17 UTC (permalink / raw) To: Joey Pabalinas, Linux Kernel Mailing List Cc: kernelnewbies, Linus Torvalds, Greg Kroah-Hartman On Sun, 2018-12-16 at 09:06 -1000, Joey Pabalinas wrote: > I spent a lot of time trying to find an LKML archive in Maildir format > that I could use for local searches with nutmuch or something, but all > the links I was able to find were all dead. You might instead use https://www.kernel.org/lore.html https://git.kernel.org/pub/scm/public-inbox/vger.kernel.org/git.git/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] LKML Archive in Maildir Format 2018-12-16 19:17 ` Joe Perches @ 2018-12-16 19:21 ` Joey Pabalinas 2018-12-16 19:55 ` Konstantin Ryabitsev 2018-12-18 20:26 ` Jasper Spaans 0 siblings, 2 replies; 14+ messages in thread From: Joey Pabalinas @ 2018-12-16 19:21 UTC (permalink / raw) To: Joe Perches Cc: Joey Pabalinas, Linux Kernel Mailing List, kernelnewbies, Linus Torvalds, Greg Kroah-Hartman [-- Attachment #1: Type: text/plain, Size: 877 bytes --] On Sun, Dec 16, 2018 at 11:17:34AM -0800, Joe Perches wrote: > On Sun, 2018-12-16 at 09:06 -1000, Joey Pabalinas wrote: > > I spent a lot of time trying to find an LKML archive in Maildir format > > that I could use for local searches with nutmuch or something, but all > > the links I was able to find were all dead. > > You might instead use > > https://www.kernel.org/lore.html > https://git.kernel.org/pub/scm/public-inbox/vger.kernel.org/git.git/ That was my first attempt, but the ducumentation for the public-inbox format is sort of terrible, and after a few hours trying to convert it to Maildir I just gave up. I ended up just slowly scraping lkml.org for a couple weeks so I wouldn't disrupt anything and it worked fairly well. Just looking for advice on where to host this now so others might be able to use it. -- Cheers, Joey Pabalinas [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] LKML Archive in Maildir Format 2018-12-16 19:21 ` Joey Pabalinas @ 2018-12-16 19:55 ` Konstantin Ryabitsev 2018-12-16 21:55 ` Joey Pabalinas 2018-12-18 20:26 ` Jasper Spaans 1 sibling, 1 reply; 14+ messages in thread From: Konstantin Ryabitsev @ 2018-12-16 19:55 UTC (permalink / raw) To: Joey Pabalinas, Joe Perches, Linux Kernel Mailing List, kernelnewbies, Linus Torvalds, Greg Kroah-Hartman On Sun, Dec 16, 2018 at 09:21:35AM -1000, Joey Pabalinas wrote: > That was my first attempt, but the ducumentation for the public-inbox > format is sort of terrible, I'm surprised you think so, because it's basically a simple file called "m" that is updated on each commit and contains the body of the message. > and after a few hours trying to convert it to Maildir I just gave up. It's as easy as something like this: for commit in $(git rev-list master); do: git show $commit:m > maildir/new/$commit done You have to do it per each of the shards for the complete archive. -K ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] LKML Archive in Maildir Format 2018-12-16 19:55 ` Konstantin Ryabitsev @ 2018-12-16 21:55 ` Joey Pabalinas 0 siblings, 0 replies; 14+ messages in thread From: Joey Pabalinas @ 2018-12-16 21:55 UTC (permalink / raw) To: Joey Pabalinas, Joe Perches, Linux Kernel Mailing List, kernelnewbies, Linus Torvalds, Greg Kroah-Hartman [-- Attachment #1: Type: text/plain, Size: 924 bytes --] On Sun, Dec 16, 2018 at 02:55:05PM -0500, Konstantin Ryabitsev wrote: > On Sun, Dec 16, 2018 at 09:21:35AM -1000, Joey Pabalinas wrote: > > That was my first attempt, but the ducumentation for the public-inbox > > format is sort of terrible, > > I'm surprised you think so, because it's basically a simple file called > "m" that is updated on each commit and contains the body of the > message. > > > and after a few hours trying to convert it to Maildir I just gave up. > > It's as easy as something like this: > > for commit in $(git rev-list master); do: > git show $commit:m > maildir/new/$commit > done > > You have to do it per each of the shards for the complete archive. Ah dang, I was trying to use stuff like ssoma to split it, no wonder it didn't work. Not sure why I didn't think to try any git commands... Well, at least now I know, ha. Thanks! -- Cheers, Joey Pabalinas [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] LKML Archive in Maildir Format 2018-12-16 19:21 ` Joey Pabalinas 2018-12-16 19:55 ` Konstantin Ryabitsev @ 2018-12-18 20:26 ` Jasper Spaans 2018-12-18 22:53 ` Joey Pabalinas 1 sibling, 1 reply; 14+ messages in thread From: Jasper Spaans @ 2018-12-18 20:26 UTC (permalink / raw) To: Joey Pabalinas, Joe Perches, Linux Kernel Mailing List [-- Attachment #1.1: Type: text/plain, Size: 1510 bytes --] Hi Joey, On Sun, Dec 16, 2018 at 09:21:35AM -1000, Joey Pabalinas wrote: > > > I spent a lot of time trying to find an LKML archive in Maildir format > > > that I could use for local searches with nutmuch or something, but all > > > the links I was able to find were all dead. > > > > You might instead use > > > > https://www.kernel.org/lore.html > > https://git.kernel.org/pub/scm/public-inbox/vger.kernel.org/git.git/ > > That was my first attempt, but the ducumentation for the public-inbox > format is sort of terrible, and after a few hours trying to convert it > to Maildir I just gave up. > > I ended up just slowly scraping lkml.org for a couple weeks so I > wouldn't disrupt anything and it worked fairly well. Just looking for > advice on where to host this now so others might be able to use it. Now you've caught my attention; first of all, there are more than 3M messages stored in the lkml.org datase, so I guess you've missed some messages or something is really broken. Besides, unless you figured out how to get to the raw data, you've just scraped a rendering which discards stuff like pgp signatures etc and has very incomplete headers. Unless you don't care for those of course :) Note that I've also been toying with the lore dataset, and wrote a tiny tool to get Maildir-like data out of it; this code is a bit of a single-use-jig so you'll need to do some coding if you really want to use it. Attached anyway. All the best and enjoy, Jasper [-- Attachment #1.2: Pipfile --] [-- Type: text/plain, Size: 168 bytes --] [[source]] url = "https://pypi.org/simple" verify_ssl = true name = "pypi" [packages] gitpython = "*" ipython = "*" [dev-packages] [requires] python_version = "3.7" [-- Attachment #1.3: test.py --] [-- Type: text/x-python, Size: 1130 bytes --] from email.parser import BytesParser from email.message import EmailMessage from email.policy import default from git import Repo our_last_id = '<dc4d502c-bc3c-46e3-a984-41271951a5f7@mellanox.com>' #'<20180711142744.GN3593@linux.vnet.ibm.com>' repo = Repo('/Users/spaans/xsrc/lkml/lkml/git/6.git') commit = repo.commit("master") counter = 5000 froms = set() while True: tree = commit.tree blob = tree['m'] data = blob.data_stream.read() msg = BytesParser(policy=default).parsebytes(data) msgid = msg['Message-ID'] from_ = msg['From'] froms.add(from_) print(msgid) #import pdb; pdb.set_trace() if len(froms) > 1000: print("HAVE LOTS OF FRIENDS NOW") break if msgid == our_last_id: print("LADIES & GENTLEMEN, WE'VE GOT HIM") break parents = commit.parents if len(parents) != 1: print("WUH") break else: commit = commit.parents[0] #with open("output/%04d.eml" % counter, "bw") as f: # f.write(data) counter -= 1 import pprint pprint.pprint(froms) [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 1528 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] LKML Archive in Maildir Format 2018-12-18 20:26 ` Jasper Spaans @ 2018-12-18 22:53 ` Joey Pabalinas 0 siblings, 0 replies; 14+ messages in thread From: Joey Pabalinas @ 2018-12-18 22:53 UTC (permalink / raw) To: Jasper Spaans; +Cc: Joey Pabalinas, Joe Perches, Linux Kernel Mailing List [-- Attachment #1: Type: text/plain, Size: 979 bytes --] On Tue, Dec 18, 2018 at 09:26:27PM +0100, Jasper Spaans wrote: > Now you've caught my attention; first of all, there are more than 3M > messages stored in the lkml.org datase, so I guess you've missed some > messages or something is really broken. > > Besides, unless you figured out how to get to the raw data, you've just > scraped a rendering which discards stuff like pgp signatures etc and has > very incomplete headers. Unless you don't care for those of course :) > > Note that I've also been toying with the lore dataset, and wrote a tiny tool > to get Maildir-like data out of it; this code is a bit of a single-use-jig > so you'll need to do some coding if you really want to use it. Attached > anyway. Yeah, after looking closer at it last week, something here is very weird. This is definitely far from complete. When I have some free time I'm just going to give it another go with the public-inbox conversion. -- Cheers, Joey Pabalinas [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] LKML Archive in Maildir Format 2018-12-16 19:06 [RFC] LKML Archive in Maildir Format Joey Pabalinas 2018-12-16 19:17 ` Joe Perches @ 2018-12-16 19:46 ` Konstantin Ryabitsev 2018-12-16 19:53 ` Joey Pabalinas 1 sibling, 1 reply; 14+ messages in thread From: Konstantin Ryabitsev @ 2018-12-16 19:46 UTC (permalink / raw) To: Joey Pabalinas, Linux Kernel Mailing List, kernelnewbies, Linus Torvalds, Greg Kroah-Hartman On Sun, Dec 16, 2018 at 09:06:39AM -1000, Joey Pabalinas wrote: > I spent a lot of time trying to find an LKML archive in Maildir format > that I could use for local searches with nutmuch or something, but all > the links I was able to find were all dead. > > I ended up just compiling one myself and I currently host it at: > > https://alyptik.org/lkml.tar.xz You seem to have duplicated a lot of effort that has already been done to compile the archive on lore.kernel.org. > It's possible I'm the only weirdo who finds this kind of thing useful, but > I figured I should share it just in case I'm not. The maildir format is kind of terrible for LKML, because having millions of messages in a single directory is very hard on the underlying FS. If you break it up into multiple folders, then it becomes difficult to search. This is the main reason why we have chosen to go with the public-inbox format, which solves both of these problems and allows for a very efficient archive updating and replication using git. > It's about 1.1 million files, I was wondering if anyone had an idea of a > better way to host this? I've tried Github and GitLab, but they don't > appreciate repos with that many files, hah. Like I said, you seem to be going down the road we've already tried and rejected. :) -K ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] LKML Archive in Maildir Format 2018-12-16 19:46 ` Konstantin Ryabitsev @ 2018-12-16 19:53 ` Joey Pabalinas 2019-01-04 1:35 ` Eric Wong 0 siblings, 1 reply; 14+ messages in thread From: Joey Pabalinas @ 2018-12-16 19:53 UTC (permalink / raw) To: Joey Pabalinas, Linux Kernel Mailing List, kernelnewbies, Linus Torvalds, Greg Kroah-Hartman [-- Attachment #1: Type: text/plain, Size: 1967 bytes --] On Sun, Dec 16, 2018 at 02:46:49PM -0500, Konstantin Ryabitsev wrote: > On Sun, Dec 16, 2018 at 09:06:39AM -1000, Joey Pabalinas wrote: > > I spent a lot of time trying to find an LKML archive in Maildir format > > that I could use for local searches with nutmuch or something, but all > > the links I was able to find were all dead. > > > > I ended up just compiling one myself and I currently host it at: > > > > https://alyptik.org/lkml.tar.xz > > You seem to have duplicated a lot of effort that has already been done > to compile the archive on lore.kernel.org. Absolutely correct, haha. > > > It's possible I'm the only weirdo who finds this kind of thing useful, but > > I figured I should share it just in case I'm not. > > The maildir format is kind of terrible for LKML, because having millions > of messages in a single directory is very hard on the underlying FS. If > you break it up into multiple folders, then it becomes difficult to > search. This is the main reason why we have chosen to go with the > public-inbox format, which solves both of these problems and allows for > a very efficient archive updating and replication using git. > > > It's about 1.1 million files, I was wondering if anyone had an idea of a > > better way to host this? I've tried Github and GitLab, but they don't > > appreciate repos with that many files, hah. > > Like I said, you seem to be going down the road we've already tried and > rejected. :) Yes, I had a strong suspicion I might be the only crazy person who prefers this kind of format :) My only comment on the public-mailbox choice is that the documentation is very sparse and erratic. Myself and a couple other people just couldn't figure out how to convert that format to Maildir or some other format you could feed into a reader like neomutt. Do you have any advice on how to convert those public-inbox files correctly? -- Cheers, Joey Pabalinas [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] LKML Archive in Maildir Format 2018-12-16 19:53 ` Joey Pabalinas @ 2019-01-04 1:35 ` Eric Wong 2019-03-05 20:48 ` Bjorn Helgaas 0 siblings, 1 reply; 14+ messages in thread From: Eric Wong @ 2019-01-04 1:35 UTC (permalink / raw) To: Joey Pabalinas Cc: linux-kernel, kernelnewbies, Linus Torvalds, Greg Kroah-Hartman Joey Pabalinas <joeypabalinas@gmail.com> wrote: > My only comment on the public-mailbox choice is that the documentation > is very sparse and erratic. Myself and a couple other people just > couldn't figure out how to convert that format to Maildir or some other > format you could feed into a reader like neomutt. Sorry, I didn't notice this before. I started making some attempts at improving documentation (among other things, when time permits) to public-inbox: https://public-inbox.org/meta/20190102083305.30473-1-e@80x24.org/ And without knowing anything about git or public-inbox, you can get NNTP messages into Maildir or mboxrd pretty easily. Nothing new to learn :) I wrote a one-off Ruby years ago (before public-inbox) for converting slrnspools to Maildir (sample slrnpull.conf below). But yeah, I wouldn't recommend 3M+ messages in a Maildir... ==> slrnspool2maildir <== #!/usr/bin/ruby require 'socket' require 'fileutils' HOSTNAME = Socket.gethostname usage = "Usage #$0 <spooldir> <maildir>" spooldir = ARGV[0] or abort usage maildir = ARGV[1] or abort usage f = base = nil nr = 0 %w(cur new tmp).each { |x| FileUtils.mkpath("#{maildir}/#{x}") } Dir.glob("#{spooldir}/*").each do |src| File.file?(src) or next base = File.basename(src) dest = "#{maildir}/new/#{Time.now.to_i}_#{base}_0.#{HOSTNAME}:2," begin File.link(src, dest) rescue Errno::EEXIST warn "#{dest} already exists" next end File.unlink(src) end __END__ ==> slrnpull.conf <== # group_name max expire headers_only inbox.com.example.news.group.name 1000000000 1000000000 0 # usage: slrnpull -d $PWD -h news.example.com --no-post # Wouldn't be hard to script something using Net::NNTP in Perl # to write directly to Maildirs, either. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] LKML Archive in Maildir Format 2019-01-04 1:35 ` Eric Wong @ 2019-03-05 20:48 ` Bjorn Helgaas 2019-03-05 23:26 ` Eric Wong 0 siblings, 1 reply; 14+ messages in thread From: Bjorn Helgaas @ 2019-03-05 20:48 UTC (permalink / raw) To: Eric Wong Cc: Joey Pabalinas, Linux Kernel Mailing List, kernelnewbies, Linus Torvalds, Greg Kroah-Hartman, Konstantin Ryabitsev, Eric Biederman, Jasper Spaans OK, so I understand how to clone archives from lore.kernel.org and how to convert a git archive to a maildir (thanks, Konstantin!) What I *don't* understand is how to effectively read this locally. Ideally I'd like to run mutt, possibly with notmuch for indexing. But a maildir with 3M files seems impractical. I did actually try it (without notmuch), but it takes mutt about 5 minutes to start up. And the maildir is about 23G, compared with 7.5G for the git archive. Any pointers? I guess there's no mutt backend that can read a public-inbox archive directly? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] LKML Archive in Maildir Format 2019-03-05 20:48 ` Bjorn Helgaas @ 2019-03-05 23:26 ` Eric Wong 2019-03-06 20:50 ` Bjorn Helgaas 0 siblings, 1 reply; 14+ messages in thread From: Eric Wong @ 2019-03-05 23:26 UTC (permalink / raw) To: Bjorn Helgaas Cc: Joey Pabalinas, linux-kernel, kernelnewbies, Linus Torvalds, Greg Kroah-Hartman, Konstantin Ryabitsev, Eric Biederman, Jasper Spaans Bjorn Helgaas <bhelgaas@google.com> wrote: > OK, so I understand how to clone archives from lore.kernel.org and how > to convert a git archive to a maildir (thanks, Konstantin!) > > What I *don't* understand is how to effectively read this locally. > Ideally I'd like to run mutt, possibly with notmuch for indexing. But > a maildir with 3M files seems impractical. I did actually try it > (without notmuch), but it takes mutt about 5 minutes to start up. And > the maildir is about 23G, compared with 7.5G for the git archive. Right, relying on Maildir for long-term storage of giant archives is not a usable solution with any general purpose FSes I know about. git itself had the same problem with loose object scalability in the old days and packs were invented as a result. > Any pointers? I guess there's no mutt backend that can read a > public-inbox archive directly? There's mutt patches to support reading over NNTP, so that works: mutt -f news://$INBOX_HOST/$INBOX_NEWSGROUP I don't think mutt handles mboxrd 100% correctly, but it's close enough that you can can download the gzipped mboxrd of a search query and open it via "mutt -f /path/to/downloaded/mbox.gz" curl -XPOST -OJ "$INBOX_URL/?q=$SEARCH_QUERY&x=m" POST is required(*), and -OJ lets it use the Content-Disposition: header for a meaningful server-generated name, but you can also redirect the result to whatever you want. For all messages since March 1, you could use: SEARCH_QUERY=d:20190301.. All the supported search queries are documented in $INBOX_URL/_/text/help/ and the search prefixes (e.g. "d:", "s:", "b:") are modeled after what's in mairix. You'll need to escape the queries for URIs (e.g. " " => "+", and so on). Xapian requires date ranges to be denoted with ".." whereas mairix uses "-" for ranges. The main thing public-inbox search misses from mairix is support for "-t" which grabs non-matching messages from the same thread. I would like to support that someday, but don't have enough time (or funding) to make it happen at the moment. (*) to reliably avoid wasting resources from spiders/prefetchers ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] LKML Archive in Maildir Format 2019-03-05 23:26 ` Eric Wong @ 2019-03-06 20:50 ` Bjorn Helgaas 2019-03-07 3:44 ` Eric Wong 0 siblings, 1 reply; 14+ messages in thread From: Bjorn Helgaas @ 2019-03-06 20:50 UTC (permalink / raw) To: Eric Wong Cc: Joey Pabalinas, Linux Kernel Mailing List, kernelnewbies, Linus Torvalds, Greg Kroah-Hartman, Konstantin Ryabitsev, Eric Biederman, Jasper Spaans On Tue, Mar 5, 2019 at 5:26 PM Eric Wong <e@80x24.org> wrote: > Bjorn Helgaas <bhelgaas@google.com> wrote: > > Any pointers? I guess there's no mutt backend that can read a > > public-inbox archive directly? > > There's mutt patches to support reading over NNTP, so that > works: > > mutt -f news://$INBOX_HOST/$INBOX_NEWSGROUP Neomutt includes NNTP support, so I tried this: neomutt -f news://nntp.lore.kernel.org/org.kernel.vger.linux-kernel which worked OK but (1) I only see the most recent 1000 messages and (2) obviously isn't reading a *local* archive. Neomutt took about 45 seconds to start up over my wimpy ISP. I assume I could probably have a local archive and run a local NNTP server and point neomutt at that local server. But I don't know how full-archive searching would work there. > I don't think mutt handles mboxrd 100% correctly, but it's close > enough that you can can download the gzipped mboxrd of a search > query and open it via "mutt -f /path/to/downloaded/mbox.gz" > > curl -XPOST -OJ "$INBOX_URL/?q=$SEARCH_QUERY&x=m" I got nothing at all with -XPOST, but this: curl -OJ "https://lore.kernel.org/linux-pci/?q=d:20190301..&x=m" got me the HTML source. Nothing that looks like mboxrd. I assume this is stupid user error on my part, but even with that resolved, it wouldn't have the nice git fetch properties of the git archive, i.e., incremental updates of only new stuff, would it? I think my ideal solution would be a mutt that could read the git archive directly, plus a notmuch index. But AFAIK, mutt can't do that, and notmuch only works with one message per file, not with the git archive. Something that might work would be to use Konstantin's "git archive to maildir" hint but shard into a bunch of smaller maildirs instead of one big one, then have notmuch index those, and use mutt or vim with notmuch queries instead of having it read in a maildir. But I feel like I must be missing the solution that's obvious to everybody but me. Bjorn ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] LKML Archive in Maildir Format 2019-03-06 20:50 ` Bjorn Helgaas @ 2019-03-07 3:44 ` Eric Wong 0 siblings, 0 replies; 14+ messages in thread From: Eric Wong @ 2019-03-07 3:44 UTC (permalink / raw) To: Bjorn Helgaas Cc: Joey Pabalinas, linux-kernel, kernelnewbies, Linus Torvalds, Greg Kroah-Hartman, Konstantin Ryabitsev, Eric Biederman, Jasper Spaans Bjorn Helgaas <bhelgaas@google.com> wrote: > On Tue, Mar 5, 2019 at 5:26 PM Eric Wong <e@80x24.org> wrote: > > Bjorn Helgaas <bhelgaas@google.com> wrote: > > > > Any pointers? I guess there's no mutt backend that can read a > > > public-inbox archive directly? > > > > There's mutt patches to support reading over NNTP, so that > > works: > > > > mutt -f news://$INBOX_HOST/$INBOX_NEWSGROUP > > Neomutt includes NNTP support, so I tried this: > > neomutt -f news://nntp.lore.kernel.org/org.kernel.vger.linux-kernel > > which worked OK but (1) I only see the most recent 1000 messages and > (2) obviously isn't reading a *local* archive. Neomutt took about 45 > seconds to start up over my wimpy ISP. > > I assume I could probably have a local archive and run a local NNTP > server and point neomutt at that local server. But I don't know how > full-archive searching would work there. Right. AFAIK there isn't a good solution for search via NNTP. > > I don't think mutt handles mboxrd 100% correctly, but it's close > > enough that you can can download the gzipped mboxrd of a search > > query and open it via "mutt -f /path/to/downloaded/mbox.gz" > > > > curl -XPOST -OJ "$INBOX_URL/?q=$SEARCH_QUERY&x=m" > > I got nothing at all with -XPOST, but this: Ah, I guess nginx (or something in AWS) rejects POST without Content-Length headers. Adding "-HContent-Length:0" to the command-line with -XPOST works for lore. > curl -OJ "https://lore.kernel.org/linux-pci/?q=d:20190301..&x=m" > > got me the HTML source. Nothing that looks like mboxrd. I assume Right. The "x=m" requests an mbox; but it's only available via POST requests (to prevent search engine spiders from wasting time on non-HTML content). With the HTML output in a browser, the "mbox.gz" button makes the POST request and allows you to download the mbox. > this is stupid user error on my part, but even with that resolved, it > wouldn't have the nice git fetch properties of the git archive, i.e., > incremental updates of only new stuff, would it? You could bump d:YYYYMMDD (there's also "dt:" for date-time if you need more precision). > I think my ideal solution would be a mutt that could read the git > archive directly, plus a notmuch index. But AFAIK, mutt can't do > that, and notmuch only works with one message per file, not with the > git archive. > > Something that might work would be to use Konstantin's "git archive to > maildir" hint but shard into a bunch of smaller maildirs instead of > one big one, then have notmuch index those, and use mutt or vim with > notmuch queries instead of having it read in a maildir. Small Maildirs work great, but large ones fall over. I don't think having a bunch of smaller Maildirs would help notmuch since notmuch still needs to know each file path. The only way I could see notmuch/Maildir working well is to keep the overall number of messages relatively small. One of my longer-term goals is to write a mairix-like tool in Perl which works with public-inbox archives; but I barely have enough time for public-inbox these days :< mairix works with gzipped mboxes, which is great for large archives; but the indexing falls over since it rewrites the entire search index every time. SSDs have died as a result :< > But I feel like I must be missing the solution that's obvious to > everybody but me. Nope, you're not alone :) There's not a lot of mail software which can handle LKML-sized histories efficiently. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2019-03-07 3:44 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-12-16 19:06 [RFC] LKML Archive in Maildir Format Joey Pabalinas 2018-12-16 19:17 ` Joe Perches 2018-12-16 19:21 ` Joey Pabalinas 2018-12-16 19:55 ` Konstantin Ryabitsev 2018-12-16 21:55 ` Joey Pabalinas 2018-12-18 20:26 ` Jasper Spaans 2018-12-18 22:53 ` Joey Pabalinas 2018-12-16 19:46 ` Konstantin Ryabitsev 2018-12-16 19:53 ` Joey Pabalinas 2019-01-04 1:35 ` Eric Wong 2019-03-05 20:48 ` Bjorn Helgaas 2019-03-05 23:26 ` Eric Wong 2019-03-06 20:50 ` Bjorn Helgaas 2019-03-07 3:44 ` Eric Wong
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).