Emad Elsaid Download RSS Offline

Wednesday 12 October 2022

#ruby

Couple weeks ago I changed my workflow regarding reading/sending emails. So to have more control over my emails I started using offlineimap which can download your emails offline to a directory on your filesystem. I then used Mu which index this directory so you can search for emails offline using some queries, like "show me unread emails from inboxes" or you can search for a word in all your emails from all inboxes. Over all of that I used Mu4e which is an email client inside Emacs (my default editor). As I'm using Spacemacs So I added a binding that opens Mu4e using SPC M and boom I can see all my emails, I can search with s and I have a bookmark that shows all unread emails using bi.

Now I want the same for RSS.

Here is the problem, I made some research and couldn't find similar tools that does the same for RSS, although it would be easier, there are no authentication required like IMAP/SMTP servers, So I spent an hour or so writing a small script that does the same as offlineimap, this script on my machine is called offlinerss, it's the first piece of the puzzle and it looks like that

#!/usr/bin/env ruby
# frozen_string_literal: true

require 'bundler/inline'
require 'open-uri'
require 'fileutils'
require 'digest'
require 'yaml'

gemfile do
  source 'https://rubygems.org'
  gem 'rss'
end

def mkdir(*paths)
  path = File.join(*paths)
  FileUtils.mkdir(path) unless Dir.exist?(path)
  path
end

destination = mkdir(File.expand_path('~/rss/'))
inbox = mkdir(destination, 'INBOX')
meta_dir = mkdir(destination, '.meta')

config_file = File.join(destination, 'config.yml')
config = YAML.load_file(config_file)
urls = config['urls']

urls.each do |url|
  url_digest = Digest::SHA1.hexdigest(url)

  URI.open(url) do |rss|
    content = rss.read
    feed = RSS::Parser.parse(content)

    feed.items.each do |item|
      id = item.respond_to?(:id) ? item.id : item.guid
      id_digest = Digest::SHA1.hexdigest(id.content)
      file_basename = url_digest + '-' + id_digest + '.xml'

      next unless Dir.glob(File.join(destination, '**', file_basename)).empty?

      filename = File.join(inbox, file_basename)
      File.write(filename, item.to_s)
    end

    [{ start_tag: '<entry>', end_tag: '</entry>' }, { start_tag: '<item>', end_tag: '</item>' }].each do |tag|
      next unless content.include?(tag[:start_tag])

      content[content.index(tag[:start_tag])...(content.rindex(tag[:end_tag]) + tag[:end_tag].length)] = ''
    end

    metafile = File.join(meta_dir, url_digest + '.xml')
    File.write(metafile, content)
  end
end

I have a small config file in ~/rss/config.yml which has all the URLs I care for, so far just ruby/rails/go main blogs to be alerted by the latest versions.

urls:
   - https://server.tld/feed.rss
   - https://server.tld/feed.atom

This just reads the URLs, and saves each entry to a file on your machine ~/rss/INBOX if the file doesn't exist in any sub directory in ~/rss. Then it removes all entries/items from the feed and save the rest to ~/rss/.meta.

The file names of the RSS item is sha1(url)-sha1(item.id).xml and the meta file name is sha1(url).xml very simple.

So now I need to write a client that reads files in ~/rss/ and render the XML files and some actions to create directories under ~/rss/ and actions to move the file to another directory after it's read or the user want to move it to read-later or something, just like emails directories.

Another piece of the puzzle is indexer like what Mu does for the emails.

I was surprised that it was easy to just sit down and write the thing for myself, than searching for days for a solution.

Backlinks

🌻 Home

See Also

Cloning All Your GitHub Repositories or Updating Them in One Command
Deploy rails application with docker compose and capistrano
ERB Templates as Standalone Executables
Enforcing Project Structure With Rspec
Git as Messages Queue
🌻 Home
Implementing Hover Cards With Minimum Javascript
Integerating Jekyll and Octopress in Emacs
⌨️ Programming
Projects Files Flame Graph
Rails Accepts_nested_attributes for polymorphic relation solution
Representing Named Entity Mention in Postgres for Fast Queries
Tracing Ruby Applications Execution in 4 Lines