I used Devdocs.io for couple years now. I appreciate the effort that went to this application. I use it for DOM, Ruby, HTTP, Rails and others.
So I started to think about how hard could it be to do something similar for Linux man pages. How hard is it to collect all Linux man pages and render them in HTML and push them to the web. I see some website are already doing that but they didn’t get updated for some time now. the style and readability needs abit of work.
Linux man pages are written in a typesetting language called roff Linux comes with a tool groff to render it to other formats like PDF, ASCII, PS. if you have a man page file and you want to render it in HTML for example you can do that
groff -mandoc -Thtml /usr/share/man/man1/411toppm.1
which will convert it to HTML page and write to STDOUT.
The problem now is that we need all the man pages written for all software, this is a quest that’s hard to fulfill so I started by downloading all software in the Archlinux repositories and extract their man pages from their pages.
Archlinux repositories are HTTP servers that we use them to download packages, each server had different directory for different type of software like:
Each directory will have a .db file with the same name that has a list of all software in the directory, their names and other meta information for example core/ directory has core.db file. it’s a tar directory that you need to extract and inside it there is a desc file with meta information for each package.
The format of the desc file is nothing I have seen before, it’s a text file starts with a line of a label followed by a line for the value like
So I had to get all these files parse them and get the file name for each package.
But as we don’t need the whole package we just need the man files. Instead of
downloading the package to the disk I streamed it to
tar and extracted only
usr/share/man/ to the disk. So I ended up with all the man pages
of all packages minus the files I don’t need.
The really hard part for me was parsing all these files and discovering their formats. they’re not documented anywhere so I had to learn it through experimentation.
The end results can be found here Ag(1) This is the man page of Ag silver searcher.