I used Devdocs.io for a couple of years now. I appreciate the effort that went into this application. I use it for DOM, Ruby, HTTP, Rails, and others.
So I started to think about how hard could it be to do something similar for Linux man pages. How hard is it to collect all Linux man pages and render them in HTML and push them to the web? I see some websites are already doing that but they didn’t get updated for some time now. the style and readability need a bit of work.
Linux man pages are written in a typesetting language called roff Linux comes with a tool Groff to render it to other formats like PDF, ASCII, and PS. if you have a man page file and you want to render it in HTML for example you can do that
1groff -mandoc -Thtml /usr/share/man/man1/411toppm.1
which will convert it to an HTML page and write to STDOUT.
The problem now is that we need all the man pages written for all software, this is a quest that’s hard to fulfill so I started by downloading all software in the Archlinux repositories and extracting their man pages from their pages.
Archlinux repositories are HTTP servers that we use to download packages, each server had a different directory for a different type of software:
Each directory will have a .db file with the same name that has a list of all software in the directory, their names, and other meta information for example core/ directory has core.db file. it’s a tar directory that you need to extract and inside it, there is a desc file with meta information for each package.
The format of the desc file is nothing I have seen before, it’s a text file that starts with a line of a label followed by a line for the value like
So I had to get all these files to parse them and get the file name for each package.
But as we don’t need the whole package we need the man files. Instead of downloading the package to the disk I streamed it to
tar and extracted only files inside
usr/share/man/ to the disk. So I ended up with all the man pages of all packages minus the files I don’t need.
The really hard part for me was parsing all these files and discovering their formats. they’re not documented anywhere so I had to learn it through experimentation.
The end results can be found here Ag(1) This is the man page of Ag silver searcher.