Tanabicom: Right-Click Here to Bookmark
Linkspider
This is a project originally designed to be a "linkback checker". Feed it a list of URLs, and it would crawl those sites and tell you which ones are linking back to your sites. This is very helpful if you're operating a large scale link exchange program.
However, more than that, it's a good frame for building any kind of web spider or crawler. Here are some key points about it:
- Written in C - Known to build on Linux and Solaris
- Multithreaded
- Uses libcurl for connections.
- Uses the HTML parsing and hash tables from GNU wget-1.11
- Very small, and fairly easy to work with.
Because it relies on components from wget, it's offered under the same GNU license. It requires a few minor tweaks prior to release, but will be offered in the near future.