amass is a high-throughput enterprise-grade web-crawler that crawls specific URLs. It can be used to fetch
millions of url’s per hour.
amass is different than other crawlers like
crawler4j that it does not crawl
the nested pages. Instead it just gathers and collects various URLs as supplied. Thus, it amasses specific
data from the internet, and hence the name
- Enterprise-grade: crawl milions of URLs without worry
- A priority based queue for crawling urgent URLs faster
- Support for pre-crawl and post-crawl handler
- Mechanism to prevent crawling via the pre-crawl handler
- Support for multiple submission of a URL, which increase its priority
- Nano-time accuracy for ordering when priority is the same
The library is released under the terms of Apache Public License Version 2.