sviluppiamo comunicazione con weLaika Advertising

Middleman Crawler

Tags: middleman, ruby, crawler, gem, extension
Filippo Gangi Dino -
Filippogangidino

Yesterday we released a Ruby gem called Middleman Crawler. It’s a Middleman extension which aims at finding any 4xx or 5xx error in your static site.

As better described in Github repository you can perform a crawler over your static site with command

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
  middleman crawler [options]

where [options] are:
  --environment, -e <s>: The Middleman Enviroment
  --username, -u <s>: HTTP Basic Username
  --password, -p <s>: HTTP Basic Password
  --wait, -w <f>: Seconds to wait between requests, may be fractional e.g. '1.5' (default: 3.0)
  --log, -l: Log results to file rawler_log.txt
  --logfile, -o <s>: Specify logfile, implies --log (default: rawler_log.txt)
  --css, -c: Check CSS links
  --skip, -s <s>: Skip URLs that match a regexp
  --iskip, -i <s>: Skip URLs that match a case insensitive regexp
  --include <s>: Only include URLs that match a regexp
  --iinclude <s>: Only include URLs that match a case insensitive regexp
  --local <s>: Restrict to the given URL and below. Equivalent to '--include ^http://mysite.com/*'.
  --ignore_fragments: Strips any fragment from parsed links

It will returns the list of pages that have returned an HTTP error code (4xx, 5xx).

We have developed this extension as very helpful if you must verify the site integrity when data entry needs a deeply check (mostly when dynamic pages are data-generated).

For a complete guide and contribute see at Github repository.