A tool for crawling and finding links to URLs which no longer exist
Go to file
Brian Picciano f5a91f918e Don't share http roundtripper between clients
This seems to fix the issues with concurrency.
2024-01-04 20:31:17 +01:00
cmd/deadlinks Add deadlinks command 2023-12-30 13:16:58 +01:00
client.go Don't share http roundtripper between clients 2024-01-04 20:31:17 +01:00
deadlinks.go Flesh out README and documentation 2023-12-30 13:43:06 +01:00
flake.lock Initial commit, SQLiteStore is mostly implemented 2023-12-26 23:18:09 +01:00
flake.nix Flesh out README and documentation 2023-12-30 13:43:06 +01:00
go.mod Add support for RSS feeds 2023-12-30 11:33:47 +01:00
go.sum Add support for RSS feeds 2023-12-30 11:33:47 +01:00
LICENSE.txt Flesh out README and documentation 2023-12-30 13:43:06 +01:00
parser_test.go Added HTTP(s)/HTML support 2023-12-30 11:22:09 +01:00
parser.go Add support for RSS feeds 2023-12-30 11:33:47 +01:00
README.md Flesh out README and documentation 2023-12-30 13:43:06 +01:00
resource.go Clean up yaml output 2023-12-30 12:17:36 +01:00
store_test.go Fix how iteration works in Store, since sqlite doesn't like concurrent access 2023-12-29 20:12:10 +01:00
store.go Fix how iteration works in Store, since sqlite doesn't like concurrent access 2023-12-29 20:12:10 +01:00
url.go Got DeadLinks basic functionality actually working 2023-12-30 10:31:30 +01:00

DeadLinks

A tool for crawling and finding links to URLs which no longer exist. deadlinks supports the HTTP(s) and gemini protocols, and is intended for periodically checking links on personal websites and blogs.

Library

The deadlinks package is designed to be easily embedded into a process and have its results displayed in something like a status page.

See the godocs for more info.

Command-Line

The command-line utility can be installed using go install:

go install code.betamike.com/mediocregopher/deadlinks/cmd/deadlinks

The -urls parameter is required. Given one or more URLs it will check each one for any dead links:

deadlinks -urls 'https://mediocregopher.com,gemini://mediocregopher.com'

Any links which are dead will be output to stdout as YAML objects, each containing the dead URL, the error encountered, and which pages link to it.

In order to recursively crawl through links you can give one or more regex patterns. Any URL which matches a pattern will have its links checked as well (and if any of those link URLs match a pattern their links will be checked, and so on):

deadlinks \
    -urls 'https://mediocregopher.com,gemini://mediocregopher.com' \
    -patterns '://mediocregopher.com'

There are further options available which affect the utility's behavior, see deadlinks -h for more.