Automated Reference URL Verification: Designing a Script to Mechanically Check AI-Generated Links
AI can generate URLs that do not exist
When I have AI create references for an article, it produces URLs that look correct at first glance. But when I actually try to visit those URLs, some of them turn out to be pages that do not exist.
This is a characteristic of how AI works. AI has learned what URL formats look like, so it generates plausible-looking URLs without checking whether they are actually accessible.
To address this, I built a script that automatically verifies whether reference URLs in articles can actually be accessed.
This article is limited to mechanical URL availability checks. I cover citation decisions and whether a source supports its claim in the twelve-step review workflow.
How the script works
The URL-checking script runs the following steps.
Step 1: Extract URLs
The script reads each article’s Markdown file and pulls out URLs from the references section (## 参考文献 or ## References).
Step 2: Check access
The script sends an actual HTTP request to each URL and checks the response. Think of it as the script doing what a person would do when opening a URL in a browser — except it does it automatically for every URL in the list.
Step 3: Classify results
The script classifies results based on the response.
How results are classified
Accessible (normal)
An HTTP status in the 200 range means the page returned successfully.
404 (page does not exist)
A 404 response means “this page does not exist.” A URL returning 404 cannot be used as a reference, so it is treated as a Critical issue. The reference must be removed or replaced with a correct URL.
Authentication error (login required)
A 403-range status indicates that a login is required to access the page. For a published article’s references, readers may not be able to access such a page, so replacing it with a different source is worth considering.
Timeout
No response within a set wait time. This can be caused by a temporary network issue, so the script does not immediately flag it as Critical — instead it is marked for rechecking after some time.
Redirect
The URL responds but forwards to a different URL. If the destination is appropriate, there is no problem, but it is recommended to update the article to use the canonical destination URL directly.
An important caveat: accessible does not mean accurate
The URL-checking script confirms “can this URL be accessed” — it does not confirm “does the content at this URL support the claim in the article.”
A URL may be accessible but contain content that is not appropriate as a reference. Also, a page’s content may have been updated since the article was written, meaning the source no longer says what it once did.
URL existence can be automated, but deciding whether a source is appropriate for its cited claim requires human judgment.
When the script runs
On this site, I run the URL check with npm run review:references:live. Running this script is part of the review process before publishing any article.
Because this script makes requests to external URLs, a network connection is required. To avoid placing unnecessary load on external servers, the script waits between requests rather than sending them all at once.
Summary
AI-generated reference URLs can include links to pages that do not exist. Using a URL-checking script makes it possible to detect broken links before an article is published. However, verifying that a URL is accessible and verifying that a source is appropriate are two separate tasks. Combining automated checks with human review is what makes the overall process reliable.