1001Ferramentas
🔗Validators

DOI Validator

Validate DOI format (10.PREFIX/SUFFIX) per Crossref. Format only — does not check existence.

DOI validation: the persistent identifier of scholarly content

The DOI — Digital Object Identifier is the persistent identifier of choice for academic articles, books, datasets, preprints, conference papers and any "digital object" that requires a stable, citable URL. It is governed by the DOI Foundation, a nonprofit, and standardised by ISO 26324. Unlike a plain URL, a DOI resolves through https://doi.org/ and is guaranteed to point to the current location of the resource even after publisher migrations, journal name changes or domain expirations.

Validating a DOI is a syntactic exercise — there is no check digit. A DOI must (1) start with the literal prefix 10., (2) be followed by a registrant code of four to nine digits, (3) include a forward slash, and (4) end with an opaque, case-insensitive suffix made of letters, digits and a small set of punctuation marks. Whether the DOI is actually registered is a separate question that requires a network resolution.

Anatomy of a DOI

A DOI has the form 10.{registrant}/{suffix}. Examples:

  • 10.1038/nature12373 — Nature article.
  • 10.1093/ajae/aaq063 — American Journal of Agricultural Economics.
  • 10.1109/5.771073 — IEEE archive paper.
  • 10.1000/182 — the DOI Handbook itself (sample 10.1000 registrant).

The prefix 10. is fixed by the standard and never changes. The registrant (four to nine digits after the dot) is allocated to a registration agency, who in turn assigns it to a publisher. The suffix is opaque: the publisher chooses any scheme it likes — sequential numbers, article codes, mnemonic strings — and it must be treated as a black box by validators. ASCII letters, digits and the punctuation - . _ ; ( ) / : are allowed.

Reasonable validation regex

A practical pattern, after normalising the input to uppercase, is:

^10\.\d{4,9}/[-._;()/:A-Z0-9]+$

This regex catches the vast majority of real-world DOIs while still permitting the punctuation publishers actually use. Crossref recommends being slightly more permissive — accepting any printable ASCII in the suffix — because some legacy DOIs include unusual characters. Either way, a regex check is the floor, not the ceiling: actual registration is the next step.

Registration agencies

Several agencies issue DOIs under licence from the DOI Foundation:

  • Crossref — the largest, covering most academic papers worldwide.
  • DataCite — datasets, research software, theses and grey literature (used by Zenodo, Figshare, OSF).
  • mEDRA — European DOI agency, common for Italian and EU publishers.
  • Airiti — Taiwan / East Asian publishers.
  • KISTI — Korean Institute of Science and Technology Information.
  • JaLC — Japan Link Center, Japanese journals and datasets.
  • SciELO in Brazil registers DOIs via Crossref membership.

Resolution and content negotiation

Beyond validation, you usually want to resolve the DOI: send an HTTP HEAD or GET to https://doi.org/{doi} and observe the 30x redirect to the publisher landing page. Tools that need bibliographic metadata can use DOI Content Negotiation:

curl -LH "Accept: application/x-bibtex" https://doi.org/10.1038/nature12373
curl -LH "Accept: application/vnd.citationstyles.csl+json" https://doi.org/10.1038/nature12373

The server returns BibTeX, CSL-JSON, RIS or RDF depending on the Accept header — ideal for reference managers (Mendeley, Zotero, Papers) and citation engines.

DOI vs ORCID vs ISSN vs ISBN

  • DOI identifies the work (article, dataset, chapter).
  • ORCID identifies the researcher (16-digit code with ISNI mod-11 check).
  • ISSN identifies the journal in which the article was published.
  • ISBN identifies the book edition.
  • arXiv ID and PMID are alternative identifiers; arXiv preprints today carry a DOI from DataCite as well.

Pitfalls

  • Case-sensitivity confusion: a DOI is case-insensitive, but the URL of the destination publisher may not be. The doi.org resolver normalises the DOI before redirecting, so always compare uppercased forms.
  • Mistaking URL for DOI: https://doi.org/10.1000/182 is the URL; 10.1000/182 is the DOI itself. Strip the host before validating.
  • Mistyping the prefix: 10,1038/... (comma) and 10.1038\\... (backslash) are common copy-paste bugs.
  • Trailing punctuation: citation text often ends a sentence with 10.1038/nature12373. — strip the final dot before validating.
  • Suffix with slashes: 10.5281/zenodo.1234 is valid; some validators wrongly stop at the first slash.

FAQ

Can a DOI start with anything other than 10.? No. Every DOI in existence begins with the literal prefix 10. — it is the directory indicator of the Handle System.

Can I confirm a DOI by checking whether it resolves? Yes — send a HEAD request to https://doi.org/{doi}. A 302 response means it is registered; a 404 means it does not exist. This page only does the syntactic check, locally.

Is the DOI case-sensitive? No. The string is case-insensitive by standard. Compare after normalising to a common case.

How much does a DOI cost? Publishers pay around US$ 1 per DOI through Crossref, plus an annual membership fee starting at US$ 275. DataCite has similar tiered pricing.

Does the suffix have meaning? Only to the publisher who chose it. Some encode the article number, others use a UUID or a slug. Validators must treat it as an opaque string.

Related Tools