# hast-util-from-html [![Build][build-badge]][build] [![Coverage][coverage-badge]][coverage] [![Downloads][downloads-badge]][downloads] [![Size][size-badge]][size] [![Sponsors][sponsors-badge]][collective] [![Backers][backers-badge]][collective] [![Chat][chat-badge]][chat] [hast][] utility that turns HTML into a syntax tree. ## Contents * [What is this?](#what-is-this) * [When should I use this?](#when-should-i-use-this) * [Install](#install) * [Use](#use) * [API](#api) * [`fromHtml(value[, options])`](#fromhtmlvalue-options) * [`ErrorCode`](#errorcode) * [`ErrorSeverity`](#errorseverity) * [`OnError`](#onerror) * [`Options`](#options) * [Examples](#examples) * [Example: fragment versus document](#example-fragment-versus-document) * [Example: whitespace around and inside ``](#example-whitespace-around-and-inside-html) * [Example: parse errors](#example-parse-errors) * [Syntax](#syntax) * [Types](#types-2) * [Compatibility](#compatibility) * [Security](#security) * [Related](#related) * [Contribute](#contribute) * [License](#license) ## What is this? This package is a utility that takes HTML input and turns it into a hast syntax tree. ## When should I use this? If you want to handle syntax trees manually, use this. Use [`parse5`][parse5] instead when you just want to parse HTML and don’t care about [hast][]. You can also use [`hast-util-from-parse5`][hast-util-from-parse5] and [`parse5`][parse5] yourself, or use the rehype plugin [`rehype-parse`][rehype-parse], which wraps this utility to also parse HTML at a higher-level (easier) abstraction. [`xast-util-from-xml`][xast-util-from-xml] can be used if you are dealing with XML instead of HTML. If you might run in a browser and prefer a ligher alternative, while not caring about positional info, parse errors, and consistency across browsers, use [`hast-util-from-html-isomorphic`][hast-util-from-html-isomorphic], which wraps this in Node and uses browser APIs otherwise. Finally you can use the utility [`hast-util-to-html`][hast-util-to-html] for the inverse of this utility. It turns hast into HTML. ## Install This package is [ESM only][esm]. In Node.js (version 16+), install with [npm][]: ```sh npm install hast-util-from-html ``` In Deno with [`esm.sh`][esmsh]: ```js import {fromHtml} from 'https://esm.sh/hast-util-from-html@2' ``` In browsers with [`esm.sh`][esmsh]: ```html ``` ## Use ```js import {fromHtml} from 'hast-util-from-html' const tree = fromHtml('

Hello, world!

', {fragment: true}) console.log(tree) ``` Yields: ```js { type: 'root', children: [ { type: 'element', tagName: 'h1', properties: {}, children: [Array], position: [Object] } ], data: { quirksMode: false }, position: { start: { line: 1, column: 1, offset: 0 }, end: { line: 1, column: 23, offset: 22 } } } ``` ## API This package exports the identifier [`fromHtml`][api-from-html]. There is no default export. ### `fromHtml(value[, options])` Turn serialized HTML into a hast tree. ###### Parameters * `value` ([`Compatible`][compatible]) — serialized HTML to parse * `options` ([`Options`][api-options], optional) — configuration ###### Returns Tree ([`Root`][root]). ### `ErrorCode` Known names of parse errors (TypeScript type). ###### Types ```ts type ErrorCode = | 'abandonedHeadElementChild' | 'abruptClosingOfEmptyComment' | 'abruptDoctypePublicIdentifier' // … see readme on `options[key in ErrorCode]` above. ``` ### `ErrorSeverity` Error severity (TypeScript type). ###### Types ```ts export type ErrorSeverity = // Turn the parse error off: | 0 | false // Turn the parse error into a warning: | 1 | true // Turn the parse error into an actual error: processing stops. | 2 ``` ### `OnError` Function called when encountering [HTML parse errors][parse-errors]. ###### Parameters * `error` ([`VFileMessage`][vfile-message]) — message ###### Returns Nothing (`void`). ### `Options` Configuration (TypeScript type). ##### Fields ###### `options.space` Which space the document is in (`'html'` or `'svg'`, default: `'html'`). When an `` element is found in the HTML space, `hast-util-from-html` already automatically switches to and from the SVG space when entering and exiting it. > 👉 **Note**: this is not an XML parser. > It supports SVG as embedded in HTML. > It does not support the features available in XML. > Passing SVG files might break but fragments of modern SVG should be fine. > Use [`xast-util-from-xml`][xast-util-from-xml] to parse XML. > 👉 **Note**: make sure to set `fragment: true` if `space: 'svg'`. ###### `options.verbose` Add extra positional info about attributes, start tags, and end tags (`boolean`, default: `false`). ###### `options.fragment` Whether to parse as a fragment (`boolean`, default: `false`). The default is to expect a whole document. In document mode, unopened `html`, `head`, and `body` elements are opened. ###### `options.onerror` Function called when encountering [HTML parse errors][parse-errors] ([`OnError`][api-on-error], optional). ###### `options[key in ErrorCode]` Specific parse errors can be configured by setting their identifiers (see [`ErrorCode`][api-error-code]) as keys directly in `options` to an [`ErrorSeverity`][api-error-severity] as value. The list of parse errors: * `abandonedHeadElementChild` — unexpected metadata element after head ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/abandoned-head-element-child/index.html)) * [`abruptClosingOfEmptyComment`](https://html.spec.whatwg.org/multipage/parsing.html#parse-error-abrupt-closing-of-empty-comment) — unexpected abruptly closed empty comment ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/abrupt-closing-of-empty-comment/index.html)) * [`abruptDoctypePublicIdentifier`](https://html.spec.whatwg.org/multipage/parsing.html#parse-error-abrupt-doctype-public-identifier) — unexpected abruptly closed public identifier ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/abrupt-doctype-public-identifier/index.html)) * [`abruptDoctypeSystemIdentifier`](https://html.spec.whatwg.org/multipage/parsing.html#parse-error-abrupt-doctype-system-identifier) — unexpected abruptly closed system identifier ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/abrupt-doctype-system-identifier/index.html)) * [`absenceOfDigitsInNumericCharacterReference`](https://html.spec.whatwg.org/multipage/parsing.html#parse-error-absence-of-digits-in-numeric-character-reference) — unexpected non-digit at start of numeric character reference ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/absence-of-digits-in-numeric-character-reference/index.html)) * [`cdataInHtmlContent`](https://html.spec.whatwg.org/multipage/parsing.html#parse-error-cdata-in-html-content) — unexpected CDATA section in HTML ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/cdata-in-html-content/index.html)) * [`characterReferenceOutsideUnicodeRange`](https://html.spec.whatwg.org/multipage/parsing.html#parse-error-character-reference-outside-unicode-range) — unexpected too big numeric character reference ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/character-reference-outside-unicode-range/index.html)) * `closingOfElementWithOpenChildElements` — unexpected closing tag with open child elements ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/closing-of-element-with-open-child-elements/index.html)) * [`controlCharacterInInputStream`](https://html.spec.whatwg.org/multipage/parsing.html#parse-error-control-character-in-input-stream) — unexpected control character ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/control-character-in-input-stream/index.html)) * [`controlCharacterReference`](https://html.spec.whatwg.org/multipage/parsing.html#parse-error-control-character-reference) — unexpected control character reference ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/control-character-reference/index.html)) * `disallowedContentInNoscriptInHead` — disallowed content inside `` ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/disallowed-content-in-noscript-in-head/index.html)) * [`duplicateAttribute`](https://html.spec.whatwg.org/multipage/parsing.html#parse-error-duplicate-attribute) — unexpected duplicate attribute ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/duplicate-attribute/index.html)) * [`endTagWithAttributes`](https://html.spec.whatwg.org/multipage/parsing.html#parse-error-end-tag-with-attributes) — unexpected attribute on closing tag ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/end-tag-with-attributes/index.html)) * [`endTagWithTrailingSolidus`](https://html.spec.whatwg.org/multipage/parsing.html#parse-error-end-tag-with-trailing-solidus) — unexpected slash at end of closing tag ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/end-tag-with-trailing-solidus/index.html)) * `endTagWithoutMatchingOpenElement` — unexpected unopened end tag ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/end-tag-without-matching-open-element/index.html)) * [`eofBeforeTagName`](https://html.spec.whatwg.org/multipage/parsing.html#parse-error-eof-before-tag-name) — unexpected end of file ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/eof-before-tag-name/index.html)) * [`eofInCdata`](https://html.spec.whatwg.org/multipage/parsing.html#parse-error-eof-in-cdata) — unexpected end of file in CDATA ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/eof-in-cdata/index.html)) * [`eofInComment`](https://html.spec.whatwg.org/multipage/parsing.html#parse-error-eof-in-comment) — unexpected end of file in comment ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/eof-in-comment/index.html)) * [`eofInDoctype`](https://html.spec.whatwg.org/multipage/parsing.html#parse-error-eof-in-doctype) — unexpected end of file in doctype ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/eof-in-doctype/index.html)) * `eofInElementThatCanContainOnlyText` — unexpected end of file in element that can only contain text ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/eof-in-element-that-can-contain-only-text/index.html)) * [`eofInScriptHtmlCommentLikeText`](https://html.spec.whatwg.org/multipage/parsing.html#parse-error-eof-in-script-html-comment-like-text) — unexpected end of file in comment inside script ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/eof-in-script-html-comment-like-text/index.html)) * [`eofInTag`](https://html.spec.whatwg.org/multipage/parsing.html#parse-error-eof-in-tag) — unexpected end of file in tag ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/eof-in-tag/index.html)) * [`incorrectlyClosedComment`](https://html.spec.whatwg.org/multipage/parsing.html#parse-error-incorrectly-closed-comment) — incorrectly closed comment ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/incorrectly-closed-comment/index.html)) * [`incorrectlyOpenedComment`](https://html.spec.whatwg.org/multipage/parsing.html#parse-error-incorrectly-opened-comment) — incorrectly opened comment ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/incorrectly-opened-comment/index.html)) * [`invalidCharacterSequenceAfterDoctypeName`](https://html.spec.whatwg.org/multipage/parsing.html#parse-error-invalid-character-sequence-after-doctype-name) — invalid sequence after doctype name ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/invalid-character-sequence-after-doctype-name/index.html)) * [`invalidFirstCharacterOfTagName`](https://html.spec.whatwg.org/multipage/parsing.html#parse-error-invalid-first-character-of-tag-name) — invalid first character in tag name ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/invalid-first-character-of-tag-name/index.html)) * `misplacedDoctype` — misplaced doctype ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/misplaced-doctype/index.html)) * `misplacedStartTagForHeadElement` — misplaced `` start tag ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/misplaced-start-tag-for-head-element/index.html)) * [`missingAttributeValue`](https://html.spec.whatwg.org/multipage/parsing.html#parse-error-missing-attribute-value) — missing attribute value ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/missing-attribute-value/index.html)) * `missingDoctype` — missing doctype before other content ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/missing-doctype/index.html)) * [`missingDoctypeName`](https://html.spec.whatwg.org/multipage/parsing.html#parse-error-missing-doctype-name) — missing doctype name ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/missing-doctype-name/index.html)) * [`missingDoctypePublicIdentifier`](https://html.spec.whatwg.org/multipage/parsing.html#parse-error-missing-doctype-public-identifier) — missing public identifier in doctype ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/missing-doctype-public-identifier/index.html)) * [`missingDoctypeSystemIdentifier`](https://html.spec.whatwg.org/multipage/parsing.html#parse-error-missing-doctype-system-identifier) — missing system identifier in doctype ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/missing-doctype-system-identifier/index.html)) * [`missingEndTagName`](https://html.spec.whatwg.org/multipage/parsing.html#parse-error-missing-end-tag-name) — missing name in end tag ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/missing-end-tag-name/index.html)) * [`missingQuoteBeforeDoctypePublicIdentifier`](https://html.spec.whatwg.org/multipage/parsing.html#parse-error-missing-quote-before-doctype-public-identifier) — missing quote before public identifier in doctype ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/missing-quote-before-doctype-public-identifier/index.html)) * [`missingQuoteBeforeDoctypeSystemIdentifier`](https://html.spec.whatwg.org/multipage/parsing.html#parse-error-missing-quote-before-doctype-system-identifier) — missing quote before system identifier in doctype ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/missing-quote-before-doctype-system-identifier/index.html)) * [`missingSemicolonAfterCharacterReference`](https://html.spec.whatwg.org/multipage/parsing.html#parse-error-missing-semicolon-after-character-reference) — missing semicolon after character reference ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/missing-semicolon-after-character-reference/index.html)) * [`missingWhitespaceAfterDoctypePublicKeyword`](https://html.spec.whatwg.org/multipage/parsing.html#parse-error-missing-whitespace-after-doctype-public-keyword) — missing whitespace after public identifier in doctype ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/missing-whitespace-after-doctype-public-keyword/index.html)) * [`missingWhitespaceAfterDoctypeSystemKeyword`](https://html.spec.whatwg.org/multipage/parsing.html#parse-error-missing-whitespace-after-doctype-system-keyword) — missing whitespace after system identifier in doctype ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/missing-whitespace-after-doctype-system-keyword/index.html)) * [`missingWhitespaceBeforeDoctypeName`](https://html.spec.whatwg.org/multipage/parsing.html#parse-error-missing-whitespace-before-doctype-name) — missing whitespace before doctype name ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/missing-whitespace-before-doctype-name/index.html)) * [`missingWhitespaceBetweenAttributes`](https://html.spec.whatwg.org/multipage/parsing.html#parse-error-missing-whitespace-between-attributes) — missing whitespace between attributes ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/missing-whitespace-between-attributes/index.html)) * [`missingWhitespaceBetweenDoctypePublicAndSystemIdentifiers`](https://html.spec.whatwg.org/multipage/parsing.html#parse-error-missing-whitespace-between-doctype-public-and-system-identifiers) — missing whitespace between public and system identifiers in doctype ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/missing-whitespace-between-doctype-public-and-system-identifiers/index.html)) * [`nestedComment`](https://html.spec.whatwg.org/multipage/parsing.html#parse-error-nested-comment) — unexpected nested comment ([example](https://github.com/syntax-tree/hast-util-from-html/blob/main/test/parse-error/nested-comment/index.html)) * `nestedNoscriptInHead` — unexpected nested `