html-conduit: Parse HTML documents using xml-conduit datatypes.

[ conduit, library, mit, text, web ] [ Propose Tags ] [ Report a vulnerability ]

This package uses tagstream-conduit for its parser. It automatically balances mismatched tags, so that there shouldn't be any parse failures. It does not handle a full HTML document rendering, such as adding missing html and head tags. Note that, since version 1.3.1, it uses an inlined copy of tagstream-conduit with entity decoding bugfixes applied.


[Skip to Readme]

Modules

[Index] [Quick Jump]

Downloads

Maintainer's Corner

Package maintainers

For package maintainers and hackage trustees

Candidates

  • No Candidates
Versions [RSS] 0.0.0, 0.0.1, 0.1.0, 0.1.0.1, 0.1.0.2, 0.1.0.3, 0.1.0.4, 1.1.0, 1.1.0.1, 1.1.0.2, 1.1.0.3, 1.1.0.4, 1.1.0.5, 1.1.0.6, 1.1.1, 1.1.1.1, 1.1.1.2, 1.2.0, 1.2.1, 1.2.1.1, 1.2.1.2, 1.3.0, 1.3.1, 1.3.2, 1.3.2.1, 1.3.2.2
Change log ChangeLog.md
Dependencies attoparsec, base (>=4 && <5), bytestring, conduit (>=1.3), conduit-extra, containers, resourcet (>=1.2), text, transformers, xml-conduit (>=1.3), xml-types (>=0.3 && <0.4) [details]
License MIT
Author Michael Snoyman
Maintainer michael@snoyman.com
Category Web, Text, Conduit
Home page https://github.com/snoyberg/xml
Source repo head: git clone git://github.com/snoyberg/xml.conduit
Uploaded by MichaelSnoyman at 2021-08-16T04:14:08Z
Distributions Arch:1.3.2.2, Debian:1.3.2.1, Fedora:1.3.2.2, FreeBSD:1.2.0, LTSHaskell:1.3.2.2, NixOS:1.3.2.2, Stackage:1.3.2.2
Reverse Dependencies 19 direct, 345 indirect [details]
Downloads 45142 total (80 in the last 30 days)
Rating (no votes yet) [estimated by Bayesian average]
Your Rating
  • λ
  • λ
  • λ
Status Docs available [build log]
Last success reported on 2021-08-16 [all 1 reports]

Readme for html-conduit-1.3.2.2

[back to package description]

This package uses tagstream-conduit for its parser. It automatically balances mismatched tags, so that there shouldn't be any parse failures. It does not handle a full HTML document rendering, such as adding missing html and head tags. Note that, since version 1.3.1, it uses an inlined copy of tagstream-conduit with entity decoding bugfixes applied.

Simple usage example:

#!/usr/bin/env stack {- stack --install-ghc --resolver lts-6.23 runghc --package http-conduit --package html-conduit -} {-# LANGUAGE OverloadedStrings #-} import qualified Data.Text.IO as T import Network.HTTP.Simple (httpSink) import Text.HTML.DOM (sinkDoc) import Text.XML.Cursor (attributeIs, content, element, fromDocument, ($//), (&/), (&//)) main :: IO () main = do doc <- httpSink "http://www.yesodweb.com/book" $ const sinkDoc let cursor = fromDocument doc T.putStrLn "Chapters in the Yesod book:\n" mapM_ T.putStrLn $ cursor $// attributeIs "class" "main-listing" &// element "li" &/ element "a" &/ content