A collection of tools for processing XML with Haskell.
readDocument :: SysConfigList -> String -> IOStateArrow s b XmlTree

hxt -Text.XML.HXT.Arrow.ReadDocument  

the main document input filter

this filter can be configured by a list of configuration options, a value of type SysConfig

for all available options see module SystemConfig

  • withValidate yes/no : switch on/off DTD validation. Only for XML parsed documents, not for HTML parsing.
  • withSubstDTDEntities yes/no : switch on/off entity substitution for general entities defined in DTD validation. Default is yes. Switching this option and the validation off can lead to faster parsing, in that case reading the DTD documents is not longer necessary. Only used with XML parsed documents, not with HTML parsing.
  • withSubstHTMLEntities yes/no : switch on/off entity substitution for general entities defined in HTML validation. Default is no. Switching this option on and the validation and substDTDEntities off can lead to faster parsing, in that case reading the DTD documents is not longer necessary, HTML general entities are still substituted. Only used with XML parsed documents, not with HTML parsing.
  • withParseHTML yes/no : switch on HTML parsing.
  • withParseByMimeType yes/no : select XML/HTML parser by document mime type. text/xml and text/xhtml are parsed as XML, text/html as HTML.
  • withCheckNamespaces yes/no : Switch on/off namespace propagation and checking
  • withInputEncoding : Set default encoding.
  • withTagSoup : use light weight and lazy parser based on tagsoup lib. This is only available when package hxt-tagsoup is installed and the source contains an import Text.XML.HXT.TagSoup.
  • withRelaxNG : validate document with Relax NG, the parameter is for the schema URI. This implies using XML parser, no validation against DTD, and canonicalisation.
  • withCurl [...] : Use the libCurl binding for HTTP access. This is only available when package hxt-curl is installed and the source contains an import Text.XML.HXT.Curl.
  • withHTTP [...] : Use the Haskell HTTP package for HTTP access. This is only available when package hxt-http is installed and the source contains an import Text.XML.HXT.HTTP.

examples:

readDocument [] "test.xml"

reads and validates a document "test.xml", no namespace propagation, only canonicalization is performed

...
import Text.XML.HXT.Curl
...

readDocument [ withValidate        no
             , withInputEncoding   isoLatin1
             , withParseByMimeType yes
             , withCurl []
             ] "http://localhost/test.php"

reads document "test.php", parses it as HTML or XML depending on the mimetype given from the server, but without validation, default encoding isoLatin1. HTTP access is done via libCurl.

readDocument [ withParseHTML       yes
             , withInputEncoding   isoLatin1
             ] ""

reads a HTML document from standard input, no validation is done when parsing HTML, default encoding is isoLatin1,

readDocument [ withInputEncoding  isoLatin1
             , withValidate       no
             , withMimeTypeFile   "/etc/mime.types"
             , withStrictInput    yes
             ] "test.svg"

reads an SVG document from "test.svg", sets the mime type by looking in the system mimetype config file, default encoding is isoLatin1,

...
import Text.XML.HXT.Curl
import Text.XML.HXT.TagSoup
...

readDocument [ withParseHTML      yes
             , withTagSoup
             , withProxy          "www-cache:3128"
             , withCurl           []
             , withWarnings       no
             ] "http://www.haskell.org/"

reads Haskell homepage with HTML parser, ignoring any warnings (at the time of writing, there were some HTML errors), with http access via libCurl interface and proxy "www-cache" at port 3128, parsing is done with tagsoup HTML parser. This requires packages "hxt-curl" and "hxt-tagsoup" to be installed

readDocument [ withValidate          yes
             , withCheckNamespaces   yes
             , withRemoveWS          yes
             , withTrace             2
             , withHTTP              []
             ] "http://www.w3c.org/"

read w3c home page (xhtml), validate and check namespaces, remove whitespace between tags, trace activities with level 2. HTTP access is done with Haskell HTTP package

readDocument [ withValidate          no
             , withSubstDTDEntities  no
             ...
             ] "http://www.w3c.org/"

read w3c home page (xhtml), but without accessing the DTD given in that document. Only the predefined XML general entity refs are substituted.

readDocument [ withValidate          no
             , withSubstDTDEntities  no
             , withSubstHTMLEntities yes
             ...
             ] "http://www.w3c.org/"

same as above, but with substituion of all general entity refs defined in XHTML.

for minimal complete examples see writeDocument and runX, the main starting point for running an XML arrow.

catchA :: a b c -> a SomeException c -> a b c Class Method
tryA :: a b c -> a b (Either SomeException c) Class Method

hxt -Control.Arrow.ArrowIO  

the interface for converting an IO action into an arrow

hxt -Control.Arrow.ArrowIO  

the interface for converting an IO predicate into a list arrow

arrIO :: (b -> IO c) -> a b c Class Method

hxt -Control.Arrow.ArrowIO  

construct an arrow from an IO action

hxt -Control.Arrow.ArrowIO  

construct an arrow from an IO action without any parameter

arrIO2 :: (b1 -> b2 -> IO c) -> a (b1, b2) c Class Method

hxt -Control.Arrow.ArrowIO  

construction of a 2 argument arrow from a binary IO action | | example: a1 &&& a2 >>> arr2 f

arrIO3 :: (b1 -> b2 -> b3 -> IO c) -> a (b1, (b2, b3)) c Class Method

hxt -Control.Arrow.ArrowIO  

construction of a 3 argument arrow from a 3-ary IO action | | example: a1 &&& a2 &&& a3 >>> arr3 f

arrIO4 :: (b1 -> b2 -> b3 -> b4 -> IO c) -> a (b1, (b2, (b3, b4))) c Class Method

hxt -Control.Arrow.ArrowIO  

construction of a 4 argument arrow from a 4-ary IO action | | example: a1 &&& a2 &&& a3 &&& a4 >>> arr4 f

isIOA :: (b -> IO Bool) -> a b b Class Method

hxt -Control.Arrow.ArrowIO  

builds an arrow from an IO predicate

if the predicate holds, the single list containing the input is returned, else the empty list, similar to isA

hxt -Control.Arrow.ArrowIf  

Conditionals for List Arrows

This module defines conditional combinators for list arrows.

The empty list as result represents False, none empty lists True.

hxt -Control.Arrow.ArrowIf  

The interface for arrows as conditionals.

Requires list arrows because False is represented as empty list, True as none empty lists.

Only ifA and orElse don't have default implementations

choiceA :: [IfThen (a b c) (a b d)] -> a b d Class Method

hxt -Control.Arrow.ArrowIf  

generalisation of orElse for multi way branches like in case expressions.

An auxiliary data type IfThen with an infix constructor :-> is used for writing multi way branches

example: choiceA [ p1 :-> e1, p2 :-> e2, this :-> default ]