| Portability | GHC |
|---|---|
| Stability | experimental |
| Maintainer | bos@serpentine.com, rtharper@aftereternity.co.uk, duncan@haskell.org |
Data.Text
Contents
Description
A time and space-efficient implementation of Unicode text using packed Word16 arrays. Suitable for performance critical use, both in terms of large data quantities and high speed.
This module is intended to be imported qualified, to avoid name clashes with Prelude functions, e.g.
import qualified Data.Text as T
- data Text
- pack :: String -> Text
- unpack :: Text -> String
- singleton :: Char -> Text
- empty :: Text
- cons :: Char -> Text -> Text
- snoc :: Text -> Char -> Text
- append :: Text -> Text -> Text
- uncons :: Text -> Maybe (Char, Text)
- head :: Text -> Char
- last :: Text -> Char
- tail :: Text -> Text
- init :: Text -> Text
- null :: Text -> Bool
- length :: Text -> Int
- map :: (Char -> Char) -> Text -> Text
- intercalate :: Text -> [Text] -> Text
- intersperse :: Char -> Text -> Text
- transpose :: [Text] -> [Text]
- reverse :: Text -> Text
- replace :: Text -> Text -> Text -> Text
- toCaseFold :: Text -> Text
- toLower :: Text -> Text
- toUpper :: Text -> Text
- justifyLeft :: Int -> Char -> Text -> Text
- justifyRight :: Int -> Char -> Text -> Text
- center :: Int -> Char -> Text -> Text
- foldl :: (b -> Char -> b) -> b -> Text -> b
- foldl' :: (b -> Char -> b) -> b -> Text -> b
- foldl1 :: (Char -> Char -> Char) -> Text -> Char
- foldl1' :: (Char -> Char -> Char) -> Text -> Char
- foldr :: (Char -> b -> b) -> b -> Text -> b
- foldr1 :: (Char -> Char -> Char) -> Text -> Char
- concat :: [Text] -> Text
- concatMap :: (Char -> Text) -> Text -> Text
- any :: (Char -> Bool) -> Text -> Bool
- all :: (Char -> Bool) -> Text -> Bool
- maximum :: Text -> Char
- minimum :: Text -> Char
- scanl :: (Char -> Char -> Char) -> Char -> Text -> Text
- scanl1 :: (Char -> Char -> Char) -> Text -> Text
- scanr :: (Char -> Char -> Char) -> Char -> Text -> Text
- scanr1 :: (Char -> Char -> Char) -> Text -> Text
- mapAccumL :: (a -> Char -> (a, Char)) -> a -> Text -> (a, Text)
- mapAccumR :: (a -> Char -> (a, Char)) -> a -> Text -> (a, Text)
- replicate :: Int -> Text -> Text
- unfoldr :: (a -> Maybe (Char, a)) -> a -> Text
- unfoldrN :: Int -> (a -> Maybe (Char, a)) -> a -> Text
- take :: Int -> Text -> Text
- drop :: Int -> Text -> Text
- takeWhile :: (Char -> Bool) -> Text -> Text
- dropWhile :: (Char -> Bool) -> Text -> Text
- dropWhileEnd :: (Char -> Bool) -> Text -> Text
- dropAround :: (Char -> Bool) -> Text -> Text
- strip :: Text -> Text
- stripStart :: Text -> Text
- stripEnd :: Text -> Text
- splitAt :: Int -> Text -> (Text, Text)
- spanBy :: (Char -> Bool) -> Text -> (Text, Text)
- break :: Text -> Text -> (Text, Text)
- breakBy :: (Char -> Bool) -> Text -> (Text, Text)
- group :: Text -> [Text]
- groupBy :: (Char -> Char -> Bool) -> Text -> [Text]
- inits :: Text -> [Text]
- tails :: Text -> [Text]
- split :: Text -> Text -> [Text]
- splitBy :: (Char -> Bool) -> Text -> [Text]
- chunksOf :: Int -> Text -> [Text]
- lines :: Text -> [Text]
- words :: Text -> [Text]
- unlines :: [Text] -> Text
- unwords :: [Text] -> Text
- isPrefixOf :: Text -> Text -> Bool
- isSuffixOf :: Text -> Text -> Bool
- isInfixOf :: Text -> Text -> Bool
- filter :: (Char -> Bool) -> Text -> Text
- find :: Text -> Text -> (Text, [(Text, Text)])
- findBy :: (Char -> Bool) -> Text -> Maybe Char
- partitionBy :: (Char -> Bool) -> Text -> (Text, Text)
- index :: Text -> Int -> Char
- findIndex :: (Char -> Bool) -> Text -> Maybe Int
- count :: Text -> Text -> Int
- zip :: Text -> Text -> [(Char, Char)]
- zipWith :: (Char -> Char -> Char) -> Text -> Text -> Text
Fusion
Most of the functions in this module are subject to fusion, meaning that a pipeline of such functions will usually allocate at most one Text value.
Types
A space efficient, packed, unboxed Unicode text type.
Creation and elimination
Basic interface
cons :: Char -> Text -> TextSource
O(n) Adds a character to the front of a Text. This function is more costly than its List counterpart because it requires copying a new array. Subject to fusion.
snoc :: Text -> Char -> TextSource
O(n) Adds a character to the end of a Text. This copies the entire array in the process, unless fused. Subject to fusion.
O(1) Returns the first character of a Text, which must be non-empty. Subject to fusion.
O(1) Returns the last character of a Text, which must be non-empty. Subject to fusion.
O(1) Returns all characters after the head of a Text, which must be non-empty. Subject to fusion.
O(1) Returns all but the last character of a Text, which must be non-empty. Subject to fusion.
Transformations
intercalate :: Text -> [Text] -> TextSource
O(n) The intercalate function takes a Text and a list of Texts and concatenates the list after interspersing the first argument between each element of the list.
intersperse :: Char -> Text -> TextSource
O(n) The intersperse function takes a character and places it between the characters of a Text. Subject to fusion.
O(m*n) Replace every occurrence of one substring with another.
Case conversion
When case converting Text values, do not use combinators like map toUpper to case convert each character of a string individually, as this gives incorrect results according to the rules of some writing systems. The whole-string case conversion functions from this module, such as toUpper, obey the correct case conversion rules. As a result, these functions may map one input character to two or three output characters. For examples, see the documentation of each function.
toCaseFold :: Text -> TextSource
O(n) Convert a string to folded case. This function is mainly useful for performing caseless (also known as case insensitive) string comparisons.
A string x is a caseless match for a string y if and only if:
toCaseFold x == toCaseFold y
The result string may be longer than the input string, and may differ from applying toLower to the input string. For instance, the Armenian small ligature "ﬓ" (men now, U+FB13) is case folded to the sequence "մ" (men, U+0574) followed by "ն" (now, U+0576), while the Greek "µ" (micro sign, U+00B5) is case folded to "μ" (small letter mu, U+03BC) instead of itself.
O(n) Convert a string to lower case, using simple case conversion. The result string may be longer than the input string. For instance, "İ" (Latin capital letter I with dot above, U+0130) maps to the sequence "i" (Latin small letter i, U+0069) followed by " ̇" (combining dot above, U+0307).
O(n) Convert a string to upper case, using simple case conversion. The result string may be longer than the input string. For instance, the German "ß" (eszett, U+00DF) maps to the two-letter sequence "SS".
Justification
justifyLeft :: Int -> Char -> Text -> TextSource
O(n) Left-justify a string to the given length, using the specified fill character on the right. Subject to fusion. Examples:
justifyLeft 7 'x' "foo" == "fooxxxx" justifyLeft 3 'x' "foobar" == "foobar"
justifyRight :: Int -> Char -> Text -> TextSource
O(n) Right-justify a string to the given length, using the specified fill character on the left. Examples:
justifyRight 7 'x' "bar" == "xxxxbar" justifyRight 3 'x' "foobar" == "foobar"
center :: Int -> Char -> Text -> TextSource
O(n) Center a string to the given length, using the specified fill character on either side. Examples:
center 8 'x' "HS" = "xxxHSxxx"
Folds
foldl' :: (b -> Char -> b) -> b -> Text -> bSource
O(n) A strict version of foldl. Subject to fusion.
foldl1' :: (Char -> Char -> Char) -> Text -> CharSource
O(n) A strict version of foldl1. Subject to fusion.
Special folds
Construction
Scans
Accumulating maps
Generation and unfolding
unfoldr :: (a -> Maybe (Char, a)) -> a -> TextSource
O(n), where n is the length of the result. The unfoldr function is analogous to the List unfoldr. unfoldr builds a Text from a seed value. The function takes the element and returns Nothing if it is done producing the Text, otherwise Just (a,b). In this case, a is the next Char in the string, and b is the seed value for further production. Subject to fusion.
unfoldrN :: Int -> (a -> Maybe (Char, a)) -> a -> TextSource
O(n) Like unfoldr, unfoldrN builds a Text from a seed value. However, the length of the result should be limited by the first argument to unfoldrN. This function is more efficient than unfoldr when the maximum length of the result is known and correct, otherwise its performance is similar to unfoldr. Subject to fusion.
Substrings
Breaking strings
dropWhileEnd :: (Char -> Bool) -> Text -> TextSource
O(n) dropWhileEnd p t returns the prefix remaining after dropping characters that fail the predicate p from the end of t. Subject to fusion. Examples:
dropWhileEnd (=='.') "foo..." == "foo"
dropAround :: (Char -> Bool) -> Text -> TextSource
O(n) dropAround p t returns the substring remaining after dropping characters that fail the predicate p from both the beginning and end of t. Subject to fusion.
O(n) Remove leading and trailing white space from a string. Equivalent to:
dropAround isSpace
stripStart :: Text -> TextSource
O(n) Remove leading white space from a string. Equivalent to:
dropWhile isSpace
stripEnd :: Text -> TextSource
O(n) Remove trailing white space from a string. Equivalent to:
dropWhileEnd isSpace
spanBy :: (Char -> Bool) -> Text -> (Text, Text)Source
O(n) spanBy, applied to a predicate p and text t, returns a pair whose first element is the longest prefix (possibly empty) of t of elements that satisfy p, and whose second is the remainder of the list.
break :: Text -> Text -> (Text, Text)Source
O(n+m) Find the first instance of needle (which must be non-null) in haystack. The first element of the returned tuple is the prefix of haystack before needle is matched. The second is the remainder of haystack, starting with the match.
Examples:
break "::" "a::b::c" ==> ("a", "::b::c") break "/" "foobar" ==> ("foobar", "") Laws:
append prefix match == haystack where (prefix, match) = break needle haystack
If you need to break a string by a substring repeatedly (e.g. you want to break on every instance of a substring), use find instead, as it has lower startup overhead.
In (unlikely) bad cases, this function's time complexity degrades towards O(n*m).
groupBy :: (Char -> Char -> Bool) -> Text -> [Text]Source
O(n) Group characters in a string according to a predicate.
Breaking into many substrings
Splitting functions in this library do not perform character-wise copies to create substrings; they just construct new Texts that are slices of the original.
split :: Text -> Text -> [Text]Source
O(m+n) Break a Text into pieces separated by the first Text argument, consuming the delimiter. An empty delimiter is invalid, and will cause an error to be raised.
Examples:
split "\r\n" "a\r\nb\r\nd\r\ne" == ["a","b","d","e"] split "aaa" "aaaXaaaXaaaXaaa" == ["","X","X","X",""] split "x" "x" == ["",""]
and
intercalate s . split s == id split (singleton c) == splitBy (==c)
In (unlikely) bad cases, this function's time complexity degrades towards O(n*m).
splitBy :: (Char -> Bool) -> Text -> [Text]Source
O(n) Splits a Text into components delimited by separators, where the predicate returns True for a separator element. The resulting components do not contain the separators. Two adjacent separators result in an empty component in the output. eg.
splitBy (=='a') "aabbaca" == ["","","bb","c",""] splitBy (=='a') "" == [""]
chunksOf :: Int -> Text -> [Text]Source
O(n) Splits a Text into components of length k. The last element may be shorter than the other chunks, depending on the length of the input. Examples:
chunksOf 3 "foobarbaz" == ["foo","bar","baz"] chunksOf 4 "haskell.org" == ["hask","ell.","org"]
Breaking into lines and words
unlines :: [Text] -> TextSource
O(n) Portably breaks a Text up into a list of Texts at line boundaries.
A line boundary is considered to be either a line feed, a carriage return immediately followed by a line feed, or a carriage return. This accounts for both Unix and Windows line ending conventions, and for the old convention used on Mac OS 9 and earlier.
O(n) Joins lines, after appending a terminating newline to each.
Predicates
isPrefixOf :: Text -> Text -> BoolSource
O(n) The isPrefixOf function takes two Texts and returns True iff the first is a prefix of the second. This function is subject to fusion.
isSuffixOf :: Text -> Text -> BoolSource
O(n) The isSuffixOf function takes two Texts and returns True iff the first is a suffix of the second.
Searching
find :: Text -> Text -> (Text, [(Text, Text)])Source
O(n+m) Find all non-overlapping instances of needle in haystack. The first element of the returned pair is the prefix of haystack prior to any matches of needle. The second is a list of pairs.
The first element of each pair in the list is a span from the beginning of a match to the beginning of the next match, while the second is a span from the beginning of the match to the end of the input.
Examples:
find "::" "" ==> ("", []) find "/" "a/b/c/d" ==> ("a", [("/b","/b/c/d"), ("/c","/c/d"), ("/d","/d")]) In (unlikely) bad cases, this function's time complexity degrades towards O(n*m).
partitionBy :: (Char -> Bool) -> Text -> (Text, Text)Source
O(n) The partitionBy function takes a predicate and a Text, and returns the pair of Texts with elements which do and do not satisfy the predicate, respectively; i.e.
partitionBy p t == (filter p t, filter (not . p) t)
Indexing
If you think of a Text value as an array of Char values (which it is not), you run the risk of writing inefficient code.
An idiom that is common in some languages is to find the numeric offset of a character or substring, then use that number to split or trim the searched string. With a Text value, this approach would require two O(n) operations: one to perform the search, and one to operate from wherever the search ended.
For example, suppose you have a string that you want to split on the substring "::", such as "foo::bar::quux". Instead of searching for the index of "::" and taking the substrings before and after that index, you would instead use find ::.