Skip to content

Conversation

@Bodigrim
Copy link
Contributor

Related to our discussion with @rhendric at https://discourse.haskell.org/t/the-quest-to-completely-eradicate-string-awkwardness/10111/75?u=bodigrim, let's implement Data.Text.unpack directly via iterArray instead of streaming / unstreaming. My immediate goal is to simplify dependencies between modules of text, but I imagine that performance could get better as well.

@Bodigrim Bodigrim requested a review from Lysxia August 14, 2024 22:07
@rhendric
Copy link

I might be benchmarking this incorrectly, but I think it has a negative impact on performance.

Main.hs
module Main (main) where import Criterion.Main import Data.Text.Internal (Text(..)) import Data.Text.Unsafe (Iter(..), iterArray) import Data.Text qualified as T unpack :: Text -> String unpack (Text arr off len) = go 0 where go !i | i >= len = [] | otherwise = c : go (i + l) where Iter c l = iterArray arr (off+i) {-# INLINE [1] unpack #-} f1 :: String -> String f1 = T.unpack . T.toTitle . T.pack {-# NOINLINE f1 #-} f2 :: String -> String f2 = unpack . T.toTitle . T.pack {-# NOINLINE f2 #-} testInput :: String testInput = "this is a pretty short string, all things considered." testInputT :: Text testInputT = T.pack testInput {-# NOINLINE testInputT #-} main :: IO () main = defaultMain [ bgroup "fusion" [ bench "old" $ nf f1 testInput , bench "new" $ nf f2 testInput ] , bgroup "just unpack" [ bench "old" $ nf T.unpack testInputT , bench "new" $ nf unpack testInputT ] ]
benchmarking fusion/old time 1.148 μs (1.144 μs .. 1.154 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 1.145 μs (1.141 μs .. 1.151 μs) std dev 16.06 ns (13.55 ns .. 19.22 ns) variance introduced by outliers: 13% (moderately inflated) benchmarking fusion/new time 1.914 μs (1.909 μs .. 1.920 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 1.911 μs (1.906 μs .. 1.917 μs) std dev 19.80 ns (15.78 ns .. 28.69 ns) benchmarking just unpack/old time 474.1 ns (467.0 ns .. 481.2 ns) 0.999 R² (0.998 R² .. 1.000 R²) mean 470.3 ns (467.4 ns .. 476.1 ns) std dev 12.96 ns (7.541 ns .. 20.76 ns) variance introduced by outliers: 39% (moderately inflated) benchmarking just unpack/new time 943.0 ns (935.2 ns .. 956.6 ns) 0.999 R² (0.998 R² .. 1.000 R²) mean 964.7 ns (956.4 ns .. 972.0 ns) std dev 26.70 ns (23.38 ns .. 31.77 ns) variance introduced by outliers: 37% (moderately inflated) 
@Bodigrim Bodigrim force-pushed the unpack branch 3 times, most recently from a1eba87 to d289882 Compare August 17, 2024 19:32
@Bodigrim Bodigrim marked this pull request as draft August 20, 2024 18:26
@Bodigrim
Copy link
Contributor Author

Thanks @rhendric, should be fixed now.

@Bodigrim Bodigrim marked this pull request as ready for review October 14, 2024 21:33
@rhendric
Copy link

My comment wasn't about toTitle specifically, but about any function with a stream-based implementation. Modify f1 and f2 to use any other function implemented via unstream, and there's still a performance regression. Example:

f1 :: String -> String f1 = T.unpack . T.scanl max '\0' . T.pack {-# NOINLINE f1 #-} f2 :: String -> String f2 = unpack . T.scanl max '\0' . T.pack {-# NOINLINE f2 #-}
benchmarking fusion/old time 836.4 ns (832.7 ns .. 841.6 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 833.7 ns (831.3 ns .. 837.1 ns) std dev 9.667 ns (6.892 ns .. 14.57 ns) benchmarking fusion/new time 1.085 μs (1.081 μs .. 1.092 μs) 1.000 R² (0.999 R² .. 1.000 R²) mean 1.094 μs (1.087 μs .. 1.104 μs) std dev 27.77 ns (20.48 ns .. 45.82 ns) variance introduced by outliers: 33% (moderately inflated) 
@Bodigrim
Copy link
Contributor Author

@rhendric we are gradually rewriting all stream-based functions to their non-streaming counterparts. E. g., see map and mapAccumL. Unless you have a particularly long pipeline, non-streaming implementation is faster. If you happen to have a long transformation pipeline (to be honest, I know exactly zero use cases for Data.Text.scanl), you are welcome to use Stream explicitly. See #513 for some context.

@rhendric
Copy link

Ah, okay then!

@Bodigrim Bodigrim merged commit 1d26a61 into haskell:master Oct 16, 2024
2 of 3 checks passed
@Bodigrim Bodigrim deleted the unpack branch October 16, 2024 22:41
@Lysxia Lysxia changed the title Implement Data.Text.unpack directly, without streaming Implement Data.Text.unpack and Data.Text.toTitle directly, without streaming Oct 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants