Haskell Problem 1: Unix grep (part 4)
May 25, 2009
The previous solution is a lot slower than the system provided grep:
Macintosh-2:Haskell clark$ ghc --make Main.hs [1 of 1] Compiling Main ( Main.hs, Main.o ) Linking Main ... Macintosh-2:Haskell clark$ time ./Main -r teen regex-tdfa-1.1.2 real 0m0.643s user 0m0.546s sys 0m0.089s Macintosh-2:Haskell clark$ time grep -r teen regex-tdfa-1.1.2 real 0m0.018s user 0m0.008s sys 0m0.010s
ghc has a profiler which we can use to find out where the code is spending its time and how much memory is getting allocated.
The profiler requires that options be passed to it on the command line. This is inconvenient when the program to be profiled itself takes command line options. Thus, to profile the program, the first step was to hardcode some command line arguments.
-- a <- getArgs
a <- return [ "-r", "teen", "regex-tdfa-1.1.2" ]
Next compile a profiled version of the code:
ghc --make -prof -auto-all Main.hs
Then run the executable:
./Main +RTS -p
The profiling output appears in Main.prof:
Mon May 25 09:02 2009 Time and Allocation Profiling Report (Final)
Main +RTS -p -RTS
total time = 0.25 secs (5 ticks @ 50 ms)
total alloc = 179,197,136 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
doesLineMatch Main 60.0 32.1
grepHandlePerLine Main 40.0 67.4
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 1 0 0.0 0.0 100.0 100.0
main Main 262 0 0.0 0.0 0.0 0.0
grepFiles Main 263 0 0.0 0.0 0.0 0.0
grepFile Main 264 0 0.0 0.0 0.0 0.0
CAF Main 236 9 0.0 0.0 100.0 100.0
grepHandle Main 252 0 0.0 0.0 0.0 0.0
grepHandlePerLine Main 251 0 0.0 0.0 0.0 0.0
main Main 242 1 0.0 0.0 100.0 100.0
getGrepPattern Main 259 1 0.0 0.0 0.0 0.0
matchCaseInsensitive Main 260 1 0.0 0.0 0.0 0.0
grepFiles Main 245 47 0.0 0.0 100.0 100.0
grepFile Main 246 37 0.0 0.1 100.0 100.0
grepHandle Main 253 10705 0.0 0.0 100.0 99.9
grepHandlePerLine Main 255 10705 40.0 67.4 100.0 99.8
outputNonMatch Main 261 10677 0.0 0.3 0.0 0.3
doesLineMatch Main 256 10677 60.0 32.1 60.0 32.1
matchCaseInsensitive Main 258 10677 0.0 0.0 0.0 0.0
matchRegularExpression Main 257 10677 0.0 0.0 0.0 0.0
outputPerFile Main 254 10705 0.0 0.1 0.0 0.1
recursiveDir Main 250 9 0.0 0.0 0.0 0.0
getOptions Main 244 4 0.0 0.0 0.0 0.0
getNonOptionArguments Main 243 4 0.0 0.0 0.0 0.0
CAF GHC.Int 201 1 0.0 0.0 0.0 0.0
CAF GHC.Handle 189 6 0.0 0.0 0.0 0.0
CAF System.Posix.Internals 172 7 0.0 0.0 0.0 0.0
CAF System.Directory 125 2 0.0 0.0 0.0 0.0
main Main 247 0 0.0 0.0 0.0 0.0
grepFiles Main 248 0 0.0 0.0 0.0 0.0
grepFile Main 249 0 0.0 0.0 0.0 0.0
The profiling did not immediately suggest ways to improve performance. It appears to be an IO issue, and this article demonstrates faster ways to do IO in Haskell.