The previous solution is a lot slower than the system provided grep:

Macintosh-2:Haskell clark$ ghc --make Main.hs
[1 of 1] Compiling Main             ( Main.hs, Main.o )
Linking Main ...
Macintosh-2:Haskell clark$ time ./Main -r teen regex-tdfa-1.1.2

real	0m0.643s
user	0m0.546s
sys	0m0.089s
Macintosh-2:Haskell clark$ time grep -r teen regex-tdfa-1.1.2

real	0m0.018s
user	0m0.008s
sys	0m0.010s

ghc has a profiler which we can use to find out where the code is spending its time and how much memory is getting allocated.

The profiler requires that options be passed to it on the command line. This is inconvenient when the program to be profiled itself takes command line options. Thus, to profile the program, the first step was to hardcode some command line arguments.

-- a <- getArgs
a <- return [ "-r", "teen", "regex-tdfa-1.1.2" ]

Next compile a profiled version of the code:

ghc --make -prof -auto-all Main.hs

Then run the executable:

./Main +RTS -p

The profiling output appears in Main.prof:

        Mon May 25 09:02 2009 Time and Allocation Profiling Report  (Final)

           Main +RTS -p -RTS

        total time  =        0.25 secs   (5 ticks @ 50 ms)
        total alloc = 179,197,136 bytes  (excludes profiling overheads)

COST CENTRE                    MODULE               %time %alloc

doesLineMatch                  Main                  60.0   32.1
grepHandlePerLine              Main                  40.0   67.4

                                                                                               individual    inherited
COST CENTRE              MODULE                                               no.    entries  %time %alloc   %time %alloc

MAIN                     MAIN                                                   1           0   0.0    0.0   100.0  100.0
  main                   Main                                                 262           0   0.0    0.0     0.0    0.0
   grepFiles             Main                                                 263           0   0.0    0.0     0.0    0.0
    grepFile             Main                                                 264           0   0.0    0.0     0.0    0.0
 CAF                     Main                                                 236           9   0.0    0.0   100.0  100.0
  grepHandle             Main                                                 252           0   0.0    0.0     0.0    0.0
  grepHandlePerLine      Main                                                 251           0   0.0    0.0     0.0    0.0
  main                   Main                                                 242           1   0.0    0.0   100.0  100.0
   getGrepPattern        Main                                                 259           1   0.0    0.0     0.0    0.0
    matchCaseInsensitive Main                                                 260           1   0.0    0.0     0.0    0.0
   grepFiles             Main                                                 245          47   0.0    0.0   100.0  100.0
    grepFile             Main                                                 246          37   0.0    0.1   100.0  100.0
     grepHandle          Main                                                 253       10705   0.0    0.0   100.0   99.9
      grepHandlePerLine  Main                                                 255       10705  40.0   67.4   100.0   99.8
       outputNonMatch    Main                                                 261       10677   0.0    0.3     0.0    0.3
       doesLineMatch     Main                                                 256       10677  60.0   32.1    60.0   32.1
        matchCaseInsensitive Main                                                 258       10677   0.0    0.0     0.0    0.0
        matchRegularExpression Main                                                 257       10677   0.0    0.0     0.0    0.0
      outputPerFile      Main                                                 254       10705   0.0    0.1     0.0    0.1
     recursiveDir        Main                                                 250           9   0.0    0.0     0.0    0.0
   getOptions            Main                                                 244           4   0.0    0.0     0.0    0.0
   getNonOptionArguments Main                                                 243           4   0.0    0.0     0.0    0.0
 CAF                     GHC.Int                                              201           1   0.0    0.0     0.0    0.0
 CAF                     GHC.Handle                                           189           6   0.0    0.0     0.0    0.0
 CAF                     System.Posix.Internals                               172           7   0.0    0.0     0.0    0.0
 CAF                     System.Directory                                     125           2   0.0    0.0     0.0    0.0
  main                   Main                                                 247           0   0.0    0.0     0.0    0.0
   grepFiles             Main                                                 248           0   0.0    0.0     0.0    0.0
    grepFile             Main                                                 249           0   0.0    0.0     0.0    0.0

The profiling did not immediately suggest ways to improve performance. It appears to be an IO issue, and this article demonstrates faster ways to do IO in Haskell.

Leave a Reply