But in Haskell arrays are NOT constant-time updatable, and the lazy-evaluated nature of Haskell is largely to blame for this "problem". But there is more to it than that. The implications of a lazy-evaluated language are often difficult to grasp, but in a nutshell, it allows you to express your program as a mathematical equation so the compiler can "prove" your equations are correct and efficient. The compiler can therefore (ususally) choose the most efficient low-level implementation of your algorithm. So, using "Data.Array.IO" to create arrays, and to do all read/updates in the IO monad is NOT necessarily going to produce the fastest executable program.
Using the IO monad basically forces execution to happen in a certain order. This means the compiler is unable to employ a vast range of optimizing strategies on IO computations. Using an unboxed array in the IO monad, you may gain some efficiency from constant-time updates, but you have likely lost efficiency because the compiler can no longer prove that all the steps leading up to each write operation were as efficient as they could have been.
For example, what if your code does this:
main = do
a <- newArray (0,1024) 0 :: IOUArray Int IO Int
writeArray a 0 999
something_else
something_else = do
writeArray a 0 555
The "something_else" function overwrites the value written by the "main". But, it happened in the IO monad, so the compiler promises it will execute both writes, so it is obliged to not optimize away that first write from "main". Had your code been pure (i.e. not in the IO monad) the compiler may have been able to prove that the first array operation is always unnecessary. If the "main" function were some other function that is executed a million times, that would cause quite a performance hit. Of course, this is an oversimplified example, and your Haskell compiler may well be clever enough to optimize-out the needless array writes. But this problem becomes especially hard to catch as programs get larger and more complex.
So, keep these things in mind:
- Arrays are simply mappings from integers to elements.
- Arrays are similar to "Data.IntMap"
- Only use arrays if you want constant-time reads, not writes.
- For histograms, use "Data.Array.Diff" which has Array structures optimized for that purpose.
- Use an "IntMap" for fast writes of arbitrary elements.
- Use "Control.Monad.State.Lazy" with an "IntMap" or "DiffArray" as the state.
Try to structure your program so all the writes are done in one broad stroke, and then use "Data.IntMap" for writing. Then, when you need not make anymore updates, "freeze" your IntMap into an immutable array and let the rest of your program enjoy constant time lookups.
import MySetup (lots_of_writes)
import Data.Array.IArray
import Data.IntMap
import Control.Monad.State.Lazy
lb = 0 -- lower bound
ub = 1024 -- upper bound
main = do
-- First do all "write" operations:
let my_intmap = execState (lots_of_writes lb ub) (Data.IntMap.empty)
-- Then "freeze" it, make an array for lots of reads:
let my_array = array (lb, ub) (Data.IntMap.elems my_intmap)
-- Then do lots of reads:
ixlist <- (readFile "./indecies" >>= return . read) :: IO [Int]
mapM_ (\i -> putStrLn $ show (my_array ! i)) ixlist
No comments:
Post a Comment