It may be true that FP does make concurrency easier, but I don't think this has been demonstrated to be generally true yet
Perhaps not generally true, but I think MapReduce and LINQ are two prominent examples of how a "functional programming-like" model lends itself to easy parallelization and distribution.
And there are also enough 'prominent examples' for impure imperative languages. For instance, OpenMP makes parallelizing some loops in C/C++ a walk in the park, and I have parallelized some of my programs by adding only a few pragmas.
The 'FP makes concurrency easier' is a often-used selling point, but I think the jury is still out. First of all, many functional languages are impure, and do not have this advantage in the same manner that e.g. Haskell has. Second, purity can be an expensive trade-off due to the copying involved. For lots of things (e.g. fast linear algebra) you'll still want to work in an impure world.
I think a better selling point of some functional languages is clarity. For instance, some class of problems can be solved very elegantly in Haskell and ML using algebraic data types, pattern matching, etc. But then again, some other classes of problems can be solved elegantly in Prolog.
And there are also enough 'prominent examples' for impure imperative languages. For instance, OpenMP makes parallelizing some loops in C/C++ a walk in the park, and I have parallelized some of my programs by adding only a few pragmas.
I'm curious -- were the programs that were easily parallelized with OpenMP largely data parallel to begin with? I haven't used OpenMP extensively, but I'd suspect that the ease of parallelizing a program with OpenMP probably varies inversely with the amount of explicit coordination/synchronization/update of shared state that the program requires.
Mostly, yes. This particular program evaluates effectiveness of features given an existing log-linear model. So, you can evaluate the effectiveness of features in parallel. For each feature, you can also partition the training data, process the the data in parallel, and apply a reduction step[1].
But 'map' in MapReduce is also a typical data-parallel task.
[1] In practice, there is a trade-off: the vectors are usually so large for the average training set, that you do not really want to copy them for memory-efficiency, so the mapping and reduction are interleaved, requiring some locking.
Perhaps not generally true, but I think MapReduce and LINQ are two prominent examples of how a "functional programming-like" model lends itself to easy parallelization and distribution.