Currently Dyalog APL provides green-threaded concurrent programming using a traditional "fork" model as well as a CSP-style distributed computing model called "Isolates." In addition, Dyalog will utilize vectorized operations on the CPU for a number of primitives and code patterns. Co-dfns provides for additional functionality by providing vector level parallelism on the GPU/CPU and a true threading "fork" (this feature is only part of the language extension, and has not yet been implemented). So, if you write your code in a Co-dfns compatible manner (closed single Namespace of dfns, and some limitations in the current version which are being lifted rapidly), then you will be able to execute it on the GPU, CPU, and SMP machines.
As a trivial example, say you wanted to compute the quadratic prime numbers algorithm in parallel:
Primes←{(2=+⌿0=X∘.|X)/X←⍳⍵}
You would simply compile this function inside of a namespace using Co-dfns and you could then run this function on the GPU to improve performance. The compiled code would run on either the CPU, GPU, or SMP machines.