Promise as a basic language feature

Posted on by Idorobots

Just the other day, while programming in Scala, I was thinking how nice it is that Haskell hides non-strictness (a.k.a. laziness) underneath, so no obnoxious lazy stream creation/forcing needs to be done. And then it struck me: how cool would it be to treat Promises in a language the same way that Haskell treats lazy values? Would such a language even make sense? How would you program in such a language?

Let's explore! You can find all the code included in this post in a more useful form here.

Promise

Here's a very simple Promise implementation written in Scheme, with support for asynchronous resolving & error rejection, which we'll need in order to explore this weird proposition:

(define *promises* '())

(define (make-promise value state then handle thunk)
  (list '&promise value state then handle thunk))

;; Field accessors omitted.

(define (promise fun)
  (let* ((val '())
         (state 'pending)
         (on-resolve id)
         (on-reject id)
         (resolve (lambda (v)
                    (set! val v)
                    (set! state 'resolved)
                    (on-resolve val)))
         (reject (lambda (e)
                   (set! val e)
                   (set! state 'rejected)
                   (on-reject val)))
         (then (lambda (t)
                 (if (equal? state 'resolved)
                     (t val)
                     (set! on-resolve t))))
         (handle (lambda (h)
                   (if (equal? state 'rejected)
                       (h val)
                       (set! on-reject h))))
         (p (make-promise (lambda () val)
                          (lambda () state)
                          then
                          handle
                          (lambda () (fun resolve reject)))))
    (set! *promises* (cons p *promises*))
    p))

A promise created with promise function is just a list of several getters/setters for different values - the value this promise will eventually resolve to, the state of the promise (either pending, resolved or rejected), two callbacks then and handle that will be called on state transitions and the actual, captured computation thunk. The thunk is the meat of the thing; the chunk of computation this promise represents.

To actually use these promises we need two more functions - then which will allow us to perform certain tasks when the promise resolves, and handle which will facilitate error recovery if anything goes wrong:

(define (then p fun)
  (promise (lambda (resolve reject)
             ((promise-handle p) reject)
             ((promise-then p) (lambda (val)
                                 (resolve (fun val)))))))

(define (handle p handler)
  (promise (lambda (resolve _)
             ((promise-handle p) (lambda (error)
                                   (resolve (handler error)))))))

Both of these conceptually create a new promise which will resolve to the value of the original promise mapped by the function passed in the second argument.

Let's add a way to actually run the thunk's until everything is resolved:

(define (run! p)
  ((promise-thunk p)))

(define (run-all!)
  (let ((ps *promises*))
    (set! *promises* '())
    (map run! ps)
    (unless (equal? *promises* '())
      (run-all!))))

run! and run-all! take care of traversing the list of currently defined *promises* and executing their corresponding thunks. It's worth mentioning, that during the execution of a thunk more promises can be created, so run-all! has to loop until no more promises need to be run.

With these functions defined we're nearly set, but in order to build wrappers for primitive values & operations we need one last function - a way to bind promises to one another:

(define (>>= p0 fun)
  (promise (lambda (resolve reject)
             ((promise-handle p0) reject)
             ((promise-then p0) (lambda (val)
                                  (let ((p1 (fun val)))
                                    ((promise-handle p1) reject)
                                    ((promise-then p1) resolve)))))))

Binding promises allows us to replace the value of one promise with another promise in a seamless way, without the need for recursive traversal or anything like that. Some implementations actually combine >>= and then for brevity, but that's not a concern for us, since all this crap will be hidden in the language itself. This is how we define primitive operators:

(define (primop op)
  (lambda (a b)
    (>>= a (lambda (a)
             (then b (lambda (b)
                       (op a b)))))))
(define &equal? (primop equal?))
(define &* (primop *))
(define &- (primop -))
(define &+ (primop +))
(define &<= (primop <=))

(define (& value) ;; Wraps each native value in a resolved promise.
  (make-promise (lambda () value)
                (lambda () 'resolved)
                (lambda (t) (t value))
                (lambda (h) value)
                (lambda () value)))

(define-syntax &if
  (syntax-rules ()
    ((&if c t e)
     (>>= c (lambda (r)
              (if r t e))))))

Now, that we have primitive values & operations figured out we can finally write some purely-asynchronous code - let's start with a function that computes factorial of a number:

(define (factorial n)
  (&if (&equal? n (& 0))
       (& 1)
       (&* n (factorial (&- n (& 1))))))

(then (factorial (& 20))
      (lambda (result) (display "Success!")))

(run-all!)

There's nothing inherently asynchronous about this implementation of factorial, but one could imagine &equal? performing arbitrary, asynchronous operations under the hood - for instance, it could send an email to a privileged user and resolve only when a response asserting equality with no hidden biases is received. We wouldn't know and the code would still look and work exactly the same, save for execution time. This actually is a pretty decent use case for a whole class of DB-talky, HTTP-sendy CRUD apps, which mostly consist of asynchronous operations of one kind or another. No special handling of promises needed.

Here's another example - error handling using promises:

(define (&/ a b)
  (>>= a (lambda (a)
           (>>= b (lambda (b)
                    (promise (lambda (resolve reject)
                               (if (equal? b 0)
                                   (reject "Can't divide by 0!")
                                   (resolve (/ a b))))))))))

(define (&catch value handler)
  (>>= handler
       (lambda (h)
         (>>=-handle value
                     (lambda (error)
                       (h (& error)))))))

(handle (then (&catch (&/ (& 10) (& 0))
              (& (lambda (error)
                   (&/ (& 10) (& 0.001))))) ;; Close enough!
              (lambda (result) (display "Success!")))
        (lambda (error) (display "This will never print!!")))

(run-all!)

A primitive operation that found itself in a less than desirable state can simply... Not resolve and terminate via rejection. This rejection can be optionally handled elsewhere - in this case via &catch. Admittedly though, implementing &catch requires >>=-handle which is a binder analogous to handle in operation, except it takes a a function returning promise as its second argument, much like >>=.

Now let's focus on adding concurrency to the mix. Assuming atomic updates to *promises* and each promise individually we merely need to parallelize run-all!, and since each thunk is fairly self-contained this ought to be pretty easy. Moreover, since we're using asynchronous promises we don't need to worry at all about the order of execution! We just need to run all thunks in any order whatsoever and that's it:

(set! *promises* (shuffle *promises*)) ;; Everything still works.
(run-all!)

It's worth mentioning that this implementation does not impose very many restrictions on the underlying concurrency model. There's nothing inherently binding promise execution strategy to the actual computation it conveys. All we need under the hood is a queue of promises awaiting execution and something that will execute them - for instance a thread pool would do nicely here and is fairly simple to implement.

So... Now that we have it laid out like this it really does look quite similar to how Haskell handles non-strictness and the concurrency perks are nice...

Wait a minute! Does this mean this weird idea is actually viable!?

CPS

Let's instead focus for a moment on an alternative approach to implementing concurrency in a programming language - the Actor Model. In Actor Model all concurrent computation is performed by so-called Actors - lightweight processes that synchronize themselves using message passing.

Implementing such a thing is quite a bit more complex than using a thread pool, especially if we aim to compile down to an Actor Model-based language. Fortunately enough, there's a clever technique that makes it a little bit easier - it's called Continuation Passing Style or CPS for short. CPS itself is quite straightforward - all you're really doing is calling another function instead of returning a value, but just as these things tend to turn out, it has some mind-boggling implications. For starters, sometimes it's really hard to understand what's going on in the code - and I mean really hard -, especially if some clever use of continuations is in progress. Another complex thing about CPS is the CPS-transform - a nontrivial task of converting direct-style (the regular kind) code into continuation passing style. For these reasons I won't dive too deep into the CPS, so let's assume we both read and understood bits of the wikipedia page on the topic, and let's see some spaghetti code. ( ͡° ͜ʖ ͡°)

(define (cps-primop op)
  (lambda (a b cont handler)
    (cont (op a b))))

(define ^equal? (cps-primop equal?))
(define ^* (cps-primop *))
(define ^- (cps-primop -))
(define ^+ (cps-primop +))
(define ^<= (cps-primop <=))

(define (fact-cps n cont handler)
  (^equal? n 0
           (lambda (comp)
             (if comp
                 (cont 1)
                 (^- n
                     1
                     (lambda (sub)
                       (fact-cps sub
                                 (lambda (fact)
                                   (^* n
                                       fact
                                       cont
                                       handler))
                                 handler))
                     handler)))
           handler))

(fact-cps 20
          (lambda (result) (display "Success!"))
          (lambda (error) (display "ERROR!")))

This is a CPS version of a factorial function along with a couple of primitive operations wrappers. Noticeably, in CPS there's no need to wrap any values in anything, we just use their... Well, values. The second obvious thing is the similarity to using Promises - what we have here is two callbacks, one resolving and the other rejecting a value, ordered in such a way that running one resolve cascades and recursively resolves more promises. Pretty clever. On the other hand, error handling is not clever at all - all we need to do in order to install a new error handler is... Well, just pass a different one along, that's it. Language-wise, this requires CPS-transform support, but conceptually beats &catch any time.

Unfortunately, we can't do anything fun with this factorial, since it just executes until completion. If we were to run one inside of an actor - it would hog the entire processor core, preventing any other actors from running on it. Needlessly to say, this is undesirable and we need to fix it. Fortunately, the fix for this is quite simple - we'll voluntarily yield execution and let the runtime decide which continuation to run next:

(define *continuation* '())

(define (%yield val cont handler)
  (set! *continuation* (list val cont handler)))

(define (cps-yield-primop op)
  (lambda (a b cont handler)
    (%yield (op a b) cont handler)))

(define %equal? (cps-yield-primop equal?))
(define %* (cps-yield-primop *))
(define %- (cps-yield-primop -))
(define %+ (cps-yield-primop +))
(define %<= (cps-yield-primop <=))

(define (fact-cps-yield n cont handler)
  (%equal? n 0
           (lambda (comp)
             (if comp
                 (%yield 1 cont handler)
                 (%- n
                     1
                     (lambda (sub)
                       (fact-cps-yield sub
                                       (lambda (fact)
                                         (%* n
                                             fact
                                             cont
                                             handler))
                                       handler))
                     handler)))
           handler))

Our new version of factorial is pretty much identical to the previous one, save for the %yield call in the then branch of the if expression. What changed however is the fact that all primitive operations now %yield execution by default and we store the previously yielded continuation - in a manner similar to storing a promise for later execution.

In order to execute this factorial function, we need something to step through the yielded chain of continuations and another function that will schedule a chunk of computation for execution:

(define (step! cont)
  ((cadr cont) (car cont)))

(define (step-all!)
  (unless (equal? *continuation* '())
    (let ((c *continuation*))
      (set! *continuation* '())
      (step! c)
      (step-all!))))

(define (schedule thunk)
  (%yield 23
          (lambda (_)
            (thunk))
          (lambda (e) e)))

We can schedule a call to factorial like this:

(schedule (lambda ()
            (fact-cps-yield 20
                            (lambda (result) (display "Success!"))
                            (lambda (error) (display "ERROR!")))))
(step-all!)

In terms of concurrency, we could store several such continuations and step through them arbitrarily, yielding execution at each step and swapping for another continuation if need be. This works especially well with the Actor Model, where we already have a perfect place to store such a continuation (the actor itself), plus an actor already represents an on-going process, which an unterminated continuation is.

Asynchronity is achievable on inter-continuation level. What I mean by that is, basically, that we can spin up a new actor or a few new actors that will perform asynchronous computations and return results whenever available. It's not as sexy as Promises, but if it's good enough for Erlang, then it's definitely good enough for me.

Conclusions & benchmark

Now, back to the viability question. We've seen two potential implementations of concurrency in a programming language runtime - both having their perks and trade-offs. The CPS-based one has a somewhat proven track record of me implementing an Actor Model-based programming language, where CPS enabled some nice additional features. However, some people argue that Promises are the way to go and Actor Model should be avoided. I don't buy into that argument, but having uniform and transparent treatment of synchronous and asynchronous code sure does look nice... Let's do some half-assed performance benchmarking to help us decide!

(All benchmarks are presented with a baseline of direct-style, synchronous Scheme implementation and are meant to be taken with a grain of salt, after all this is a sample of 1 and my Promise implementation probably is a bit wonky. What should be noted though, is the performance to straightforwardness of implementation ratio of different approaches, which this benchmark highlights. All presented times are in milliseconds.)

Running 1000 iterations of four versions of (fibonacci 18) on a single core:

baseline:  ((real-time . 0.032)  (cpu-time . 0.03)   (gc-time . 0))
cps:       ((real-time . 0.159)  (cpu-time . 0.156)  (gc-time . 0.049))
cps-yield: ((real-time . 1.013)  (cpu-time . 1.01)   (gc-time . 0.084))
promise:   ((real-time . 62.481) (cpu-time . 62.273) (gc-time . 37.052))

Running 100 iterations of four versions of (fibonacci 23) on a single core:

baseline:  ((real-time . 0.36)    (cpu-time . 0.36)    (gc-time . 0))
cps:       ((real-time . 1.81)    (cpu-time . 1.83)    (gc-time . 0.54))
cps-yield: ((real-time . 9.92)    (cpu-time . 9.87)    (gc-time . 0.61))
promise:   ((real-time . 1647.02) (cpu-time . 1648.17) (gc-time . 1257.45))

Running 1000 iterations of four versions of (fibonacci 23) on 8 cores (unfortunately, only real-time seems to be somewhat valid for multi-core benchmarks):

baseline:  ((real-time . 0.587)    (cpu-time . 3.782)    (gc-time . 0))
cps:       ((real-time . 7.615)    (cpu-time . 47.98)    (gc-time . 10.92))
cps-yield: ((real-time . 53.387)   (cpu-time . 353.529)  (gc-time . 26.831))
promise:   ((real-time . 5381.291) (cpu-time . 30913.62) (gc-time . 22178.393))

Ouch, Promise-based implementation really stands out here and not in a good way. What happened? As previously briefly mentioned, promises have a lot in common with laziness and, unfortunately, share some of its drawbacks as well. In this case, the benchmark creates a large number of simultaneously existing promises resulting in excessive memory allocation, coping and GC pauses. There are ways to fix it, but these entail much complication to the implementation. Not something I want to explore right now.

Interestingly enough, the CPS version with yield seems to be about 10 times slower than the plain CPS version, which in turn is about 10 times slower than the baseline. Also, running the benchmark on multiple cores at the same time kinda, sorta appears to slow things down? ¯\_(ツ)_/¯ I blame Racket's time-apply function.

Now, just for fun, let's run 10 iterations of four versions of (fibonacci 35) on a single core:

baseline:  ((real-time . 98.1)   (cpu-time . 98.4)   (gc-time . 0))
cps:       ((real-time . 384.1)  (cpu-time . 382.0)  (gc-time . 12.5))
cps-yield: ((real-time . 3039.6) (cpu-time . 3041.7) (gc-time . 91.0))
promise:   ;; Munched through 16 GB of memory at which point I killed it.

To sum it up...

CPS:

  • Fits Actor Model really well.
  • First class control structures - via exposed continuations.
  • No sexy asynchronity support.
  • CPS-transform is hard to implement.
  • Pretty performant out of the box.

Promises:

  • Little restrictions on concurrency model.
  • Seamless asynchronity.
  • Easy compilation.
  • No first class control structures.
  • Poor performance or complex implementation - choose one..

So, are Promises a viable basic programming language construct? Probably yes. The real question, though, seems to be: is there a middle-ground between CPS and Promises? That's definitely something to be explored further...