blag/drafts/2017-12-12-dataflow.org
2018-02-06 17:52:16 -05:00

2.1 KiB

Dataflow paradigm (working title)

I don't know if there's actually anything to write here.

There is a sort of parallel between the declarative nature of computational graphs in TensorFlow, and functional programming (possibly function-level - think of the J language and how important rank is to its computations).

Apache Spark and TensorFlow are very similar in a lot of ways. The key difference I see is that Spark handles different types of data internally that are more suited to databases, reords, tables, and generally relational data, while TensorFlow is, well, tensors (arbitrary-dimensional arrays).

The interesting part to me with both of these is how they've moved "bulk" computations into first-class objects (ish) and permitted some level of introspection into them before they run, as they run, and after they run. Like I noted in Notes - Paper, 2016-11-13, "One interesting (to me) facet is how the computation process has been split out and instrumented enough to allow some meaningful introspection with it. It hasn't precisely made it a first-class construct, but still, this feature pervades all of Spark's major abstractions (RDD, DataFrame, Dataset)."

Spark does this with a database. TensorFlow does it with numerical calculations. Node-RED does it with irregular, asynchronous data.