blag/drafts/2017-12-12-dataflow.org

32 lines
1.4 KiB
Org Mode

#+TITLE: Dataflow paradigm (working title)
#+AUTHOR: Chris Hodapp
#+DATE: December 12, 2017
#+TAGS: technobabble
There is a sort of parallel between the declarative nature of
computational graphs in TensorFlow, and functional programming
(possibly function-level - think of the J language and how important
rank is to its computations).
Apache Spark and TensorFlow are very similar in a lot of ways. The
key difference I see is that Spark handles different types of data
internally that are more suited to databases, reords, tables, and
generally relational data, while TensorFlow is, well, tensors
(arbitrary-dimensional arrays).
The interesting part to me with both of these is how they've moved
"bulk" computations into first-class objects (ish) and permitted some
level of introspection into them before they run, as they run, and
after they run. Like I noted in Notes - Paper, 2016-11-13, "One
interesting (to me) facet is how the computation process has been
split out and instrumented enough to allow some meaningful
introspection with it. It hasn't precisely made it a first-class
construct, but still, this feature pervades all of Spark's major
abstractions (RDD, DataFrame, Dataset)."
# Show Tensorboard example here
# Screenshots may be a good idea too
Spark does this with a database. TensorFlow does it with numerical
calculations. Node-RED does it with irregular, asynchronous data.