diff --git a/posts/2015-06-23-stupidity-catalogue-genericstruct.md b/posts/2015-06-23-stupidity-catalogue-genericstruct.md new file mode 100644 index 0000000..2ba6498 --- /dev/null +++ b/posts/2015-06-23-stupidity-catalogue-genericstruct.md @@ -0,0 +1,239 @@ +--- +title: Catalogue of My Stupidity: My Haskell 'GenericStruct' Nonsense +author: Chris Hodapp +date: June 23, 2015 +tags: stupidity, Technobabble +--- + +*(A note: I took these notes during my time at Urbanalta, intending +them to be a private reference to myself on how to learn from some +mistakes. I've tried to scrub the proprietary bits out and leave the +general things behind. I do reference some other notes that probably +will still stay private.)* + +# Background + +Some background on this: This is some notes on a small Haskell module +I did at Urbanalta which I called `GenericStruct`. Most of this post +is very Haskell-heavy and perhaps more suited to [HaskellEmbedded][] +as it's a very niche usage even within Haskell. + +I talk about this much more extensively in my handwritten work notes, +circa 2015-05-05 to 2015-05-20, and in a source file +`GenericStruct.hs`. Neither of those are online (and trust me that +you don't want to try to understand my scratch notes anyway), but a +cleaner summary is in the [Appendix](#appendix). + +The short version is that I needed a way to express the format a +packed data structure, similar to a C struct in some ways, but without +any padding between fields, and more explicit about the exact size in +bits of fields. I wanted this format to also be able to carry some +documentation with it because it was meant to be able to express data +formats for Bluetooth Low Energy, and so I had a need to present this +format in a human-readable way and possibly a more general +machine-readable way (such as JSON). This was a similar design goal +to my Python creation, AnnotatedStruct, from nearly 2 years ago, but +here I wanted the benefits of static typing for when accessing these +data structures. + +What complicated matters somewhat is that these data structures, +rather than being used directly in Haskell to store things, were to be +used with [Ivory][] to model the proper C code for reading and +writing. + +What I eventually came up with used Haskell records with +specially-crafted data types inside them, and then [GHC.Generics][] to +iterate over these data types and inject into them some context +information (a field accessor in Haskell by itself cannot have any +information about 'where' in the record it is, whether in absolute +terms or relative to any other field). Context information here meant +things like a bit offset, a size, and a type representation. + +This was a little complicated to implement, but overall, not +particularly daunting. The [GHC.Generics][] examples included generic +serialization which is a very similar problem in many ways, and I +followed from this example and a JSON example. + +(Another note from some prior work: Do not attempt to do record access +with Base.Data.Data. It can get some meta-information, like the +constructor itself, but only in a sufficiently generic way that you +may call nothing specific to a record on it.) + +# Problems + +The problem that I ran into fairly quickly (but not quick enough) is +that what I had created had no good way to let me nest data formats +inside each other. For instance, I had a 16-bit value which I used in +several places, and that 16-bit value was treated in many places as 16 +individual bitfields with each bit representing a specific Boolean +parameter unto itself. In other places, treating it as simply a +single 16-bit integer was more meaningful - and this was related to it +being used in several places, such that operations like copying from +one place to another became meaningful. + +I had no good way to express this. I could not define that format in +one place, and then put it inside each struct that used it. I thought +initially that implementing this would be a matter of just making +certain structures recursive, and I was partly right, but I ran into +such complication in the type system that I felt like it was not worth +it to proceed further. + +What I wrote yesterday when I ran into these serious snags was: + +- At this stage of complexity, I sort of wish I'd opted for + [Template Haskell][] instead. It would have absorbed the change + much better. [GHC.Generics][] required me to sort of bend the type + system. The problem there is that it had only so far to bend, while + with Template Haskell the whole Haskell language (more or less) + would be at my disposal, not just some slightly-pliable parts of its + type system. (Perhaps this is why Ivory does what it does.) +- Idris may have handled this better too by virtue of its dependent + types, and for similar reasons. +- `johnw` (Freenode IRC `#haskell` denizen & + [Galois Inc.](https://galois.com/) employee) mentioned a + possibly-viable approach based around an applicative expression of + data formats and not requiring things like Template Haskell or + possibly Generics. (See IRC logs from 2015-06-04; this was via PM.) +- Another person mentioned that this sounded like a job for + [Lenses][lens], particularly, + their + [Iso](https://hackage.haskell.org/package/lens/docs/Control-Lens-Iso.html) + (isomorphism) type which had different 'projections' of data. + +# What I did right + +I properly implemented a nice structure with GHC.Generics over top of +Haskell records, and kept it fairly compact and strictly-typed. I +started making use of it right away, and this meant errors would +readily show up (generally as type errors at compile time) as I made +changes. + +I kept the code clean and well-documented, and this helped me out +substantially with writing the code, understanding it, and then +understanding that much of it shouldn't have been written. + +I think that overall that it was a good idea for me to treat the data +format as a specification that could be turned into C code, into a +JSON description, and into (eventually) a human-readable description. + +# What I did wrong, and should have done instead + +Overall: I tried very hard to solve the very unique, very specific +problem. This blocked my view of the real, more general problem. +Despite active attempts to discern that general problem, I was fixated +on specifics. Despite my preachy guideline elsewhere in my notes +that, "Your problem is not a unique snowflake - someone else has +studied it," I assumed my problem was a relatively unique snowflake. + +When I expressed the problem to other people, a number of them told me +that this sounded like a job for the [Lens][lens] library in Haskell. +On top of this, I had used Lenses before. While I had not used them +enough to know for certain that Lens was the best solution here, I had +used them enough to know that they were a likely first place to look. +But, I ignored this experience, and I ignored what other people told +me. + +Why I ignored this, I suspect, is because I was focusing too much on +the specifics of the issue. This led me to believe that this problem +was sufficiently unique and different that Lenses were not an approach +I should even look at. + +Lenses might not be the proper approach, but I am almost certain that +examining them would have helped me. + +I missed something very crucial: That I would need to nest data +formats and share definitions between them. This should have been +obvious to me: this is a functional language, and composition (which +is what this is) is essential to abstraction and reuse. + +I assumed that my solution would have to be tied to Haskell records. +This was not a given. Further, I knew of three methods which created +similar structures but did not rely on records: Lenses, Ivory structs, +and Ivory bitdata. Records were an irrelevancy (even to the specifics +I was fixated on) and I tightly coupled my solution to them. Records +are not meant to compose, while some other structures are. + +# Short, general summary + +*(i.e. the part where I get really preachy about vague things)* + +- Foresee what else your problem may need to encompass. Perhaps it + only looks like a unique problem because you put too much weight on + the specifics, and you've missed the ways it resembles existing, + well-studied problems - perhaps even ones you are familiar with. + +- Perhaps you haven't missed anything notable. Still, knowing what + else it may need to encompass makes for better solutions, and may + prevent you from making design decisions early on which + fundamentally limit it in ways that matter later. + +- Unsurprisingly, ekmett probably solved your problem already. (Or + perhaps acowley did with [Vinyl][], or perhaps [compdata][] solves + it...) + +# Appendix {#appendix} + +My aim was to solve a few problems: + + - Outputting a concrete representation of an entire type (for the +sake of inserting into JSON specs, for instance), + - Creating a correspondence between native Haskell types and Ivory's +specific types (which ended up not being so necessary), + - Packing and unpacking a struct value to and from memory +automatically (via Ivory), + - Unpacking and packing individual fields of a struct (also via + Ivory), + - Doing the above with the benefit of strict, static typing (i.e. not +relying on strings to access a field), + - Handling all of this with (in Ivory) an in-memory representation +with no padding or alignment concerns, + - Having a single specification of a type, including human-readable +descriptions. + +What I saw as the largest problem is that accessors for Haskell +records have no accessible information on which field they access, or +where that field is relative to anything else in the data +structure. Thus, if I access a field of a record, I can have no +information there about 'where' in the record it is unless I put that +information there somehow. The pieces of information that I seemed to +need in the field were the field's overall index in the record, and +the field's overall memory offset. + +A simpler form of generics, Data.Data, allowed me to solve the first +problem easily and produce a list of something like (TypeRep, name, +size, position). However, I ran into problems when trying to find a +way to insert context into the record somehow. The central issue is +that Data.Data provides no way to do anything other than generic +operations on a field, and those generic operations are fairly +limited. I could find no way to make something like a typeclass and +use typeclass methods to update those fields. + +GHC.Generics, on the other hand, made this fairly trivial. I could +solve the first problem (albeit in a more complicated way), and what I +eventually turned the other problems into was the need to take the +generic record type itself (in this case, in the form of a Proxy to +it), and given certain constraints on it, to create a generic +constructor for this type. + +This proved to be fairly easy. Most of GHC.Generics will as readily +traverse a Proxy of a type as a value of the type, given some changes +which mostly amount to a lot of use of fmap. The 'to' function in +GHC.Generics (after I had cleared up some conceptual confusion) simply +took the representation and removed the Proxy (or pushed it +elsewhere), until it hit a certain innermost point in which one +created an abstract representation of the constructor call itself, but +this time with the proper data. + +Most of the rest was just modifying the above to allow me to propagate +context information such as index and memory offset, and dealing with +the confusion of type families (which I ended up needing much less of +than I initially thought). + + +[GHC.Generics]: https://hackage.haskell.org/package/base/docs/GHC-Generics.html +[Ivory]: https://hackage.haskell.org/package/ivory +[Vinyl]: https://hackage.haskell.org/package/vinyl +[compdata]: https://hackage.haskell.org/package/compdata +[lens]: https://hackage.haskell.org/package/lens +[Template Haskell]: https://wiki.haskell.org/Template_Haskell +[HaskellEmbedded]: https://haskellembedded.github.io/