blag/content/posts/2015-06-23-stupidity-catalogue-genericstruct.md
2020-04-11 22:25:33 -04:00

242 lines
11 KiB
Markdown

---
title: "Catalogue of My Stupidity: My Haskell 'GenericStruct' Nonsense"
author: Chris Hodapp
date: "2015-06-23"
tags:
- stupidity
- Technobabble
---
*(A note: I took these notes during my time at Urbanalta, intending
them to be a private reference to myself on how to learn from some
mistakes. I've tried to scrub the proprietary bits out and leave the
general things behind. I do reference some other notes that probably
will still stay private.)*
# Background
Some background on this: This is some notes on a small Haskell module
I did at Urbanalta which I called `GenericStruct`. Most of this post
is very Haskell-heavy and perhaps more suited to [HaskellEmbedded][]
as it's a very niche usage even within Haskell.
I talk about this much more extensively in my handwritten work notes,
circa 2015-05-05 to 2015-05-20, and in a source file
`GenericStruct.hs`. Neither of those are online (and trust me that
you don't want to try to understand my scratch notes anyway), but a
cleaner summary is in the [Appendix](#appendix).
The short version is that I needed a way to express the format a
packed data structure, similar to a C struct in some ways, but without
any padding between fields, and more explicit about the exact size in
bits of fields. I wanted this format to also be able to carry some
documentation with it because it was meant to be able to express data
formats for Bluetooth Low Energy, and so I had a need to present this
format in a human-readable way and possibly a more general
machine-readable way (such as JSON). This was a similar design goal
to my Python creation, AnnotatedStruct, from nearly 2 years ago, but
here I wanted the benefits of static typing for when accessing these
data structures.
What complicated matters somewhat is that these data structures,
rather than being used directly in Haskell to store things, were to be
used with [Ivory][] to model the proper C code for reading and
writing.
What I eventually came up with used Haskell records with
specially-crafted data types inside them, and then [GHC.Generics][] to
iterate over these data types and inject into them some context
information (a field accessor in Haskell by itself cannot have any
information about 'where' in the record it is, whether in absolute
terms or relative to any other field). Context information here meant
things like a bit offset, a size, and a type representation.
This was a little complicated to implement, but overall, not
particularly daunting. The [GHC.Generics][] examples included generic
serialization which is a very similar problem in many ways, and I
followed from this example and a JSON example.
(Another note from some prior work: Do not attempt to do record access
with Base.Data.Data. It can get some meta-information, like the
constructor itself, but only in a sufficiently generic way that you
may call nothing specific to a record on it.)
# Problems
The problem that I ran into fairly quickly (but not quick enough) is
that what I had created had no good way to let me nest data formats
inside each other. For instance, I had a 16-bit value which I used in
several places, and that 16-bit value was treated in many places as 16
individual bitfields with each bit representing a specific Boolean
parameter unto itself. In other places, treating it as simply a
single 16-bit integer was more meaningful - and this was related to it
being used in several places, such that operations like copying from
one place to another became meaningful.
I had no good way to express this. I could not define that format in
one place, and then put it inside each struct that used it. I thought
initially that implementing this would be a matter of just making
certain structures recursive, and I was partly right, but I ran into
such complication in the type system that I felt like it was not worth
it to proceed further.
What I wrote yesterday when I ran into these serious snags was:
- At this stage of complexity, I sort of wish I'd opted for
[Template Haskell][] instead. It would have absorbed the change
much better. [GHC.Generics][] required me to sort of bend the type
system. The problem there is that it had only so far to bend, while
with Template Haskell the whole Haskell language (more or less)
would be at my disposal, not just some slightly-pliable parts of its
type system. (Perhaps this is why Ivory does what it does.)
- Idris may have handled this better too by virtue of its dependent
types, and for similar reasons.
- `johnw` (Freenode IRC `#haskell` denizen &
[Galois Inc.](https://galois.com/) employee) mentioned a
possibly-viable approach based around an applicative expression of
data formats and not requiring things like Template Haskell or
possibly Generics. (See IRC logs from 2015-06-04; this was via PM.)
- Another person mentioned that this sounded like a job for
[Lenses][lens], particularly,
their
[Iso](https://hackage.haskell.org/package/lens/docs/Control-Lens-Iso.html)
(isomorphism) type which had different 'projections' of data.
# What I did right
I properly implemented a nice structure with GHC.Generics over top of
Haskell records, and kept it fairly compact and strictly-typed. I
started making use of it right away, and this meant errors would
readily show up (generally as type errors at compile time) as I made
changes.
I kept the code clean and well-documented, and this helped me out
substantially with writing the code, understanding it, and then
understanding that much of it shouldn't have been written.
I think that overall that it was a good idea for me to treat the data
format as a specification that could be turned into C code, into a
JSON description, and into (eventually) a human-readable description.
# What I did wrong, and should have done instead
Overall: I tried very hard to solve the very unique, very specific
problem. This blocked my view of the real, more general problem.
Despite active attempts to discern that general problem, I was fixated
on specifics. Despite my preachy guideline elsewhere in my notes
that, "Your problem is not a unique snowflake - someone else has
studied it," I assumed my problem was a relatively unique snowflake.
When I expressed the problem to other people, a number of them told me
that this sounded like a job for the [Lens][lens] library in Haskell.
On top of this, I had used Lenses before. While I had not used them
enough to know for certain that Lens was the best solution here, I had
used them enough to know that they were a likely first place to look.
But, I ignored this experience, and I ignored what other people told
me.
Why I ignored this, I suspect, is because I was focusing too much on
the specifics of the issue. This led me to believe that this problem
was sufficiently unique and different that Lenses were not an approach
I should even look at.
Lenses might not be the proper approach, but I am almost certain that
examining them would have helped me.
I missed something very crucial: That I would need to nest data
formats and share definitions between them. This should have been
obvious to me: this is a functional language, and composition (which
is what this is) is essential to abstraction and reuse.
I assumed that my solution would have to be tied to Haskell records.
This was not a given. Further, I knew of three methods which created
similar structures but did not rely on records: Lenses, Ivory structs,
and Ivory bitdata. Records were an irrelevancy (even to the specifics
I was fixated on) and I tightly coupled my solution to them. Records
are not meant to compose, while some other structures are.
# Short, general summary
*(i.e. the part where I get really preachy about vague things)*
- Foresee what else your problem may need to encompass. Perhaps it
only looks like a unique problem because you put too much weight on
the specifics, and you've missed the ways it resembles existing,
well-studied problems - perhaps even ones you are familiar with.
- Perhaps you haven't missed anything notable. Still, knowing what
else it may need to encompass makes for better solutions, and may
prevent you from making design decisions early on which
fundamentally limit it in ways that matter later.
- Unsurprisingly, ekmett probably solved your problem already. (Or
perhaps acowley did with [Vinyl][], or perhaps [compdata][] solves
it...)
# Appendix {#appendix}
My aim was to solve a few problems:
- Outputting a concrete representation of an entire type (for the
sake of inserting into JSON specs, for instance),
- Creating a correspondence between native Haskell types and Ivory's
specific types (which ended up not being so necessary),
- Packing and unpacking a struct value to and from memory
automatically (via Ivory),
- Unpacking and packing individual fields of a struct (also via
Ivory),
- Doing the above with the benefit of strict, static typing (i.e. not
relying on strings to access a field),
- Handling all of this with (in Ivory) an in-memory representation
with no padding or alignment concerns,
- Having a single specification of a type, including human-readable
descriptions.
What I saw as the largest problem is that accessors for Haskell
records have no accessible information on which field they access, or
where that field is relative to anything else in the data
structure. Thus, if I access a field of a record, I can have no
information there about 'where' in the record it is unless I put that
information there somehow. The pieces of information that I seemed to
need in the field were the field's overall index in the record, and
the field's overall memory offset.
A simpler form of generics, Data.Data, allowed me to solve the first
problem easily and produce a list of something like (TypeRep, name,
size, position). However, I ran into problems when trying to find a
way to insert context into the record somehow. The central issue is
that Data.Data provides no way to do anything other than generic
operations on a field, and those generic operations are fairly
limited. I could find no way to make something like a typeclass and
use typeclass methods to update those fields.
GHC.Generics, on the other hand, made this fairly trivial. I could
solve the first problem (albeit in a more complicated way), and what I
eventually turned the other problems into was the need to take the
generic record type itself (in this case, in the form of a Proxy to
it), and given certain constraints on it, to create a generic
constructor for this type.
This proved to be fairly easy. Most of GHC.Generics will as readily
traverse a Proxy of a type as a value of the type, given some changes
which mostly amount to a lot of use of fmap. The 'to' function in
GHC.Generics (after I had cleared up some conceptual confusion) simply
took the representation and removed the Proxy (or pushed it
elsewhere), until it hit a certain innermost point in which one
created an abstract representation of the constructor call itself, but
this time with the proper data.
Most of the rest was just modifying the above to allow me to propagate
context information such as index and memory offset, and dealing with
the confusion of type families (which I ended up needing much less of
than I initially thought).
[GHC.Generics]: https://hackage.haskell.org/package/base/docs/GHC-Generics.html
[Ivory]: https://hackage.haskell.org/package/ivory
[Vinyl]: https://hackage.haskell.org/package/vinyl
[compdata]: https://hackage.haskell.org/package/compdata
[lens]: https://hackage.haskell.org/package/lens
[Template Haskell]: https://wiki.haskell.org/Template_Haskell
[HaskellEmbedded]: https://haskellembedded.github.io/