Projections Explained (DDD EU 2020 talk summary)
Projections Explained
Projections since 2009…
CQRS — split reading and writing (this brought opportunities)
Mostly a logical separation (not a physical one) — that’s not what the pattern is about (but you can do it)
Event Sourcing + DDD — apply it on a bounded context level
You need to know the requirements from the consumers BEFORE building projections
Event Store is a collection of streams (which are in turn collections/sequences of heterogenous events partitioned by name or key)
Aggregate instance has its own stream
Each event appears at a certain position/offset (sequence) in an event stream (a monotonically incremented number)
A subscription is needed in order to be notified when new events arrive
- Starts off at a known position in the ALL stream, or from a VERSION in the aggregate stream (this allows us to resume when we crash — that’s why projection is in control of the cursor)
What is a projection?
It is the act of transforming a stream of events
act of deriving state from a stream of events
…many meanings
More specifically a projection is a transformation function that transforms the stream of events to a certain data structure (derived state)
which could be stored in-memory or to a specific storage engine (database) which are optimized for your reading use case (querying) — don’t ignore transaction support of it
Don’t always use the same stack for everything.
If your read models change because of different reasons — build two projections (it depends)
They often get neglected — they do need some analysis and design!
What does this mean?
- What is the optimal data structure for this read model
- What events do we need
- Where do we store it
Read models (data structures) capture data such as:
- used to identify rows / values / documents
- to return to consumers
- to filter on
- needed to allow updates from future events
Writing Projections
- written with cohesion / as units
- focus on what, not how
- limit amount of boilerplate
- optimized for your platform/language
- don’t be conservative about the number of them (this one is mine)
Forms of projections:
f(state, event) -> state
f(event) -> statement[]
f(connection, event) -> unit — eg. for calling different API's
IHandle<TEvent> — using generic interfaces
handling using pattern matching
can be handled in an actor
Smells
- complexity
- dependencies (e.g. on another projections data structure) — try to avoid unless you can ensure the order of projection building, but it’s still not encouraged, nor fun
- repeated logic in different projections or same logic in the domain model and read model (don’t do this)
- big data structure — depends
Testing
Should test if projection behaves as expected (specifications)
Preferably as integration tests
Nothing special (given these events -> I’m expecting for this state to be in the store)
MISC
Use lookup caches (in-memory or otherwise) if you need historical data while building projections
If you have big read models — think if you really need all of that history — is it temporal — is it still relevant this year / this month … (can help you keep the volume of your data down) — try to partition by time (this is domain-specific)
You can be flexible about a way you deploy and run your projection processes — but keep them lightweight and single-threaded (little use of doing parallel writes)
You want them to be fault-tolerant and able to recover from failure and continue where they left off.
Try to isolate different projections in different processes (so they don’t block each-other)
All in all, an informative talk and I can confirm many of the points he made based on my personal experience in dealing with projections.