Projections Explained (DDD EU 2020 talk summary)

Projections Explained

Projections since 2009…

CQRS — split reading and writing (this brought opportunities)

Mostly a logical separation (not a physical one) — that’s not what the pattern is about (but you can do it)

Event Sourcing + DDD — apply it on a bounded context level

You need to know the requirements from the consumers BEFORE building projections

Event Store is a collection of streams (which are in turn collections/sequences of heterogenous events partitioned by name or key)

Aggregate instance has its own stream

Each event appears at a certain position/offset (sequence) in an event stream (a monotonically incremented number)

A subscription is needed in order to be notified when new events arrive

  • Starts off at a known position in the ALL stream, or from a VERSION in the aggregate stream (this allows us to resume when we crash — that’s why projection is in control of the cursor)

What is a projection?

It is the act of transforming a stream of events

act of deriving state from a stream of events

…many meanings

More specifically a projection is a transformation function that transforms the stream of events to a certain data structure (derived state)

which could be stored in-memory or to a specific storage engine (database) which are optimized for your reading use case (querying) — don’t ignore transaction support of it

Don’t always use the same stack for everything.

If your read models change because of different reasons — build two projections (it depends)

They often get neglected — they do need some analysis and design!

What does this mean?

  • What is the optimal data structure for this read model
  • What events do we need
  • Where do we store it

Read models (data structures) capture data such as:

  • used to identify rows / values / documents
  • to return to consumers
  • to filter on
  • needed to allow updates from future events

Writing Projections

  • written with cohesion / as units
  • focus on what, not how
  • limit amount of boilerplate
  • optimized for your platform/language
  • don’t be conservative about the number of them (this one is mine)

Forms of projections:

f(state, event) -> state

f(event) -> statement[]

f(connection, event) -> unit — eg. for calling different API's

IHandle<TEvent> — using generic interfaces

handling using pattern matching

can be handled in an actor

Smells

  • complexity
  • dependencies (e.g. on another projections data structure) — try to avoid unless you can ensure the order of projection building, but it’s still not encouraged, nor fun
  • repeated logic in different projections or same logic in the domain model and read model (don’t do this)
  • big data structure — depends

Testing

Should test if projection behaves as expected (specifications)

Preferably as integration tests

Nothing special (given these events -> I’m expecting for this state to be in the store)

MISC

Use lookup caches (in-memory or otherwise) if you need historical data while building projections

If you have big read models — think if you really need all of that history — is it temporal — is it still relevant this year / this month … (can help you keep the volume of your data down) — try to partition by time (this is domain-specific)

You can be flexible about a way you deploy and run your projection processes — but keep them lightweight and single-threaded (little use of doing parallel writes)

You want them to be fault-tolerant and able to recover from failure and continue where they left off.

Try to isolate different projections in different processes (so they don’t block each-other)

All in all, an informative talk and I can confirm many of the points he made based on my personal experience in dealing with projections.

Software Engineer

Love podcasts or audiobooks? Learn on the go with our new app.