Projections Explained (DDD EU 2020 talk summary)

Anes Hasicic
3 min readFeb 5, 2020

Projections Explained

Projections since 2009…

CQRS — split reading and writing (this brought opportunities)

Mostly a logical separation (not a physical one) — that’s not what the pattern is about (but you can do it)

Event Sourcing + DDD — apply it on a bounded context level

You need to know the requirements from the consumers BEFORE building projections

Event Store is a collection of streams (which are in turn collections/sequences of heterogenous events partitioned by name or key)

Aggregate instance has its own stream

Each event appears at a certain position/offset (sequence) in an event stream (a monotonically incremented number)

A subscription is needed in order to be notified when new events arrive

  • Starts off at a known position in the ALL stream, or from a VERSION in the aggregate stream (this allows us to resume when we crash — that’s why projection is in control of the cursor)

What is a projection?

It is the act of transforming a stream of events

act of deriving state from a stream of events

…many meanings

More specifically a projection is a transformation function that transforms the stream of events to a certain data structure (derived state)

which could be stored in-memory or to a specific storage engine (database) which are optimized for your reading use case (querying) — don’t ignore transaction support of it

Don’t always use the same stack for everything.

If your read models change because of different reasons — build two projections (it depends)

They often get neglected — they do need some analysis and design!

What does this mean?

  • What is the optimal data structure for this read model
  • What events do we need
  • Where do we store it

Read models (data structures) capture data such as:

  • used to identify rows / values / documents
  • to return to consumers
  • to filter on
  • needed to allow updates from future events

Writing Projections

  • written with cohesion / as units
  • focus on what, not how
  • limit amount of boilerplate
  • optimized for your platform/language
  • don’t be conservative about the number of them (this one is mine)

Forms of projections:

f(state, event) -> state

f(event) -> statement[]

f(connection, event) -> unit — eg. for calling different API's

IHandle<TEvent> — using generic interfaces

handling using pattern matching

can be handled in an actor

Smells

  • complexity
  • dependencies (e.g. on another projections data structure) — try to avoid unless you can ensure the order of projection building, but it’s still not encouraged, nor fun
  • repeated logic in different projections or same logic in the domain model and read model (don’t do this)
  • big data structure — depends

Testing

Should test if projection behaves as expected (specifications)

Preferably as integration tests

Nothing special (given these events -> I’m expecting for this state to be in the store)

MISC

Use lookup caches (in-memory or otherwise) if you need historical data while building projections

If you have big read models — think if you really need all of that history — is it temporal — is it still relevant this year / this month … (can help you keep the volume of your data down) — try to partition by time (this is domain-specific)

You can be flexible about a way you deploy and run your projection processes — but keep them lightweight and single-threaded (little use of doing parallel writes)

You want them to be fault-tolerant and able to recover from failure and continue where they left off.

Try to isolate different projections in different processes (so they don’t block each-other)

All in all, an informative talk and I can confirm many of the points he made based on my personal experience in dealing with projections.

--

--