Read-Optimized, Eventually-Consistent Data Stores: In Practice

Jean-Philippe Daigle posted October 15, 2019

This is Part 2 of a series on how TripAdvisor uses event-driven architecture.

In Part 1, we introduced basic concepts and trade-offs; now we’ll dig into two different real-world use cases: followee feed-building and member profile search. Both of these use cases rely on maintaining an eventually-consistent, non-authoritative view of data managed by other services for the purposes of fast querying.

Followee / Follower Model

The Travel Feed on TripAdvisor is a collection of recent travel-related content, personalized for the member viewing it. It aggregates data from several sources, one of which is a chronological feed of content from members the user previously chose to follow.

Feed Building: Backing Data

The TripAdvisor site supports different types of content posts — the feed can consist of Link Posts, Videos, Trips, Reviews, Forum Posts, etc., each of which have a different data model, publishing rules, etc. All these different content types end up managed by different, independent, services. Followee relationships themselves (user ‘Alice’ follows user ‘Bob’) are stored in yet another service.

In a world where data is strictly never replicated outside of the individual service that manages it, building a chronological feed of the latest 500 posts by accounts I follow would be, basically, a very expensive map-reduce operation:

  • You would need to first fetch the set of followed members (the Followee set of IDs)
  • Then, divide-and-conquer the search by contacting each of the services managing a piece of content (Link Posts, Shared Trips, Reviews, etc.), sending it the Followee set, and asking for the last 500 posts they know about published by a member in the Followee set (the map operation)
  • Finally, running the reduce operation by applying a total ordering across each of the sub-result lists and taking the top 500 by recency.

In the worst case, all 500 of the latest items by members I follow would end up being reviews, and thus all the data returned by all the other contacted services would end up thrown away! Clearly, we can’t do this.

A Read-Optimized Pool of Stub Data

Instead, we accept partial duplication of data for query purposes, and a service known as SocialQuery maintains a database of lightweight stubs that represent a window looking back on a (quite long) span of time and contains a copy of light placeholders identifying the type, timestamp, author, and id of all types of posts known in the system, across all services, as well as a copy of all the followee relationships.

By storing a stub reference to any post known to any other content-management service, and already indexing these by timestamp, SocialQuery is then able to easily answer a feed-building query in a single round-trip to its underlying data store: compute the list of the 500 latest posts where the author is in the set of accounts I follow, by simply joining posts with followees. This lets us quickly build a feed of ordered post IDs, which can then be passed up and hydrated for display by the requester, who can then apply further upstream filtering and fetch actual referenced content from the authoritative services for each content type.

The whole read-optimized store is easily kept up to date asynchronously by using event broadcasting (in this case, topics on a Kafka message bus propagating events representing each kind of CRUD action that can happen to any piece of content). These events are published, in a standardized format, by all the various content-management services.

User Profile Search

One of the largest benefits of decoupling the production of an event from its consumption is that the model easily allows us to grow new uses for the existing event stream over time, without going back and adding complexity on the producer side. (In other words, allowing us to scale development teams without increasing communication overhead – as discussed in Part 1).

We saw above how an event feed representing all content CRUD actions was useful in asynchronously building a store of post stubs for building feeds. At the same time, a different team was tasked with building the profile search feature:

The requirements for this involved a relevance function that took into account:

  • the member’s number of contributions to the site, 
  • their number of followers, and 
  • whether the user running the search already followed them. (Under the assumption that an account you’ve already followed is typically more relevant to you.)

This is easy to build with Elasticsearch (which already powered all other search functionality on the site), which then becomes just another denormalized, eventually-consistent query-optimized replica of data that’s authoritatively managed by specifically-targeted content management services. This required modeling profiles in Elasticsearch as documents describing that member profile’s contribution count, number of followers, and a keyword vector listing the set of all followed account IDs followed by that profile.

Driving indexing off of the exact same event stream as the SocialQuery service discussed in the previous section, we built an event-driven document indexer that listened for any new content being published, deleted, or profile follow action:

  • Upon receiving any of these events, it refreshes its view of a profile by rebuilding that profile’s ES document by querying authoritative services to get contribution counts, followees, and follower counts, then replaces the existing document in Elasticsearch.
  • When a member follows a profile, then both the source and target of that follow action are refreshed, to keep follower counts, and followee ID keyword vectors consistent.

The setup is tunable for latency vs throughput (higher throughput by processing larger batches), and we ended up with profile search indices being up to date within a few seconds of a member getting a new follower or publishing a piece of content.

Next time, in Part 3, we’ll dig into how TripAdvisor uses Kafka internally for service collaboration.

Author’s Biography

Jean-Philippe joined TripAdvisor in 2013. Currently, he is the Technical Manager for Search and Discovery, the team responsible for building search engine and event messaging platforms. Prior to TripAdvisor he was an architect at Solace, designing event-driven distributed systems for the banking and high-frequency trading sectors. In his free time, he enjoys traveling with his wife and two daughters.