As the world’s largest travel site, TripAdvisor provides a platform for billions of users to research, book, and review their trips across the world. Our Experiences business provides travelers a collection of over 160,000 bookable experiences, from canoe rentals in the Everglades to entrance ticket to the Eiffel tower. The fast-growing user and product spaces pose a significant challenge for us: how to match users with experiences that are relevant to them. In this blog post, we will explain how our newly-developed ‘Recommended For You’ (RFY) model generates personalized recommendations on our website using users’ browsing history and deep learning. The model has already been tested in production and demonstrated lifts in user engagement and bookings.
As the number and kinds of available experiences on TripAdvisor grew rapidly in the last couple of years, it became apparent that we need to serve personalized recommendations for our users. A personalized website can increase user satisfaction significantly by providing travelers with an easy way to find experiences that are relevant to them. One typical personalized application is the RFY shelf on our “Things To Do” pages as shown below.
Since we believe in moving fast, we first started by deploying a personalized version of our existing item-based collaborative filtering model, which uses item-to-item cosine similarities based on page views. To make it personalized, we recommended items with the highest aggregated similarities to all items browsed by a user. We found that users liked to interact with the RFY shelf, which suggested that it is worth investing more time and engineering efforts in building a more powerful machine learning model. In particular, we thought we could do better with supervised learning and high-quality item and user representations.
We translated the business goal into a specific data science problem: given a user’s browsing history, predict the user’s next interest in an experience. We will describe here three major components of the model: data collection, entity embeddings, and the neural network architecture.
- Training Data Collection
To train a model in a supervised learning setting, we collected logs of page views and page action of users who clicked the ‘check availability’ button or booked an experience. ‘Check availability’ is one of the steps in the booking process on TripAdvisor. Users have to click the ‘check availability’ button of an Experience before they add it to their cart and checkout. We use it as a user interest signal.
Below is an illustration of a user’s journey on our site. Notice that in addition to bookable experiences, there are also a large number of ‘Point of Interest’ pages, where travelers can find useful information and reviews. For instance, ‘Camp Nou’ is a ‘Point of Interest’ page view in the example below. Our new RFY model takes those page views into consideration as well.
Note: If a user interacted with an item multiple times, we only keep the last visit in the sequence and sort based on the last-visit timestamp.
This sequence of four entities will be broken down into two samples in our training data:
(Exp A) -> Exp B (Check Availability)
(Exp A, Exp B, POI C) -> Exp D (Booking)
We include both ‘check availability’ and ‘booking’ into our training data. And we assign samples of bookings a higher weight. The weight is a hyper-parameter that we can tune.
- Entity Embeddings
Word embedding is a popular technique to learn vector representations of words in Natural Language Processing. Since it was first introduced it was adapted to other domains such as recommendation as an approach for learning representations of items and users. In our model, we represent each entity as a 100-dimensional vector within the same embedding space. Our pool of entities contains Point of Interests (e.g. Eiffel Tower) and bookable Experiences (e.g. Eiffel Tower Priority Access Ticket with Host). We first pre-train general-purpose embeddings using StarSpace package by Facebook AI Research on our page view logs. The pre-trained embeddings encode entities’ location and category information very well, as you can see from the table of embeddings’ cosine similarities below. Features built upon those embeddings are consumed by other downstream tasks, such as sort orders and landing page recommendations. For the RFY model, we initialize the embedding vectors with the pre-trained weights and then fine-tune them for the task. In our experiments, we found that this initialization scheme outperforms random initialization.
- RFY Model Architecture
The RFY model architecture follows the figure below. We aggregate user’s browsing history by taking a recency weighted average of the 100-dimensional item embeddings, followed by two fully connected layers with the final softmax output on 64,000 class probabilities, each of which corresponds to an experience that can be recommended.
We use the exponential recency weighted average formula to aggregate the user browsing history.
This aligns with our assumption that the most recent browsing data contributes most to the prediction of the next action. Our offline evaluation shows that the recency-weighted aggregation approach outperforms the naive average significantly. We also tried to plug-in an LSTM layer to combine the embeddings, but we did not see any improvement in our offline metrics compared to the recency weighted average approach. As a result, we went for the simpler architecture, which gives us a faster prediction speed in real-time as well. We also noticed that we can achieve even better accuracy when we increase the number of neurons in the layer before softmax. However, this slows down the prediction significantly. The 512-neuron layer is our final compromise between speed and accuracy.
- Offline Evaluation
We evaluated the new RFY model offline on a dataset containing two-week of data following the time period used for training. Ranking metrics are calculated on the top recommendations we generated. For MRR (mean reciprocal rank), we retrieved the top 10 results from the recommender. The new RFY model beats our baseline item-to-item cosine similarity model by a large margin. We report here percentage improvements against the item-to-item baseline.
- Online A/B test
We conducted an online A/B test to evaluate our system. Users in the treatment group saw the RFY shelf populated by the new RFY model, while users in the control group saw the RFY shelf populated by the cosine similarity model. We found a statistically significant improvement in our user engagement and conversion metrics.
- Example Output
Below is an example of personalized recommendations for a user who viewed several ‘food, wine & cooking class’ experiences in Paris. As we can see, the RFY model does a great job capturing the user’s preference for that specific category. This information can then be used in new locations that the user might explore.
- Our embedding pre-training phase is done monthly using the StarSpace Package. In particular, we use the ‘PageSpace’ model type and train it with our page view logs.
- Our RFY model implementation is in Pytorch. We follow best practices for transfer learning, where we first freeze the pre-trained embedding weights for several epochs and then unfreeze them with a smaller learning rate for fine-tuning. The model training happens weekly on our own data science platform.
- Our model prediction is done in real-time. Our microservice, built by our data science engineers, receives and preprocesses the input data, runs a forward path through the Pytorch model, partitions and sorts based on the scores, and returns the result to front-end. Our service is fully optimized for speed to reduce real-time latency. We will do a follow-up blog post with the engineering implementation details.
In this blog post, we introduced a machine learning model built by our data science team to generate personalized recommendations of Experiences on TripAdvisor. Compared to the more traditional item-based cosine similarity model, our newly developed model, which uses item embeddings and deep learning, shows significant improvements on both offline ranking metrics and online business metrics. In order to improve our personalization strategies and increase user satisfaction further, we consider to focus on the following areas for future iterations:
- Introduce more features, such as price range (budget friendly vs. luxury experiences), category preference (kids friendly vs. outdoor adventures), etc. This can help us construct better representations of our users.
- Incorporate more user actions from the entire TripAdvisor Platform. Right now we are only using actions related to the Point of Interests and Experiences. We plan to leverage users’ actions for hotels, rentals, flights, restaurants, etc.
- We want to make our personalization pipeline more sensitive to users’ contextual features, such as device, users’ language, and whether travelers are in-destination or not.
Many thanks to our Rentals & Experiences Data Science, Data Science Platform Engineering, Revenue Management and Front End Engineering teams for contributions to this project! If you are interested in this type of work, welcome to comment under this post or reach out to us! We are hiring talented people for many teams as well!
Peimeng Sui is a data scientist in Rentals & Experiences Data Science Team at TripAdvisor. He started his career at TripAdvisor as a summer data science intern in 2017 and joined as a full-time data scientist in 2018. During his first year at TripAdvisor, he has been working on various projects related to recommendation and personalization. Peimeng graduated from New York University Center for Data Science with a master degree in Data Science.