This page provides the link to the datasets. Our benchmark has two settings: 1) user-based separation and 2) time-based separation setting. The details of each set are reported in the paper. It should be noted that the Avocado (Personalized Email Subject Generation) dataset is not publicly available; however, we provided the code here and sample ids we used to generate the dataset. Follow the instructions to generate the dataset easily when you got acess to Avocado dataset.

Dataset Train Validation Test
Inputs Outputs Inputs Outputs Inputs
LaMP 1: Personalized Citation Identification User-based / Time-based User-based / Time-based User-based / Time-based User-based / Time-based User-based / Time-based
LaMP 2: Personalized News Categorization User-based / Time-based User-based / Time-based User-based / Time-based User-based / Time-based User-based / Time-based
LaMP 3: Personalized Product Rating User-based / Time-based User-based / Time-based User-based / Time-based User-based / Time-based User-based / Time-based
LaMP 4: Personalized News Headline Generation User-based / Time-based User-based / Time-based User-based / Time-based User-based / Time-based User-based / Time-based
LaMP 5: Personalized Scholarly Title Generation User-based / Time-based User-based / Time-based User-based / Time-based User-based / Time-based User-based / Time-based
LaMP 6: Personalized Email Subject Generation User-based / Time-based User-based / Time-based User-based / Time-based User-based / Time-based User-based / Time-based
LaMP 7: Personalized Tweet Paraphrasing User-based / Time-based User-based / Time-based User-based / Time-based User-based / Time-based User-based / Time-based