This page provides the link to the datasets. Our benchmark has two settings: 1) user-based separation and 2) time-based separation setting. The details of each set are reported in the paper. It should be noted that the Avocado (Personalized Email Subject Generation) dataset is not publicly available; however, we provided the code here and sample ids we used to generate the dataset. Follow the instructions to generate the dataset easily when you got acess to Avocado dataset.

Notice: We deprecated the LaMP 2: Personalized News Categorization task and replaced it with a new task, LaMP 2: Personalized Movie Tagging.

Dataset Train Validation Test
Inputs Outputs Inputs Outputs Inputs
LaMP 1: Personalized Citation Identification User-based / Time-based User-based / Time-based User-based / Time-based User-based / Time-based User-based / Time-based
Deprecated: LaMP 2: Personalized News Categorization User-based / Time-based User-based / Time-based User-based / Time-based User-based / Time-based User-based / Time-based
LaMP 2: Personalized Movie Tagging User-based / Time-based User-based / Time-based User-based / Time-based User-based / Time-based User-based / Time-based
LaMP 3: Personalized Product Rating User-based / Time-based User-based / Time-based User-based / Time-based User-based / Time-based User-based / Time-based
LaMP 4: Personalized News Headline Generation User-based / Time-based User-based / Time-based User-based / Time-based User-based / Time-based User-based / Time-based
LaMP 5: Personalized Scholarly Title Generation User-based / Time-based User-based / Time-based User-based / Time-based User-based / Time-based User-based / Time-based
LaMP 6: Personalized Email Subject Generation User-based / Time-based User-based / Time-based User-based / Time-based User-based / Time-based User-based / Time-based
LaMP 7: Personalized Tweet Paraphrasing User-based / Time-based User-based / Time-based User-based / Time-based User-based / Time-based User-based / Time-based