Examples of faces in TSFT dataset

TSFT dataset contains 589 * manually annotated face tubes of 94 subjects in popular TV series. The dataset has diverse set of subjects (age, gender, race, sex) in challening filming conditions (home, office, bar, inside car, day, night etc.). The average tube length is 55 frames and the average face size is 121 pixels square.

Statistics of TSFT dataset

It is a highly challenging dataset as more than 32000 faces appear in many different filming conditions and unrestricted emotions. To demonstrate the challenge qualitatively the following figure shows the false positive matches found by our method (ref. below):

False positive matches by our method in TSFT dataset

Send a request to me if you are interested in obtaining our TV Series Face Tubes (TSFT) dataset.

Please cite the following paper if you use TSFT dataset.

Latent Max-margin Metric Learning for Comparing Video Face Tubes
(Best paper award)
G. Sharma, P. Perez
Workshop on Biometrics
Computer Vision and Pattern Recognition (CVPR)

Boston, MA, USA, June 2015

    title = {Latent Max-margin Metric Learning for Comparing Video Face Tubes}, 
    author = {Gaurav Sharma and Patrick Perez}, 
    booktitle={Computer Vision and Pattern Recognition (CVPR) Workshops}
  • The paper has a typo in Table 1, it reports 569 instead of 589 tubes.