Philip Anderson, Garry Elvin, Wai Lok Woo, Drummond Heckels (Northumbria University), Mark Warner (UCL)

The development of a child sexual abuse conversation (CSAC) dataset, this 9-month project will lead to advances in our understanding of how perpetrators of child sex grooming engage online with young people through computer-mediated communication tools and platforms. Our work will consist of identifying, acquiring, sanitising, and anonymising data to build the dataset, and conducting an initial analysis across the dataset. This will lay the foundations for research into online grooming behaviours, and the development of automated grooming detection tools through future labelling of the dataset for the purpose of language modelling.  The dataset will, for the first time, provide researchers with access to real-world grooming conversations, laying the foundations for work into reactive and proactive mechanisms for limiting this behaviour across platforms. Moreover, it will provide the foundations for future research to support education and training for young people, training and tools for digital forensic examiners, and support police officers engaged in undercover operations involving child sexual offenders.