QUERY – REPHRAIN

Efficiently Querying User Sequences While Preventing Their Reconstruction

Grigorios Loukides, Huiping Chen – King’s College London

Sequential data model useful information about individuals, such as their visited locations, purchased products, or genomic information. Thus, sequential data are often collected from individuals and disseminated widely, to support applications ranging from location-based services to marketing and bioinformatics.

The project aims to develop a novel methodology for protecting the privacy of users’ sequential data. The main idea is to construct a data structure which stores one or more users’ sequences and allows efficient answering of queries on these sequences, while guaranteeing that a malicious user (attacker) cannot infer the users’ sequences through querying. Such data structures could be used by individuals while interacting with internet services, or by organisations who collect data of many individuals and release them for querying or statistical analysis purposes. The project has the following goals:

To design a novel type of data structure for sequential data, which will be able to support different types of queries, e.g., statistical counts and pattern matching queries;
To develop algorithms for constructing the data structure efficiently;
To enhance the data structure, so that it takes into account privacy preferences of individuals;
To implement and evaluate the data structure using real-world and synthetic datasets.

The project will also develop software which will be used to evaluate the proposed algorithms in public real-world datasets (DNA and location sequences, activity logs, customer purchases).