REPHRAIN Publications

A list of the REPHRAIN Centre publications can be found below – please check back for regular updates.

January 2024

Supporting Small and Medium-Sized Enterprises in Using Privacy Enhancing Technologies

Prepared by Maria Bada, Steven Furnell, Jason R. C. Nurse and Jason Dymydiuk.

Abstract: Small and Medium-sized Enterprises (SMEs) are a critical element of the economy in many countries, as well as being embedded within key supply chains alongside larger organisations. Typical SMEs are data- and technology dependent, but many are nonetheless ill-equipped to protect these areas. This study aims to investigate the extent to which SMEs currently understand and use Privacy Enhancing Technologies (PETs), and how they could be supported to do so more effectively given their potential constraints in terms of understanding, skills and capacity to act. This was studied via a mixed method approach collecting qualitative and quantitative data. Survey responses from 239 participants were collected and 14 interviews conducted. Participants were SME owners as well as experts working with SMEs. The findings clearly demonstrate that SMEs generally tend not to think about privacy, and if they do so it is mainly because of risk, potentially after a cyber attack, with the main drivers for implementing privacy being the potential of being fined by regulators, reputational damage, the demands of customers, and legal or regulatory compliance. The main reasons for the lack of attention are lack of skills and necessity. On this basis, the findings were taken forward to inform the initial design of an SME Privacy Starter Pack, which aims to assist SMEs in understanding that privacy and PETs are relevant to them and their industry in a simple and facilitated manner.

Paper available for download here.

November 2023

Online Sextortion: Characteristics of offences from a decade of community reporting

Prepared by Matthew Edwards and Nick M. Hollely.

Abstract: Online sextortion is an organised form of blackmail which can have a serious financial and traumatic impact on its victims. Responding to a dearth of evidence about this crime, this study analyses patterns within a large dataset of over 23,000 anonymous victim reports, collected via an online support community. Using common responses within these reports, this study identifies the most typical patterns of offending, including the profile assumed by offenders, the platforms through which the offence is initiated and enabled, payment methods and amounts demanded, and the national origins of most offences. Analysis shows that the mix of social media and dating platforms being used to approach and communicate with victims is changing over time, but the tactics employed by offenders are remarkably standardised. Payment demands involved in the crime were previously centralised in a few key service providers, but are increasingly diversifying. The variety of platforms involved in online sextortion points towards enforcement and safeguarding challenges, motivating an analysis of common risk factors that can inform the design of broadly-applicable countermeasures.

Paper available for download here.

October 2023

Towards Human-Centric Endpoint Security

Prepared by Jenny Blessing, Partha Das Chowdhury, Maria Sameen, Ross Anderson, Joe Gardiner and Awais Rashid.

Abstract: In a survey of six widely used end-to-end encrypted messaging applications, we consider the post-compromise recovery process from the perspective of what security audit functions, if any, are in place to detect and recover from attacks. Our investigation reveals audit functions vary in the extent to which they rely on the end user. We argue developers should minimize dependence on users and view them as a residual, not primary, risk mitigation strategy. To provide robust communications security, E2EE applications need to avoid protocol designs that dump too much responsibility on naive users and instead make system components play an appropriate role.

Paper available for download here.

Privacy-enhanced AI Assistants based on Dialogues and Case Similarity

Prepared by Xiao Zhan, Stefan Sarkadi and José Such.

Abstract: Personal assistants (PAs) such as Amazon Alexa, Google Assistant and Apple Siri are now widespread. However, without adequate safeguards and controls their use may lead to privacy risks and violations. In this paper, we propose a model for privacy-enhancing PAs. The model is an interpretable AI architecture that combines 1) a dialogue mechanism for understanding the user and getting online feedback from them, with 2) a decision-making mechanism based on case-based reasoning considering both user and scenario similarity. We evaluate our model using real data about users’ privacy preferences, and compare its accuracy and demand for user involvement with both online machine learning and other, more interpretable, AI approaches. Our results show that our proposed architecture is more accurate and requires less intervention from the users than existing approaches

Paper available for download here.

Ilicit COVID-19 products online: A mixed-method approach for identifying and preventing online health risks

Prepared by Valeria Catalani, Honor D. Townshend, Mariya Prilutskaya, Robert P. Chilcott, Antonio Metastasio, Hani Banayoti, Tim McSweeney and Ornella Corazza.

Paper available for download here.

September 2023

‘Ought’ should not assume ‘Can’: Basic Capabilities in Cybersecurity to Ground Sen’s Capability Approach

Prepared by Partha Das Chowdhury and Karen Renaud.

Abstract: We inhabit a ‘digital first’ society, which is only viable if everyone, regardless of ability and capacity, is able to benefit from online offerings in a safe and secure way. However, disabled individuals, people living under oppressive regimes, elderly citizens and individuals fleeing conflict can be excluded, because they might not have the opportunity to implement cybersecurity hygiene measures. To reduce this potential exclusion, it is crucial to make all users’ situated realities focal variables in policy debates and provisioning efforts. This requires a validated set of basic minimum capabilities which reflect individuals’ diverse personal and social realities. In this paper, we report on a scoping literature review intended to reveal the state of play with respect to capabilities-related research in the cyber domain. We motivate our initial focus on the over 65s for this investigation. We used advice from online government cybersecurity advisories to arrive at a set of five recommended cybersecurity hygiene tasks. These fed into a survey with sixty senior citizens to elicit the barriers they could envisage someone of their age encountering, in acting upon cybersecurity hygiene advice. The final deliverable is a candidate list of basic capabilities (cybersecurity) for seniors. This allows us to start measuring security and privacy poverty, an essential step in recognising and mitigating exclusion, as well as informing threat modelling efforts.

Paper available for download here.

Decentralised Repeated Modular Squaring Service Revisited: Attack and Mitigation

Prepared by Aydin Abadi and Steven Murdoch.

Abstract: Repeated modular squaring plays a crucial role in various time-based cryptographic primitives, such as Time-Lock Puzzles and Verifiable Delay Functions. At ACM CCS 2021, Thyagarajan et al. introduced “OpenSquare”, a decentralised protocol that lets a client delegate the computation of repeated modular squaring to third-party servers while ensuring that these servers are compensated only if they deliver valid results. In this work, we unveil a significant vulnerability in OpenSquare, which enables servers to receive payments without fulfilling the delegated task. To tackle this issue, we present a series of mitigation measures.

Paper available for download here.

Delegated Time-Lock Puzzle

Prepared by Aydin Abadi, Dan Ristea and Steven Murdoch.

Abstract: Time-Lock Puzzles (TLPs) are cryptographic protocols that enable a client to lock a message in such a way that a server can only unlock it after a specific time period. However, existing TLPs have certain limitations: (i) they assume that both the client and server always possess sufficient computational resources and (ii) they solely focus on the lower time bound for finding a solution, disregarding the upper bound that guarantees a regular server can find a solution within a certain time frame. Additionally, existing TLPs designed to handle multiple puzzles either (a) entail high verification costs or (b) lack generality, requiring identical time intervals between consecutive solutions. To address these limitations, this paper introduces, for the first time, the concept of a “Delegated Time-Lock Puzzle” and presents a protocol called “Efficient Delegated Time Lock Puzzle” (ED-TLP) that realises this concept. ED-TLP allows the client and server to delegate their resource-demanding tasks to third-party helpers. It facilitates real-time verification of solution correctness and efficiently handles multiple puzzles with varying time intervals. ED-TLP ensures the delivery of solutions within predefined time limits by incorporating both an upper bound and a fair payment algorithm. We have implemented ED-TLP and conducted a comprehensive analysis of its overheads, demonstrating the efficiency of the construction.

Paper available for download here.

Timed Secret Sharing

Prepared by Alireza Kavousi, Aydin Abadi and Philipp Jovanovic.

Abstract: Secret sharing has been a promising tool in cryptographic schemes for decades. It allows a dealer to split a secret into some pieces of shares that carry no sensitive information on their own when being treated individually but lead to the original secret when having a sufficient number of them together. Existing schemes lack considering a guaranteed delay prior to secret reconstruction and implicitly assume once the dealer shares the secret, a sufficient number of shareholders will get together and recover the secret at their wish. This, however, may lead to security breaches when a timely reconstruction of the secret matters as the early knowledge of a single revealed share is catastrophic assuming a threshold adversary.

This paper presents the notion of timed secret sharing (TSS), providing lower and upper time bounds for secret reconstruction with the use of time-based cryptography. The recent advances in the literature including short-lived proofs [Asiacrypt 2022], enable us to realize an upper time bound shown to be useful in breaking public goods game, an inherent issue in secret sharing-based systems. Moreover, we establish an interesting trade-off between time and fault tolerance in a secret sharing scheme by having dealer gradually release additional shares over time, offering another approach with the same goal. We propose several constructions that offer a range of security properties while maintaining practical efficiency. Our constructions leverage a variety of techniques and state-of-the-art primitives.

Paper available for download here.

Online reading habits can reveal personality traits: towards detecting psychological microtargeting

Prepared by Almog Simchon, Adam Sutton, Matthew Edwards and Stephan Lewandowsky.

Abstract: Building on big data from Reddit, we generated two computational text models: (i) Predicting the personality of users from the text they have written and (ii) predicting the personality of users based on the text they have consumed. The second model is novel and without precedent in the literature. We recruited active Reddit users (N = 1, 105) of fiction-writing communities. The participants completed a Big Five personality questionnaire and consented for their Reddit activity to be scraped and used to create a machine learning model. We trained an natural language processing model [Bidirectional Encoder Representations from Transformers (BERT)], predicting personality from produced text (average performance: r = 0.33). We then applied this model to a new set of Reddit users (N = 10, 050), predicted their personality based on their produced text, and trained a second BERT model to predict their predicted-personality scores based on consumed text (average performance: r = 0.13). By doing so, we provide the first glimpse into the linguistic markers of personality-congruent consumed content.

Paper available for download here.

You Are What You Read: Inferring Personality From Consumed Textual Content

Prepared by Adam Sutton, Almog Simchon, Matthew Edwards and Stephan Lewandowsky.

Abstract: In this work we use consumed text to infer Big5 personality inventories using data we have collected from the social media platform Reddit. We test our models on two datasets, sampled from participants who consumed either fiction content (N = 913) or news content(N = 213). We show that state-of-the-art models from a similar task using authored text do not translate well to this task, with average correlations of r = .06 between the model’s predictions and ground-truth personality inventory dimensions. We propose an alternate method of generating average personality labels for each piece of text consumed, under which our model achieves correlations as high as r = .34 when predicting personality from the text being read

Paper available for download here.

August 2023

Threat Models over Space and Time: A Case Study of E2EE Messaging Applications

Prepared by Partha Das Chowdhury, Maria Sameen, Jenny Blessing, Nicholas Boucher, Joseph Gardiner, Tom Burrows, Ross Anderson and Awais Rashid.

Abstract: Threat modelling is foundational to secure systems engineering and should be done in consideration of the context within which systems operate. On the other hand, the continuous evolution of both the technical sophistication of threats and the system attack surface is an inescapable reality. In this work, we explore the extent to which real-world systems engineering reflects the changing threat context. To this end we examine the desktop clients of six widely used end-to-end-encrypted mobile messaging applications to understand the extent to which they adjusted their threat model over space (when enabling clients on new platforms, such as desktop clients) and time (as new threats emerged). We experimented with short-lived adversarial access against these desktop clients and analyzed the results with respect to two popular threat elicitation frameworks, STRIDE and LINDDUN. The results demonstrate that system designers need to both recognise the threats in the evolving context within which systems operate and, more importantly, to mitigate them by rescoping trust boundaries in a manner that those within the administrative boundary cannot violate security and privacy properties. Such a nuanced understanding of trust boundary scopes and their relationship with administrative boundaries allows for better administration of shared components, including securing them with safe defaults.

Paper available for download here.

Personal Identity Insurance: Coverage and Pricing in the U.S.

Prepared by Daniel Woods.

Abstract: Personal identity theft occurs when a criminal uses stolen personal identifiers to manipulate third parties into taking actions under the false belief they are communicating with the individual whose identity has been stolen. A typical example is the criminal taking a loan out under the stolen identity. A market for personal identity insurance has emerged to mitigate the associated harms. We extract 34 personal identity insurance products that were uniquely filed with regulators in the U.S. We conduct a content analysis on the policy wordings and actuarial tables. Analyzing the policy wordings reveals that personal identity theft causes a number of costs in terms of monitoring credit records, lost income and travel expenses, attorney fees, and even mental health counseling. Our analysis shows there are few exclusions related to moral hazard. This suggests identity theft is largely outside the control of individuals. We extract actuarial calculations, which reveal financial impacts ranging from a few hundred to a few thousand dollars. Finally, insurers provide support services that are believed to reduce out of pocket expenses by over 90 percent.

Paper available for download here.

Contextual Integrity for Augmentation-based Privacy Reasoning

Prepared by Gideon Ogunniye and Nadin Kökciyan.

Abstract: Privacy management in online systems is a complex task. Recently, contextual integrity theory has been introduced to model privacy, which considers the social contexts of users before making privacy decisions. However, having a practical application based on this theory is not straightforward. In this paper, we propose an agent-based framework for privacy policy reasoning that combines the power of ontologies together with argumentation techniques to resolve privacy conflicts. First, we propose an ontology that represents the contextual integrity theory. We then introduce an argumentation-based dialogue framework that could: (i) reason about contextual norms to resolve privacy conflicts among agents, and (ii) provide justifications to the agents during multi-party dialogues. We apply our approach to privacy scenarios in various contexts where each scenario has different challenges to address. We conclude with theoretical results to show the effectiveness of the framework

Paper available for download here.

Home Alone? Exploring the geographies of digitally-mediated privacy practices at home during the COVID-19 pandemic

Prepared by Kim Cheetham and Ola Michalec.

Abstract: During the COVID-19 pandemic, digital technologies have enabled work, education, community activity, and access to healthcare to be situated within our homes. These emerging applications call for a renewed focus on the geographies of online privacy. Thus, this research aims to explore the geographies of digitally-mediated privacy practices at home during the COVID-19 lockdown through the method of qualitative in-depth interviews with the lay-users of the Internet. Using Social Practice Theory, the paper explores contextual, collective and spatial dimensions of privacy. In particular, the paper explores how increased use of digital technologies at home during the COVID-19 lockdown has reconfigured practices of self-disclosure, data-sharing and protection of private spaces. First, the paper argues that the use of new work tools, the re-purposing of work tools for social means, and the use of personal devices for work functions, have all affected people’s ability to maintain boundaries between their work and personal lives. Second, the paper uncovers how public health concerns during the pandemic mobilised the collective dimensions of privacy, countering the popular belief that privacy is an individualistic concern. Taken together, these findings point at reorienting digital geographies of privacy towards the people and spaces ‘behind’ the screen.

Paper available for download here.

July 2023

Ethical, political and epistemic implications of machine learning (mis)information classification insights from an interdisciplinary collaboration

Prepared by Andrés Domínguez Hernández, Richard Owen, Dan Saattrup Nielsen and Ryan McConville.

Abstract: Machine learning (ML) classification models are becoming increasingly popular for tackling the sheer volume and speed of online misinformation. In building these models data scientists need to make assumptions about the legitimacy and authoritativeness of the sources of ‘truth’ employed for model training and testing. This has political, ethical and epistemic implications which are rarely addressed in technical papers. Despite (and due to) their reported high performance, ML-driven moderation systems have the potential to shape public debate and create downstream negative impacts. This article presents findings from a responsible innovation (RI) inflected collaboration between science and technology studies scholars and data scientists. Following an interactive co-ethnographic process, we identify a series of algorithmic contingencies—key moments during ML model development which could lead to different future outcomes, uncertainties and harmful effects. We conclude by offering recommendations on how to address the potential failures of ML tools for combating online misinformation.

Paper available for download here.

Co-creating a Transdisciplinary Map of Technology-mediated Harms, Risks and Vulnerabilities: Challenges, Ambivalences and Opportunities

Prepared by Andrés Domínguez Hernández, Kopo M. Ramokapane, Partha Das Chowdhury, Ola Michalec, Emily Johnstone, Emily Godwin, Alicia G. Cork and Awais Rashid.

Abstract:The phrase “online harms” has emerged in recent years out of a growing political willingness to address the ethical and social issues associated with the use of the Internet and digital technology at large. The broad landscape that surrounds online harms gathers a multitude of disciplinary, sectoral and organizational efforts while raising myriad challenges and opportunities for the crossing entrenched boundaries. In this paper we draw lessons from a journey of co-creating a transdisciplinary knowledge infrastructure within a large research initiative animated by the online harms agenda. We begin with a reflection of the implications of mapping, taxonomizing and constructing knowledge infrastructures and a brief review of how online harm and adjacent themes have been theorized and classified in the literature to date. Grounded on our own experience of co-creating a map of online harms, we then argue that the map—and the process of mapping—perform three mutually constitutive functions, acting simultaneously as method, medium and provocation. We draw lessons from how an open-ended approach to mapping, despite not guaranteeing consensus, can foster productive debate and collaboration in ethically and politically fraught areas of research. We end with a call for CSCW research to surface and engage with the multiple temporalities, social lives and political sensibilities of knowledge infrastructures.

Paper available for download here.

June 2023

Social Explainability of AI: The Impact of Non-Technical Explanations on Trust

Prepared by Frens Kroeger, Bianca Slocombe, Isa Inuwa-Dutse, Baker Kagimu, Beate Grawemeyer and Umang Bhatt.

Abstract
: In striving for explainable AI, it is not necessarily technical understanding that will maximise perceived transparency and trust. Most of us board planes with little understanding about how the plane works, and without knowing the pilot, because we put trust in the regulatory and authoritative systems that govern the people and processes. By providing knowledge of the governing ecosystem, industries like aviation and engineering have built stable trust with everyday people. This is known as “social explainability.” We extend this concept to AI systems using a series of “social” explanations designed with users (based on external certification of the system, data security and privacy). Core research questions are: Do social explanations, purely technical explanations, or a combination of the two, predict greatest trust from users? Does this depend on digital literacy of the user? An interaction between explanation type and digital literacy reveals that more technical information predicts higher trust from those with higher digital literacy, but those of lower digital literacy given purely technical explanations have the worst trust overall. For this group, social explainability works best. Overall, combined socio-technical explanations appear more successful in building trust than purely social explanations. As in other areas, social explainability may be a useful tool for building stable trust for non-experts in AI systems.

Paper available for download here.

Motivations for Collecting Digital NFT Fashion

Prepared by Alicia G. Cork, Adam Joinson, Laura G. E. Smith, David A. Ellis and Danaë Stanton Fraser.

Abstract
: Non-fungible tokens (NFTs) allow individuals to demonstrate ownership of digital and physical assets. NFTs are scarce, unique, and authentic; three properties known to be key for determining perceived value. Whilst previous research has primarily focused on NFTs as a source of economic value, here we assess the psychological motivations of collectors of digital fashion NFTs. Specifically, NFTs related to digital fashion are particularly relevant to HCI researchers as they sit at the intersection between business, culture, and self-expression. Here, we survey 19 users of a digital avatar fashion company, Genies, to understand the gratifications users derive from collecting digital NFT fashion. Results demonstrate that the primary motivations for collecting fashion NFTs are self-expression and utility and that motivations associated with value are secondary. We make design recommendations based on these results, indicating that developers should distinguish between expression-based motivations and value-based motivations.

Paper available for download here.

Earn While You Reveal: Private Set Intersection that Rewards Participants

Prepared by Aydin Abadi and Steven J. Murdoch.

Abstract
: In Private Set Intersection protocols (PSIs), a non-empty result always reveals something about the private input sets of the parties. Moreover, in various variants of PSI, not all parties necessarily receive or are interested in the result. Nevertheless, to date, the literature has assumed that those parties who do not receive or are not interested in the result still contribute their private input sets to the PSI for free, although doing so would cost them their privacy. In this work, for the first time, we propose a multi-party PSI, called “Anesidora”, that rewards parties who contribute their private input sets to the protocol. Anesidora is efficient; it mainly relies on symmetric key primitives and its computation and communication complexities are linear with the number of parties and set cardinality. It remains secure even if the majority of parties are corrupted by active colluding adversaries.

Paper available for download here.

Profiling the vendors of COVID-19 related product on the Darknet: An observational study

Prepared by Valeria Catalani, Honor D. Townshend, Mariya Prilutskaya, Andres Roman-Urrestarazu, Robin van Kessel, Robert P. Chilcott, Hani Banayoti, Tim McSweeney and Ornella Corazza.

Abstract
: In a time of unprecedented global change, the COVID-19 pandemic has led to a surge in demand of COVID-19 vaccines and related certifications. Mainly due to supply shortages, counterfeit vaccines, fake documentation, and alleged cures to illegal portfolios, have been offered on darkweb marketplaces (DWMs) with important public health consequences. We aimed to profile key DWMs and vendors by presenting some in-depth case studies. A non-systematic search for COVID-19 products was performed across 118 DWMs. Levels of activity, credibility, content, COVID-19 product listings, privacy protocols were among the features retrieved. Open web fora and other open web sources were also considered for further analysis of both functional and non functional DWMs. Collected data refers to the period between January 2020 and October 2021. A total of 42 relevant listings sold by 24 vendors across eight DWMs were identified. Four of these markets were active and well-established at the time of the study with good levels of credibility. COVID-19 products were listed alongside other marketplace content. Vendors had a trusted profile, communicated in English language and accepted payments in cryptocurrencies (Monero or Bitcoin). Their geographical location included the USA, Asia and Europe. While COVID-19 related goods were mostly available for regional supply, other listings were also shipped worldwide. Findings emerging from this study rise important questions about the health safety of certain DWMs activities and encourage the development of targeted interventions to overcome such new and rapidly expanding public health threats.

Paper available for download here.

What values should online consent forms satisfy? A Scoping Review

Prepared by Karen Renaud and Paul van Schaik.

Abstract
: Online users are presented with consent forms frequently, as they visit new websites. Such forms seek consent to collect, store and process a web user’s data. The forms contain a wide range of statements that attempt to persuade people to grant such consent. In this paper, we review the literature to determine what researchers say about the human values/needs online consent forms should satisfy. We carried out a scoping review of the literature on consent forms, in order to understand the research in this area. We conclude with a value-based model of online consent. Our investigation revealed six distinct human values, and their associated value creators, that online consent forms ought to satisfy in order to support informed consent-related decision making. We conclude with a suggestion for future work to validate the proposed model.

Paper available for download here.

March 2023

The Decline of Third-Party Cookies in the AdTech Sector Of Data Protection Law and Regulation (I)

Prepared by Asma Vranaki and Francesca Farmer.

Paper available for download here.

Will Admins Cope? Decentralized Moderation in the Fediverse

Prepared by Ishaku Hassan Anaobi, Aravindh Raman, Ignacio Castro, Haris Bin Zia, Dami Ibosiola and Gareth Tyson.

Abstract: As an alternative to Twitter and other centralized social networks, the Fediverse is growing in popularity. The recent, and polemical, takeover of Twitter by Elon Musk has exacerbated this trend. The Fediverse includes a growing number of decentralized social networks, such as Pleroma or Mastodon, that share the same subscription protocol (ActivityPub). Each of these decentralized social networks is composed of independent instances that are run by different administrators. Users, however, can interact with other users across the Fediverse regardless of the instance they are signed up to. The growing user base of the Fediverse creates key challenges for the administrators, who may experience a growing burden. In this paper, we explore how large that overhead is, and whether there are solutions to alleviate the burden. We study the overhead of moderation on the administrators. We observe a diversity of administrator strategies, with evidence that administrators on larger instances struggle to find sufficient resources. We then propose a tool, WatchGen, to semi-automate the process.

Paper available for download here.

Toxicity in the Decentralized Web and the Potential for Model Sharing

Prepared by Haris Bin Zia, Aravindh Raman, Ignacio Castro, Ishaku Hassan Anaobi, Emiliano de Cristofaro, Nishanth Sastry and Gareth Tyson.

Abstract: The “Decentralised Web” (DW) is an evolving concept, which encompasses technologies aimed at providing greater transparency and openness on the web. The DW relies on independent servers (aka instances) that mesh together in a peer-to-peer fashion to deliver a range of services (e.g. micro-blogs, image sharing, video streaming). However, toxic content moderation in this decentralised context is challenging. This is because there is no central entity that can define toxicity, nor a large central pool of data that can be used to build universal classifiers. It is therefore unsurprising that there have been several high-profile cases of the DW being misused to coordinate and disseminate harmful material. Using a dataset of 9.9M posts from 117K users on Pleroma (a popular DW microblogging service), we quantify the presence of toxic content. We find that toxic content is prevalent and spreads rapidly between instances. We show that automating per-instance content moderation is challenging due to the lack of sufficient training data available and the effort required in labelling. We therefore propose and evaluate ModPair, a model sharing system that effectively detects toxic content, gaining an average per-instance macro-F1 score 0.89.

Paper available for download here.

“I didn’t click”: What users say when reporting phishing

Prepared by Nikolas Pilavakis, Adam Jenkins, Nadin Kokciyan and Kami Vaniea.

Abstract: When people identify potential malicious phishing emails one option they have is to contact a help desk to report it and receive guidance. While there is a great deal of effort put into helping people identify such emails and to encourage users to report them, there is relatively little understanding of what people say or ask when contacting a help desk about such emails. In this work, we qualitatively analyze a random sample of 270 help desk phishing tickets collected across nine months. We find that when reporting or asking about phishing emails, users often discuss evidence they have observed or gathered, potential impacts they have identified, actions they have or have not taken, and questions they have. Some users also provide clear arguments both about why the email really is phishing and why the organization needs to take action about it.

Paper available for download here.

A Turning Point for Cyber Insurance

Prepared by Daniel Woods.

Abstract: For the first two decades, the cyber insurance market rewarded entrepreneurial insurers who embraced uncertainty (or ignorance) while offering innovative insurance products. The supply increased as insurers expanded into the new product. Applicants and brokers began to seek out those underwriters who had the lowest underwriting standards and price, which preventing informed insurers from applying their expertise. Ransomware shattered this equilibrium, creating space for insurers—both traditional carriers and start-ups—who can accurately price risk and nudge policy holders towards better security. Looking forward, we should expect technologists who can understand and measure cyber risk to thrive.

Paper available for download here.

A toolkit of dilemmas: Beyond debiasing and fairness formulas for responsible AI/ML

Prepared by Andrés Domínguez Hernández and Vassilis Galanos.

Abstract: Approaches to fair and ethical AI have recently fell under the scrutiny of the emerging, chiefly qualitative, field of critical data studies, placing emphasis on the lack of sensitivity to context and complex social phenomena of such interventions. We employ some of these lessons to introduce a tripartite decision-making toolkit, informed by dilemmas encountered in the pursuit of responsible AI/ML. These are:(a) the opportunity dilemma between the availability of data shaping problem statements versus problem statements shaping data collection and processing; (b) the scale dilemma between scalability and contextualizability; and (c) the epistemic dilemma between the pragmatic technical objectivism and the reflexive relativism in acknowledging the social. This paper advocates for a situated reasoning and creative engagement with the dilemmas surrounding responsible algorithmic/data-driven systems, and going beyond the formulaic bias elimination and ethics operationalization narratives found in the fair-AI literature.

Paper available for download here.

February 2023

A survey on understanding and representing privacy requirements in the IoT

Prepared by Gideon Ogunniye and Nadin Kokciyan.

Abstract: People are interacting with online systems all the time. In order to use the services being provided, they give consent for their data to be collected. This approach requires too much human effort and is impractical for systems like Internet-of-Things (IoT) where human-device interactions can be large. Ideally, privacy assistants can help humans make privacy decisions while working in collaboration with them. In our work, we focus on the identification and representation of privacy requirements in IoT to help privacy assistants better understand their environment. In recent years, more focus has been on the technical aspects of privacy. However, the dynamic nature of privacy also requires a representation of social aspects (e.g., social trust). In this survey paper, we review the privacy requirements represented in existing IoT ontologies. We discuss how to extend these ontologies with new requirements to better capture privacy, and we introduce case studies to demonstrate the applicability of the novel requirements.

Paper available for download here.

Privacy-Enhancing Technology and Everyday Augmented Reality: Understanding Bystanders’ Varying Needs for Awareness and Consent

Prepared by Joseph O’Hagan, Pejman Saeghe, Jan Gugenheimer, Daniel Medeiros, Karola Marky, Mohamed Khamis and Mark McGill.

Abstract: Fundamental to Augmented Reality (AR) headsets is their capacity to visually and aurally sense the world around them, necessary to drive the positional tracking that makes rendering 3D spatial content possible. This requisite sensing also opens the door for more advanced AR-driven activities, such as augmented perception, volumetric capture and biometric identification – activities with the potential to expose bystanders to significant privacy risks. Existing Privacy-Enhancing Technologies (PETs) often safeguard against these risks at a low level e.g., instituting camera access controls. However, we argue that such PETs are incompatible with the need for always-on sensing given AR headsets’ intended everyday use. Through an online survey (N=102), we examine bystanders’ awareness of, and concerns regarding, potentially privacy infringing AR activities; the extent to which bystanders’ consent should be sought; and the level of granularity of information necessary to provide awareness of AR activities to bystanders. Our findings suggest that PETs should take into account the AR activity type, and relationship to bystanders, selectively facilitating awareness and consent. In this way, we can ensure bystanders feel their privacy is respected by everyday AR headsets, and avoid unnecessary rejection of these powerful devices by society.

Paper available for download here.

Differentially Private Shapley Values for Data Evaluation

Prepared by Lauren Watson, Rayna Andreeva, Hao-Tsung Yang and Rik Sarkar.

Abstract: The Shapley value has been proposed as a solution to many applications in machine learning, including for equitable valuation of data. Shapley values are computationally expensive and involve the entire dataset. The query for a point’s Shapley value can also compromise the statistical privacy of other data points. We observe that in machine learning problems such as empirical risk minimization, and in many learning algorithms (such as those with uniform stability), a diminishing returns property holds, where marginal benefit per data point decreases rapidly with data sample size. Based on this property, we propose a new stratified approximation method called the Layered Shapley Algorithm. We prove that this method operates on small (O(polylog(n))) random samples of data and small sized (O(log n)) coalitions to achieve the results with guaranteed probabilistic accuracy, and can be modified to incorporate differential privacy. Experimental results show that the algorithm correctly identifies high-value data points that improve validation accuracy, and that the differentially private evaluations preserve approximate ranking of data.

Paper available for download here.

The Shapley Value in Machine Learning

Prepared by Benedek Rozemberczki, Lauren Watson, Peter Bayer, Hao-Tsung Yang, Oliver Kiss, Sebastian Nilsson and Rik Sarkar.

Abstract: Over the last few years, the Shapley value, a solution concept from cooperative game theory, has found numerous applications in machine learning. In this paper, we first discuss fundamental concepts of cooperative game theory and axiomatic properties of the Shapley value. Then, we give an overview of the most important applications of the Shapley value in machine learning: feature selection, explainability, multi-agent reinforcement learning, ensemble pruning, and data valuation. We examine the most crucial limitations of the Shapley value and point out directions for future research.

Paper available for download here.

December 2022

Implications of XR on Privacy, Security and Behaviour: Insights from Experts

Prepared by Melvin Abraham, Pejman Saeghe, Mark McGill and Mohamed Khamis.

Abstract: Extended-Reality (XR) devices are packed with sensors that allow tracking of users (e.g., behaviour, actions, eye-gaze) and their surroundings (e.g., people, places, objects). As a consequence, XR devices pose significant risks to privacy, security, and our ability to understand and influence the behaviour of users – risks that will be amplified by ever-increasing adoption. This necessitates addressing these concerns before XR becomes ubiquitous. We conducted three focus groups with thirteen XR experts from industry and academia interested in XR, security, and privacy, to investigate current and emerging issues relating to security, privacy, and influencing behaviour. We identified issues such as virtual threats leading to physical harm, missing opting-out methods, and amplifying bias through perceptual filters. From the results we establish a collection of prescient challenges relating to security, privacy and behavioural manipulation within XR and present recommendations working towards developing future XR devices that better support security and privacy by default.

Paper available for download here.

October 2022

Stopping Silent Sneaks: Defending against Malicious Mixes with Topological Engineering

Prepared by Xinshu Ma, Florentin Rochet and Tariq Elahi.

Abstract: Mixnets provide strong meta-data privacy and recent academic research and industrial projects have made strides in making them more secure, performant, and scalable. In this paper, we focus our work on stratified Mixnets, a popular design with real-world adoption. We identify and measure significant impacts of practical aspects such as: relay sampling and topology placement, network churn, and risks due to real-world usage patterns. We show that, due to the lack of incorporating these aspects in design decisions, Mixnets of this type are far more susceptible to user deanonymization than expected. In order to reason about and resolve these issues, we model Mixnets as a three-stage “Sample-Placement-Forward” pipeline and develop tools to analyze and evaluate design decisions. To address the identified gaps and weaknesses we propose BowTie, a design that mitigates user deanonymization through a novel adaption of Tor’s guard design with an engineered guard layer and client guard-logic for stratified mixnets. We show that Bow-Tie has significantly higher user anonymity in the dynamic setting, where the Mixnet is used over a period of time, and is no worse in the static setting, where the user only sends a single message. We show the necessity of both the guard layer and client guard-logic in tandem as well as their individual effect when incorporated into other reference designs. We develop and implement two tools, 1) a mixnet topology generator (Mixnet-Topology-Generator (MTG)) and 2) a path simulator and security evaluator (routesim) that takes into account temporal dynamics and user behavior, to assist our analysis and empirical data collection. These tools are designed to help Mixnet designers assess the security and performance impact of their design decisions

Paper available for download here.

Recurring Contingent Service Payment

Prepared by Aydin Abadi, Steven Murdoch and Thomas Zacharias.

Abstract: Fair exchange protocols let two mutually distrustful parties exchange digital data in a way that neither party can cheat. They have various applications such as the exchange of digital items, or the exchange of digital coins and digital services between a buyer and seller. At CCS 2017, two blockchain-based protocols were proposed to support the fair exchange of digital coins and a certain service; namely, “proofs of retrievability” (PoR). In this work, we identify two notable issues of these protocols, (1) waste of the seller’s resources, and (2) real-time information leakage. To rectify these issues, we formally define and propose a blockchain-based generic construction called “recurring contingent service payment” (RC-S-P). RC-S-P lets a fair exchange of digital coins and verifiable service occur periodically while ensuring that the buyer cannot waste the seller’s resources, and the parties’ privacy is preserved. It supports arbitrary verifiable services, such as PoR, or verifiable computation and imposes low on-chain overheads. Also, we present a concrete efficient instantiation of RC-S-P when the verifiable service is PoR. The instantiation is called “recurring contingent PoR payment” (RC-PoR-P). We have implemented RC-PoR-P and analysed its cost. When it deals with a 4-GB outsourced file, a verifier can check a proof in 90 milliseconds, and a dispute between prover and verifier is resolved in 0.1 milliseconds.

Paper available for download here.

Payment with Dispute Resolution: A Protocol for Reimbursing Frauds’ Victims

Prepared by Aydin Abadi and Steven Murdoch.

Abstract: An “Authorised Push Payment” (APP) fraud refers to the case where fraudsters deceive a victim to make payments to bank accounts controlled by them. The total amount of money stolen via APP frauds is swiftly growing. Although regulators have provided guidelines to improve victims’ protection, the guidelines are vague and the victims are not receiving sufficient protection. To facilitate victims’ reimbursement, in this work, we propose a protocol called “Payment with Dispute Resolution” (PwDR) and formally define it. The protocol lets an honest victim prove its innocence to a third-party dispute resolver while preserving the protocol participants’ privacy. It makes black-box use of a standard online banking system. We evaluate its asymptotic cost and runtime via a prototype implementation. Our evaluation indicates that the protocol is efficient. It imposes only O(1) overheads to the customer and bank. Also, it takes a dispute resolver 0.09 milliseconds to settle a dispute between the two parties.

Paper available for download here.

Glass-Vault: A Generic Transparent Privacy-preserving Exposure Notification Analytics Platform

Prepared by Lorenzo Martinico, Aydin Abadi, Thomas Zacharias and Thomas Win.

Abstract: The highly transmissible COVID-19 disease is a serious threat to people’s health and life. To automate tracing those who have been in close physical contact with newly infected people and/or to analyse tracing-related data, researchers have proposed various ad-hoc programs that require being executed on users’ smartphones. Nevertheless, the existing solutions have two primary limitations: (1) lack of generality: for each type of analytic task, a certain kind of data needs to be sent to an analyst; (2) lack of transparency: parties who provide data to an analyst are not necessarily infected individuals; therefore, infected individuals’ data can be shared with others (e.g., the analyst) without their fine-grained and direct consent. In this work, we present Glass-Vault, a protocol that addresses both limitations simultaneously. It allows an analyst to run authorised programs over the collected data of infectious users, without learning the input data. Glass-Vault relies on a new variant of generic Functional Encryption that we propose in this work. This new variant, called DD-Steel, offers these two additional properties: dynamic and decentralised. We illustrate the security of both Glass-Vault and DD-Steel in the Universal Composability setting. Glass-Vault is the first UC-secure protocol that allows analysing the data of Exposure Notification users in a privacy-preserving manner. As a sample application, we indicate how it can be used to generate “infection heatmaps”.

Paper available for download here.

A Forward Secure Efficient Two-Factor Authentication Protocol

Prepared by Steven Murdoch and Aydin Abadi.

Abstract: Two-factor authentication (2FA) schemes that rely on a combination of knowledge factors (e.g., PIN) and device possession have gained popularity. Some of these schemes remain secure even against strong adversaries that (a) observe the traffic between a client and server, and (b) have physical access to the client’s device, or its PIN, or breach the server. However, these solutions have several shortcomings; namely, they (i) require a client to remember multiple secret values to prove its identity, (ii) involve several modular exponentiations, and (iii) are in the non-standard random oracle model. In this work, we present a 2FA protocol that resists such a strong adversary while addressing the above shortcomings. Our protocol requires a client to remember only a single secret value/PIN, does not involve any modular exponentiations, and is in a standard model. It is the first one that offers these features without using trusted chipsets. This protocol also imposes up to 40% lower communication overhead than the state-of-the-art solutions do.

Paper available for download here.

September 2022

From Utility to Capability: A New Paradigm to Conceptualise and Develop Inclusive PETs

Prepared by Partha Das Chowdhury, Andrés Domínguez Hernández, Kopo Marvin Ramokapane & Awais Rashid.

Abstract: The wider adoption of Privacy Enhancing Technologies (PETs) has relied on usability studies – which focus mainly on an assessment of how a specified group of users interface, in particular contexts, with the technical properties of a system. While human-centred efforts in usability aim to achieve important technical improvements and drive technology adoption, a focus on the usability of PETs alone is not enough. PETs development and adoption requires a broadening of focus to adequately capture the specific needs of individuals, particularly of vulnerable individuals and/or individuals in marginalized populations. We argue for a departure, from the utilitarian evaluation of surface features aimed at maximizing adoption, towards a bottom-up evaluation of what real opportunities humans have to use a particular system. We delineate a new paradigm for the way PETs are conceived and developed. To that end, we propose that Amartya Sen’s capability approach offers a foundation for the comprehensive evaluation of the opportunities individuals have based on their personal and environmental circumstances which can, in turn, inform the evolution of PETs. This includes considerations of vulnerability, age, education, physical and mental ability, language barriers, gender, access to technology, freedom from oppression among many important contextual factors.

Paper available for download here.

July 2022

Taking Situation-Based Privacy Decisions: Privacy Assistants Working with Humans

Prepared by Nadin Kokciyan & Pinar Yolum

Abstract: Privacy on the Web is typically managed by giving consent to individual Websites for various aspects of data usage. This paradigm requires too much human effort and thus is impractical for Internet of Things (IoT) applications where humans interact with many new devices on a daily basis. Ideally, software privacy assistants can help by making privacy decisions in different situations on behalf of the users. To realize this, we propose an agent-based model for a privacy assistant. The model identifies the contexts that a situation implies and computes the trustworthiness of these contexts. Contrary to traditional trust models that capture trust in an entity by observing large number of interactions, our proposed model can assess the trust-worthiness even if the user has not interacted with the particular device before. Moreover, our model can decide which situations are inherently ambiguous and thus can request the human to make the decision. We evaluate various aspects of the model using a real-life data set and report adjustments that are needed to serve different types of users well.

Paper available for download here.

June 2022

Safeguarding Privacy in the Age of Everyday XR

Prepared by Pejman Saeghe, Mark McGill & Mohamed Khamis

Abstract: The commercialisation of extended reality (XR) devices provides new capabilities for its user, such as the ability to continuously capture their surroundings. This introduces novel privacy risks and challenges for XR users and bystanders alike. In this position paper, we use an established taxonomy of privacy to highlight its limitations when dealing with everyday XR. Our aim is to highlight a need for an update in our collective understanding of privacy risks imposed by everyday XR technology.

Paper available for download here.

 MuMiN: A Large Scale Multilingual Multimodal Fact-Checked Misinformation Social Network Dataset

Prepared by Dan S. Nielsen & Ryan McConville

Abstract: Misinformation is becoming increasingly prevalent on social media and in news articles. It has become so widespread that we require algorithmic assistance utilising machine learning to detect such content. Training these machine learning models require datasets of sufficient scale, diversity and quality. However, datasets in the field of automatic misinformation detection are predominantly monolingual, include a limited amount of modalities and are not of sufficient scale and quality. Addressing this, we develop a data collection and linking system (MuMiN-trawl), to build a public misinformation graph dataset (MuMiN), containing rich social media data (tweets, replies, users, images, articles, hashtags) spanning 21 million tweets belonging to 26 thousand Twitter threads, each of which have been semantically linked to 13 thousand fact-checked claims across dozens of topics, events and domains, in 41 different languages, spanning more than a decade. The dataset is made available as a heterogeneous graph via a Python package (mumin). We provide baseline results for two node classification tasks related to the veracity of a claim involving social media, and demonstrate that these are challenging tasks, with the highest macro-average F1- score being 62.55% and 61.45% for the two tasks, respectively. The MuMiN ecosystem is available at https://mumin-dataset.github.io/, including the data, documentation, tutorials and leaderboards.

Paper available for download here.

January 2022

Multi-party Updatable Delegated Private Set Intersection

Prepared by Aydin Abadi, Changyu Dong, Steven Murdoch and Sotirios Terzis

Abstract: With the growth of cloud computing, the need arises for Private Set Intersection protocols (PSI) that can let parties outsource the storage of their private sets and securely delegate PSI computation to a cloud server. The existing delegated PSIs have two major limitations; namely, they cannot support (1) efficient updates on outsourced sets and (2) efficient PSI among multiple clients. This paper presents “Feather”, the first lightweight delegated PSI that addresses both limitations simultaneously. It lets clients independently prepare and upload their private sets to the cloud once, then delegate the computation an unlimited number of times. We implemented Feather and compared its costs with the state of the art delegated PSIs. The evaluation shows that Feather is more efficient computationally, in both update and PSI computation phases.

Paper available for download here.

December 2021

A Consumer Law Perspective on the Commercialization of Data

Prepared by Mateja Durovic and Franciszek Lech

Abstract: Commercialization of consumers’ personal data in the digital economy poses serious, both conceptual and practical, challenges to the traditional approach of European Union (EU) Consumer Law. This article argues that mass-spread, automated, algorithmic decision-making casts doubt on the foundational paradigm of EU consumer law: consent and autonomy. Moreover, it poses threats of discrimination and under- mining of consumer privacy. It is argued that the recent legislative reaction by the EU Commission, in the form of the ‘New Deal for Consumers’, was a step in the right direction, but fell short due to its continued reliance on consent, autonomy and failure to adequately protect consumers from indirect discrimination. It is posited that a focus on creating a contracting landscape where the consumer may be properly informed in material respects is required, which in turn necessitates blending the approaches of competition, consumer protection and data protection laws.

Paper available for download here.

October 2021

Building a Privacy Testbed: Use Cases and Design Considerations

Prepared by Joseph Gardiner, Partha Das Chowdhury, Jacob Halsey, Mohammad Tahaei, Tariq Elahi and Awais Rashid.

Abstract: Mobile application (app) developers are often ill-equipped to understand the privacy implications of their products and services, especially with the common practice of using third-party libraries to provide critical functionality. To add to the complexity, most mobile applications interact with the “cloud”—not only the platform provider’s ecosystem (such as Apple or Google) but also with third-party servers (as a consequence of library use). This presents a hazy view of the privacy impact for a particular app. Therefore, we take a significant step to address this challenge and propose a testbed with the ability to systematically evaluate and understand the privacy behavior of client server applications in a network environment across a large number of hosts. We reflect on our experiences of successfully deploying two mass market applications on the initial versions of our proposed testbed. Standardization across cloud implementations and exposed end points of closed source binaries are key for transparent evaluation of privacy features.

Paper available for download here.

September 2021

A Privacy Testbed for IT Professionals: Use Cases and Design Considerations

Prepared by Joseph Gardiner, Mohammad Tahaei, Jacob Halsey, Tariq Elahi and Awais Rashid

Abstract: We propose a testbed to assist IT professionals in evaluating privacy properties of software systems. The goal of the testbed, currently under construction, is to help IT professionals systematically evaluate and understand the privacy behaviour of applications. We first provide three use cases to support developers and privacy engineers and then describe key design considerations for the testbed.

Paper available for download here.

August 2021

Polynomial Representation Is Tricky: Maliciously Secure Private Set Intersections Revisited

Prepared by Aydin Abadi, Steven Murdoch, Thomas Zacharias

Abstract: Private Set Intersection protocols (PSIs) allow parties to compute the intersection of their private sets, such that nothing about the sets’ elements beyond the intersection is revealed. PSIs have a variety of applications, primarily in efficiently supporting data sharing in a privacy-preserving manner. At Eurocrypt 2019, Ghosh and Nilges proposed three efficient PSIs based on the polynomial representation of sets and proved their security against active adversaries. In this work, we show that these three PSIs are susceptible to several serious attacks. The attacks let an adversary (1) learn the correct intersection while making its victim believe that the intersection is empty, (2) learn a certain element of its victim’s set beyond the intersection, and (3) delete multiple elements of its victim’s input set. We explain why the proofs did not identify these attacks and propose a set of mitigations.

Paper available for download here.

March 2021

Towards Data Scientific Investigations: A Comprehensive Data Science Framework and Case Study for Investigating Organized Crime and Serving the Public Interest

Prepared by Erik van de Sandt, Arthur van Bunningen, Jarmo van Lenthe, John Fokker

Abstract: Big Data problems thwart the effectiveness of today’s organized crime investigations. A frequently proposed solution is the introduction of ‘smart’ data science technologies to process raw data into factual evidence. This transition to – what we call – data scientific investigations is nothing less than a paradigm shift for law enforcement agencies, and cannot be done alone. Yet a common language for data scientific investigations is so far missing. This white paper therefore presents guiding principles and best practices for data scientific investigations of organized crime, developed and put into practice by operational experts over several years, while connecting to existing law enforcement and industry standards. The associated framework is called CSAE (pronounced as ‘see-say’): a comprehensive framework that consists of a business process, methodology, policy agenda and public interest philosophy for data scientific operations.

Paper available for download here.