The use of AI to target regional, culturally significant language groups

Published July 2023

This report asserts that Large Language Models (LLMs) like the Generative Pre-Training Transformer (GPT, also known as ChatGPT) are being used by malicious actors to target RCS communities in two major ways:

Allowing non-regional language speakers to effectively impersonate as members of the community.
Allowing non-regional language speakers greater access to intrinsic customs and norms of RCS communities, giving them deeper understanding of these communities to exploit them.

New Zealand is home to many RCS languages such as Te Reo Māori and other Asia-Pacific languages. Historically, these communities may not have been targeted with “high quality/high volume” malicious content (e.g., phishing emails, phone scams and disinformation).

RCS communities that are not accustomed to having malicious content in their language are at risk of being more susceptible to malicious content generated by Generative Artificial Intelligence.

Artificial Intelligence (AI) systems are designed to perform tasks that typically require human intelligence, such as visual perception, speech recognition and decision making. Generative AI is a subfield of AI that focuses on creating systems capable of generating new content or completing tasks without significant guidance from humans. The most popular example of this currently is ChatGPT (https://openai.com/blog/chatgpt) which is a Large Language Model (LLM). These systems use methods that allow them to learn complex patterns on a deeper level than the others. Due to this they can receive and generate content simulating a genuine conversation or appear to understand further nuance in their outputs.

Recent advancements in Generative AI have increased access to RCS languages, as well as their respective populations. Whilst LLMs can help bridge communication gaps for many communities across the world, the unprecedented level of accessibility exposes communities to sophisticated cyber-enabled scams and fraud. There is a risk of accessibility from AI to these communities which should be considered if active efforts are made to “teach” LLMs regional languages used in New Zealand.

LLMs and how they differ to traditional translators

GPT and other LLMs can be trained to understand many languages and select subtle grammar rules or vocabulary and avoid artifacts that can occasionally be picked up by native speakers. Traditional online translators cannot create material themselves; their results are based on pre-configured answers. This means the material given to an online translator could hold elements that will appear odd or wrong to native speakers.

History of RCS languages and their relevance to scams and fraud

According to CERT NZ data, RCS languages have historically had limited scams and phishing campaigns. This is partly due to the limited sophistication of translation tools and understanding of the culture. A prohibitive cost of entry to these communities (via paying human translators for example) have likely made targeting them not worthwhile. This has been the case in New Zealand, where scams and phishing in languages like Te Reo Māori and Pacific Island languages make up a small minority of incidents reported to CERT NZ. Recent LLM technology could increase the frequency and quality of scams targeting these communities. If these communities are not familiar with malicious messaging, such attacks could be highly effective.

Even with safeguards, LLMs can be used to create malicious messages

Many language models have been trained not to give unethical advice. However, carefully crafted inputs can bypass the safeguards in place, producing an output with potentially malicious applications. For example, you can ask the LLMs to create a phishing email without directly calling it a phishing email. There are also ways to “trick” the system to produce what it doesn’t recognise as malicious messaging in an ethical context.

While GPT is currently the most popular LLM, other powerful models are also widely accessible. They are either open-source in nature or have been leaked. These will continue to improve with or without safeguards and could be used for malicious purposes. These open-source models can also be independently trained on RCS to improve their capabilities even further.

Recommendations

CERT NZ encourages that decisions for training AI models like LLMs incorporate the right risk factors such as granting malicious actors an increased ability to take advantage of RCS groups. There are likely to be other risks to consider also.

There is strong justification for education and awareness building targeted at RCS language users in New Zealand, to build resilience to scams, fraud, and misinformation and disinformation. Ideally, resilience can be built before we see these demographics targeted with this technology.