• About
  • Advertise
  • Privacy & Policy
  • Contact
HK Businesswire
  • Home
  • News
    • All
    • Business
    • Politics
    • PR Newswire
    • Science
    • World

    Banking sector looks to fund Central Asia’s boom

    Haier Biomedical Achieves Double No. 1 Ranking in Euromonitor’s Global Life Science Lab Equipment Report, Caps Three-Phase Global Expansion

    Haier Biomedical Achieves Double No. 1 Ranking in Euromonitor’s Global Life Science Lab Equipment Report, Caps Three-Phase Global Expansion

    Principal in swearing case submits resignation

    Unikeyic Electronics Ranked No. 19 on Supply Chain Connect’s 2026 Top 50 Global Electronics Distributors List

    Unikeyic Electronics Ranked No. 19 on Supply Chain Connect’s 2026 Top 50 Global Electronics Distributors List

    Acer Expands Gaming Portfolio With Predator Atlas 8 Handheld Powered by Intel

    Acer Expands Gaming Portfolio With Predator Atlas 8 Handheld Powered by Intel

    Acer Broadens Portfolio with Two New Laptops Powered by the Latest Snapdragon Processors

    Acer Broadens Portfolio with Two New Laptops Powered by the Latest Snapdragon Processors

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • PR Newswire
  • Business
  • World
  • Entertainment
  • Sports
  • Tech
    • All
    • Apps
    • Gadget
    • Mobile
    • Startup

    Xiaomi Cuts MiMo-V2.5 API Prices by Up to 99% Worldwide

    Hong Kong Medical Implant Firm Koln 3D to Expand into Central Asia

    Hong Kong Sets Cap on Ride-Hailing Cars in Landmark Regulatory Move

    Alipay Launches AI Wallet and Token Pay After Completing 300 Million AI Transactions

    Xiaomi Unveils YU7 GT SUV and Full Smart Ecosystem Expansion in Beijing

    Li Ka Shing Foundation to Fund Histotripsy Treatment for 200 Liver Cancer Patients in Hong Kong

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
  • Feature
No Result
View All Result
  • Home
  • News
    • All
    • Business
    • Politics
    • PR Newswire
    • Science
    • World

    Banking sector looks to fund Central Asia’s boom

    Haier Biomedical Achieves Double No. 1 Ranking in Euromonitor’s Global Life Science Lab Equipment Report, Caps Three-Phase Global Expansion

    Haier Biomedical Achieves Double No. 1 Ranking in Euromonitor’s Global Life Science Lab Equipment Report, Caps Three-Phase Global Expansion

    Principal in swearing case submits resignation

    Unikeyic Electronics Ranked No. 19 on Supply Chain Connect’s 2026 Top 50 Global Electronics Distributors List

    Unikeyic Electronics Ranked No. 19 on Supply Chain Connect’s 2026 Top 50 Global Electronics Distributors List

    Acer Expands Gaming Portfolio With Predator Atlas 8 Handheld Powered by Intel

    Acer Expands Gaming Portfolio With Predator Atlas 8 Handheld Powered by Intel

    Acer Broadens Portfolio with Two New Laptops Powered by the Latest Snapdragon Processors

    Acer Broadens Portfolio with Two New Laptops Powered by the Latest Snapdragon Processors

    Trending Tags

    • Trump Inauguration
    • United Stated
    • White House
    • Market Stories
    • Election Results
  • PR Newswire
  • Business
  • World
  • Entertainment
  • Sports
  • Tech
    • All
    • Apps
    • Gadget
    • Mobile
    • Startup

    Xiaomi Cuts MiMo-V2.5 API Prices by Up to 99% Worldwide

    Hong Kong Medical Implant Firm Koln 3D to Expand into Central Asia

    Hong Kong Sets Cap on Ride-Hailing Cars in Landmark Regulatory Move

    Alipay Launches AI Wallet and Token Pay After Completing 300 Million AI Transactions

    Xiaomi Unveils YU7 GT SUV and Full Smart Ecosystem Expansion in Beijing

    Li Ka Shing Foundation to Fund Histotripsy Treatment for 200 Liver Cancer Patients in Hong Kong

    Trending Tags

    • Nintendo Switch
    • CES 2017
    • Playstation 4 Pro
    • Mark Zuckerberg
  • Feature
No Result
View All Result
HK Businesswire
No Result
View All Result
Home News Science

Training LLMs to self-detoxify their language

David Lee by David Lee
14 April 2025
in Science
0
Training LLMs to self-detoxify their language
0
SHARES
5
VIEWS
Share on FacebookShare on Twitter

As we mature from childhood, our vocabulary — as well as the ways we use it — grows, and our experiences become richer, allowing us to think, reason, and interact with others with specificity and intention. Accordingly, our word choices evolve to align with our personal values, ethics, cultural norms, and views. Over time, most of us develop an internal “guide” that enables us to learn context behind conversation; it also frequently directs us away from sharing information and sentiments that are, or could be, harmful or inappropriate. As it turns out, large language models (LLMs) — which are trained on extensive, public datasets and therefore often have biases and toxic language baked in — can gain a similar capacity to moderate their own language.A new method from MIT, the MIT-IBM Watson AI Lab, and IBM Research, called self-disciplined autoregressive sampling (SASA), allows LLMs to detoxify their own outputs, without sacrificing fluency. Unlike other detoxifying methods, this decoding algorithm learns a boundary between toxic/nontoxic subspaces within the LLM’s own internal representation, without altering the parameters of the model, the need for retraining, or an external reward model. Then, during inference, the algorithm assesses the toxicity value of the partially generated phrase: tokens (words) already generated and accepted, along with each potential new token that could reasonably be chosen for proximity to the classifier boundary. Next, it selects a word option that places the phrase in the nontoxic space, ultimately offering a fast and efficient way to generate less-toxic language.“We wanted to find out a way with any existing language model [that], during the generation process, the decoding can be subject to some human values; the example here we are taking is toxicity,” says the study’s lead author Ching-Yun “Irene” Ko PhD ’24, a former graduate intern with the MIT-IBM Watson AI Lab and a current research scientist at IBM’s Thomas J. Watson Research Center in New York.Ko’s co-authors include Luca Daniel, professor in the MIT Department of Electrical Engineering and Computer Science (EECS), a member of the MIT-IBM Watson AI Lab, and Ko’s graduate advisor; and several members of the MIT-IBM Watson AI Lab and/or IBM Research — Pin-Yu Chen, Payel Das, Youssef Mroueh, Soham Dan, Georgios Kollias, Subhajit Chaudhury, and Tejaswini Pedapati. The work will be presented at the International Conference on Learning Representations.Finding the “guardrails”The training resources behind LLMs almost always include content collected from public spaces like the internet and other readily available datasets. As such, curse words and bullying/unpalatable language is a component, although some of it is in the context of literary works. It then follows that LLMs can innately produce — or be tricked into generating — dangerous and/or biased content, which often contains disagreeable words or hateful language, even from innocuous prompts. Further, it’s been found that they can learn and amplify language that’s not preferred or even detrimental for many applications and downstream tasks — leading to the need for mitigation or correction strategies.There are many ways to achieve robust language generation that’s fair and value-aligned. Some methods use LLM retraining with a sanitized dataset, which is costly, takes time, and may alter the LLM’s performance; others employ decoding external reward models, like sampling or beam search, which take longer to run and require more memory. In the case of SASA, Ko, Daniel, and the IBM Research team developed a method that leverages the autoregressive nature of LLMs, and using a decoding-based strategy during the LLM’s inference, gradually steers the generation — one token at a time — away from unsavory or undesired outputs and toward better language.The research group achieved this by building a linear classifier that operates on the learned subspace from the LLM’s embedding. When LLMs are trained, words with similar meanings are placed closely together in vector space and further away from dissimilar words; the researchers hypothesized that an LLM’s embedding would therefore also capture contextual information, which could be used for detoxification. The researchers used datasets that contained sets of a prompt (first half of a sentence or thought), a response (the completion of that sentence), and human-attributed annotation, like toxic or nontoxic, preferred or not preferred, with continuous labels from 0-1, denoting increasing toxicity. A Bayes-optimal classifier was then applied to learn and figuratively draw a line between the binary subspaces within the sentence embeddings, represented by positive values (nontoxic space) and negative numbers (toxic space). The SASA system then works by re-weighting the sampling probabilities of newest potential token based on the value of it and the generated phrase’s distance to the classifier, with the goal of remaining close to the original sampling distribution.To illustrate, if a user is generating a potential token #12 in a sentence, the LLM will look over its full vocabulary for a reasonable word, based on the 11 words that came before it, and using top-k, top-p, it will filter and produce roughly 10 tokens to select from. SASA then evaluates each of those tokens in the partially completed sentence for its proximity to the classifier (i.e., the value of tokens 1-11, plus each potential token 12). Tokens that produce sentences in the positive space are encouraged, while those in the negative space are penalized. Additionally, the further away from the classifier, the stronger the impact.“The goal is to change the autoregressive sampling process by re-weighting the probability of good tokens. If the next token is likely to be toxic given the context, then we are going to reduce the sampling probability for those prone to be toxic tokens,” says Ko. The researchers chose to do it this way “because the things we say, whether it’s benign or not, is subject to the context.”Tamping down toxicity for value matchingThe researchers evaluated their method against several baseline interventions with three LLMs of increasing size; all were transformers and autoregressive-based: GPT2-Large, Llama2-7b, and Llama 3.1-8b-Instruct, with 762 million, 7 billion, and 8 billion parameters respectively. For each prompt, the LLM was tasked with completing the sentence/phrase 25 times, and PerspectiveAPI scored them from 0 to 1, with anything over 0.5 being toxic. The team looked at two metrics: the average maximum toxicity score over the 25 generations for all the prompts, and the toxic rate, which was the probability of producing at least one toxic phrase over 25 generations. Reduced fluency (and therefore increased perplexity) were also analyzed. SASA was tested to complete RealToxicityPrompts (RPT), BOLD, and AttaQ datasets, which contained naturally occurring, English sentence prompts.The researchers ramped up the complexity of their trials for detoxification by SASA, beginning with nontoxic prompts from the RPT dataset, looking for harmful sentence completions. Then, they escalated it to more challenging prompts from RPT that were more likely to produce concerning results, and as well applied SASA to the instruction-tuned model to assess if their technique could further reduce unwanted ouputs. They also used the BOLD and AttaQ benchmarks to examine the general applicability of SASA in detoxification. With the BOLD dataset, the researchers further looked for gender bias in language generations and tried to achieve a balanced toxic rate between the genders. Lastly, the team looked at runtime, memory usage, and how SASA could be combined with word filtering to achieve healthy and/or helpful language generation.“If we think about how human beings think and react in the world, we do see bad things, so it’s not about allowing the language model to see only the good things. It’s about understanding the full spectrum — both good and bad,” says Ko, “and choosing to uphold our values when we speak and act.”Overall, SASA achieved significant toxic language generation reductions, performing on par with RAD, a state-of-the-art external reward model technique. However, it was universally observed that stronger detoxification accompanied a decrease in fluency. Before intervention, the LLMs produced more toxic responses for female labeled prompts than male; however, SASA was able to also significantly cut down harmful responses, making them more equalized. Similarly, word filtering on top of SASA did markedly lower toxicity levels, but it also hindered the ability of the LLM to respond coherently.A great aspect of this work is that it’s a well-defined, constrained optimization problem, says Ko, meaning that balance between open language generation that sounds natural and the need to reduce unwanted language can be achieved and tuned.Further, Ko says, SASA could work well for multiple attributes in the future: “For human beings, we have multiple human values. We don’t want to say toxic things, but we also want to be truthful, helpful, and loyal … If you were to fine-tune a model for all of these values, it would require more computational resources and, of course, additional training.” On account of the lightweight manner of SASA, it could easily be applied in these circumstances: “If you want to work with multiple values, it’s simply checking the generation’s position in multiple subspaces. It only adds marginal overhead in terms of the compute and parameters,” says Ko, leading to more positive, fair, and principle-aligned language.This work was supported, in part, by the MIT-IBM Watson AI Lab and the National Science Foundation.

Tags: Science
David Lee

David Lee

Read More

Key Chemistry Question Answered, No Quantum Computer Required

29 May 2026

New laboratory at MIT aims to advance quantum research for the nation

28 May 2026
  • Trending
  • Comments
  • Latest

Tongcheng Travel Achieves Revenue of 5 Billion in 2026Q1 Growing User Base and APUs Increases to 254 Million

21 May 2026

USC Thornton Chamber Singers Make Hong Kong Debut at Inter-School Choral Festival

26 May 2026
CodeCoin Named World Finance Forum “Tech Innovation Growth Enterprise”; Compliant Digital Payment Infrastructure Gains Industry Recognition

CodeCoin Named World Finance Forum “Tech Innovation Growth Enterprise”; Compliant Digital Payment Infrastructure Gains Industry Recognition

26 May 2026

10 hurt as turbulence hits Cathay flight from Brisbane

23 May 2026

Banking sector looks to fund Central Asia’s boom

28 May 2026
Haier Biomedical Achieves Double No. 1 Ranking in Euromonitor’s Global Life Science Lab Equipment Report, Caps Three-Phase Global Expansion

Haier Biomedical Achieves Double No. 1 Ranking in Euromonitor’s Global Life Science Lab Equipment Report, Caps Three-Phase Global Expansion

28 May 2026

Principal in swearing case submits resignation

28 May 2026
Unikeyic Electronics Ranked No. 19 on Supply Chain Connect’s 2026 Top 50 Global Electronics Distributors List

Unikeyic Electronics Ranked No. 19 on Supply Chain Connect’s 2026 Top 50 Global Electronics Distributors List

28 May 2026

Recent News

Banking sector looks to fund Central Asia’s boom

28 May 2026
Haier Biomedical Achieves Double No. 1 Ranking in Euromonitor’s Global Life Science Lab Equipment Report, Caps Three-Phase Global Expansion

Haier Biomedical Achieves Double No. 1 Ranking in Euromonitor’s Global Life Science Lab Equipment Report, Caps Three-Phase Global Expansion

28 May 2026

Principal in swearing case submits resignation

28 May 2026
Unikeyic Electronics Ranked No. 19 on Supply Chain Connect’s 2026 Top 50 Global Electronics Distributors List

Unikeyic Electronics Ranked No. 19 on Supply Chain Connect’s 2026 Top 50 Global Electronics Distributors List

28 May 2026
HK Businesswire

Stay ahead with the latest insights on Hong Kong’s economy, finance, and investments. From market trends to policy updates, we bring you in-depth analysis and expert opinions.

📩 Subscribe to our newsletter for exclusive updates.
📍 Follow us on social media for real-time news.
📧 Contact us: info@hongkong-invest.com

Follow Us

  • About
  • Advertise
  • Privacy & Policy
  • Contact

© 2025 by HKBusinesswire.com

No Result
View All Result

© 2025 by HKBusinesswire.com