Automating Financial AI Safety: Red Teaming and Post-Training Large Language Models at Vanguard

Graduate researcher, during his internship at Vanguard, under the supervision of Vanguard’s AI research scientists, worked on developing and evaluating red teaming and post-training dataset generation to enhance LLM alignment in the financial services industry

Project at a Glance: Financial institutions increasingly rely on large language models (LLMs) in customer-facing chatbots, raising critical safety and compliance concerns. Manual red teaming and dataset curation are slow and costly, creating bottlenecks for deploying new models. Luc Baier-Reinio, a graduate student in the Master of Science in Applied Computing (MScAC) program, during his internship at Vanguard, worked on developing an automated red teaming tool paired with AI-guided post-training dataset generation in Vanguard’s Enterprise AI & Research (EAIR) team, within the Chief Data & Analytics Office (CDAO). The system helps to identify model vulnerabilities across multiple risk categories and produces responses using Deliberative Alignment and Constitutional AI. Fine-tuning LLMs on this dataset reduces adversarial success rate. This research complements Vanguard’s existing efforts under the trustworthy and responsible AI umbrella, specifically in reinforcement learning by AI feedback.

Why AI Safety Matters

As LLM adoption grows in financial services, companies must ensure their AI systems behave safely, comply with regulations, and adhere to company-specific guidelines. Vanguard is one of the world’s largest investment management companies, offering a large selection of investments, advice, retirement services and insights to individual investors, institutions and financial advisors. The company helps investors achieve their investment goals while ensuring fair treatment.

Vanguard’s EAIR team focuses on integrating advanced AI technologies into Vanguard’s AI ecosystem and building scalable capabilities that elevate every client experience. Its goal is to bring forward AI that feels less like a tool and more like a trusted partner. The team has been a long-time partner with the University of Toronto and has delivered many important results in generative AI, large language models and computer vision.

To reduce risk in their AI systems, companies leverage red teaming — a cybersecurity exercise that proactively identifies vulnerabilities and misaligned behaviours. To remediate undesirable outputs found during red teaming, researchers and developers post-train the underlying models to enhance safety and trustworthiness. Vanguard maintains extensive documentation on regulations and conversation guidelines, which should dictate model behaviour. However, open-source language models do not inherently incorporate these rules since internal documentation is not included in any training corpora. Accordingly, Vanguard must red-team and post-train any open-source models before integrating them into its AI ecosystem.

Can We Keep Up with AI?

Red teaming language models poses significant challenges. Their open-ended conversational nature makes manual red teaming prohibitively costly and labour-intensive. Likewise, manual dataset procurement for post-training is not feasible as it may involve producing and labelling thousands of data points.

The dynamic natures of finance and technology add another layer of complexity. First, conversation guidelines and financial regulations are not static or universal — they are ever-changing and use-case-specific. Second, new and improved open-source language models are released each year, and companies are incentivized to incorporate them into products to improve services. Thus, for each new open-source model, each change in guidelines or each new use case, an additional round of red teaming and subsequent dataset procurement for post-training should follow.

The time and cost of repeatedly undertaking manual red teaming and data curation clearly call for the adoption of automated methods. Automating red teaming and post-training can help reduce risk, enhance trustworthiness and accelerate the integration of AI models for financial services.

When Academia Meets Investment Management: A Partnership Story

Vanguard has maintained a longstanding partnership with the Master of Science in Applied Computing (MScAC) program at the University of Toronto since 2020. This collaboration has produced numerous high-impact outcomes. Graduate student Luc Baier-Reinio’s research, from the MScAC program’s Computer Science concentration, focuses on LLM and AI safety.

Baier-Reinio’s motivation for joining Vanguard for his internship was deeply personal: “I started investing after reading a book by Vanguard’s founder, John C. Bogle, who is credited with popularizing the index fund and revered as an advocate of the individual investor,” he said. “Vanguard’s strong reputation, its long-standing partnership with the University of Toronto, and the close alignment between the project’s focus and my research interests made this opportunity an ideal fit for me.”

The MScAC program prepared him through courses that exposed him to cutting-edge research topics and built his confidence to discuss technical concepts with peers. Most courses included open-ended research projects, which improved his technical acumen, particularly in AI/ML, enabling him to tackle industrial-scale AI safety challenges.

Building Smarter, Safer AI: How It Works

Teaching AI to Find Its Own Weaknesses

The automated red teaming system begins with a human-defined set of initial user queries, risk categories (undesirable behaviours the red teaming run will attempt to elicit and attack styles) and semantic strategies (used to elicit those undesirable behaviours, such as emotional manipulation). The algorithm is a multi-step evaluation system inspired by industry-standard adversarial testing techniques.¹ Upon completion, the tool outputs a comprehensive report on the run. The key metric is Attack Success Rate (ASR), the percentage of user queries that elicit an undesirable response. The report can be used to quantify the alignment, compliance or safety of language models before deployment.

Creating the Training Data: AI Teaching AI

In addition to red teaming, the tool can also be used as a data engine for user query generation. By running the red teaming tool several times, the system produces thousands of diverse user queries targeting the risk categories. These user queries can then be used to align the models via various AI feedback mechanisms (Deliberative Alignment², Constitutional AI³).

The Tech Behind the Breakthroughs

The workflow was evaluated using internal research systems in a secure environment. The problems encountered when red teaming and post-training models in the financial services industry are underexplored. To address these challenges, the project carefully synthesized and tailored multiple distinct research approaches to AI safety towards Vanguard’s unique situation. The combination of evolutionary search, AI feedback and post-training dataset creation into a single automated workflow represents a tailored solution specifically adapted for the complexities of financial services AI.

What This Tool Can and Can’t Do

While the search algorithm and AI feedback mechanisms are comprehensive, they are also compute-intensive, potentially requiring tens to hundreds of thousands of language model invocations to procure a dataset. Furthermore, these tools should not be considered standalone safety measures but as part of a broader toolkit that includes safety classifiers, carefully crafted system prompts, mechanistic interpretability, benchmarking suites and other safety mechanisms.

Putting AI to the Test: How the System Performed

The Security Audit

Three red teaming objectives were designed to capture the risk landscape when deploying language models in the financial services industry:

  1. Toxicity objective: Assesses if the language model will output harmful and/or unethical behaviour.
  2. Financial Crimes objective: Tests whether the language model can assist users in detecting illegal financial activities (e.g., fraud, money laundering).
  3. Financial Advice objective*: The language models should not provide financial advice; this category identifies cases where they do.


Three open-source models were targeted: Falcon-3-7B-Instruct, Olmo-3-7B-Instruct, and Phi-3.5-Mini-Instruct. ASR is used as the performance metric, with higher values indicating a better attack success rate.

The results show that the evolutionary search can effectively uncover harmful and noncompliant responses from open-source models. Furthermore, the tool allows for comparative analysis between models. For example, the Toxicity runs suggest that it is more difficult to elicit harmful content from Falcon-3-7B-Instruct than Phi-3.5-Mini-Instruct.

The Fix: Making AI Safer

The red teaming tool was run 16 times on the Financial Advice objective (e.g., the language models should not provide financial advice), retaining all high-quality adversarial user queries and resulting in a synthetic user query dataset of 6,360 examples. To complete the post-training dataset, Deliberative Alignment and Constitutional AI were applied to generate safe completions.

By applying Deliberative Alignment, responses were obtained that adhere to Vanguard’s conversation guidelines. Through Constitutional AI, any remaining content that could be construed as financial advice was removed. Using this approach, golden responses were generated for all user queries, producing a post-training dataset for advice conversations. Phi-3.5-Mini-Instruct was fine-tuned on that dataset, leading to improved resilience.

Why This Matters: From Lab to Real-World Impact

The results confirm that the automated tools can effectively streamline red teaming and dataset procurement, saving significant time for technical teams. Rather than manually writing adversarial queries or golden responses, humans act as supervisors, offloading the creation and labelling of data to the AI system itself. This synergy between humans and AI enables the production of high-quality post-training data at scale while maintaining responsible oversight.

Automating red teaming and dataset creation enables Vanguard to reduce manual labour, scale model evaluation, and safely integrate new LLMs more quickly. This approach also translates academic AI safety research into practical enterprise tools, contributing to broader industry trends in responsible AI deployment.

A key focus of Vanguard’s research agenda is ensuring the ethical and responsible use of AI. This work involved exploring the state-of-the-art research in AI safety in collaboration with EAIR.

Behind the Code: The People Making AI Safer

Building a Culture Where Innovation Thrives

Vanguard thrives on a culture of innovation and mentorship. The company is deeply committed to continuously introducing forward-thinking ideas that drive progress and create meaningful impact. At the same time, Vanguard strongly advocates for mentorship as a cornerstone of professional growth — fostering collaboration, knowledge sharing and empowering individuals to reach their full potential.

At Vanguard, a unique research funnel has been built that transforms cutting-edge academic work into enterprise-wide AI products that deliver real-world impact. Some of these innovations have been conducted in partnership with the University of Toronto’s MScAC program. By bridging the gap between theoretical research and practical application, Vanguard ensures that groundbreaking ideas don’t just stay in academia — they become scalable solutions that drive value across the organization and beyond.

Jithin Pradeep, Director and Head of Enterprise AI Research, reflected: “The combination of professors and U of T students together with our team creates a good understanding of the industry as well as the research domain, helping us put together solutions in a way that’s helping Vanguard move forward and beyond.”

Jack Ji, Manager of AI Research and Baier-Reinio’s supervisor at Vanguard, noted: “Luc consistently demonstrates exceptional clarity in communication, strong technical acumen and an eagerness to learn. His deep passion for the work translates into impactful results that drive meaningful progress for his internship project at Vanguard.”

From Classroom to Boardroom: A Student's Journey

On a personal level, Baier-Reinio found it interesting to observe how “ideas circulate within large organizations, flowing downward from leadership, rising from individuals, and moving laterally across teams.” He saw firsthand how these ideas evolve into proofs of concept and fully developed products.

From a technical perspective, working with multi-billion-parameter models for training and generating synthetic data was a new experience. This introduced unique challenges around resource management and optimization.

Since Vanguard is such a large organization, other technical teams were also working on and thinking about similar problems. Therefore, the project’s success depended not only on technical acumen but also on developing awareness of other internal initiatives and building Vanguard expertise. In doing so, Baier-Reinio found a niche for the project and effectively communicated its business value to the organization. This was an incredibly rewarding component of the research internship.

The internship reinforced Baier-Reinio’s transition from software development to AI research: “I planned to use my master’s degree to shift from software development into AI research. The program effectively assisted me in achieving that goal through its rigorous coursework, networking opportunities, and a unique opportunity to conduct an eight-month industrial research project.”

What This Means for the Future of Financial AI

The automated red teaming and dataset pipeline establishes a principled and effective baseline for automated red teaming and dataset curation that can be adopted within Vanguard. Future research work can pursue algorithmic improvements to these solutions and analyze trade-offs between this work and alternative approaches to red teaming and model alignment. From an applied perspective, the goal is to apply these tools to align and tailor open-source language models for specific GenAI use cases.

Contact: For media inquiries, please contact MScAC Partnerships  at partners@mscac.utoronto.ca. For more information about the Vanguard Research Lab at the University of Toronto, please contact strategicinitiatives.dcs@utoronto.ca. For more information about Vanguard, please visit vanguard.ca.

References

¹ Samvelyan, M., et al. (2024). Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts. arXiv preprint arXiv:2402.16822. https://arxiv.org/abs/2402.16822

² Guan, M. Y., et al. (2025). Deliberative Alignment: Reasoning Enables Safer Language Models. arXiv preprint arXiv:2412.16339. https://arxiv.org/abs/2412.16339

³ Bai, Y., et al. (2022). Constitutional AI: Harmlessness from AI Feedback. arXiv preprint arXiv:2212.08073. https://arxiv.org/abs/2212.08073

The methods and results described in this report reflect research‑stage experimentation conducted in a controlled environment. They do not represent any client-facing systems or capabilities, nor do they involve the provision of financial advice.

*This research does not involve providing financial advice to clients; it focuses on improving model safety and compliance.