Sanofi and Graduate Researcher Advance Multi-View Motion Analysis for Clinical Trials

Why tracking how patients walk and move could drastically improve some clinical trials — and how Sanofi and a grad student developed a solution without a million-dollar lab

Project at a Glance: What works perfectly in controlled laboratory settings often fails in real-world conditions. Sanofi mentored graduate student Natalie Won during her internship to develop an advanced multi-view human pose estimation system that overcomes the limitations of both single-camera and traditional multi-camera approaches. Under the guidance of her Sanofi industry supervisor, Won implemented spatial and temporal transformers trained on synthetically generated multi-view 3D data. This collaborative innovation demonstrates improved accuracy in 3D motion estimation and tracking in uncontrolled settings, reducing the need for expensive, time-consuming data collection in complex lab environments.

When Precision Meets Practicality: The Motion Tracking Challenge

Human movement tells a powerful story in clinical research. Gait patterns can reveal early signs of neurological conditions. Subtle changes in posture might indicate treatment efficacy. Small variations in joint mobility could signal disease progression. But capturing these movements accurately outside controlled laboratory settings remains one of the most persistent technical challenges in healthcare.

Traditional motion capture systems, currently the gold standard in biomechanics labs, require multiple synchronized cameras, reflective markers placed on the body, and carefully calibrated environments. They’re accurate but impractical for real-world clinical trials where patients need to move naturally in diverse settings. Single-camera systems offer convenience but struggle with a fundamental problem: depth ambiguity. A camera can’t reliably determine whether someone’s arm is extended forward or simply appears closer due to the camera angle. When a limb becomes occluded, for example, hidden behind the body or another object, monocular systems often fail completely.

Multi-view systems theoretically solve these problems by combining perspectives from multiple cameras to triangulate positions in three-dimensional space. Yet they bring their own complications: complex setup requirements, calibration needs, and most critically, a scarcity of training data. Capturing multi-view video sequences of human movement is expensive and time-consuming, creating a bottleneck for machine learning development.

Sanofi’s Digital R&D team in Toronto confronted this challenge directly. As pharmaceutical development increasingly relies on digital biomarkers — objective, quantifiable physiological measures collected through wearables, smartphones, and sensors — the ability to accurately track human movement becomes essential for clinical trials. Whether assessing mobility improvements or measuring rehabilitation progress after surgery, reliable 3D pose estimation could transform how clinical endpoints are measured.

Academic Expertise Meets Pharmaceutical Innovation

Natalie Won brought specialized training in computer vision and healthcare AI to her MScAC internship at Sanofi. A graduate student in the Master of Science in Applied Computing program, Won had cultivated a deep interest in how machine learning could address practical clinical challenges.

“Having previously worked at the intersection of AI and healthcare, I knew I wanted to dive deeper into the field, making the MScAC program a natural next step,” Won said. “The interdisciplinary nature of the curriculum, combined with the ability to tailor coursework to my specific interests, provided the strong foundation I needed to contribute meaningfully to the internship project.”

The partnership was established through the MScAC internship program, with Won (Artificial Intelligence in Healthcare concentration) working under Mena Kamel, computational scientist lead in Sanofi’s Data for New Technologies team, and academic supervisor Professor Rahul G. Krishnan from the University of Toronto. The collaboration positioned Won within Sanofi’s AI Centre of Excellence — a multidisciplinary team that brings together data scientists, engineers, and clinical researchers to develop AI-driven solutions across the pharmaceutical value chain.

“At Sanofi, we deeply value academic partnerships, particularly with programs like MScAC, because the students they produce bring a rare combination of curiosity, theoretical depth, and an eagerness to apply their skills where it matters,” said Kamel. “We came into this collaboration looking for someone who shared our passion for using AI to directly improve human lives. That alignment of ambition and purpose is what makes these partnerships so powerful.”

Based at Sanofi’s Toronto office with a hybrid work arrangement, Won enjoyed direct access to cross-functional expertise spanning clinical research, data science and engineering

Building a Synthetic Data-Driven Solution

Developed during Won’s internship at Sanofi, this work adopted a “lifting” approach to multi-view pose estimation. Rather than directly estimating 3D poses from multi-view video, the system leverages well-established 2D pose estimation models that process each camera view independently, identifying joint locations in 2D.

A unique aspect of the solution lies in how the model processes and integrates information across views and time. Drawing on transformer architectures, specifically DSTFormer (Dual Spatial-Temporal Transformer), the system extracts spatial- and temporal-aware embeddings for each camera view. View-specific embeddings are passed through a fusion transformer that consolidates information across all camera angles. This approach resolves ambiguities and occlusions by leveraging complementary perspectives without requiring camera calibration.

A key enabler of this project was the approach to generating training data. Rather than requiring thousands of hours of real multi-view video capture, the team developed a synthetic dataset from 3D skeleton data. By applying rigid transformations and perspective projection to existing 3D pose sequences, the system generates realistic multi-view 2D pose sequences that maintain perfect ground truth correspondence. This synthetic data approach addresses a fundamental bottleneck in multi-view pose estimation, enabling diverse training scenarios without the time and expense of physical data collection.

Transforming Clinical Motion Analysis: The Results

Validation on benchmark datasets shows that the system achieves 50.9 per cent and 49.5 per cent reductions in mean per-joint position error (MPJPE) and mean per-joint velocity error (MPJVE), respectively, compared to the leading multi-view pose estimation model. This represents progress toward achieving lab-grade motion capture accuracy with reduced infrastructure requirements. The lifting architecture offers particular advantages for clinical deployment. Because it relies on 2D pose estimation as an intermediate step, the system can leverage ongoing improvements in monocular pose estimation, a rapidly advancing field with large-scale datasets and extensive research investment.

For healthcare applications, this represents meaningful progress. Digital biomarkers extracted from 3D pose estimation can be measured more frequently and objectively than traditional clinical assessments.

“This research has given us solid experience building models for enhanced human pose estimation. It will help us capture multi-view datasets in any environment with just regular phones — no markers, no calibration needed,” Kamel said. “Traditional approaches require professional labs, markers attached to patients, and meticulously calibrated cameras. This approach removes those barriers, making data collection simpler and more efficient.”

“Natalie’s work at Sanofi tackles a real bottleneck in bringing AI-driven motion analysis out of the lab and into practical clinical settings. By combining synthetic data generation with transformer-based multi-view fusion, she’s shown that high-quality 3D pose estimation doesn’t require million-dollar infrastructure. This kind of industry-academic collaboration is exactly how research translates into real-world impact to accelerate drug development,” said Krishnan.

A Culture of Innovation and Scientific Rigour

Sanofi’s Digital R&D team operates at the intersection of pharmaceutical development and cutting-edge AI research. The Toronto-based AI Centre of Excellence serves as an innovation engine for Sanofi globally, breaking down complex business problems across the entire pharmaceutical value chain.

“Innovation at Sanofi means continuous learning and taking bold, thoughtful risks. That’s the environment we try to create for our interns, too,” Kamel said. “The work is always connected to something real and useful in our current projects, and where possible, we encourage academic publication. It’s how we build our presence in new areas of AI innovation and contribute back to the broader research community.”

“This internship was a true deep-dive into the research process, which naturally came with its share of challenges. Learning to pivot and adapt when obstacles arose was crucial, and the guidance from my industry mentor, Mena Kamel, proved invaluable in advancing the research,” Won said. “Beyond the technical work itself, the mentorship and company culture I experienced made it clear that I wanted to continue my journey at Sanofi.”

What This Means for Digital Biomarkers

This project demonstrates that multi-view pose estimation can transition from research labs with significant hardware requirements to non-controlled, in-the-wild settings with minimal hardware. As digital biomarkers become standard components of clinical evidence, the ability to measure movement accurately and continuously could reshape how efficacy is demonstrated.

The methodology’s adaptability matters particularly as personalized medicine advances. Different conditions require different movement metrics, and a system that can rapidly adapt to new motion analysis requirements through synthetic data generation provides research flexibility that traditional approaches cannot match.

“We absolutely intend to continue our collaboration with U of T and the MScAC program. We’re committed to pushing further in movement understanding from video data, deriving insights that simply weren’t possible to capture before. Each project grows our research footprint and experience in the motion understanding domain,” Kamel said.

By the Numbers

  • Mean-per-joint position error: 50.9 per cent reduction compared to the leading multi-view 3D HPE model
  • Mean-per-joint velocity error: 49.5 per cent reduction compared to the leading multi-view 3D HPE model
  • Architecture: Spatial and temporal transformer embeddings with fusion pose transformer
  • Training approach: Synthetic multi-view 2D sequence generation from 3D skeletons
  • Partner: Sanofi AI Centre of Excellence, Toronto
  • Applications: Gait analysis, motion tracking, clinical trial digital biomarkers
  • Supervisors: Mena Kamel (Sanofi), Professor Rahul G. Krishnan (University of Toronto)

 

Contact: For media inquiries, please contact MScAC Partnerships at partners@mscac.utoronto.ca. For more information about Sanofi’s AI Centre of Excellence, visit sanofi.com.