
Introduction
Intelligent agents constitute task-oriented software that use artificial intelligence (AI) to achieve well-defined goals. AI agency in its most basic form can be deployed with traditional systems that have limited ability to perform specific tasks under defined conditions. Increasingly agentic AI systems are envisioned to have full ability to learn from their environment, make decisions, and perform tasks independently. They won’t require explicit inputs leading to predetermined outputs. Instead, they’ll receive instructions, develop an approach, and complete tasks leading to dynamic outputs.
The integration of AI agents into medical research and development (R&D) has ushered in a transformative era, promising unprecedented efficiency, accuracy, and scalability. However, while much of the discourse around AI in healthcare emphasizes the processes these technologies optimize, there is a notable gap in understanding how data — the lifeblood of AI systems — is prepared, structured, processed, and stored to enable these advancements. This paper explores the critical role of data engineering in the deployment of AI agents, focusing on the medical R&D domain, and outlines actionable strategies for navigating this complex landscape.
Examples of Agentic AI in Medicine
In healthcare, AI agents are increasingly deployed applying preset rules to analyze large datasets towards specific, pre-defined outcomes. These agents can analyze patient data such as test results, medical histories, and patient symptoms towards a potential diagnosis, recommend further tests, and suggest treatments based on insights from vast datasets and medical literature. Radiology is one example of AI agents acting as medical assistants fostering detection of anomalies in imaging with remarkable precision and improving diagnostic accuracy.
Agents can also be used to optimize the day-to-day management of running a medical practice. For example, agents can use the outcome of the above types of analyses to facilitate patient scheduling, generate medication reminders, increase patient engagement, support billing inquiries, and simplify patient access.
Additionally, agents’ ability to adapt to real-time data, provide dynamic insights, and support decision-making is revolutionizing medical R&D.
The Data Imperative in Medical R&D
Medical research and development is inherently data-intensive, requiring vast volumes of structured and unstructured data to drive innovation. From genomics research to clinical trials, the need for robust data is paramount. For instance, genomics platforms must process very large datasets across temporal and spatial dimensions, ranging from high-performance computing (HPC) systems for immediate analysis to long-term cold storage archives for reanalysis years later. The complexity of managing such data underscores the importance of a well-engineered infrastructure and the quality and accessibility of the underlying data.
Despite the promise of AI agents, challenges in data engineering include:
- Data Quality: Ensuring accuracy, removing duplicates, and standardizing formats are essential for reliable AI analysis.
- Real-Time Processing: Technologies that bring computation closer to data sources, but limited edge computational capacity and unpredictable network performance remain challenging to overcome.
- Scalability: AI agents must handle vast amounts of data from diverse sources, necessitating scalable storage and processing solutions.
- Privacy and Compliance: Adhering to stringent regulations while leveraging AI for data analysis is a critical concern.
The Role of Data Engineering
Data engineering activities and practices are foundational to unlocking the potential of AI agents in medical R&D. By addressing the complexities of data preparation, structuring, processing, and storage, data engineering ensures that AI agents can operate efficiently, accurately, and at scale. These include:
- Ensuring Data Quality and Reliability: AI agents rely on high-quality data to deliver accurate insights and predictions. Data engineering practices such as data cleansing, deduplication, and standardization ensure that datasets are reliable and fit-for-use.
- Building Scalable Data Architectures: Medical R&D generates vast amounts of structured and unstructured data, requiring scalable storage and processing solutions. Data engineering activities must focus on creating architectures that can grow alongside increasing demands.
- Enabling Real-Time Data Processing: Medical research often involves time-sensitive data, such as patient monitoring during clinical trials or disease surveillance. Data engineering practices such as edge and fog computing enable near-to-real-time data processing, which is critical for AI agents to provide timely insights.
- Supporting Advanced Analytics: AI agents use predictive and diagnostic analytics to uncover patterns, forecast outcomes, and provide actionable insights. The application of machine learning models will help forecast disease outcomes and detect anomalies in medical data. Using AI agents to identify and standardize patterns in medical imaging and genomic data will aid in early detection of disease.
- Facilitating Secure and Compliant Data Sharing: Medical R&D often involves collaboration across organizations, requiring secure and transparent data sharing. Data engineering technology that provides secure and transparent sharing of medical data will ensure privacy and foster trust among stakeholders. Robust security protocols and adherence to regulatory (e.g., privacy) requirements allow AI agents to process sensitive medical data without compromise.
- Optimizing Data Accessibility: AI agents need seamless access to diverse datasets to perform effectively. API-Based data access simplifies integration and enables real-time collaboration among research teams. Additionally, data cataloging tools support search and exploration capabilities, allowing researchers to quickly locate and utilize relevant datasets.
- Governing Data and AI Agents: We know from years of experience that data governance is central to effective data management. Ensuring that data quality, data platforms, standards, metadata, analytics, and data sharing are well coordinated is key to leveraging good, consistent data. Such governance extends to AI — including agents so that decisions, especially ones that are automated, are sound.
Conclusion
AI agents are revolutionizing medical research and development by automating complex tasks, providing real-time insights, and improving decision-making. However, their success depends on the underlying data infrastructure. The full potential of AI in healthcare can be unlocked through the implementation of deliberate data preparation, storage, and processing strategies. Data engineering activities and practices are critical backbone components of AI agent deployment. By ensuring data quality, scalability, real-time processing, security, and accessibility, data engineering unlocks the full potential of AI agents to transform healthcare research. As organizations increasingly adopt AI-driven solutions, investing in robust data engineering practices will be critical to achieving breakthroughs in medical research and improving patient outcomes.
References
On the Effectiveness of Fog Offloading in a Mobility-Aware Healthcare Environment
From Data to Discovery: How AI Agents are Shaping Medical Research
Gartner Names Agentic AI Top Tech Trend for 2025 — THE Journal
How AI Agents Are Transforming Healthcare With Efficiency And Security
Authors
Lori Wordsworth: Principal Information Systems Engineer at the MITRE Corporation who worked on system engineering and business improvement efforts. Ms. Wordsworth researched designing decision support mechanisms and frameworks that could be implemented via AI or other learning tools to understand and optimize decision making pathways.
Adrienne Chen-Young: Group Leader and Principal Information Systems Engineer at the MITRE Corporation who has worked on data architecture, metadata and data modeling, interoperability, and lifecycle planning efforts across multiple agencies.
Mike Fleckenstein: Chief Data Strategist at the MITRE Corporation who has guided enterprise-level data programs and projects across multiple agencies. Mr. Fleckenstein is a published author and regular conference speaker.
Approved for Public Release; Distribution Unlimited. Public Release Case Number PR_25-00398-1. The authors’ affiliation with The MITRE Corporation is provided for identification purposes only and is not intended to convey or imply MITRE’s concurrence with, or support for, the positions, opinions, or viewpoints expressed by the author. ©2025 THE MITRE CORPORATION. ALL RIGHTS RESERVED.
