Legal Issues for Data Professionals: Protecting Against AI Risks in NDAs

This column explores how to protect against the risks of AI in legal confidentiality and non-disclosure agreements (NDAs). When companies are considering entering into a business relationship and/or are engaging in preliminary discussions about a contemplated agreement, they often create a term sheet that outlines the key business and legal provisions to be included in the final agreement. In each case, they enter into an NDA to protect against the disclosure of confidential information while engaged in the preliminary discussions. In addition, the confidential and non-disclosure provisions agreed to in the NDA are often carried over and used in the final agreement. 

Because of the nature of AI, it is important that confidential information specifically include data that is either owned by a company or controlled by a company.  This column uses the concept of “control of data” to address the uncertain legal status of what “ownership” means with respect to data, and to capture data that a company has the right to use and impose obligations and restrictions on the use of such data on other parties. This data often includes data that is licensed to the company by a third party.  Such licenses can provide exclusive or semi-exclusive use of the data by the receiving company. To oversimplify, the company has proprietary rights in the data.   

Overview of AI Risks 

In today’s legal environment, there is a risk that confidential data will be used in AI in a manner that is detrimental to the company. The risks are two-fold. First, there is a risk that the company’s data will be used as training data for the other company’s AI systems. Second, there is a risk that using an AI system will disclose data that should remain confidential. In addition, while the following use of AI does not present the same risks, the NDA should permit the company receiving the other company’s confidentiality agreement to use AI to analyze that confidential information. This does not present the same risks because it is using AI to conduct evaluations that are otherwise currently done by other means, and does not involve using the data as training data. In other words, AI is a new tool for an old purpose. This point is important because of how companies are using AI as part of the normal conduct of their business.

AI will allow a sophisticated analysis of the confidential information as part of the preliminary steps in considering the terms of the final agreement. AI analysis can be used to suggest provisions in the final agreement that emerge from this analysis and which terms may not be covered by a term sheet, or at an earlier stage, may be used in creating the term sheet.  Further, it allows a company to use AI to compare different proposals from different companies or to compare and contrast the risks and benefits of different proposed agreements. It is important to include generative AI prompts used by a company as part of its confidential information for NDA purposes because the prompts themselves may disclose confidential information. 

Because NDAs are legal agreements, it is important that they contain AI-related definitions to add precision to the AI rights and obligations in the main text of the agreement. The definitions should reflect the difference between the legal definition of AI and the computer science definition of AI. An NDA addressing AI risks should contain robust definitions in order to look ahead and cover the rapid evaluation of AI and the emerging judicial decisions regarding the use of data as training data in the context of potential copyright infringement. An NDA allows companies to agree to restrictions and rights by using a private contract while the law continues to evolve.  

AI-Related Definitions for Use in NDAs  

Below are suggested definitions for AI-related terms for use in NDAs. As a matter of legal drafting, there are advantages to using precise definitions as opposed to including the subject matter of definitions in the main text of the agreement.  

“AI Systems” means any existing or future system that operates on probabilistic and pattern-based intelligence with respect to real or virtual environments, including software, models, services or systems that utilize generative AI, non-generative AI, predictive AI, machine learning, deep learning, neural networks, and other types of artificial intelligence designed to ingest data, learn and adapt from data, generate outputs, conduct analytics, perform classifications, or make decisions, predictions, recommendations or forecasts, such as large language models, small language models, large foundation models, transformers, diffusion models, and agentic AI platforms.  

“Train” means, including through supervised or unsupervised learning, to fine-tune, calibrate, train, and refine an AI system, reduce hallucinations, create or enrich datasets for use by an AI system, generate embeddings (for example, to translate data and information into a mathematical format and spatial proximity for AI operations) or other data representations, and validate the performance, accuracy, and outputs of an AI system.  

“Training Prohibitions” means using confidential information for one or more of the following prohibited activities: modeling, simulation, abstraction, encoding, extraction, and transforming confidential information into AI formats; developing vectors or weights that can be used as inputs to or outputs from an AI system; and submitting confidential information to any publicly available or externally hosted AI systems. 

“AI Provider” means a person or entity who operates and/or trains a party’s AI system. 

“Disclose” means to make physical delivery, provide electronic versions, provide physical or remote inspection, or make accessible through the disclosing party’s IT assets. 

“IT Assets” means a party’s information technology and communications technology, including the following and each instance thereof: (a) computer hardware and software (including virtual machines) on the party’s premises, in a hosted environment and/or used through outsourcing; (b) software otherwise used or accessed on a subscription basis or as Software-as-a-Service, Platform-as-a-Service or Infrastructure-as-a-Service; and (c) end-user devices and software issued by a party, or acquired by a representative, and used in connection with the party’s business, including desktops, laptops, and mobile devices.    

Overview of Legal Provisions  

First, the party disclosing confidential information should say whether the information (or part of it) was created without the use of AI or with the use of AI. If AI is used, the parties to the NDA should decide on how much information about the use of AI should be provided.  

Second, the receiving party should agree that it will not use the disclosing party’s confidential information to train the receiving party’s AI. See the definition of “Train” above. The NDA should also cover the disclosing party’s AI providers when that company uses a third-party service provider to operate the AI system that it uses.   

Third, the receiving party should agree that it will not engage in any training prohibitions with respect to the disclosing party’s confidential information.   

Fourth, the agreement should provide that each party to the NDA may use its own AI system to analyze the other party’s confidential information.   This use should be only for the purposes of the NDA.   

Fifth, at the end of the NDA, the parties should return, destroy, or delete the other party’s confidential information. A complication in today’s environment is that confidential information will be stored in an often complex set of data storage locations in an IT system. See the definition of “IT Assets” above.  Because not all data can be deleted from all IT assets, the NDA should provide that data that cannot be deleted will be isolated in the IT assets and protected against unauthorized use or access.  

Conclusion   

The use of AI in today’s business and legal environment presents risks of unauthorized disclosure of confidential information. NDAs should be updated to expressly address these risks instead of having to rely on an argument after the fact.

Share this post

William A. Tanenbaum

William A. Tanenbaum

William A. Tanenbaum is a data, technology, privacy, and IP lawyer, and a partner in the 100-year old New York law firm Moses Singer. Who’s Who Legal says Bill is a “go-to expert” on “the management of and protection of data across a variety of sectors.” It named him “one of the leading names” in AI and data, and ranked him as one of the international "Thought Leaders in Data." Chambers, America’s Leading Lawyers for Business, says Bill has “notable expertise in cybersecurity, data law, and IP,” has a “solid national reputation,” and “brings extremely high integrity, a deep intellect, fearlessness, and a practical, real-world mindset to every problem.” Bill is a member of the DAMA Speakers Bureau and the Past President of the International Technology Law Association. He is a graduate of Brown University (Phi Beta Kappa), Cornell Law School, and the Bob Bondurant School of High-Performance Driving. Follow William on LinkedIn.

scroll to top