Open Document Alliance

We're building an open standard library for agreements: machine-readable contracts that remain human-first, portable across tools, and built to last.

Agreements are the foundation of how we work together. Every partnership, every employment relationship, every software license, every purchase: all of them rest on some form of agreement. These documents define obligations, establish trust, and enable cooperation at scale. And yet, the way we create, store, and manage agreements hasn't kept pace with how we work.

Most contracts today live as PDFs, static images of text that are essentially unreadable to machines. They sit in email attachments, file servers, and contract management systems, waiting for humans to manually extract the information that matters. When did this agreement start? When does it end? What are the payment terms? Who are the parties? Answering these questions often means opening the document and searching through pages of legal prose.

We think there's a better way. At the Open Document Alliance, we're working on an open standard library for agreements: a format that's human-readable first, machine-readable by design, and portable across any tool or platform. This is early work, very much a work in progress, but we want to share our thinking and invite collaboration from anyone who cares about this problem.

The problem with agreements today

Consider what happens when a company wants to understand its contract portfolio. Maybe they're preparing for an audit, negotiating a renewal, or assessing risk exposure. Someone (often an expensive someone) needs to open each document, read through it, and manually extract the relevant details into a spreadsheet or database.

This is slow, error-prone, and doesn't scale. Large organizations might have thousands of active agreements. Even with contract management software, the underlying documents remain opaque. The software can track metadata that humans enter, but it can't reliably parse the agreements themselves.

AI is starting to help here. Modern language models can extract information from unstructured text with impressive accuracy. But they're working against the grain, inferring structure from documents that were never designed to be structured. Every extraction is a best guess. Edge cases abound. And when the stakes are high (as they often are with legal documents), "usually correct" isn't good enough.

The deeper problem is that PDFs and Word documents treat agreements as presentation formats. They specify how text should look on a page, not what the text means. A heading that says "Term and Termination" is just styled text. There's nothing in the file format that says "this section contains duration and ending conditions." The semantic meaning lives only in the minds of readers.

What if agreements were structured from the start?

Imagine an agreement format where the structure is explicit. Parties are identified in a standard block with names, roles, and contact information. The effective date and term duration are machine-readable fields, not sentences to be parsed. Clauses have unique identifiers that can be referenced across documents. Obligations are marked as such, with clear conditions and deadlines.

This doesn't mean agreements become less human-readable; quite the opposite. A well-structured document can still render beautifully as a traditional contract. But underneath the presentation layer, the data is organized in a way that any software can understand.

Think of the difference between a scanned paper form and a web form. Both collect the same information, but only one can be processed automatically. Structured agreements are like web forms for contracts: they capture information in a way that flows naturally into downstream systems.

Design principles for an open agreement format

We've been thinking carefully about what makes a good agreement format. Here are the principles guiding our work:

First, human-readable by default. Agreements are fundamentally human documents. They need to be read, understood, and negotiated by people, often people without technical backgrounds. Any format we create must produce documents that feel natural to read, with familiar structures and clear language. The machine-readable layer should enhance the human experience, not complicate it.

Second, semantic structure over presentation. Instead of specifying fonts and margins, the format should capture meaning. What are the parties agreeing to? What conditions apply? What happens if someone breaches? The presentation can be customized for different contexts (formal PDFs for signatures, simplified views for quick reference, accessible versions for screen readers), but the underlying structure remains consistent.

Third, extensibility for different domains. A software license has different requirements than a lease agreement, which differs from an employment contract. The format needs a core schema that applies universally (parties, dates, signatures) plus extension mechanisms for industry-specific elements. Real estate agreements might include property descriptions. Healthcare contracts might require HIPAA provisions. The format should accommodate these variations without becoming unwieldy.

Fourth, explicit provenance and history. Agreements evolve. They get negotiated, amended, renewed, and terminated. A good format captures this history: who proposed which changes, when signatures were added, how the current version relates to previous ones. This isn't just bookkeeping; it's essential context for understanding what an agreement actually means.

Fifth, open specification and governance. This might be the most important principle. For an agreement format to become a true standard, it can't be controlled by any single vendor. The specification must be public, developed collaboratively, and maintained by a neutral organization. Anyone should be able to build tools that read and write this format without licensing fees or permission. This is the same principle that makes open document standards work, and why they outlast proprietary alternatives.

Core elements of the schema

While we're still refining the details, the basic structure is taking shape. An agreement in this format would include several standard blocks:

The header block identifies the agreement itself: a unique identifier, the type of agreement, version information, and the governing jurisdiction. This metadata applies to the document as a whole.

The parties block lists everyone involved in the agreement, along with their roles. A simple contract might have two parties (a service provider and a client). More complex arrangements might include guarantors, beneficiaries, or authorized representatives. Each party has structured contact information and signing authority.

The terms block captures the temporal aspects: effective date, duration, renewal conditions, and termination provisions. These are some of the most commonly queried fields in any contract management system, so having them in a standard location with machine-readable values is especially valuable.

The body contains the actual substance of the agreement, organized into clauses. Each clause has a unique identifier, a type classification, and its content. Some clauses might be simple prose. Others might contain structured obligations with specific conditions, deadlines, and responsible parties. The format supports both, allowing authors to add structure where it's valuable without forcing everything into rigid templates.

The signatures block records who signed, when, and how. This integrates naturally with e-signature workflows while also supporting traditional wet signatures through attestation records. Importantly, the signature data can be cryptographically verified, providing strong guarantees about document integrity.

Finally, an attachments block handles exhibits, schedules, and other referenced documents. These might be other structured agreements, images, data files, or any other content that's incorporated by reference.

Why portability matters

One of our core goals is ensuring agreements can move freely between tools and platforms. This matters for several reasons.

Organizations change software all the time. The contract management system you use today might not be the one you use in five years. If your agreements are stored in a proprietary format, migration becomes painful or even impossible. Open formats mean your contracts travel with you, regardless of which vendors you choose.

Different stakeholders need different tools. Legal teams might work in specialized contract management platforms. Finance teams might need agreements integrated with their ERP systems. Executives might want dashboards and summaries. When the format is open, each team can use the tools that work best for them, all operating on the same underlying data.

Interoperability enables automation. When agreements have consistent structure, you can build reliable workflows: automatically extract renewal dates for calendaring, flag agreements approaching expiration, route documents for appropriate approvals, generate compliance reports. These automations become trustworthy because they're working with structured data, not guessing at unstructured text.

The format should have clean export paths to common formats. PDF generation for official versions. HTML for web viewing. JSON for API integrations. Markdown for version control systems. Each export captures the content appropriately for its medium while the canonical structured version remains the source of truth.

Compatibility with existing workflows

We're not trying to replace the entire legal technology ecosystem. We're trying to make it work better. That means the format needs to integrate with how people actually work today.

E-signature platforms are a critical piece of this puzzle. DocuSign, Adobe Sign, and similar services have become standard for executing agreements. Our format should work seamlessly with these platforms: export a signing-ready PDF, execute through any e-signature provider, and then import the signed version back with signature data intact.

Contract lifecycle management (CLM) systems are another key integration point. These platforms manage the entire agreement process: drafting, negotiation, approval, execution, storage, and renewal. A structured format makes CLM systems more powerful by giving them reliable access to contract data, rather than depending on manual entry or imperfect text extraction.

Document generation tools should be able to produce structured agreements from templates. Instead of merging data into a Word template that produces opaque output, these tools could generate properly structured documents where the merged data remains queryable.

Legal research and compliance systems could leverage structured agreements for better analysis. Which contracts contain arbitration clauses? What's our exposure to a specific jurisdiction's regulations? These questions become much easier to answer when agreements are structured consistently.

The role of verification and trust

Agreements aren't just data; they're commitments that carry legal weight. This creates special requirements around authenticity and integrity. As we explore in why trust services matter in the age of AI, verification is becoming essential infrastructure for digital documents.

The format should support cryptographic signatures that prove a document hasn't been tampered with since signing. This goes beyond e-signature platform attestation: the document itself can be verified independently, without relying on any particular service's records.

Timestamping matters too. When was this version created? When was it signed? Trusted timestamps from neutral third parties can provide evidence that's admissible in legal proceedings, helping resolve disputes about when agreements were formed.

Provenance tracking through the document's history should be tamper-evident. If someone modifies the revision history, that modification should be detectable. This creates an audit trail that stakeholders can trust.

These verification capabilities align naturally with emerging standards for verifiable credentials and decentralized identity. An agreement could reference verified credentials for the signing parties, providing strong assurance about who actually agreed to what.

How AI tools benefit from structure

Artificial intelligence is transforming how we work with documents, and agreements are no exception. AI assistants can help draft contracts, identify risks, compare terms across documents, and answer questions about obligations. But these capabilities work much better with structured data.

Consider a simple question: "What are the payment terms in this contract?" With a PDF, an AI system has to read the entire document, identify the relevant section, and extract the information, all while guessing at formatting conventions and handling edge cases. With a structured agreement, the payment terms are in a known location with labeled fields. The AI can access them directly and reliably.

Risk analysis becomes more systematic. Instead of asking an AI to review a contract and flag concerns (a task where false negatives could be costly), you can define specific checks: Does this agreement have unusual liability limits? Are there non-standard termination provisions? Does the jurisdiction match our compliance requirements? Structured data makes these checks deterministic.

Comparison and benchmarking improve dramatically. How do this vendor's terms compare to industry standard? Are we getting better pricing than similar agreements we've signed? These analyses require understanding the content of many documents. Structure makes that understanding reliable at scale.

And as AI systems become more capable, structured agreements give them better raw material. Language models trained on structured contracts can learn not just legal language, but legal semantics. They can understand that a limitation of liability clause has different implications than an indemnification clause, because the structure makes those distinctions explicit.

Building the ecosystem

A format is only as valuable as the ecosystem around it. Without tools that make adoption practical, even the best specification will languish. We're thinking carefully about what's needed. As we discuss in why tech ecosystems matter, shared foundations create compounding progress.

Reference implementations in multiple languages will lower the barrier to adoption. Developers shouldn't have to implement the specification from scratch. They should have well-tested libraries they can integrate into their applications.

Conversion tools will help with migration. Most organizations have existing contracts in Word or PDF format. Tools that can intelligently convert these to structured agreements, even if they require human review, will make adoption practical for organizations with large contract portfolios.

Validation tools will ensure conformance. When a tool claims to produce agreements in this format, there should be test suites that verify compliance. This prevents fragmentation where different implementations interpret the specification differently.

Editor integrations will support human workflows. Most contracts are still drafted by people in word processors. Plugins that help authors create structured agreements within familiar tools will be essential for adoption.

Documentation and examples will help developers understand what's possible. Clear specifications are necessary but not sufficient. People need to see how the format works in practice, with real-world examples covering common agreement types.

A path toward adoption

We're realistic about the challenge of establishing a new standard. Network effects are powerful: formats become valuable when many people use them, but people only adopt formats that are already valuable. Breaking this chicken-and-egg problem requires a thoughtful approach.

We're starting by working with the specification itself, getting the fundamentals right before pushing for broad adoption. This means engaging with legal technologists, contract managers, and developers to understand requirements and refine the design.

Pilot implementations will come next. We're looking for partners (legal tech platforms, procurement systems, contract management vendors) who see value in structured agreements and want to help prove out the format. These early adopters will surface practical issues and demonstrate real-world benefits.

Conformance testing will build confidence. Once the specification stabilizes, we'll publish test suites that implementations can run to verify compliance. This creates trust that tools claiming to support the format actually interoperate correctly.

Community governance will ensure the format evolves appropriately. No single organization should control such an important standard. We're committed to open development processes and eventually transitioning stewardship to a neutral foundation.

Why we're sharing this early

This work is genuinely in progress. We don't have all the answers, and the specification will certainly evolve as we learn more. So why share now?

Partly, we want feedback. If you work with agreements (as a lawyer, a developer, a contract manager, or a business owner), your perspective is valuable. What have we missed? What do we have wrong? What would make this format useful for your work?

Partly, we want collaborators. Building a successful standard requires diverse input and broad support. If this vision resonates with you, we'd love to work together. Whether that means contributing to the specification, building tooling, or piloting implementations, there are many ways to help.

And partly, we believe in building in the open. The Open Document Alliance exists to advance open standards, and that philosophy extends to how we work. Sharing early, getting feedback, and iterating publicly produces better results than designing in isolation. This is what open information is all about.

Agreements that work everywhere

The vision is simple: agreements that you can create in any tool, sign through any platform, store in any system, and access forever. Agreements where the important information is structured and queryable, not buried in prose. Agreements that work seamlessly with AI assistants, automation workflows, and compliance systems.

This isn't just about efficiency, though the efficiency gains are real. It's about ensuring that agreements, these foundational documents of human cooperation, remain accessible and useful as technology evolves. A contract signed today should still be readable and meaningful in fifty years, regardless of which software exists then.

Open formats make this possible. When the specification is public and the tooling is open source, your agreements aren't dependent on any vendor's continued existence or goodwill. They become truly yours: portable, durable, and future-proof.

We're excited to be working on this problem. If you are too, we'd love to hear from you.

A new open format for agreements