ClearlyDefined at SOSS Fusion 2024: a collaborative solution to Open Source license compliance
This past month, the Open Source Security Foundation (OpenSSF) hosted SOSS Fusion in Atlanta, an event that brought together a diverse community of leaders and innovators from across the digital security spectrum. The conference, held on October 22-23, explored themes central to today’s technological landscape: AI security, diversity in technology, and public policy for Open Source software. Industry thought leaders like Bruce Schneier, Marten Mickos, and Cory Doctorow delivered keynotes, setting the tone for a conference that emphasized collaboration and community in creating a secure digital future.
Amidst these pressing topics, the Open Source Initiative in collaboration with GitHub and SAP presented ClearlyDefined—an innovative project aimed at simplifying software license compliance and metadata management. Presented by Nick Vidal of the Open Source Initiative, along with E. Lynette Rayle from GitHub and Qing Tomlinson from SAP, the session highlighted how ClearlyDefined is transforming the way organizations handle licensing compliance for Open Source components.
What is ClearlyDefined?
ClearlyDefined is a project with a powerful vision: to create a global crowdsourced database of license metadata for every software component ever published. This ambitious mission seeks to help organizations of all sizes easily manage compliance by providing accurate, up-to-date metadata for Open Source components. By offering a single, reliable source for license information, ClearlyDefined enables organizations to work together rather than in isolation, collectively contributing to the metadata that keeps Open Source software compliant and accessible.
The problem: redundant and inconsistent license management
In today’s Open Source ecosystem, managing software licenses has become a significant challenge. Many organizations face the repetitive task of identifying, correcting, and maintaining accurate licensing data. When one component has missing or incorrect metadata, dozens—or even hundreds—of organizations using that component may duplicate efforts to resolve the same issue. ClearlyDefined aims to eliminate redundancy by enabling a collaborative approach.
The solution: crowdsourcing compliance with ClearlyDefined
ClearlyDefined provides an API and user-friendly interface that make it easy to access and contribute license metadata. By aggregating and standardizing licensing data, ClearlyDefined offers a powerful solution for organizations to enhance SBOMs (Software Bill of Materials) and license information without the need for extensive re-scanning and data correction. At the conference, Nick demonstrated how developers can quickly retrieve license data for popular libraries using a simple API call, making license compliance seamless and scalable.
In addition, organizations that encounter incomplete or incorrect metadata can easily update it through ClearlyDefined’s platform, creating a feedback loop that benefits the entire Open Source community. This crowdsourcing approach means that once an organization fixes a licensing issue, that data becomes available to all, fostering efficiency and accuracy.
Key components of ClearlyDefined’s platform
1. API and User Interface: Users can access ClearlyDefined data through an API or the website, making it simple for developers to integrate license checks directly into their workflows.
2. Human curation and community collaboration: To ensure high data quality, ClearlyDefined employs a curation workflow. When metadata requires updates, community members can submit corrections that go through a human review process, ensuring accuracy and reliability.
3. Integration with popular package managers: ClearlyDefined supports various package managers, including npm and pypi, and has recently expanded to support Conda, a popular choice among data science and AI developers.
Real-world use cases: GitHub and SAP’s adoption of ClearlyDefined
During the presentation, representatives from GitHub and SAP shared how ClearlyDefined has impacted their organizations.
– GitHub: ClearlyDefined’s licensing data powers GitHub’s compliance solutions, allowing GitHub to manage millions of licenses with ease. Lynette shared how they initially onboarded over 17 million licenses through ClearlyDefined, a number that has since grown to over 40 million. This database enables GitHub to provide accurate compliance information to users, significantly reducing the resources required to maintain licensing accuracy. Lynette showcased the harvesting process and the curation process. More details about how GitHub is using ClearlyDefined is available here.
– SAP: Qing discussed how ClearlyDefined’s approach has streamlined SAP’s Open Source compliance efforts. By using ClearlyDefined’s data, SAP reduced the time spent on license reviews and improved the quality of metadata available for compliance checks. SAP’s internal harvesting service integrates with ClearlyDefined, ensuring that critical license metadata is consistently available and accurate. SAP has contributed to the ClearlyDefined project and most notably, together with Microsoft, has optimized the database schema and reduced the database operational cost by more than 90%. More details about how SAP is using ClearlyDefined is available here.
Why ClearlyDefined matters
ClearlyDefined is a community-driven initiative with a vision to address one of Open Source’s biggest challenges: ensuring accurate and accessible licensing metadata. By centralizing and standardizing this data, ClearlyDefined not only reduces redundant work but also fosters a collaborative approach to license compliance.
The platform’s Open Source nature and integration with existing package managers and APIs make it accessible and scalable for organizations of all sizes. As more contributors join the effort, ClearlyDefined continues to grow, strengthening the Open Source community’s commitment to compliance, security, and transparency.
Join the ClearlyDefined community
ClearlyDefined is always open to new contributors. With weekly developer meetings, an open governance model, and continuous collaboration with OpenSSF and other Open Source organizations, ClearlyDefined provides numerous ways to get involved. For anyone interested in shaping the future of license compliance and data quality in Open Source, ClearlyDefined offers an exciting opportunity to make a tangible impact.
At SOSS Fusion, ClearlyDefined’s presentation showcased how an open, collaborative approach to license compliance can benefit the entire digital ecosystem, embodying the very spirit of the conference: working together toward a secure, inclusive, and sustainable digital future.
Download slides and see summarized presentation transcript below.
ClearlyDefined presentation transcript
Hello, folks, good morning! Let’s start by introducing ClearlyDefined, an exciting project. My name is Nick Vidal, and I work with the Open Source Initiative. With me today are Lynette Rayle from GitHub and Qing Tomlinson from SAP, and we’re all very excited to be here.
Introduction to ClearlyDefined’s mission
So, what’s the mission of ClearlyDefined? Our mission is ambitious—we aim to crowdsource a global database of license metadata for every software component ever published. This would benefit everyone in the Open Source ecosystem.
The problem ClearlyDefined addresses
There’s a critical problem in the Open Source space: compliance and managing SBOMs (Software Bill of Materials) at scale. Many organizations struggle with missing or incorrect licensing metadata for software components. When multiple organizations use a component with incomplete or wrong license metadata, they each have to solve it individually. ClearlyDefined offers a solution where, instead of every organization doing redundant work, we can collectively work on fixing these issues once and make the corrected data available to all.
ClearlyDefined’s solution
ClearlyDefined enables organizations to access license metadata through a simple API. This reduces the need for repeated license scanning and helps with SBOM generation at scale. When issues arise with a component’s license metadata, organizations can contribute fixes that benefit the entire community.
Getting started with ClearlyDefined
To use ClearlyDefined, you can access its API directly from your terminal. For example, let’s say you’re working with a JavaScript library like Lodash. By calling the API, you can get all license metadata for a specific version of Lodash at your fingertips.
Once you incorporate this licensing metadata into your workflow, you may notice some metadata that needs updating. You can curate that data and contribute it back, so everyone benefits. ClearlyDefined also provides a user-friendly interface for this, making it easier to contribute.
Open Source and community contributions
ClearlyDefined is an Open Source initiative, hosted on GitHub, supporting various package managers (e.g., npm, pypi). We work to promote best practices and integrate with other tools. Recently, we’ve expanded our scope to support non-SPDX licenses and Conda, a package manager often used in data science projects.
Integration with other tools
ClearlyDefined integrates with GUAC, an OpenSSF project that consumes ClearlyDefined data. This integration broadens the reach and utility of ClearlyDefined’s licensing information.
Case studies and community impact
I’d like to hand it over to Lynette from GitHub, who will talk about how GitHub uses ClearlyDefined and why it’s critical for license compliance.
GitHub’s use of ClearlyDefined
Hello, I’m Lynette, a developer at GitHub working on license compliance solutions. ClearlyDefined has become a key part of our workflows. Knowing the licenses of our dependencies is crucial, as legal compliance requires correct attributions. By using ClearlyDefined, we’ve streamlined our process and now manage over 40 million licenses. We also run our own harvester to contribute back to ClearlyDefined and scale our operations.
SAP’s adoption of ClearlyDefined
Hi, my name is Qing. At SAP, we co-innovate and collaborate with Open Source, ensuring a clean, well-maintained software pool. ClearlyDefined has streamlined our license review process, reducing time spent on scanning and enhancing data quality. SAP’s journey with ClearlyDefined began in 2018, and since then, we’ve implemented large-scale automation for our Open Source compliance and continuously contribute curated data back to the community.
Community and governance
ClearlyDefined thrives on community involvement. We recently elected members to our Steering and Outreach Committees to support the platform and encourage new contributors. Our weekly developer meetings and active Discord channel provide opportunities to engage, share knowledge, and collaborate.
Q&A highlights
- PURLs as Package Identifiers: We’re exploring support for PURLs as an internal coordinate system.
- Data Quality Issues: Data quality is our top priority. We plan to implement routines to scan for common issues, ensuring accurate metadata across the platform.
Thank you all for joining us today. If you’re interested in contributing, please reach out and become part of this collaborative community.
Tags: clearlydefined, Events, licenses, News
Leave a Reply