Reimagining data for Open Source AI: A call to action

Favorite Artificial intelligence (AI) is changing the world at a remarkable pace, with Open Source AI playing a pivotal role in shaping its trajectory. Yet, as AI advances, a fundamental challenge emerges: How do we create a data ecosystem that is not only robust but also equitable and sustainable? The

Read More
Shared by voicesofopensource January 23, 2025

Open Data and Open Source AI: Charting a course to get more of both

Favorite While working to define Open Source AI, we realized that data governance is an unresolved issue. The Open Source Initiative organized a workshop to discuss data sharing and governance for AI training. The critical question posed to attendees was “How can we best govern and share data to power

Read More
Shared by voicesofopensource November 18, 2024

Why datasets built on public domain might not be enough for AI

Favorite There is tension between copyright laws and large datasets suitable to train large language models. Common Corpus is a dataset that only uses text from copyright-expired sources to bypass the legal issues. It’s a useful achievement, paving the path to research without immediate risk of lawsuits. I also fear

Read More
Shared by voicesofopensource May 7, 2024