Open Source AI – Weekly update August 26
Week 34 summary
Share your thoughts about draft v0.0.9
As we move toward the release of the first-ever Open Source AI Definition in October at All Things Open, the publication of the 0.0.9 draft brings us one step closer to realizing this goal.
- OSAID 0.0.9 draft definition is live!
- Changelog includes:
- New Feature: Clarified Open Source Models and Weights
- Added a new paragraph under “What is Open Source AI” to define “system” as including both models and weights.
- Clarified that all components of a larger system must meet the standard.
- Updated paragraph after the “share” bullet to emphasize this point.
- New Section: Open Source Models and Open Source Weights
- Added descriptions of components for both models and weights in machine learning systems.
- Edited subsequent paragraphs to eliminate redundancy.
- Training Data: Defined as a Benefit, Not a Requirement
- Defined open, public, and unshareable non-public training data.
- Explained the role of training data in studying AI systems and understanding biases.
- Emphasized extra requirements for data to advance openness, especially in private-first areas like healthcare.
- Separation of Checklist
- The Checklist is now a separate document from the main Definition.
- Fully aligned Checklist content with the Model Openness Framework (MOF).
- Terminology Changes
- Replaced “Model” with “Weights” under “Preferred form to make modifications” for consistency.
- Explicit Reference to Recipients of the Four Freedoms
- Added specific references to developers, deployers, and end users of AI systems.
- Credits and References
- Incorporated credit to the Free Software Definition.
- Added references to conditions of availability of components, referencing the Open Source Definition.
- New Feature: Clarified Open Source Models and Weights
- Initial reactions on the forum:
- @shujisado praises the updates in version 0.0.9, particularly the decision to separate the checklist from the main document, which clarifies the intent behind OSAID. He also supports the separation of “code” and “weights,” noting that in Japan, “code” clearly falls under copyright, making this distinction logical. He acknowledges revisions in the checklist that consider the importance of complete datasets, even though he disagrees with making datasets mandatory.
- Comments on the draft on HackMD
- @Joshua Gay adds that instead of narrowing the focus to machine-learning systems, the emphasis should be on “parameters” as a whole since weights are just one type of parameter. He suggests a rewrite that highlights making model parameters, such as weights and other settings, available under OSI-approved terms, with examples across various AI models.
- He further suggests using broader language that covers more AI systems instead of narrower terminology. Specifically, he proposes replacing “Open Source models and Open Source weights” with “Open Source models and Open Source parameters,” and using “AI systems” instead of “machine learning systems.” Additionally, he recommends redefining an AI model to include architecture, parameters like weights and decision boundaries, and inference code, while referring to AI parameters as configuration settings that produce outputs from inputs.
- Under “Open Source models and Open Source weights”, @shujisado adds that the last paragraph titled “Open Source models and Open Source weights” actually explains “AI model” and “AI weights,” leading to a mismatch between the title and content, and notes that these terms are not used elsewhere in the definition.
- Under “Preferred form to make modifications to machine-learning systems”, @shujisado suggests some grammatical corrections.
- @Joshua Gay adds that instead of narrowing the focus to machine-learning systems, the emphasis should be on “parameters” as a whole since weights are just one type of parameter. He suggests a rewrite that highlights making model parameters, such as weights and other settings, available under OSI-approved terms, with examples across various AI models.
- Next steps
- The OSI has recently presented at the following events:
- Hong Kong for AI_dev, August 21-23
- Beijing for Open Source Congress, August 25-27.
- Iterate Drafts: Continue refining drafts with feedback from the worldwide roadshow, considering new dissenting opinions.
- Review Licenses: Decide on the best approach for reviewing new licenses for datasets, documentation, and model parameters.
- Enhance FAQ: Continue improving the FAQ to address emerging questions.
- Post-Stable Release Plan: Establish a process for reviewing and updating future versions of the Open Source AI Definition.
- The OSI has recently presented at the following events:
- Get involved:
- Join the forum and share your opinion.
- Leave a comment on the draft v.0.0.9 with precise feedback.
- Follow the weekly recaps and subscribe to our monthly newsletter.
- Join the town hall meetings: we’re increasing the frequency to weekly meetings where you can learn more, ask questions, and share your thoughts. The next is on September 6.
- Join the workshops and scheduled conferences
Explaining the concept of Data information
- @Kjetilk points out the legal distinction between using copyrighted works for AI training (reproduction) and incorporating them into publishable datasets, questioning the fairness of allowing exploitative models without compensation while potentially banning those that benefit society.
- @Shujisadoclarifies that compensation for copyrighted works used in AI training is possible for both open source and closed models, distinguishing it from “royalty,” and notes that Japan’s copyright law exempts such uses for machine learning.
- @Kjetilk reiterates the relevance of “royalty” for compensation in closed, non-published models, suggesting it makes sense under copyright law if required, but if not, it could benefit science and the arts.
Open Source AI Definition Town Hall
previous - next
Tags: ai, Deep Dive: AI, News
Leave a Reply