Open Source AI – Weekly update August 26

Week 34 summary 

Share your thoughts about draft v0.0.9

As we move toward the release of the first-ever Open Source AI Definition in October at All Things Open, the publication of the 0.0.9 draft brings us one step closer to realizing this goal.

  • OSAID 0.0.9 draft definition is live
  •   Changelog includes:
    • New Feature: Clarified Open Source Models and Weights
      • Added a new paragraph under “What is Open Source AI” to define “system” as including both models and weights.
      • Clarified that all components of a larger system must meet the standard.
      • Updated paragraph after the “share” bullet to emphasize this point.  
    • New Section: Open Source Models and Open Source Weights
      • Added descriptions of components for both models and weights in machine learning systems.
      • Edited subsequent paragraphs to eliminate redundancy.
    • Training Data: Defined as a Benefit, Not a Requirement
      • Defined open, public, and unshareable non-public training data.
      • Explained the role of training data in studying AI systems and understanding biases.
      • Emphasized extra requirements for data to advance openness, especially in private-first areas like healthcare.
    • Separation of Checklist
      • The Checklist is now a separate document from the main Definition.
      • Fully aligned Checklist content with the Model Openness Framework (MOF).
    • Terminology Changes
      • Replaced “Model” with “Weights” under “Preferred form to make modifications” for consistency.
    • Explicit Reference to Recipients of the Four Freedoms
      • Added specific references to developers, deployers, and end users of AI systems.
    • Credits and References
      • Incorporated credit to the Free Software Definition.
      • Added references to conditions of availability of components, referencing the Open Source Definition.
  • Initial reactions on the forum: 
    • @shujisado praises the updates in version 0.0.9, particularly the decision to separate the checklist from the main document, which clarifies the intent behind OSAID. He also supports the separation of “code” and “weights,” noting that in Japan, “code” clearly falls under copyright, making this distinction logical. He acknowledges revisions in the checklist that consider the importance of complete datasets, even though he disagrees with making datasets mandatory. 
  • Comments on the draft on HackMD
    • @Joshua Gay adds that instead of narrowing the focus to machine-learning systems, the emphasis should be on “parameters” as a whole since weights are just one type of parameter. He suggests a rewrite that highlights making model parameters, such as weights and other settings, available under OSI-approved terms, with examples across various AI models.
      • He further suggests using broader language that covers more AI systems instead of narrower terminology. Specifically, he proposes replacing “Open Source models and Open Source weights” with “Open Source models and Open Source parameters,” and using “AI systems” instead of “machine learning systems.” Additionally, he recommends redefining an AI model to include architecture, parameters like weights and decision boundaries, and inference code, while referring to AI parameters as configuration settings that produce outputs from inputs.
    • Under “Open Source models and Open Source weights”, @shujisado adds that the last paragraph titled “Open Source models and Open Source weights” actually explains “AI model” and “AI weights,” leading to a mismatch between the title and content, and notes that these terms are not used elsewhere in the definition.
    • Under “Preferred form to make modifications to machine-learning systems”, @shujisado suggests some grammatical corrections.
  • Next steps
    • The OSI has recently presented at the following events: 
    • Iterate Drafts: Continue refining drafts with feedback from the worldwide roadshow, considering new dissenting opinions.
    • Review Licenses: Decide on the best approach for reviewing new licenses for datasets, documentation, and model parameters.
    • Enhance FAQ: Continue improving the FAQ to address emerging questions.
    • Post-Stable Release Plan: Establish a process for reviewing and updating future versions of the Open Source AI Definition.

 Explaining the concept of Data information

  •  @Kjetilk points out the legal distinction between using copyrighted works for AI training (reproduction) and incorporating them into publishable datasets, questioning the fairness of allowing exploitative models without compensation while potentially banning those that benefit society.
  • @Shujisadoclarifies that compensation for copyrighted works used in AI training is possible for both open source and closed models, distinguishing it from “royalty,” and notes that Japan’s copyright law exempts such uses for machine learning.
    • @Kjetilk reiterates the relevance of “royalty” for compensation in closed, non-published models, suggesting it makes sense under copyright law if required, but if not, it could benefit science and the arts.

Open Source AI Definition Town Hall

  • The slides and recording from the town hall meeting held on August 23, 2024 are available here.
  • The next town hall meeting will be held on September 6th. Sign up for the event here.

Click Here to View Original Source (opensource.org)

Leave a Reply

Your email address will not be published. Required fields are marked *

Shared by: voicesofopensource

Tags: , , ,