Open Source AI Definition – Weekly update September 9

Week 36 summary 

Draft v.0.0.9 of the Open Source AI Definition is available for comments

  • -@Shamar agrees with @thesteve0 and emphasizes that AI systems consist of two parts: a virtual machine (architecture) and the weights (the executable software). He argues that while weights are important, they are not sufficient to study or fully understand an AI model. For a system to be truly Open Source, it must provide all the data used to recreate an exact copy of the model, including random values used during the process. Without this, the system should not be labeled Open Source, even if the weights are available under an open-source license. Shamar suggests calling such systems “freeware” instead and ensuring the Open Source AI Definition aligns with the Open Source Definition.
  • @jberkus questions whether creating an exact copy of an AI system is truly possible, even with access to all the training data, or if slight differences would always exist.
  • @shujisado explains that under Japan’s copyright law, AI training on publicly available copyrighted works is permissible, but sharing the datasets created during training requires explicit permission from copyright holders. He notes that while AI training within legal limits may be allowed in many jurisdictions, making all training data freely available is unlikely. He adds that the current Open Source AI Definition strikes a reasonable balance given global intellectual property rights but suggests that more specific language might help clarify this further.

Share your thoughts about draft v0.0.9

  • @marianataglio suggests including hardware specifications, training time, and carbon footprint in the Open Source AI Definition to improve transparency. She believes this would enhance reproducibility, accessibility, and collaboration, while helping practitioners estimate computational costs and optimize models for more efficient training.

Open Source AI Definition Town Hall – September 6, 2004

Welcome diverse approaches to training data within a unified Open Source AI Definition

Explaining the concept of Data information

  • @Senficon highlights a concern from the open science community that, while EU copyright law allows reproductions of protected content for research, it restricts making the research corpus available to third parties. This limits research reproducibility and open access, as it aims to protect rights holders’ revenue.
  • @kjetilk agrees with the observation but questions the assumption that making content publicly available would significantly harm rights holders’ revenue. He believes such policies should be based on solid evidence from extensive research.

Click Here to View Original Source (opensource.org)

Leave a Reply

Your email address will not be published. Required fields are marked *

Shared by: voicesofopensource

Tags: