Open Source AI Definition – Weekly update September 9
Week 36 summary
Draft v.0.0.9 of the Open Source AI Definition is available for comments
- -@Shamar agrees with @thesteve0 and emphasizes that AI systems consist of two parts: a virtual machine (architecture) and the weights (the executable software). He argues that while weights are important, they are not sufficient to study or fully understand an AI model. For a system to be truly Open Source, it must provide all the data used to recreate an exact copy of the model, including random values used during the process. Without this, the system should not be labeled Open Source, even if the weights are available under an open-source license. Shamar suggests calling such systems “freeware” instead and ensuring the Open Source AI Definition aligns with the Open Source Definition.
- @jberkus questions whether creating an exact copy of an AI system is truly possible, even with access to all the training data, or if slight differences would always exist.
- @shujisado explains that under Japan’s copyright law, AI training on publicly available copyrighted works is permissible, but sharing the datasets created during training requires explicit permission from copyright holders. He notes that while AI training within legal limits may be allowed in many jurisdictions, making all training data freely available is unlikely. He adds that the current Open Source AI Definition strikes a reasonable balance given global intellectual property rights but suggests that more specific language might help clarify this further.
Share your thoughts about draft v0.0.9
- @marianataglio suggests including hardware specifications, training time, and carbon footprint in the Open Source AI Definition to improve transparency. She believes this would enhance reproducibility, accessibility, and collaboration, while helping practitioners estimate computational costs and optimize models for more efficient training.
Open Source AI Definition Town Hall – September 6, 2004
- The fifthteenth edition of our town hall meetings was held on the 6th of September. If you missed it, the recording and slides can be found here.
Welcome diverse approaches to training data within a unified Open Source AI Definition
- @Alek_Tarkowski agrees with @arandal on the importance of situating Open Source AI within broader open movements like open data. He suggests cooperation with organizations like Creative Commons should go beyond licensing standards to include data governance, which remains an undeveloped area.
- @Alek_Tarkowski finds the idea of requiring source data to follow Open Source licenses conceptually interesting, likening it to “upstream copyleft,” but notes traditional copyleft frameworks may not suit AI development.
- @arandal clarifies that the proposal is an evolution of software freedom principles, not a direct extension of traditional copyleft, similar to how AGPL addressed gaps left by earlier licenses. They further mention that discussions on these approaches are ongoing across various organizations, though formal publications are limited.
Explaining the concept of Data information
- @Senficon highlights a concern from the open science community that, while EU copyright law allows reproductions of protected content for research, it restricts making the research corpus available to third parties. This limits research reproducibility and open access, as it aims to protect rights holders’ revenue.
- @kjetilk agrees with the observation but questions the assumption that making content publicly available would significantly harm rights holders’ revenue. He believes such policies should be based on solid evidence from extensive research.
previous - next
Tags: News
Leave a Reply