Contributions of Open Source to AI: a panel discussion at CPDP-ai conference
I participated as a panelist at the CPDP-ai 2024 conference in Brussels last week where we discussed the significant contributions of Open Source to AI and highlighted the specific properties that differentiate Open Source AI from proprietary solutions. Representing the Open Source Initiative (OSI), the globally recognized non-profit that defines the term Open Source, I emphasized the longstanding principle of granting users full agency and control over technology, which has been proven to deliver extensive social benefits.
Below is a glimpse at the questions and answers posed to me and my fellow panelists:
Question: Stefano, please explain what the contribution to AI from Open Source is, and if there are specific properties of Open Source AI that make a difference for the users and for the people who are confronted with its results.
Response: The Definition of Open Source Software has existed for over 25 years; That doesn’t apply to AI. The Open Source Definition for software provides a stable north star for all participants in the digital ecosystem, from small and large companies to citizens and governments.
The basic principle of the Open Source Definition is to grant to the users of any technology full agency and control over the technology itself. This means that users of Open Source technologies have self-sovereignty of the technical solutions.
The Open Source Definition has demonstrated that massive social benefits accrue when you remove the barriers to learning, using, sharing and improving software systems. There is ample evidence that giving users agency, control and self-sovereignty of their technical choices produces a viable ecosystem based on permissionless innovation. Multiple studies by the EU Commission and Harvard researchers have assigned significant economic value to Open Source Software, all based on that single, clear, understood and approved Definition from 26 years ago.
For AI, and especially the most recent machine learning solutions, it’s less clear how society can maintain self-sovereignty of the technology and how to achieve permissionless innovation. Despite the fact that many people talk about Open Source AI, including the AI Act, there is no shared understanding of what that means, yet!
The Open Source Initiative is concluding a global, multi-stakeholder co-design process to find an unequivocal definition of Open Source AI, and we’re heading towards the conclusion of this process with a vastly increased knowledge of the AI machine learning space. The current draft of the Open Source AI Definition recognizes that in order to study, use, share and modify AI, one needs to refer to an AI system, not a single individual component. The global process has identified the components required for society to maintain control of the technology and these are:
- Detailed information about the dataset used to train the system and the code so that a skilled person can train a system with similar capabilities
- All the libraries and tools used to run training and inference
- The model architecture and the parameters, like weights and biases
Having unrestricted access to all these elements is what makes an AI an Open Source AI.
We’re in the final stretch of the process, starting to gather support for the current draft of the definition.
The most controversial part of the discussion is the role of data in the training. To answer your question about the power of big foreign tech companies, putting aside the hardware requirements, the data is where the fight is. There seem to be two views of the world on data when it comes to AI: One thinks that text and data mining is basically strip mining humanity and all accumulation of data without consent of the rights holders must be made illegal. Another view of the world is that text and data mining for the purpose of training Open Source AI is probably the only antidote to the superpowers of large corporations. These camps haven’t found a common position yet. Japan seems to have made up its mind already, legalizing unrestricted text and data mining. We’ll see where the lawsuits in the US will go, if they ever get to a decision in court or, as I suspect, they will be settled out of court.
In any case, data, competence and to some extent hardware, are the levers to control the development of AI.
Open Source has been leveling the playing field of technologies. We know from past experience with Open Source software that giving people unrestricted access to the means of digital production enables tremendous economic value. This worked in Europe as well as in China. We think that Open Source AI can have the same effect of generating value while leaving control of the technology in the hands of society.
Question: Big tech companies are important for the development of AI. Apart from the purely technological impacts, there is also economic importance. The European Commission has been very concerned about the Digital Single Market recently, and has initiated legislation such as DSA and DMA to improve competition and market access. Will these instruments be sufficient in view of AI roll-out, thinking also of the recently adopted AI Act? Or will additional attention need to be paid?
Response: Open is the best antidote to the concentration of power. That said, I see these legislations as the sticks, very necessary. I’d love us to think also about carrots. We don’t want to repeat the mistakes of the past with the early years of the internet. Open Source software was equally available in the US and Europe but despite that, the few European champions of Open Source haven’t grown big enough to have a global impact. And some of the biggest EU companies aren’t exactly friendly with Open Source either.
Chinese companies have taken a different approach. But in Europe we have talents, and we have an attractive quality of life so we can get even more talents. Finding money is never an issue. We need to remove the disincentives to grow our companies bigger, widen the access to the internal EU market and support their international expansion, too.
For example, we need to review European Regulation 1025, on standardization to accommodate for Open Source. 1025 Regulation was written at a time when Open Source was considered a “business model” and information and communication technology standards were about voltages in a wire. Today, Open Source is between 80% and 90% of all software and “digital elements” comprise some part of every modern product. Even hardware solutions are dominated by “digital elements.” As such, the approach taken by 1025 is out of date and most likely needs a root-and-branch rethink to properly apply to the world today and the world we anticipate tomorrow.
We need to make sure that the standardization rules required by the Cyber Resilience Act are written together with Open Source champions so the rules don’t favor exclusively the cartel of European patent holders who try to seek rent instead of innovating. Europe has all the means to be at the center of AI innovation; It embodies the right values of diversity and collaboration.
Closing remarks: We think that Open Source is the best antidote to fight market concentration in AI. Data is where the concentration of power is happening now and it’s in the hands of massive corporations: not only Google, Meta, Amazon, Reddit but also Sony, Warner, Netflix, Getty Images, Adobe … All these companies have already gained access to massive amounts of data, legally. These companies basically own our data, legally: Our pictures, the graph of our circles of friends, all the books and movies…
There is a risk that if we don’t write policies that allow text and data mining in exchange of a real Open Source AI (one that society can fully control) then we risk leaving the most powerful AI systems in the hands of the oligopoly who can afford trading money for access to data.
Leave a Reply