Jean-Pierre Lorre: Voices of the Open Source AI Definition
The Open Source Initiative (OSI) is running a blog series to introduce some of the people who have been actively involved in the Open Source AI Definition (OSAID) co-design process. The co-design methodology allows for the integration of diverging perspectives into one just, cohesive and feasible standard. Support and contribution from a significant and broad group of stakeholders is imperative to the Open Source process and is proven to bring diverse issues to light, deliver swift outputs and garner community buy-in.
This series features the voices of the volunteers who have helped shape and are shaping the Definition.
Meet Jean-Pierre Lorre
What’s your background related to Open Source and AI?
I’ve been using Open Source technologies since the very beginning of my career and have been directly involved in Open Source projects for around 20 years.
I graduated in artificial intelligence engineering in 1985. Since then I have worked in a number of applied AI research structures in fields such as medical image processing, industrial plant supervision, speech recognition and natural language processing. My knowledge covers both symbolic AI methods and techniques and deep learning.
I currently lead a team of around fifteen AI researchers at LINAGORA. LINAGORA is an Open Source company.
What motivated you to join this co-design process to define Open Source AI?
The team I lead is heavily involved in the development of LLM generative models, which we want to distribute under an open license. I realized that the term Open Source AI was not defined and that the definition we had at LINAGORA was not the same as the one adopted by our competitors.
As the OSI is the leading organization for defining Open Source and there was a project underway to define the term Open Source AI, I decided to join it.
Can you describe your experience participating in this process? What did you most enjoy about it and what were some of the challenges you faced?
I participated in two ways: firstly, to provide input for the definition currently being drafted; and secondly, to evaluate LLM models with regard to the definition (I contributed to Bloom, Falcon and Mistral).
For the first item, my main difficulty was keeping up with the meandering discussions, which were very active. I didn’t manage to do so completely, but I was able to appreciate the summaries provided from time to time, which enabled me to follow the overall thread.
The second difficulty concerns the evaluation of the models: the aim of the exercise was to evaluate the consistency of OSAID version 0.8 on models that currently claim to be “Open Source.” Implementing the definition involves looking for information that is sometimes non-existent and sometimes difficult to find.
Why do you think AI should be Open Source?
Artificial intelligence models are expected to play a very important role in our professional lives, but also in our everyday lives. In this respect, the need for transparency is essential to enable people to check the properties of the models. They must also be accessible to as many people as possible, to avoid widening the inequalities between those who have the means to develop them and those who will remain on the sidelines of this innovation. Similarly, they might be adapted for different uses without the need for authorization.
The Open Source approach makes it possible to create a community such as the one created by LINAGORA, OpenLLM-Europe. This is a way for small players to come together to build the critical mass needed not only to develop models but also to disseminate them. Such an approach, which may be compared to that associated with the digital commons, is a guarantee of sovereignty because it allows knowledge and governance to be shared.
In short, they are the fruit of work based on data collected from as many people as possible, so they must remain accessible to as wide an audience as possible.
What do you think is the role of data in Open Source AI?
Data provides the basis for training models. It is therefore the pool of information from which the knowledge displayed by the model and the applications deduced from it will be drawn. In the case of an open model, the dissemination of as many elements as possible to qualify this data is a means of transparency that facilitates the study of the model’s properties; indeed, this data is likely to include cultural bias, gender, ethnic origin, skin color, etc. It is also a means of facilitating the study of the model’s properties. It also makes it easier to modify the model and its outputs.
Has your personal definition of Open Source AI changed along the way? What new perspectives or ideas did you encounter while participating in the co-design process?
Yes, we initially thought that the provision of training data was a sine qua non condition for the design of truly Open Source models. Our basic assumption was that the model may be seen as a work derived from the data and that therefore the license assigned to the data, in particular the non-commercial nature, had an impact on the license of the model. As the discussions progressed, we realized that this condition was very restrictive and severely limited the possibility of developing models.
Our current analysis is that the condition defined in version 0.8 of the OSAID is sufficient to provide the necessary guarantees of transparency for the four freedoms and in particular the freedom to study the model underlying access to data. With regard to the data, it stipulates that “sufficiently detailed information about the data used to train the system, so that a skilled person can recreate a substantially equivalent system using the same or similar data” must be provided. Even if we can agree that this condition seems difficult to satisfy without providing the data sets, other avenues may be envisaged, in particular the provision of synthetic data. This information should make it possible to carry out almost all of the model’s studies.
What do you think the primary benefit will be once there is a clear definition of Open Source AI?
Having such a definition with clear, implementable rules will provide model suppliers with a concrete framework for producing models that comply with the ethics of the Open Source movement.
A collateral effect will be to help sort out the “wheat from the chaff.” In particular, to detect attempts at “Open Source washing.” This definition is therefore a structuring element for a company such as LINAGORA, which wants to build a sustainable business model around the provision of value-added AI services.
It should also be noted that such a definition is necessary for regulations such as the European IA Act, which defines exceptions for Open Source generative models. Such legislative construction cannot be satisfied with a fuzzy basis.
What do you think are the next steps for the community involved in Open Source AI?
The next steps that need to be addressed by the community concern firstly the definition of a certification process that will formalize the conformity of a model; this process may be accompanied by tools to automate it.
In a second phase, it may also be useful to provide templates of AI models that comply with the definition, as well as best practice guides, which would help model designers.
How to get involved
The OSAID co-design process is open to everyone interested in collaborating. There are many ways to get involved:
- Join the working groups: be part of a team to evaluate various models against the OSAID.
- Join the forum: support and comment on the drafts, record your approval or concerns to new and existing threads.
- Comment on the latest draft: provide feedback on the latest draft document directly.
- Follow the weekly recaps: subscribe to our newsletter and blog to be kept up-to-date.
- Join the town hall meetings: participate in the online public town hall meetings to learn more and ask questions.
- Join the workshops and scheduled conferences: meet the OSI and other participants at in-person events around the world.
Leave a Reply