Ezequiel Lanza: Voices of the Open Source AI Definition
The Open Source Initiative (OSI) is running a blog series to introduce some of the people who have been actively involved in the Open Source AI Definition (OSAID) co-design process. The co-design methodology allows for the integration of diverging perspectives into one just, cohesive and feasible standard. Support and contribution from a significant and broad group of stakeholders is imperative to the Open Source process and is proven to bring diverse issues to light, deliver swift outputs and garner community buy-in.
This series features the voices of the volunteers who have helped shape and are shaping the Definition.
Meet Ezequiel Lanza
What’s your background related to Open Source and AI?
I’ve been working in AI for more than 10 years (Yes, before ChatGPT!). With a background in engineering, I’ve consistently focused on building and supporting AI applications, particularly in machine learning and data science. Over the years, I’ve contributed to and collaborated on various projects. A few years ago, I decided to pursue a master’s in data science to deepen my theoretical knowledge and further enhance my skills. Open Source has also been a significant part of my work; the frameworks, tools and community have continually drawn me in, making me an active participant in this evolving conversation for years.
What motivated you to join this co-design process to define Open Source AI?
AI owes much of its progress to Open Source, and it’s essential for continued innovation. My experience in both AI and Open Source spans many years, and I believe this co-design process offers a unique chance to contribute meaningfully. It’s not just about sharing my insights but also about learning from other professionals across AI and different disciplines. This collective knowledge and diverse perspectives make this initiative truly powerful and enriching, to shape the future of Open Source AI together.
Can you describe your experience participating in this process? What did you most enjoy about it, and what were some of the challenges you faced?
Participating in this process has been both rewarding and challenging. I’ve particularly enjoyed engaging with diverse groups and hearing different perspectives. The in-person events, such as All Things Open in Raleigh in 2023, have been valuable for fostering direct collaboration and building relationships. However, balancing these meetings with my work duties has been challenging. Coordinating schedules and managing time effectively to attend all the relevant discussions can be demanding. Despite these challenges, the insights and progress have made the effort worthwhile.
Why do you think AI should be Open Source?
We often say AI is everywhere, and while that’s partially true, I believe AI will be everywhere, significantly impacting our lives. However, AI’s full potential can only be realized if it is open and accessible to everyone. Open Source AI should also foster innovation by enabling developers and researchers from all backgrounds to contribute to and improve existing models, frameworks and tools, allowing freedom of expression. Without open access, involvement in AI can be costly, limiting participation to only a few large companies. Open Source AI should aim to democratize access, allowing small businesses, startups and individuals to leverage powerful tools that might otherwise be out of reach due to cost or proprietary barriers.
What do you think is the role of data in Open Source AI?
Data is essential for any AI system. Initially, from my ML bias perspective, open and accessible datasets were crucial for effective ML development. However, I’ve reevaluated this perspective, considering how to adapt the system while staying true to Open Source principles. As AI models, particularly GenAI like LLMs, become increasingly complex, I’ve come to value the models themselves. For example, Generative AI requires vast amounts of data, and gaining access to this data can be a significant challenge.
This insight has led me to consider what I—whether as a researcher, developer or user—truly need from a model to use/investigate it effectively. While understanding the data used in training is important, having access to specific datasets may not always be necessary. In approaches like federated learning, the model itself can be highly valuable while keeping data private, though understanding the nature of the data remains important. For LLMs, techniques such as fine-tuning, RAG and RAFT emphasize the benefits of accessing the model rather than the original dataset, providing substantial advantages to the community.
Sharing model architecture and weights is crucial, and data security can be maintained through methods like model introspection and fine-tuning, reducing the need for extensive dataset sharing.
Data is undoubtedly a critical component. However, the essence of Open Source AI lies in ensuring transparency, then the focus should be on how data is used in training models. Documenting which datasets were used and the data handling processes is essential. This transparency helps the community understand the origins of the data, assess potential biases and ensure the responsible use of data in model development. While sharing the exact datasets may not always be necessary, providing clear information about data sources and usage practices is crucial for maintaining trust and integrity in Open Source AI.
Has your personal definition of Open Source AI changed along the way? What new perspectives or ideas did you encounter while participating in the co-design process?
Of course, it changed and evolved – that’s what a thought process is about. I’d be stubborn if I never changed my perspective along the way. I’ve often questioned even the most fundamental concepts I’ve relied on for years, avoiding easy or lazy assumptions. This thorough process has been essential in refining my understanding of Open Source AI. Engaging in meaningful exchanges with others has shown me the importance of practical definitions that can be implemented in real-world scenarios. While striving for an ideal, flawless definition is tempting, I’ve found that embracing a pragmatic approach is ultimately more beneficial.
What do you think the primary benefit will be once there is a clear definition of Open Source AI?
As I see it, the Open Source AI Definition will support the growth, and it will be the first big step. The primary benefit of having a clear definition of Open Source AI will be increased clarity and consistency in the field. This will enhance collaboration by setting clear standards and expectations for researchers, developers and organizations. It will also improve transparency by ensuring that AI models and tools genuinely follow Open Source principles, fostering trust in their development and sharing.
A clear definition will create standardized practices and guidelines, making it easier to evaluate and compare different Open Source AI projects.
What do you think are the next steps for the community involved in Open Source AI?
The next steps for the community should start with setting up a certification process for AI models to ensure they meet certain standards. This could include tools to help automate the process. After that, it would be helpful to offer templates and best practice guides for AI models. This will support model designers in creating high-quality, compliant systems and make the development process smoother and more consistent.
How to get involved
The OSAID co-design process is open to everyone interested in collaborating. There are many ways to get involved:
- Join the forum: share your comment on the drafts.
- Leave comment on the latest draft: provide precise feedback on the text of the latest draft.
- Follow the weekly recaps: subscribe to our monthly newsletter and blog to be kept up-to-date.
- Join the town hall meetings: we’re increasing the frequency to weekly meetings where you can learn more, ask questions and share your thoughts.
- Join the workshops and scheduled conferences: meet the OSI and other participants at in-person events around the world.
Leave a Reply