Ginkgo Bioworks Holdings, Inc., which is building the platform for cell programming and biosecurity, has announced two new offerings that will make it easier for pharmaceutical and biotech companies to develop new medicines, building on the partnership announced with Google Cloud last year.
The first launch is one of the first-of-its-kind in the industry – a protein large language model (LLM) built in collaboration with Google Cloud Consulting that will give individual researchers and enterprise companies the ability to develop medicines with insights from Ginkgo’s private data. Secondly, Ginkgo is launching its model API, a powerful tool designed to bring biological AI models directly to machine learning scientists. The API is publicly available on Ginkgo’s website today, and enterprise companies will be able to access the protein-based LLM Google Cloud’s Vertex AI Model Garden soon.
Jason Kelly, CEO of Ginkgo Bioworks, said, “I’m excited to see how the community builds on top of these models and our API. AA-0 is the first model we have released trained on Ginkgo’s proprietary data – we’re opening it to data scientists and bioinformaticians so they can build new models and applications on top. We believe that the low cost for tokens and our other customer-friendly terms, like the lack of royalties, and our commitment to not re-use customer data, will allow users to build tools like iterative protein design programs that call our protein generation API or to use our embedding API to compute features for a clustering algorithm.”
Protein LLM for individual researchers and enterprise companies: Built on Vertex AI in collaboration with Google Cloud Consulting and trained on Ginkgo’s extensive proprietary dataset, this and future LLMs empower companies to generate novel insights and accelerate the discovery of new therapeutics. By harnessing the power of AI to analyze and understand complex protein structures and interactions, researchers and enterprises can streamline their research pipelines, optimize lead identification, and ultimately bring life-saving medicines to market faster and more efficiently. Building on models that learn from Ginkgo’s private data, unavailable to the public, can enable companies to unlock hidden patterns and potential therapeutic targets that would otherwise remain elusive.
Open API for scientists and researchers: With this programmer-friendly ultra-low cost API, Ginkgo is making its internally-developed AI tools available to anyone. The interface provides an easy and scalable way to access sophisticated models trained on protein and DNA data, starting with its first release: a machine learning model trained on a proprietary Ginkgo dataset.
Chris Sakalosky, vice president of Strategic Industries, Google Cloud, commented, “Ginkgo’s new protein LLM and open API mark a major step forward in making advanced AI tools accessible for drug discovery and biological research. By leveraging Google Cloud‘s infrastructure and AI capabilities, Ginkgo is empowering both enterprises and individual scientists to accelerate their work and drive innovation in the life sciences. Ginkgo is leading the way in democratizing access to cutting-edge AI models, to increase value to pharma companies using Ginkgo’s platform and to ultimately help people live healthier lives.”
Ginkgo has a multitude of models under development, spanning machine learning methods like language modeling and diffusion for conditional design. Ginkgo’s first protein language model release will support two use-cases:
Generation via Masked Language Modeling: given a sequence of amino acids with one or more <mask> tokens, the model will complete the sequence.
Embedding calculation: Calculate the final hidden layer of the trained model to extract valuable representations for downstream tasks. To begin, Ginkgo’s model returns the mean-pooled representation across the length axis.
Over the next year, Ginkgo will roll out more models and expand the API’s capabilities, building a robust suite of tools that will enable you to solve complex problems in drug discovery, synthetic biology, genomics, and more using the latest machine learning methods. Access the portal today and be among the first to explore our new API.
Ankit Gupta, General Manager of Ginkgo AI, said, “Flexibility is everything. Alongside our first proprietary model, which leverages unique datasets from Ginkgo, you’ll also have access to publicly available models like ESM2. This means you can explore and experiment with different approaches, all through a single streamlined platform. We’re also deeply committed to making advanced machine learning tools accessible, which is why our API comes with competitive pricing and a free tier. We’ve structured our costs to make it easy for you to jump in, experiment, and get predictions without worrying about high fees. Our initial models will have a free tier and our introductory pricing is approximately $0.18 per million tokens. This means for a protein with around 500 amino acids, users should be able to get predictions on 2000 sequences for roughly 10 cents. In the age of generative biology, with engineers designing thousands to millions of sequences at a time, we hope to enable them with enormous amounts of computational scale.”