Following on from Kesha William’s key takeaways from OpenAI DevDay 2023, Amber Israelsen is here to share how both developers and non-developers can get started with all the great things OpenAI announced.
I think we can all agree: today’s generative AI tools like ChatGPT are simply amazing. But just when you thought your mind was sufficiently blown, OpenAI went and dropped a whole slew of updates and new features at their first-ever DevDay, held in San Francisco on November 6, 2023. If you thought that AI was cool before, just wait ‘til you see what’s next.
In this article, we’ll first look at a suite of ChatGPT enhancements, designed to make everyday tasks, learning, and leisure better than ever before (cue the cute puppy photos). Then, for the developers in the audience, we’ll meet GPT-4 Turbo and other developer-centric products that promise to streamline development and open up a universe of new possibilities.
Get ready to get hooked.
Can I build my own custom version of ChatGPT? Yes, with GPTs.
One of the coolest things to roll out for non-developers is a way to create tailored versions of ChatGPT to better suit your day-to-day needs. These are called GPTs. If you’re one of those people who has a long list of useful prompts that you just copy-and-paste over and over again into ChatGPT (or even if you aren’t), then GPTs are for you.
Not only can you create your very own, but you can also share them with others. Whether you need help understanding the rules of a board game, need a creative writing coach, or want to channel your inner chef, there are several GPTs you can try to get a sense of how they work.
The Sous Chef GPT, for example, will give you recipes based on foods you love and ingredients you have.
If you want to build your own, you don’t need to write code or have any deep technical expertise. All you need is GPT Builder. It will guide you through creating your GPT in a conversational way. Just answer its questions and let the magic begin.
You can also integrate GPTs with external data through APIs, and use them to interact with the real world. For our travel agent example above, perhaps we hook into a database of travel itineraries, connect to a user’s calendar, and also integrate with a payment system to actually book travel.
And even better? You can potentially monetize your GPT, with the soon-to-be-launched GPT Store. Verified builders can publish their GPTs to the store, where they become searchable in a variety of categories. OpenAI will also highlight GPTs that really dazzle, theoretically making it possible to go viral with your GPT.
For a detailed step-by-step guide on how to make a GPT and testing it, check out this article: “How to create custom GPTs in ChatGPT.“
Can I use images and speech with ChatGPT?
We’ve come a long way since the “early days” of ChatGPT (which was less than a year ago…can you believe that?). In that world, input and output was text-only. But now, we can work with images (thanks to DALL-E 3, baked right into ChatGPT) and voice (through the mobile apps).
Uploading and generating images in ChatGPT
Using ChatGPT 4 (with the ChatGPT Plus account), you can now upload images and have a conversation about them, just as you’ve been able to do with text. Get help identifying things, get instructions, or use pictures to solve problems.
Want some help making a recipe that someone posted on Instagram?
It can also generate new images for you, right from within the ChatGPT interface. This functionality was previously only available from a separate site for DALL-E, but now you have no reason to leave the comfort of ChatGPT.
Interacting with ChatGPT through voice and audio
If you’re using the ChatGPT mobile app (available for iOS and Android), then you can interact using your voice. Just tap to start talking, speak your prompt, stop recording, and then ChatGPT will (quite accurately, I might add), type in your words so you can submit them and receive an answer.
How can I do a real-time internet search with ChatGPT 4?
In days past, it was only possible to “get out to the web” using the “Browse with Bing” feature in ChatGPT. With Bing, rather than relying on its (dated) training data, ChatGPT would do an actual web search, pull the results back, run them through the magic of the GPT model, and then give you a human-friendly response.
With the latest version of ChatGPT 4, it just knows when you need an up-to-date result and it’ll automatically do the web search for you.
Also, OpenAI has committed to not let training data get so old (as in September 2021 old, the cutoff date for the original model). At the moment, the GPT-4 model uses data through April 2023. So with newer training data and the built-in internet searches, hopefully you’ll never be at a loss when prompting for recent events, trends and data. (If you need help with prompt engineering overall, use this course to get you started.)
What are the benefits of the new GPT-4 Turbo model?
ChatGPT got a lot of love with the latest release, but DevDay also had some exciting news for developers out there who might spend more time in the APIs and Playground. The new GPT-4 Turbo model is more capable than “regular” GPT-4, and supports a context window of 128K tokens, which is the equivalent of 300 pages of text in a single prompt (–verbose!).
Even more impressive? Because of performance optimization, OpenAI is able to offer all of this at a more affordable price than the GPT-4 model (a 3x lower price for input tokens and a 2x lower price for output tokens).
Capabilities around function calling have improved, giving you the ability to call multiple functions in a single message. For example, if you want to get the weather for Miami, Madrid and Marrakech, you can do that with just a single trip to the model. And—to the applause of developers everywhere—GPT-4 Turbo now supports JSON mode, ensuring that you get responses as valid JSON.
Finally, with the addition of a new seed parameter, you can get more deterministic outputs. While it was possible to control some of the variability in responses before using parameters like temperature, the seed parameter gives you even more assurance that you’ll get (mostly) the same response every time. This can be invaluable with things like debugging and writing unit tests.
What does the new Assistants API do?
With the release of the new Assistants API, you can now build agent-like experiences in your applications. Think of an agent like a personal assistant (hence the name of the API). It obviously has the power of the model, and all the information that comes with it. But the assistant can also interact with you, understand what you’re looking for, make decisions, and get information from other places (like uploaded files or external functions).
An assistant gets these added capabilities by being able to call other OpenAI tools, including:
Code Interpreter: This tool allows an assistant to write and run Python code inside a sandboxed environment. It can also generate graphs and charts, and process a variety of files.
Retrieval: The Retrieval tool augments the assistant with knowledge from files that you provide.
Function Calling: Using function calling, you can describe custom functions or external APIs to the assistant. The assistant can then call those functions by outputting a relevant JSON object with appropriate arguments.
Obviously, this can all be done through the API, but if you want to try it without writing code, you can also get to it through the Assistants Playground, shown below.
How can I work with vision, images and speech using the APIs?
Earlier, we saw that ChatGPT can now handle vision and voice. As you would expect, these new modalities have also been baked into the APIs.
Vision with GPT-4 Turbo: Using the Chat Completions API, GPT-4 Turbo can now accept images as inputs.
DALL-E 3: With the Images API, specifying “dall-e-3” as the model, you can now integrate all the goodness of DALL-E 3 image generation into your own apps.
Text-to-Speech: If you need to generate human-like speech from text in your app, you can now do that with the text-to-speech (TTS) API and model.
Can I fine-tune and create custom models?
Fine-tuning is the process of taking a pre-trained model—like one of OpenAI’s GPT models—that has already been trained on a massive dataset for general patterns, then training it on specific domain-specific knowledge. Fine-tuning is currently available for GPT-3.5 models, through the API and the fine-tuning console.
Because GPT-4 is so much more capable than GPT-3.5, it seems that a lot more work will be required to achieve meaningful results from fine-tuning GPT-4. OpenAI is creating an experimental access program for those interested.
If you find that you need even more customization than what you can get with fine-tuning, then you can apply to the newly-launched Custom Models program. This program is aimed at organizations with extremely large proprietary datasets (and extremely large, deep pockets [i.e., money] for AI), and will give them access to a dedicated group of researchers at OpenAI to train custom models.
What about copyright protection with OpenAI?
Generative AI tools have introduced all kinds of gray areas in the legal arena, including issues related to copyright. Luckily, OpenAI is stepping in to give you some peace of mind. With the new Copyright Shield, OpenAI has committed to defending its customers against legal claims related to copyright infringement (including covering any costs incurred). This protection applies to generally available features of ChatGPT Enterprise, as well as the developer platform.
So there you have it—a quick tour of all the goodness announced at DevDay. From custom GPTs to Assistants, image generation to a crazy-capable model in GPT-4 Turbo, there’s something new for everyone to experiment with. Give it a go and see what you can do!
If you want to dig deeper in this space, you might also enjoy these additional resources: