Artificial Intelligence | Educational Science

Home

Gemini AI: What do we know about Google's answer to ChatGPT?

Google DeepMind has released a rival to ChatGPT, named Gemini, and it can understand and generate multiple types of media including images, videos, audio, and text.

Most artificial intelligence (AI) tools only understand and generate one type of content. For example, OpenAI's ChatGPT, "reads" and creates only text. But Gemini can generate multiple types of output based on any form of input, Google said in a blog post.

The three versions of Gemini 1.0 are Gemini Ultra, the largest version, Gemini Pro, which is being rolled out into Google's digital services, and Gemini Nano, designed to be used on devices like smartphones.

According to DeepMind's technical report on the chatbot, Gemini Ultra beat GPT-4 and other leading AI models in 30 of 32 key academic benchmarks used in AI research and development. These include high school exams and tests on morality and law.

Specifically, Gemini won out in nine image comprehension benchmarks, six video understanding tests, five in speech recognition and translation, and 10 of 12 text and reasoning benchmarks. The two in which Gemini Ulta failed to beat GPT-4 were in common-sense reasoning, according to the report.

Related: AI is transforming every aspect of science. Here's how.

Building models that process multiple forms of media is hard because biases in the training data are likely to be amplified, performance tends to drop significantly, and models tend to overfit — meaning they perform well when tested against the training data, but can't perform when exposed to new input.

Multimodal training also normally involves training different components of a model separately, each on a single type of medium and then stitching these components together. But Gemini was trained jointly across text, image, audio and video data at the same time. Scientists sourced this data from web documents, books and code.

Scientists trained Gemini by curating the training data and incorporating human supervision in the feedback process.

The team deployed servers across multiple data centers on a much grander scale than previous AI training efforts and relied on thousands of Google's AI accelerator chips — known as the tensor processing units (TPUs).