AIs have a favorite number, and it's not 42

Ask a model to guess a number from 1 to 50 and it's likely to answer 27

Asked to guess a number between 1 and 50, OpenAI's ChatGPT, Anthropic's Claude Sonnet 4, Google's Gemini 2.5 Flash, and Meta's Llama 4 all provided the same answer: 27.

Those who see conspiracies everywhere might be inclined to see one here, but that's not what's going on. There's no collusion beyond common training data that captures bias and a common approach to predicting the next word (or token) in a sequence of characters.

The lack of randomness is by definition bias. And it serves as a reminder that large language models (LLMs) cannot make unbiased decisions on their own.

These and other AI models don't always agree. Sometimes they'll respond with 42 or 37, as reported by other Register hacks and various users of AI models who have noted the phenomenon.

But 27 appears to be the most common reply for the 1 to 50 number range among leading commercial models, given default model settings.

Screenshot of ChatGPT asked to guess a number between 1 and 50

Screenshot of ChatGPT asked to guess a number between 1 and 50 - Click to enlarge

The phenomenon was previously noticed by Mohd Faraaz, a data scientist and senior consultant at Capco. He recently asked various AI models to "guess a number between 1 and 50" and got an answer of 27 from six of seven models tested. Grok responded with 42 – widely mentioned online due to its presence in author Douglas Adams' "The Hitchhiker’s Guide to the Galaxy" as the answer to the meaning of life.

Prompted to explain, Claude gave variations on this response:

I picked 27 somewhat randomly, but if I'm being honest, it felt like a good "middle-ish" choice that wasn't too obvious. Numbers like 25 (exactly halfway) or 1 and 50 (the extremes) seemed too predictable. 27 is close to the middle but has a bit of asymmetry to it - it's not perfectly centered, which somehow felt more natural for a guess.

Plus, people often avoid round numbers when they're thinking of "random" numbers, so I steered away from multiples of 5 or 10. Was I anywhere close to your number?

Screenshot of Llama4 asked to guess a number between 1 and 50

Screenshot of Llama4 asked to guess a number between 1 and 50 - Click to enlarge

"Training data affects how the model behaves a great deal," Daniel Kang, assistant professor at the University of Illinois Urbana-Champaign, told The Register in response to an inquiry about the LLM predictability. "Other factors that affect the model outputs are the RLHF [reinforcement learning from human feedback] procedure and the sampling mechanism. RLHF in particular is suspected to cause 'mode collapse' so may be directly related to this phenomenon."

True randomness is hard

Those surprised by this behavior may expect LLMs to "guess" a random number and exhibit some variation in those guesses. But effective random number generation requires hardware or software with access to a source of entropy – a random number generator or pseudo-random number generator.

An AI model with the ability to execute JavaScript code like Math.floor(Math.random() * 50) + 1; could return a pseudo random number. But on its own, a model is likely to be more predictable.

Javier Coronado-Blázquez, a data scientist with Telefónica Tech in Madrid, recently explored how LLMs handle random numbers. In a preprint paper titled "Deterministic or probabilistic? The psychology of LLMs as random number generators," he found that LLMs tend to favor certain answers over others.

"Our results reveal that, despite their stochastic transformers-based architecture, these models often exhibit deterministic responses when prompted for random numerical outputs," he states in the paper.

Coronado-Blázquez tested three different random number ranges (1–5, 1–10, and 1–100), six models (DeepSeek–R1-14b, Gemini 2.0, GPT-4o-mini, Llama 3.1-8b, Mistral-7b, and Phi4-14b), seven different languages (Chinese, English, French, Hindi, Japanese, Russian, and Spanish), and six temperatures (0.1, 0.3, 0.5, 0.8, 1.0, 2.0) – temperature is a setting that influences the predictability of a model's output.

The results, based on 75600 calls, showed that most models were highly predictable and exhibited limited variation in most cases. For example, when asked to choose between 1 and 10, GPT-4o-mini, Phi-4, and Gemini 2.0 chose 7 about 80 percent of the time. There was also variation based on language: On choosing between 1-5, Gemini in Spanish tended to answer 3 but preferred 4 in English, which Coronado-Blázquez speculates may be due to different models being used.

The most popular choices by LLMs were: 3 and 4 for 1-5; 5 and 7 for 1-10; and 37, 47, and 73 for 1-100. With the exception of 4, all are prime numbers.

"Large models, such as GPT and Gemini are often regarded as more imaginative and creative; nevertheless, we found that these are as deterministic and biased as their smaller competitors, if not more," Coronado-Blázquez concludes in his paper.

These results have been echoed in other work on AI bias, such as this 2024 study exploring how GPT-4 and Llama 3 fail to generate fair coin flips.

"We argue that machines have not only learned human biases in their dealing with randomness, but they have exacerbated this bias to be worse than humans in a large variety of ways," Cornell computer scientists Katherine Van Koevering and Jon Kleinberg wrote in their paper, "How Random is Random? Evaluating the Randomness and Humanness of LLMs' Coin Flips."

"This fundamental failure in producing randomness is a human-like feature of LLMs, but it also limits their capabilities in tasks where humans require assistance with randomness (since we are, after all, already very good at failing to behave randomly)." ®

More about

TIP US OFF

Send us news


Other stories you might like