from Hacker News

Improved Gemini 2.5 Flash and Flash-Lite

by meetpateltech on 9/25/25, 5:20 PM with 279 comments

by davidmckayv on 9/25/25, 6:28 PM
This really captures something I've been experiencing with Gemini lately. The models are genuinely capable when they work properly, but there's this persistent truncation issue that makes them unreliable in practice.
I've been running into it consistently, responses that just stop mid-sentence, not because of token limits or content filters, but what appears to be a bug in how the model signals completion. It's been documented on their GitHub and dev forums for months as a P2 issue.
The frustrating part is that when you compare a complete Gemini response to Claude or GPT-4, the quality is often quite good. But reliability matters more than peak performance. I'd rather work with a model that consistently delivers complete (if slightly less brilliant) responses than one that gives me half-thoughts I have to constantly prompt to continue.
It's a shame because Google clearly has the underlying tech. But until they fix these basic conversation flow issues, Gemini will keep feeling broken compared to the competition, regardless of how it performs on benchmarks.
https://github.com/googleapis/js-genai/issues/707
https://discuss.ai.google.dev/t/gemini-2-5-pro-incomplete-re...
by simonw on 9/25/25, 6:52 PM
I added support to these models to my llm-gemini plugin, so you can run them like this (using uvx so no need to install anything first):
```
  export LLM_GEMINI_KEY='...'
  uvx --isolated --with llm-gemini llm -m gemini-flash-lite-latest 'An epic poem about frogs at war with ducks'
```
Release notes: https://github.com/simonw/llm-gemini/releases/tag/0.26
Pelicans: https://github.com/simonw/llm-gemini/issues/104#issuecomment...
by herpderperator on 9/25/25, 10:22 PM
Serious question: If it's an improved 2.5 model, why don't they call it version 2.6? Seems annoying to have to remember if you're using the old 2.5 or the new 2.5. Kind of like when Apple released the third-gen iPad many years ago and simply called it the "new iPad" without a number.
by ashwindharne on 9/25/25, 6:06 PM
Google seems to be the main foundation model provider that's really focusing on the latency/TPS/cost dimensions. Anthropic/OpenAI are really making strides in model intelligence, but underneath some critical threshold of performance, the really long thinking times make workflows feel a lot worse in collaboration-style tools, vs a much snappier but slightly less intelligent model.
It's a delicate balance, because these Gemini models sometimes feel downright lobotomized compared to claude or gpt-5.
by newfocogi on 9/25/25, 5:46 PM
Non-AI Summary:
Both models have improved intelligence on Artificial Analysis index with lower end-to-end response time. Also 24% to 50% improved output token efficiency (resulting in lower cost).
Gemini 2.5 Flash-Lite improvements include better instruction following, reduced verbosity, stronger multimodal & translation capabilities. Gemini 2.5 Flash improvements include better agentic tool use and more token-efficient reasoning.
Model strings: gemini-2.5-flash-lite-preview-09-2025 and gemini-2.5-flash-preview-09-2025
by zitterbewegung on 9/25/25, 6:12 PM
Okay this is a nitpick but why wouldn't you increment a part of the version number to signify that there is an improvement? These releases are confusing.
by aeon_ai on 9/25/25, 6:04 PM
I think a Model-specific SemVer needs to be created to be clearer as to what degree of change has taken place, in the age of model weights.
Something that distinguishes between a completely new pre-training process/architecture, and standard RLHF cycles/optimizations.
by minimaxir on 9/25/25, 6:07 PM
Gemini 2.5 Flash has been the LLM I've used the most recently for a variety of domains, especially image inputs and structured outputs which beat both OpenAI and Anthropic in my opinion.
by Liwink on 9/25/25, 5:52 PM
Gemini 2.5 Flash is an impressive model for its price. However, I don't understand why Gemini 2.0 Flash is still popular.
From OpenRouter last week:
* xAI: Grok Code Fast 1: 1.15T
* Anthropic: Claude Sonnet 4: 586B
* Google: Gemini 2.5 Flash: 325B
* Sonoma Sky Alpha: 227B
* Google: Gemini 2.0 Flash: 187B
* DeepSeek: DeepSeek V3.1 (free): 180B
* xAI: Grok 4 Fast (free): 158B
* OpenAI: GPT-4.1 Mini: 157B
* DeepSeek: DeepSeek V3 0324: 142B
by Hobadee on 9/25/25, 11:27 PM
Am I using a different Gemini from everyone else? We have Google Workspace at my job, so Gemini is baked in.
It is HORRENDOUS when compared to other models.
I hear a bunch of other people talking about how great Gemini is, but I've never seen it.
The responses are usually either incorrect, way too long, (essays when I wanted summaries) or just...not...good. I will ask the exact same question to both Gemini and ChatGPT (free) and GPT will give a great answer while the Gemini answer is trash.
Am I missing something?
by fzimmermann89 on 9/25/25, 8:21 PM
The switch by Artificial Analysis from per-token-cost to per-benchmark-cost shows some effect! Its nice that labs are now trying to optimize what I actually have to pay to get an answer - It always annoys me to have to pay for all the senseless rambling of the less-capable reasoning models.
by stephen_cagle on 9/25/25, 10:15 PM
I still can't understand how functioning adults believe that releasing their work in two separate places is a good idea (Ai Studio and Vertex AI).
by boomer_joe on 9/26/25, 2:26 AM
Gemini 2.5 Pro feels heavily lobotomized for me lately, failing at very simple tasks with a frequency far above what I was used to seeing back when it first released. The personality seems to be getting worse too - I'm getting very tired of those dumbed analogies it loves to spew.
Would like to know whether Flash exhibits these issues as well.
by OGEnthusiast on 9/25/25, 5:47 PM
I'm not even sure how to evaluate what a "better" LLM is, when I've tried running the exact same model (Qwen3) and prompt and gotten vastly different responses on Qwen Chat vs OpenRouter vs running the model locally.
by ikgn on 9/26/25, 9:09 AM
I have a small test suite for the voice AI math tutor we built, about 50 tests, mostly about correctly following the system instructions. The newly released Flash 2.5 is much worse than current stable version. Gemini 2.5 pro will fail 2—3 tests. Flash 2.5 stable, which we use in production, fails about 10, and the new one fails 20. Every test runs 3 times and the model has to be right every time. Will look into it more, I haven‘t yet looked into actual output. This is not about solving math, the system follows given solution paths.
by ImPrajyoth on 9/25/25, 6:03 PM
I’ve been tinkering with the last version for code gen. This update might finally put it on par with Claude for latency. Anyone tried benchmarking the new preview yet?
by tardyp on 9/25/25, 6:02 PM
LLM Model versioning really makes me perplex those days...
by phartenfeller on 9/25/25, 9:31 PM
It's weird that the just keep the version number. Why not release it as 2.6 or something else. Now it is confusing, do my existing workflows automatically use the updated version and if yes do I need to monitor them for unwanted changed behavior etc.
by dcchambers on 9/25/25, 6:58 PM
Why do all of these model providers have such issues naming/versioning them? Why even use a version number (2.5) if you aren't going to change it when you update the model?
This industry desperately needs a Steve Jobs to bring some sanity to the marketing.
by DoctorOetker on 9/25/25, 11:30 PM
I would really like to see the 270M but which also knows phonetic alphabetic pronounciation in sentences. Perhaps IPA?
I would like to try a small computer->human "upload" experiment, basic multilingual understanding without pronounciation knowledge would be very sad.
I intend to make a sort of computer reflexive game, I want to compare different upload strategies (with/without analog or classic error correcting codes, empirical spaced repetition constants, a ML predictor of which parameters I'm forgetting / losing resolution on.
by rasz on 9/26/25, 1:53 AM
Threw few short python scripts at 2.5. Got stupid messages like "OMG Significant Flaw!!1 all of your functions have non-obvious dependency on this global variable declared in main, nothing will work if you dont execute main first!!1" I mean sure, technically correct, the best kind of LLM correct.
It kept finding those fatal flaws and starting to explain them to then slowly finish with "oh yes this works as intended".
by ahmedfromtunis on 9/25/25, 6:46 PM
I'm genuinely surprised to see that "thinking" flash-lite is more performant than flash with no "thinking".
by artur_makly on 9/25/25, 10:46 PM
Grok 4-Fast still looks much better in terms of price: https://x.com/ArtificialAnlys/status/1971273380335845683 going to stick to that for bit and see..
Gemini 2.5 Flash Preview $0.30 $2.50
Grok 4 Fast $0.20 $0.50
by whinvik on 9/26/25, 2:14 PM
Having done some tests, its clearly better at instruction following and JSON output now.
However its hampered by max output tokens. Gemini is at 65 K while GPT 5 mini is at 128K. Both of them have similar costs as well so as such apart from the 1M context limit GPT 5 mini is better in every way.
by rafaelero on 9/26/25, 5:18 AM
This new Gemini Flash 2.5 is cutting the response in the middle. Did anyone experience that?
by strangescript on 9/26/25, 3:22 AM
Flash-Lite is a seriously good model. I have had zero structured calls fail with it as its cranking out obscene tok/s. If you can run with something that isn't quite bleeding edge smart, this model is gold.
by dgemm on 9/26/25, 4:26 PM
I just wish the Gemini app would stop inserting and auto playing a YouTube video into nearly every response when I'm on a mobile connection. There appears to be no way to stop it.
by grej on 9/26/25, 3:08 AM
I love the gemini models and think Google has done a great job on them, but no model series I use seems to get context rot more in long conversations. Which seems strange given the longer context.
by pier25 on 9/25/25, 9:16 PM
The most annoying thing about Gemini is that it can't stop suggesting youtube videos. Even when you ask it to stop doing that, multiple times in the same conversation, it will just keep doing it.
by Moosdijk on 9/26/25, 8:45 AM
My experience with Gemini is the sole reason I am convinced that there's an AI hype going on. It consistently hallucinates key information which has led me to spend countless hours tracking down which information the output was based on, only to find that it dreamt up the facts that it gave to me.
The way I have come to perceive AI is that it's mostly good at reassuring/reaffirming people's beliefs and ideas than an actual source of truth.
That would not be an issue if it was actually marketed as such, but seeing the "guided learning" function fail time and again makes me think we should be a lot more critical of what we're being told by tech enthusiasts/companies about AI.
by ChildOfChaos on 9/25/25, 6:06 PM
Hopefully this isn't instead of the rumoured Gemini 3 pro this week.
by rldjbpin on 9/26/25, 1:28 PM
having developed a large-batch workflow for a client using gemini models, this is a welcome improvement. however, no news on the DSQ [1] issues is a bummer.
at least for us, the bottleneck is the amount of retries/waiting needed to max out how many requests we can make in parallel.
[1] https://cloud.google.com/vertex-ai/generative-ai/docs/dynami...
by modeless on 9/25/25, 6:56 PM
Why are model providers allergic to version number increments?
by maxdo on 9/25/25, 11:52 PM
I tried to switch today from gpt-4.1 , one of the few models with decent response time and ok quality. It’s not on par unfortunately
by user3939382 on 9/26/25, 1:26 AM
Gemini is also the name of a protocol which, I appreciate most disagree, but I find is actual much more important than Google’s AI.
by guybedo on 9/26/25, 12:21 AM
i just switched my project to this new flash-lite version.
Here's a summary of this discussion with the new version: https://extraakt.com/extraakts/the-great-llm-versioning-deba...
by Fiahil on 9/25/25, 6:10 PM
Question to the one that tested it : Does it still timeout a lot with unreliable response time (1-5 sec) ?
by brap on 9/25/25, 6:05 PM
Am I the only one who is starting to feel the Gemini Flash models are better than Pro?
Flash is super fast, gets straight to the point.
Pro takes ages to even respond, then starts yapping endlessly, usually confuses itself in the process and ends up with a wrong answer.
by scosman on 9/25/25, 5:45 PM
Ugh. If the model name includes sem_ver version number, increment the version number when making a new release!
Anthropic learned this lesson. Google, Deepseek, Kimi, OpenAI and others keep repeating it. This feels like Gemini_2.5_final_FINAL_FINAL_v2.
by lysecret on 9/27/25, 2:43 PM
Ok wow these models are great and fast! Tested it for pdf extraction tasks.
by sreekanth850 on 9/26/25, 3:17 AM
Code with gemini code assist and sanity check with sonnet is my current way.
by simianwords on 9/25/25, 7:28 PM
Which model does gemini.goolge.com use when I choose 2.5 flash here?
by agluszak on 9/25/25, 8:30 PM
Why isn't it called Gemini 2.6 then?
by thrownawayohman on 9/25/25, 9:02 PM
Wow checking cool
by jama211 on 9/25/25, 7:06 PM
Seems llm progress really is plateauing. I guess that was to be expected.
by bogtog on 9/25/25, 6:38 PM
> Today, we are releasing updated versions of Gemini 2.5 Flash and 2.5 Flash-Lite, available on Google AI Studio and Vertex AI, aimed at continuing to deliver better quality while also improving the efficiency.
Typo in the first sentence? "... improving the efficiency." Gemini 2.5 Pro says this is perfectly good phrasing, whereas ChatGPT and Claude recognize that it's awkward or just incorrect. Hmm...