Ask HN: OpenAI models vs. Gemini 2.5 Pro for coding and swe
In your experience, which of the two models (all of OpenAI vs Gemini 2.5 Pro) are better for having as assistants to ask SWE/software systems related questions and doing long and complex reasoning?
I'm debating whether there's any point in paying for ChatGPT vs. paying (or even using the free version) of Gemini 2.5 Pro.
I have the feeling that most HNers prefer the latter, however in livebench I think OpenAI surpasses Gemini for coding.
I've been using Gemini 2.5 Pro, Claude 3.7 Sonnet, and GPT-4.1 recently and here are my thoughts.
Regarding context windows, Gemini currently offers 1M tokens (reportedly increasing to 2M soon), GPT-4.1 also handles a large window of 1m tokens, and Claude provides 200k. In my experience testing them with large code files (around 3-4k lines), I found Gemini 2.5 Pro and Claude 3.7 Sonnet performed quite similarly, both handling the large context well and providing good solutions.
However, my impression was that GPT-4.1 didn't perform quite as well, While GPT-4.1 is certainly capable, I feel Gemini has a slight edge in this area right now. Based on this, I'd lean towards using Gemini 2.5 Pro for extremely large contexts needing high-quality results, GPT-4.1 for backend logic, and found Claude 3.7 particularly effective for UI interface tasks.
I'm not sure its easy to say one is better than the other. I've used ChatGPT pro, it's good. I've also use Gemini, and it's also good. Claude is surprisingly good as well. And I've recently been using Q-cli, which was extremely easy to get integrated into my Neovim/Tmux workflow.
Purely from a code quality perspective, they're all about the same, and they all generate code that rarely works for the first time. At least from my experience, and highly depending on language. For instance, Q-cli with Rust seems to generate better output for me than Gemini with Rust. And ChatGPT with JS gives me way better code than Claude with JS.
I honestly think that currently in the market, it's not really a choice of which is better, but which is the right tool for workflow and language.
It’s tricky. o3 is better (usually) but much much lazier IME. You probably have to pay for pro.