Can I just say that Google AI Studio with latest Gemini is stunningly, amazingly, game changingly impressive.
It leaves Claude and ChatGPT's coding looking like they are from a different century. It's hard to believe these changes are coming in factors of weeks and months. Last month i could not believe how good Claude is. Today I'm not sure how I could continue programming without Google Gemini in my toolkit.
Gemini AI Studio is such a giant leap ahead in programming I have to pinch myself when I'm using it.
I'm really surprised more people haven't caught on. Claude can one shot small stuff of similar complexity, but as soon as you start to really push the model into longer, more involved use cases Gemini pulls way ahead. The context handling is so impressive, in addition to using it for coding agents, I use Gemini as a beta reader for a fairly long manuscript (~85k words) and it absolutely nails it, providing a high level report that's comparable to what a solid human beta reader would provide in seconds.
It is absolutely the greatest golden age in programming ever - all these infinitely wealthy companies spending bajillions competing on who can make the best programming companion.
Apart from the apologising. It's silly when the AI apologises with ever more sincere apologies. There should be no apologies from AIs.
they would replace entire software department until AI make bug because endless changes into your javascript framework then they would hire human again to make fix
we literally creating solution for our own problem
Undoubtedly, but a significant positive aspect is the democratization of this technology that enables access for people who could not afford it, not productively, that is.
I asked it to make some changes to the code it wrote. But it kept pumping out the same code with more and more comments to justify itself. After the third attempt I realized I could have done it myself in less time.
Jetbrains AI recently added (beta) access to Gemini Pro 2.5 and there's of course plugins like Continue.dev that provide access to pretty much anything with an API
It always is for the first week. Then you find out that the last 10% matter a lot more than than the other 90%. And finally they turn off the high compute version and you're left with a brain dead model that loses to a 32b local model half the time.
Is it just me or did they turn off reasoning mode in free Gemini Pro this week?
It's pretty useful as long as you hold it back from writing code too early, or too generally, or sometimes at all. It's a chronic over-writer of code, too. Ignoring most of what it attempts to write and using it to explore the design space without ever getting bogged down in code and other implementation details is great though.
I've been doing something that's new to me but is going to be all over the training data (subscription service using stripe) and have often been able to pivot the planned design of different aspects before writing a single line of code because I can get all the data it already has regurgitated in the context of my particular tech stack and use case.
They rolled out a new model a week ago which has a "bug" where in long chats it forgets to emit the tokens required for the UI to detect that it's reasoning. You can remind it that it needs to emit these tokens, which helps, or accept that it will sometimes fail to do it. I don't notice a deterioration in performance because it is still reasoning (you can tell by the nature of the output), it's just that those tokens aren't in <think> tags or whatever's required by the UI to display it as such.
I think reasoning in the studio is gated by load, and at the same time I wasn't seeing so much reasoning in AIstudio, I was getting vertex service overloaded calls pretty frequently on my agents.
Is this distinct from using Gemini 2.5 Pro? If not, this doesn’t match my experience — I’ve been getting a lot of poorly designed TypeScript with an excess of very low quality comments.
Really? I get goofy random substitutions like sometimes from foreign languages. It also doesn't do good with my mini-tests of "can you write modern Svelte without inserting React" and "can you fix a borrow-checking issue in Rust with lifetimes, not Arc/Cell slop"
That doesn't mean it's worse than the others just not much better. I haven't found anything that worked better than o1-preview so far. How are you using it?
Absolutely agree. I really pushed it last week with a screenshot of a very abstract visualisation that we’d done in a Miro board of which we couldn’t find a library that did exactly what we wanted, so we turned to Gemini.
Essentially we were hoping to tie that to data inputs and have a system to regularly output the visualisation but with dynamic values. I bet my colleague it would one shot it: it did.
What I’ve also found is that even a sloppy prompt still somehow is reading my mind on what to do, even though I’ve expressed myself poorly.
Inversely, I’ve really found myself rejecting suggestions from ChatGPT, even o4-mini-high. It’s just doing so much random crap I didn’t ask and the code is… let’s say not as “Gemini” as I’d prefer.
In one of Stephen Boyd's lectures on convex optimization, he has some quip like "if your optimization problem is computationally intractable, you could try really hard to improve the algorithm, or you could just go on vacation for a few weeks and by the time you get back, computers will be fast enough to solve it."
I feel like that's actually true now with LLMs -- if some query I write doesn't get one-shotted, I don't bother with a galaxy-brain prompt; I just shelve it 'til next month and the next big OpenAI/Anthropic/Google model will usually crush it.
It's the cleanest way to give the right context and the best place to pull a human in the loop.
A human can validate and create all important metrics (e.g. what does "monthly active users" really mean) then an LLM can use that metric definition whenever asked for MAU.
With a semantic layer, you get the added benefit of writing queries in JSON instead of raw SQL. LLM's are much more consistent at writing a small JSON vs. hundreds of lines of SQL.
We[0] use cube[1] for this. It's the best open source semantic layer, but there's a couple closed source options too.
My last company wrote a post on this in 2021[2]. Looks like the acquirer stopped paying for the blog hosting, but the HN post is still up.
Mother of God. I can write JSON instead of a language designed for querying. What is the advantage? If I’m going to move up an abstraction layer, why not give me natural language? Lots of things turn a limited natural language grammar into SQL for you. What is JSON going to {do: for: {me}}?
You move all the tools to debug and inspect slow queries, in a completely unsupported JSON environment, with prompts not to make up column names. And this is progress?
As someone who actually wrote a JSON to (limited) SQL transpiler at $DAYJOB, as much fun as I had designing and implementing that thing and for as many problems it solved immediately, 'tail wagging the dog' is the perfect description.
Every once in a while I've been trying AI, since everyone and their mother told me to, so I comply.
My recent endevour was with Gemini 2.5:
- Write me a simple todo app on cloudflare with auth0 authentication.
- Here's a simple todo on cloudflare. We import the @auth0-cloudflare and...
- Does that @auth0-cloudflare exists?
- Oh, it doesn't. I can give you a walkthrough on how to set up an account on auth0. Would you like me to?
- Yes, please.
- Here. I'm going to write the walkthrough in a document... (proceed to create an empty document)
- That seems to be an empty document.
- Oh, my bad. I'll produce it once more. (proceed to create another empty document)
- Seems like you're md parsing library is broken, can you write it in chat instead?
- Yes... (your gemini trial has expired, would you like to pay $100 to continue?)
It's difficult to assess how typical your experience is; I tried your initial prompt (`Write me a simple todo app on cloudflare with auth0 authentication.` on gemini-2.5-pro-preview-05-06) and didn't get any mentions of @auth0-cloudfare, although I cannot verify if the answer is working as-is
The worse part is not even being trolled at AI roundabout. The worse part is gaslighting by people who then go on to imply that I'm dumb to not be able to 'guide' the model 'towards the solution', whatever the fuck that means. And this is after telling me that model is so smart to just know what I want.
Claude and Gemini are pretty decent at providing a small and tight function definition with well defined parameters and output, but anything big and it starts losing shit left and right.
All vibecoding sessions I've seen have been pretty dead easy stuff with lot of boilerplate, maybe I'm weird for just not writing a lot of boilerplate and rely on well-built expressive abstractions..
It’s brilliant because you can always shift the blame on the user. Wrong prompt, wrong model, should have used an agent and ran 3 models in parallel, etc.
Meanwhile we get claims that as capable as a junior programmer, and CEOs believe that.
For the problems where it would matter the most, these tools seem to help the least. The hardest problem domains don't have just one schema to worry about. They have hundreds. If you need to spin up a personal blog or todo list tracker, I have no doubt that Google, et. al. can take you exactly where you want to go.
and then add in ambiguity in the business terms / intention behind the query. still a big need for something like semantic layer or ontology to sit between business and at least right now that stuff hasn’t been automated away yet (it should be though)
Malloy [1] has a semantic layer [2]... and Model Context Protocol (MCP) support is being added through Publisher [3]. Something to keep an eye on. Seems like a great fit for LLMs.
> We will cover state-of-the-art [...] how we approach techniques that allows the system to offer virtually certified correct answers.
I don't need AI to generate perfect SQL, because I am never going to trust the output enough to copy/paste it — the risk of subtle semantic errors is too high, even if the code validates.
Instead, I find it helpful for AI to suggest approaches — after which I will manually craft the SQL, starting from scratch.
It’s not true that I want to build “well but not fast” — I’m trying to add value, and both speed and reliability matter. My productivity is high and I don’t have trouble articulating why; my approach has generally (though not universally) been well received by management and colleagues.
I don't trust myself to craft a prompt in natural language which completely specifies my intent as codified with the precision of a programming language.
I also tend to turn to AI for advising me on difficult use cases, and most of the time it's for production code rather than one-offs. The easy cases, I just write myself because it's more mental effort to review code for subtle errors than it is to write it.
I embrace my fallibility, and enthusiastically pursue testing, code reviews, staging environments, and so on to minimize the mistakes that make it through to production.
It seems to me that this skeptical mindset is consonant with handling AI output with care.
ai is not going to replace the senior sql expert with 20 years of battle experience in the short-term but support me who last dug into sql 15 years ago and needs to get a working sql query in a project. and ai usually does a better job than me copy pasting googled code in between quickly browsing through tutorials.
Is it to build a copilot for a data analyst or to get business insight without going through an analyst?
If it’s the latter - then imho no amount of text to sql sophistication will solve the problem because it’s impossible for a non analyst to understand if the sql is correct or sufficient.
These don’t seem like text2sql problems:
> Why did we hit only 80% of our daily ecommmerce transaction yesterday?
> Why is customer acquisition cost trending up?
> Why was the campaign in NYC worse than the same in SF?
Correct, but I would propose two things to add to your analysis:
1. Natural language text is a universal input to LLM systems
2. text2sql makes the foundation of retrieving the information that can help answer these higher-level questions
And so in my mind, the goals for text2sql might be a copilot (near-term), but the long-term is to have a good foundation for automating text2sql calls, comparing results, and pulling them into a larger workflow precisely to help answer the kinds of questions you're proposing.
There's clearly much work needed to achieve that goal.
To be fair, these don’t look like SQL problems either. SQL answers “what”, not “why” questions. The goal of text2sql is to free up analyst time to get through “what” much faster and - possibly- focus on “why” questions.
My observation is the latter, but I agree the results fall short of expectations. Business will often want last minute change in reporting, don't get what they want at the right time because lack of analysts, and hope having "infinite speed" will solve the problem.
But ofc the real issue is that if your report metrics change last minute, you're unlikely to get good report. That's a symptom of not thinking much about your metrics.
Also, reports / analysis generally take time because the underlying data are messy, lots of business knowledge encoded "out of band", and poor data infrastructure. The smarter analytics leaders will use the AI push to invest in the foundations.
Any algo that a human would follow can be built and tested. If you have 10 analysts you have 10 different skill levels, with differing understanding of the database and business context. So automation gives you a platform to achieve a floor of skill and knowledge. The humans can now be “at least this good or better”. A new analyst instantly gets better, faster.
I assume a useful goal would be to guide development of the system in coordination with experts, test it, have the AI explain all trade offs, potential bugs, sense check it against expected results etc.
Taste is hard to automate. Real insight is hard to automate. But a domain expert who isn’t an “analyst” can go extremely far with well designed automation and a sense of what rational results should look like. Obviously the state of the art isn’t perfect but you asked about goals, so those would be my goals.
The processes the people want the sql for are likely filled with algo’s. An exec wants info in a known domain, set up a text to sql system with lots of context and testing to generate queries. If they think they have something good, get an expert to test and productionise it.
“Thank you for your request. Can you walk me through the steps you’d use to do this manually? What things would you watch out for? What kind of number ranges are reasonable? I can propose an algorithm and you tell me if that’s correct. The admins have set up guidelines on how to reason about customer and purchase data. Is the following consistent with your expectations?”
This is the same fallacy as low-code/no-code. If you have to check a precise algorithm, you’re effectively coding, and you need a language with the same precision as a programming language.
Only if you want a production-ready output. To get execs able to self-feed enough, this works fine. Look, you don’t see value until it’s perfect. Good, other people do. I see your fallacy and raise you a false dichotomy.
I find Gemini excellent for sql. Wouldn’t consider myself an expert in many things, but in sql and database design id consider myself close. I like writing queries and doing the architecture, and that’s where it’s exceptionally helpful. The massive context length combined with pointed questions means i can just dump the entire DDL, and ask “what am i missing?”. It really is an excellent tool for helping with times like checks and catching dumb errors on complex databases.
This is on howto to to write good SELECTS, not SQL. AI is good enough to also create schemas from spec, migrate, explore databases, testing etc which tgis article does not touch upon
Every time I've fed more than 5 migration files and asked Claude to make multiple across those files it fails, it does very badly in almost all cases, even on kinda basic schemas. I actually don't think LLMs grok complex migration files or sql that well at all.
If a lot of the value in a company is the software and over time a handful of AI companies start writing all the software, who really ends up owning all the value of the company?
I have done this using the OpenAI 4o model. I had to pass in a prompt with business-specific instructions, industry jargon, and descriptions of tables, including foreign keys. Then it would generate even complex join queries and return data. In my case, I was more interested in providing results to users not knowledgeable about SQL, but the SQL was displayed for information.
Out of all the AI tools and models I’ve tried, the most disappointing is the Gemini built into BigQuery. Despite having well named columns with good descriptions it consistently gets nowhere close to solving the problem.
Having written more SQL than any other programming language by now, every time I've tried to use AI to write the query for me, I'd spend way more time getting the output right than if I'd just written it myself.
As a quick aside there's one thing I wish SQL had that would make writing queries so much faster. At work we're using a DSL that has one operator that automatically generates joins from foreign key columns, just like
credit.CLIENT->NAME
And you got clients table automatically joined into the query. Having to write ten to twenty joins for every query is by far the worst thing, everything else about writing SQL is not that bad.
(Shameless plug) writing the same joins over and over (and refactoring when you update stuff) was one of my biggest boilerplate annoyances with SQL - I’ve tried to fix that while still keeping the rest of SQL in https://trilogydata.dev/
yeah we’re doing something similar under the hood at AstroBee. it’s way way way easier to handle joins this way.
imo any hope of really leveraging llms in this context needs this + human review on additions to a shared ontology/semantic layer so most of the nuanced stuff is expressed simply and reviewed by engineering before business goes wild with it
Having proper constraints and foreign keys that are clear is generally all that's needed in my experience. Are you sure your tables have well defined constraints, so that the AI can be absolutely 100% sure how everything links up? SQL is very precise, but only if you're utilizing constraints and foreign key definitions well.
In real life I find using AI for SQL dangerous. It allows people that don't know what they do to write queries that can significantly impact servers. In my world databases are relatively big for most developers, but not huge.
Sometimes when I want to fine tune a query I am challenging AI to provide a better solution. I give it the already optimized query and I ask for better. I never got a better answer, sometimes because AI is hallucinating or because the changes that it proposes are not working in a way that is beneficial, it is like an idiot parrot is telling what it overheard in the brothel - good info if it is a war brothel frequented by enemy officers in 1916, but not these days.
It should never be at the point where some random person can impact a server.
That's what read replicas with read-only access are for. Production db servers should not be open to random queries and usage by people. That's only for the app to use.
The strategy I've used with these people is to let them prototype with AI and then have them hand over their work to me where I can then make it significantly more efficient. The nice thing is that their poor performing version acts as a reference for validating the output of my queries.
> I give it the already optimized query and I ask for better. I never got a better answer..
This was my experience as well. However I have observed that things have been improving this regard. Newer LLMs do perform much better. And I suspect they will continue to get better over time.
I’ve been working on highly optimized code that heavily uses CPU intrinsics, a year ago no chance, 6 months ago a helpful reference, today it’s a good starting point. That is an insane pace of improvement.
> It allows people that don't know what they do to write queries that can significantly impact servers.
At least for the only OLAP DB I use often - Amazon Redshift - that’s a solved problem with Workload Management Queues. You can restrict those users ability to consume too many resources.
For queries that are used for OLTP, I usually try to keep those queries relatively simple. If there is a reason for read queries that consume resources , those go to read replicas when strong consistently isn’t required
This comment appears frequently and always surprises me. Do people just... not know regex? It seems so foreign to me.
It's not like it's some obscure thing, it's absolutely ubiquitous.
Relatively speaking it's not very complicated, it's widely documented, has vast learning resources, and has some of the best ROI of any DSL. It's funny to joke that it looks like line noise, but really, there is not a lot to learn to understand 90% of the expressions people actually write.
It takes far longer to tell an AI what you want than to write a regex yourself.
I know regex. But I use it so sparingly that every time I need it I forgot again the character for word boundary, or the character for whitespace, or the exact incantation for negative lookahead. Is it >!? who knows.
A shortcut to type in natural language and get something I can validate in seconds is really useful.
How do you validate it if you don’t know the syntax? Or are you saying that looking up syntax –> semantics is significantly quicker than semantics –> syntax? Which I don’t find to be the case. What takes time is grokking the semantics in context, which you have to do in both cases.
That doesn’t answer the question. By “validate”, I mean “prove to yourself that the regular expression is correct”. Much like with program code, you can’t do that by only testing it. You need to understand what the expression actually says.
IME it's not just longer, but also more difficult to tell the LLM precisely what you want than to write it yourself if you need a somewhat novel RegExp, which won't be all over the training data.
I needed one to do something with Markdown which was a very internal BigCo thing to need to do, something I'd never have written without weird requirements in play. It wasn't that tricky, but going back trying to get LLMs to replicate it after the fact from the same description I was working from, they were hopeless. I need to dig that out again and try it on the latest models.
There's often a bunch of edge cases that people overlook. And you also get quadratic behaviour for some fairly 'simple' looking regexes that few people seem aware of.
I was using perl in the late 90s for sysadmin stuff, have written web scrapers in python and have a solid history with regex. That being said, AI can still write really complex lookback/lookahead/nested extraction code MUCH faster and with fewer bugs than me, because regex is easy to make small mistakes with even when proficient.
The first languge I used to solve real problems was perl, where regex is a first class citizen. In python less so, most of my python scripts don't use it. I love regex but know several developers who avoid it like plague. You don't know what you don't know, and there's nothing wrong with that. LLM's are super helpful for getting up to speed on stuff.
I use regex as an alternative to wildcards in various apps like notepad++ and vscode. The format is different in each app. And the syntax is somewhat different. I have to research it each time. And complex regex is a nightmare.
Which is why I would ask an AI to build it if it could.
I personally didn’t really understand how to write regex until I understood “regular languages” properly, then it was obvious.
I’ve found that the vast majority of programmers today do not have any foundation in formal languages and/or the theory of computation (something that 10 years ago was pretty common to assume).
It used to be pretty safe to assume that everyone from perl hackers to computer science theorists understood regex pretty well, but I’ve found it’s increasingly a rare skill. While it used to be common for all programmers to understand these things, even people with a CS background view that as some annoying course they forgot as soon as the exam was over.
"Text to SQL", "text to regex", "text to shell", etc. will never fundamentally work because the reason we have computer languages is to express specific requirements with no ambiguity.
With an AI prompt you'll have to do the same thing, just more verbosely.
You will have to do what every programmer hates, write a full formal specification in English.
I agree. There's really no magic to it any more. The table create DDL commands are a very precise description of the tables, so almost nothing more is ever needed. You can just describe in detail what query you need, and any decent LLM can do it just fine.
> Even with a high-quality model, there is still some level of non-determinism or unpredictability involved in LLM-driven SQL generation. To address this we have found that non-AI approaches like query parsing or doing a dry run of the generated SQL complements model-based workflows well. We can get a clear, deterministic signal if the LLM has missed something crucial, which we then pass back to the model for a second pass. When provided an example of a mistake and some guidance, models can typically address what they got wrong.
Sounds like a bunch of bespoke not-AI work is being done to make up for LLM limitations that point blank can’t be resolved.
All this LLM written SQL stuff sounds great until you realize if you don’t really know SQL you won’t be able to debug or fix any broken SQL an LLM generates.
Thus, this is mainly just a tool for true experts to do less work and still get paid the same, not a tool for beginners to rise to the level of experts.
It depends, sometimes just feeding back broken SQL with "that didn't return any rows, can you fix it" and it comes up with something that works. Or "you're looking at the wrong entity, look at this table instead" or whatever, without knowing how to write competent SQL.
Obviously being able to at least read a bit of SQL and understanding the basic idea of relational databases helps loads.
"Given the prompt "I have a database schema that contains products and orders. Write a SQL query that shows the number of orders for shoes""
How on earth is this an AI job?
In the example you describe there are several technical things in nearly natural language and you mention two things that would be a drop down in a GUI. For starters this assumes you know what SQL is and your data layout or "schema".
Regardless of using AI, you need to understand the base technology.
SQL is not intractable for queries, once you have worked out the relationships. The relationship complexity will be the same for an AI prompt too no matter how cool you feel.
Your AI might find a customer 1:M shoes relationship or not. I suggest that anything beyond a couple of tables model will go horribly wrong.
If you know SQL, then yeah! But if you don't know SQL, using an AI to write a few queries & debug them is a great way to learn it.
I'm pretty comfortable with sql but still found it a fabulous tool recently. I have a sql database which describes a tree of some ~600k events. Each event is in a session (via session_id). Most events have a parent event - and trees of events can involve multiple sessions.
I wanted to add two derived columns to my events table. For each event, I wanted to name the root event for that event's tree and the root event within this session. I had code in typescript to do it - but unsurprisingly it was pretty slow. Well, it turns out you can write a recursive SQL query which can traverse the graph and populate those columns. I had no idea that was even possible.
ChatGPT managed it pretty well - though I ended up making a bunch of tweaks to the query it suggested to simplify it. I learned a bunch of SQL in the process - and that was cool! Obviously I could have read the SQL documentation and figured it out myself, but it was faster & easier using chatgpt. Writing SQL queries is a fantastic use case for LLMs.
Can I just say that Google AI Studio with latest Gemini is stunningly, amazingly, game changingly impressive.
It leaves Claude and ChatGPT's coding looking like they are from a different century. It's hard to believe these changes are coming in factors of weeks and months. Last month i could not believe how good Claude is. Today I'm not sure how I could continue programming without Google Gemini in my toolkit.
Gemini AI Studio is such a giant leap ahead in programming I have to pinch myself when I'm using it.
I'm really surprised more people haven't caught on. Claude can one shot small stuff of similar complexity, but as soon as you start to really push the model into longer, more involved use cases Gemini pulls way ahead. The context handling is so impressive, in addition to using it for coding agents, I use Gemini as a beta reader for a fairly long manuscript (~85k words) and it absolutely nails it, providing a high level report that's comparable to what a solid human beta reader would provide in seconds.
It is absolutely the greatest golden age in programming ever - all these infinitely wealthy companies spending bajillions competing on who can make the best programming companion.
Apart from the apologising. It's silly when the AI apologises with ever more sincere apologies. There should be no apologies from AIs.
You're absolutely right! My mistake. I'll be careful about apologizing too much in the future.
companion or replacement?
they would replace entire software department until AI make bug because endless changes into your javascript framework then they would hire human again to make fix
we literally creating solution for our own problem
... Or saboteur. :p
[dead]
And Gemini is free.
Well, as with many of Google's services, you pay with your data.
Pay-as-you-go with Gemini does not snort your data for their own purposes (allegedly...).
Undoubtedly, but a significant positive aspect is the democratization of this technology that enables access for people who could not afford it, not productively, that is.
I asked it to make some changes to the code it wrote. But it kept pumping out the same code with more and more comments to justify itself. After the third attempt I realized I could have done it myself in less time.
How do you use it exactly? Does it integrate with any IDEs?
There is Gemini Code Assist.
https://developers.google.com/gemini-code-assist/docs/overvi...
Jetbrains AI recently added (beta) access to Gemini Pro 2.5 and there's of course plugins like Continue.dev that provide access to pretty much anything with an API
Zed supports it out of the box.
It always is for the first week. Then you find out that the last 10% matter a lot more than than the other 90%. And finally they turn off the high compute version and you're left with a brain dead model that loses to a 32b local model half the time.
Is it just me or did they turn off reasoning mode in free Gemini Pro this week?
It's pretty useful as long as you hold it back from writing code too early, or too generally, or sometimes at all. It's a chronic over-writer of code, too. Ignoring most of what it attempts to write and using it to explore the design space without ever getting bogged down in code and other implementation details is great though.
I've been doing something that's new to me but is going to be all over the training data (subscription service using stripe) and have often been able to pivot the planned design of different aspects before writing a single line of code because I can get all the data it already has regurgitated in the context of my particular tech stack and use case.
They rolled out a new model a week ago which has a "bug" where in long chats it forgets to emit the tokens required for the UI to detect that it's reasoning. You can remind it that it needs to emit these tokens, which helps, or accept that it will sometimes fail to do it. I don't notice a deterioration in performance because it is still reasoning (you can tell by the nature of the output), it's just that those tokens aren't in <think> tags or whatever's required by the UI to display it as such.
I think reasoning in the studio is gated by load, and at the same time I wasn't seeing so much reasoning in AIstudio, I was getting vertex service overloaded calls pretty frequently on my agents.
Is this distinct from using Gemini 2.5 Pro? If not, this doesn’t match my experience — I’ve been getting a lot of poorly designed TypeScript with an excess of very low quality comments.
Really? I get goofy random substitutions like sometimes from foreign languages. It also doesn't do good with my mini-tests of "can you write modern Svelte without inserting React" and "can you fix a borrow-checking issue in Rust with lifetimes, not Arc/Cell slop"
That doesn't mean it's worse than the others just not much better. I haven't found anything that worked better than o1-preview so far. How are you using it?
[dead]
Absolutely agree. I really pushed it last week with a screenshot of a very abstract visualisation that we’d done in a Miro board of which we couldn’t find a library that did exactly what we wanted, so we turned to Gemini.
Essentially we were hoping to tie that to data inputs and have a system to regularly output the visualisation but with dynamic values. I bet my colleague it would one shot it: it did.
What I’ve also found is that even a sloppy prompt still somehow is reading my mind on what to do, even though I’ve expressed myself poorly.
Inversely, I’ve really found myself rejecting suggestions from ChatGPT, even o4-mini-high. It’s just doing so much random crap I didn’t ask and the code is… let’s say not as “Gemini” as I’d prefer.
In one of Stephen Boyd's lectures on convex optimization, he has some quip like "if your optimization problem is computationally intractable, you could try really hard to improve the algorithm, or you could just go on vacation for a few weeks and by the time you get back, computers will be fast enough to solve it."
I feel like that's actually true now with LLMs -- if some query I write doesn't get one-shotted, I don't bother with a galaxy-brain prompt; I just shelve it 'til next month and the next big OpenAI/Anthropic/Google model will usually crush it.
Google may be getting AI to write good SQL, but they aren’t getting it to write good blog posts.
the short answer: use a semantic layer.
It's the cleanest way to give the right context and the best place to pull a human in the loop.
A human can validate and create all important metrics (e.g. what does "monthly active users" really mean) then an LLM can use that metric definition whenever asked for MAU.
With a semantic layer, you get the added benefit of writing queries in JSON instead of raw SQL. LLM's are much more consistent at writing a small JSON vs. hundreds of lines of SQL.
We[0] use cube[1] for this. It's the best open source semantic layer, but there's a couple closed source options too.
My last company wrote a post on this in 2021[2]. Looks like the acquirer stopped paying for the blog hosting, but the HN post is still up.
0 - https://www.definite.app/
1 - https://cube.dev/
2 - https://news.ycombinator.com/item?id=25930190
Mother of God. I can write JSON instead of a language designed for querying. What is the advantage? If I’m going to move up an abstraction layer, why not give me natural language? Lots of things turn a limited natural language grammar into SQL for you. What is JSON going to {do: for: {me}}?
Sorry, I couldn't parse that. You didn't quote your keys
> you get the added benefit of writing queries in JSON instead of raw SQL.
I’m sorry, I can’t. The tail is wagging the dog.
dang, can you delete my account and scrub my history? I’m serious.
You move all the tools to debug and inspect slow queries, in a completely unsupported JSON environment, with prompts not to make up column names. And this is progress?
The JSON compiles to SQL. Have you used a semantic layer? You might have a different opinion if you tried one.
As someone who actually wrote a JSON to (limited) SQL transpiler at $DAYJOB, as much fun as I had designing and implementing that thing and for as many problems it solved immediately, 'tail wagging the dog' is the perfect description.
This may be the best comment on Hacker News ever.
You're right, it's a bit ridiculous. This is a perfect time to use xml instead of json.
[flagged]
[flagged]
>you get the added benefit of writing queries in JSON instead of raw SQL
You should have written your comment in JSON instead of raw English.
still need someone to build the semantic layer, why not use text2sql or something similar for that
Is it too late to rescue the phrase "one-shotted" or is it already too far gone, like "AI" and "agent"?
Every once in a while I've been trying AI, since everyone and their mother told me to, so I comply.
My recent endevour was with Gemini 2.5:
It's difficult to assess how typical your experience is; I tried your initial prompt (`Write me a simple todo app on cloudflare with auth0 authentication.` on gemini-2.5-pro-preview-05-06) and didn't get any mentions of @auth0-cloudfare, although I cannot verify if the answer is working as-is
https://pastebin.com/yfg0Zn0u
The worse part is not even being trolled at AI roundabout. The worse part is gaslighting by people who then go on to imply that I'm dumb to not be able to 'guide' the model 'towards the solution', whatever the fuck that means. And this is after telling me that model is so smart to just know what I want.
Claude and Gemini are pretty decent at providing a small and tight function definition with well defined parameters and output, but anything big and it starts losing shit left and right.
All vibecoding sessions I've seen have been pretty dead easy stuff with lot of boilerplate, maybe I'm weird for just not writing a lot of boilerplate and rely on well-built expressive abstractions..
Remember, if AI couldn't solve your problem, you were probably using the wrong model. Did you try with o5-selfsuck-20250523-512B?
It’s brilliant because you can always shift the blame on the user. Wrong prompt, wrong model, should have used an agent and ran 3 models in parallel, etc.
Meanwhile we get claims that as capable as a junior programmer, and CEOs believe that.
For the problems where it would matter the most, these tools seem to help the least. The hardest problem domains don't have just one schema to worry about. They have hundreds. If you need to spin up a personal blog or todo list tracker, I have no doubt that Google, et. al. can take you exactly where you want to go.
and then add in ambiguity in the business terms / intention behind the query. still a big need for something like semantic layer or ontology to sit between business and at least right now that stuff hasn’t been automated away yet (it should be though)
Malloy [1] has a semantic layer [2]... and Model Context Protocol (MCP) support is being added through Publisher [3]. Something to keep an eye on. Seems like a great fit for LLMs.
[1] https://www.malloydata.dev/ [2] https://docs.malloydata.dev/documentation/user_guides/malloy... [3] https://github.com/malloydata/publisher
> We will cover state-of-the-art [...] how we approach techniques that allows the system to offer virtually certified correct answers.
I don't need AI to generate perfect SQL, because I am never going to trust the output enough to copy/paste it — the risk of subtle semantic errors is too high, even if the code validates.
Instead, I find it helpful for AI to suggest approaches — after which I will manually craft the SQL, starting from scratch.
Explain that to the average manager or junior engineer, both who don’t care about your desire to build well but not fast.
It’s not true that I want to build “well but not fast” — I’m trying to add value, and both speed and reliability matter. My productivity is high and I don’t have trouble articulating why; my approach has generally (though not universally) been well received by management and colleagues.
> So now that we brought down prod for a day the new rule is no AI sql without three humans signing off on any queries.
If that’s the scenario, I would be asking why the testing pipeline didn’t catch this rather than why was the AI SQL wrong.
Because the testing pipeline isn't the real database.
Anyone that knows a database well can bring it down with a innocent looking statement that no one else will blink at.
Because the testing pipeline was generated by AI, and code-reviewed by AI, reading a PR description generated by AI.
Really? In my experience it’s been pretty good (using Pydantic)! I read over before I execute it, but it’s never done anything malicious.
I don't trust myself to craft a prompt in natural language which completely specifies my intent as codified with the precision of a programming language.
I also tend to turn to AI for advising me on difficult use cases, and most of the time it's for production code rather than one-offs. The easy cases, I just write myself because it's more mental effort to review code for subtle errors than it is to write it.
Hopefully your trust in yourself is warranted
I embrace my fallibility, and enthusiastically pursue testing, code reviews, staging environments, and so on to minimize the mistakes that make it through to production.
It seems to me that this skeptical mindset is consonant with handling AI output with care.
You'd rather trust in AI than yourself?
in writing good sql code? i definitely would
ai is not going to replace the senior sql expert with 20 years of battle experience in the short-term but support me who last dug into sql 15 years ago and needs to get a working sql query in a project. and ai usually does a better job than me copy pasting googled code in between quickly browsing through tutorials.
What’s the eventual goal of text to sql?
Is it to build a copilot for a data analyst or to get business insight without going through an analyst?
If it’s the latter - then imho no amount of text to sql sophistication will solve the problem because it’s impossible for a non analyst to understand if the sql is correct or sufficient.
These don’t seem like text2sql problems:
> Why did we hit only 80% of our daily ecommmerce transaction yesterday?
> Why is customer acquisition cost trending up?
> Why was the campaign in NYC worse than the same in SF?
> These don’t seem like text2sql problems:
Correct, but I would propose two things to add to your analysis:
1. Natural language text is a universal input to LLM systems
2. text2sql makes the foundation of retrieving the information that can help answer these higher-level questions
And so in my mind, the goals for text2sql might be a copilot (near-term), but the long-term is to have a good foundation for automating text2sql calls, comparing results, and pulling them into a larger workflow precisely to help answer the kinds of questions you're proposing.
There's clearly much work needed to achieve that goal.
yeah I agree with this - good text2sql is essential but just one part of a larger stack that will actually get there. Seems possible tho
To be fair, these don’t look like SQL problems either. SQL answers “what”, not “why” questions. The goal of text2sql is to free up analyst time to get through “what” much faster and - possibly- focus on “why” questions.
My observation is the latter, but I agree the results fall short of expectations. Business will often want last minute change in reporting, don't get what they want at the right time because lack of analysts, and hope having "infinite speed" will solve the problem.
But ofc the real issue is that if your report metrics change last minute, you're unlikely to get good report. That's a symptom of not thinking much about your metrics.
Also, reports / analysis generally take time because the underlying data are messy, lots of business knowledge encoded "out of band", and poor data infrastructure. The smarter analytics leaders will use the AI push to invest in the foundations.
Any algo that a human would follow can be built and tested. If you have 10 analysts you have 10 different skill levels, with differing understanding of the database and business context. So automation gives you a platform to achieve a floor of skill and knowledge. The humans can now be “at least this good or better”. A new analyst instantly gets better, faster.
I assume a useful goal would be to guide development of the system in coordination with experts, test it, have the AI explain all trade offs, potential bugs, sense check it against expected results etc.
Taste is hard to automate. Real insight is hard to automate. But a domain expert who isn’t an “analyst” can go extremely far with well designed automation and a sense of what rational results should look like. Obviously the state of the art isn’t perfect but you asked about goals, so those would be my goals.
But “text to sql” isn’t an algorithm.
The processes the people want the sql for are likely filled with algo’s. An exec wants info in a known domain, set up a text to sql system with lots of context and testing to generate queries. If they think they have something good, get an expert to test and productionise it.
“Thank you for your request. Can you walk me through the steps you’d use to do this manually? What things would you watch out for? What kind of number ranges are reasonable? I can propose an algorithm and you tell me if that’s correct. The admins have set up guidelines on how to reason about customer and purchase data. Is the following consistent with your expectations?”
This is the same fallacy as low-code/no-code. If you have to check a precise algorithm, you’re effectively coding, and you need a language with the same precision as a programming language.
Only if you want a production-ready output. To get execs able to self-feed enough, this works fine. Look, you don’t see value until it’s perfect. Good, other people do. I see your fallacy and raise you a false dichotomy.
I find Gemini excellent for sql. Wouldn’t consider myself an expert in many things, but in sql and database design id consider myself close. I like writing queries and doing the architecture, and that’s where it’s exceptionally helpful. The massive context length combined with pointed questions means i can just dump the entire DDL, and ask “what am i missing?”. It really is an excellent tool for helping with times like checks and catching dumb errors on complex databases.
This is on howto to to write good SELECTS, not SQL. AI is good enough to also create schemas from spec, migrate, explore databases, testing etc which tgis article does not touch upon
Every time I've fed more than 5 migration files and asked Claude to make multiple across those files it fails, it does very badly in almost all cases, even on kinda basic schemas. I actually don't think LLMs grok complex migration files or sql that well at all.
Well that's a great startup idea if you're familiar with the domain.
No mention of knowing anything about the tables, versions or relational structure? Are we just assuming that's already given to the AI?
If a lot of the value in a company is the software and over time a handful of AI companies start writing all the software, who really ends up owning all the value of the company?
That’s easy. None of the value is in the software. The only value is in customers that use the software.
I have done this using the OpenAI 4o model. I had to pass in a prompt with business-specific instructions, industry jargon, and descriptions of tables, including foreign keys. Then it would generate even complex join queries and return data. In my case, I was more interested in providing results to users not knowledgeable about SQL, but the SQL was displayed for information.
Can't believe I'm seeing something from Google involving shoes but it isn't named gShoe.
Out of all the AI tools and models I’ve tried, the most disappointing is the Gemini built into BigQuery. Despite having well named columns with good descriptions it consistently gets nowhere close to solving the problem.
Having written more SQL than any other programming language by now, every time I've tried to use AI to write the query for me, I'd spend way more time getting the output right than if I'd just written it myself.
As a quick aside there's one thing I wish SQL had that would make writing queries so much faster. At work we're using a DSL that has one operator that automatically generates joins from foreign key columns, just like
And you got clients table automatically joined into the query. Having to write ten to twenty joins for every query is by far the worst thing, everything else about writing SQL is not that bad.I'd like there to be a function or macro for a bunch of joins, say
You could make it visible to the DB rather than just a macro so it could optimise it by caching etc. Sort of like a view but on demand.Sounds like a CTE?
That's one of the features of EdgeQL:
https://docs.geldata.com/learn/edgeql#select-objectsAlthough I think good enough language server / IDE could automatically insert the join when you typed `credit.CLIENT->NAME`
(Shameless plug) writing the same joins over and over (and refactoring when you update stuff) was one of my biggest boilerplate annoyances with SQL - I’ve tried to fix that while still keeping the rest of SQL in https://trilogydata.dev/
yeah we’re doing something similar under the hood at AstroBee. it’s way way way easier to handle joins this way.
imo any hope of really leveraging llms in this context needs this + human review on additions to a shared ontology/semantic layer so most of the nuanced stuff is expressed simply and reviewed by engineering before business goes wild with it
Having proper constraints and foreign keys that are clear is generally all that's needed in my experience. Are you sure your tables have well defined constraints, so that the AI can be absolutely 100% sure how everything links up? SQL is very precise, but only if you're utilizing constraints and foreign key definitions well.
It’s BigQuery, so it likely won’t have any of these.
BigQuery supports all those SQL things I mentioned.
I’m just saying it’s likely they aren’t using them. But clearly you should if you want LLMs to do anything useful.
In real life I find using AI for SQL dangerous. It allows people that don't know what they do to write queries that can significantly impact servers. In my world databases are relatively big for most developers, but not huge.
Sometimes when I want to fine tune a query I am challenging AI to provide a better solution. I give it the already optimized query and I ask for better. I never got a better answer, sometimes because AI is hallucinating or because the changes that it proposes are not working in a way that is beneficial, it is like an idiot parrot is telling what it overheard in the brothel - good info if it is a war brothel frequented by enemy officers in 1916, but not these days.
It should never be at the point where some random person can impact a server.
That's what read replicas with read-only access are for. Production db servers should not be open to random queries and usage by people. That's only for the app to use.
The strategy I've used with these people is to let them prototype with AI and then have them hand over their work to me where I can then make it significantly more efficient. The nice thing is that their poor performing version acts as a reference for validating the output of my queries.
> I give it the already optimized query and I ask for better. I never got a better answer..
This was my experience as well. However I have observed that things have been improving this regard. Newer LLMs do perform much better. And I suspect they will continue to get better over time.
I’ve been working on highly optimized code that heavily uses CPU intrinsics, a year ago no chance, 6 months ago a helpful reference, today it’s a good starting point. That is an insane pace of improvement.
Mate, IME programmers who don't know what they are doing just do it anyways then look to blame someone/something else if things turn to custard.
AI is just increasing the frequency of things turning to custard :)
AI is most effective as an accountability sink
> It allows people that don't know what they do to write queries that can significantly impact servers.
At least for the only OLAP DB I use often - Amazon Redshift - that’s a solved problem with Workload Management Queues. You can restrict those users ability to consume too many resources.
For queries that are used for OLTP, I usually try to keep those queries relatively simple. If there is a reason for read queries that consume resources , those go to read replicas when strong consistently isn’t required
o3 has yet to fail me on complex, multi-table queries. Not a fan of BigQuery’s Gemini integration.
AI text to regex solutions would be incredibly handy.
This comment appears frequently and always surprises me. Do people just... not know regex? It seems so foreign to me.
It's not like it's some obscure thing, it's absolutely ubiquitous.
Relatively speaking it's not very complicated, it's widely documented, has vast learning resources, and has some of the best ROI of any DSL. It's funny to joke that it looks like line noise, but really, there is not a lot to learn to understand 90% of the expressions people actually write.
It takes far longer to tell an AI what you want than to write a regex yourself.
I know regex. But I use it so sparingly that every time I need it I forgot again the character for word boundary, or the character for whitespace, or the exact incantation for negative lookahead. Is it >!? who knows.
A shortcut to type in natural language and get something I can validate in seconds is really useful.
How do you validate it if you don’t know the syntax? Or are you saying that looking up syntax –> semantics is significantly quicker than semantics –> syntax? Which I don’t find to be the case. What takes time is grokking the semantics in context, which you have to do in both cases.
https://regex101.com/
That doesn’t answer the question. By “validate”, I mean “prove to yourself that the regular expression is correct”. Much like with program code, you can’t do that by only testing it. You need to understand what the expression actually says.
Notice that site has a very usable reference list you can consult for all those details the GP forgets.
IME it's not just longer, but also more difficult to tell the LLM precisely what you want than to write it yourself if you need a somewhat novel RegExp, which won't be all over the training data.
I needed one to do something with Markdown which was a very internal BigCo thing to need to do, something I'd never have written without weird requirements in play. It wasn't that tricky, but going back trying to get LLMs to replicate it after the fact from the same description I was working from, they were hopeless. I need to dig that out again and try it on the latest models.
There's often a bunch of edge cases that people overlook. And you also get quadratic behaviour for some fairly 'simple' looking regexes that few people seem aware of.
I was using perl in the late 90s for sysadmin stuff, have written web scrapers in python and have a solid history with regex. That being said, AI can still write really complex lookback/lookahead/nested extraction code MUCH faster and with fewer bugs than me, because regex is easy to make small mistakes with even when proficient.
Regex, especially non standard (and non regular) extensions can be pretty tricky to grok.
http://alf.nu/RegexGolf?world=regex&level=r00
The first languge I used to solve real problems was perl, where regex is a first class citizen. In python less so, most of my python scripts don't use it. I love regex but know several developers who avoid it like plague. You don't know what you don't know, and there's nothing wrong with that. LLM's are super helpful for getting up to speed on stuff.
I use regex as an alternative to wildcards in various apps like notepad++ and vscode. The format is different in each app. And the syntax is somewhat different. I have to research it each time. And complex regex is a nightmare.
Which is why I would ask an AI to build it if it could.
I personally didn’t really understand how to write regex until I understood “regular languages” properly, then it was obvious.
I’ve found that the vast majority of programmers today do not have any foundation in formal languages and/or the theory of computation (something that 10 years ago was pretty common to assume).
It used to be pretty safe to assume that everyone from perl hackers to computer science theorists understood regex pretty well, but I’ve found it’s increasingly a rare skill. While it used to be common for all programmers to understand these things, even people with a CS background view that as some annoying course they forgot as soon as the exam was over.
Its something you use so sparingly far away usually that never sticks around
A cheat sheet is just a web search away.
So is an LLM.
since you know so much regex, why dont you write a regex html parser /s
"Text to SQL", "text to regex", "text to shell", etc. will never fundamentally work because the reason we have computer languages is to express specific requirements with no ambiguity.
With an AI prompt you'll have to do the same thing, just more verbosely.
You will have to do what every programmer hates, write a full formal specification in English.
This is pretty simple in any foundation model, provide a well commented schema and ask for the query
Step 1: Your schema has thousands of tables and there aren't many comments.
Step 2...
Use AI to generate the comments of course
Exactly, add any documentation you have about the app for more context too.
the smolagents library is also pretty nice to do the scaffolding around the model. Text to sql seems simple in demos, but to make it work in real life complex cases is very hard: https://medium.com/thoughts-on-machine-learning/build-a-text...
there’s two kinds of people using AI to generate SQL…those who say it’s already solved and those who say it’ll be impossible to ever solve
I agree. There's really no magic to it any more. The table create DDL commands are a very precise description of the tables, so almost nothing more is ever needed. You can just describe in detail what query you need, and any decent LLM can do it just fine.
From "Show HN: We open sourced our entire text-to-SQL product" (2024) https://news.ycombinator.com/item?id=40456236 :
> awesome-Text2SQL: https://github.com/eosphoros-ai/Awesome-Text2SQL
> Awesome-code-llm > Benchmarks > Text to SQL: https://github.com/codefuse-ai/Awesome-Code-LLM#text-to-sql
> Even with a high-quality model, there is still some level of non-determinism or unpredictability involved in LLM-driven SQL generation. To address this we have found that non-AI approaches like query parsing or doing a dry run of the generated SQL complements model-based workflows well. We can get a clear, deterministic signal if the LLM has missed something crucial, which we then pass back to the model for a second pass. When provided an example of a mistake and some guidance, models can typically address what they got wrong.
Sounds like a bunch of bespoke not-AI work is being done to make up for LLM limitations that point blank can’t be resolved.
All this LLM written SQL stuff sounds great until you realize if you don’t really know SQL you won’t be able to debug or fix any broken SQL an LLM generates.
Thus, this is mainly just a tool for true experts to do less work and still get paid the same, not a tool for beginners to rise to the level of experts.
It depends, sometimes just feeding back broken SQL with "that didn't return any rows, can you fix it" and it comes up with something that works. Or "you're looking at the wrong entity, look at this table instead" or whatever, without knowing how to write competent SQL.
Obviously being able to at least read a bit of SQL and understanding the basic idea of relational databases helps loads.
Have you not actually used LLMs? Just copy in the errors and away it goes.
Error goes away but it gives the wrong result.
If LLMs are so wonderful we can just read from B+ Tree storage engines directly. SQL, ORMs, Query Planners... all bloat.
"Given the prompt "I have a database schema that contains products and orders. Write a SQL query that shows the number of orders for shoes""
How on earth is this an AI job?
In the example you describe there are several technical things in nearly natural language and you mention two things that would be a drop down in a GUI. For starters this assumes you know what SQL is and your data layout or "schema".
Regardless of using AI, you need to understand the base technology.
SQL is not intractable for queries, once you have worked out the relationships. The relationship complexity will be the same for an AI prompt too no matter how cool you feel.
Your AI might find a customer 1:M shoes relationship or not. I suggest that anything beyond a couple of tables model will go horribly wrong.
If you know SQL, then yeah! But if you don't know SQL, using an AI to write a few queries & debug them is a great way to learn it.
I'm pretty comfortable with sql but still found it a fabulous tool recently. I have a sql database which describes a tree of some ~600k events. Each event is in a session (via session_id). Most events have a parent event - and trees of events can involve multiple sessions.
I wanted to add two derived columns to my events table. For each event, I wanted to name the root event for that event's tree and the root event within this session. I had code in typescript to do it - but unsurprisingly it was pretty slow. Well, it turns out you can write a recursive SQL query which can traverse the graph and populate those columns. I had no idea that was even possible.
ChatGPT managed it pretty well - though I ended up making a bunch of tweaks to the query it suggested to simplify it. I learned a bunch of SQL in the process - and that was cool! Obviously I could have read the SQL documentation and figured it out myself, but it was faster & easier using chatgpt. Writing SQL queries is a fantastic use case for LLMs.
[dead]
[dead]
[dead]
[dead]