Instant SQL for results as you type in DuckDB UI
80 comments
·April 24, 2025motoboi
ergest
There’s an extension for that https://github.com/ywelsch/duckdb-psql
carlineng
I just watched the author of this feature and blog post give a talk at the DataCouncil conference in Oakland, and it is obvious what a huge amount of craft, ingenuity, and care went into building it. Congratulations to Hamilton and the MotherDuck team for an awesome launch!
ryguyrg
wohoo! glad you noticed that. Hamilton is amazing.
wodenokoto
Is that talk available online?
carlineng
Not yet, but I believe the DataCouncil staff recorded it and will post it to their YouTube channel sometime in the next few weeks: https://www.youtube.com/@DataCouncil/videos
jakozaur
It would be even better if SQL had pipe syntax. SQL is amazing, but its ordering isn’t intuitive, and only CTEs provide a reliable way to preview intermediate results. With pipes, each step could clearly show intermediate outputs.
Example:
FROM orders |> WHERE order_date >= '2024-01-01' |> AGGREGATE SUM(order_amount) AS total_spent GROUP BY customer_id |> WHERE total_spent > 1000 |> INNER JOIN customers USING(customer_id) |> CALL ENRICH.APOLLO(EMAIL > customers.email) |> AGGREGATE COUNT(*) high_value_customer GROUP BY company.country
tstack
The PRQL[1] syntax is built around pipelines and works pretty well.
I added a similar "get results as you type" feature to the SQLite integration in the Logfile Navigator (lnav)[2]. When entering PRQL queries, the preview will show the results for the current and previous stages of the pipeline. When you move the cursor around, the previews update accordingly. I was waiting years for something like PRQL to implement this since doing it with regular SQL requires more knowledge of the syntax and I didn't want to go down that path.
[1] - https://prql-lang.org [2] - https://lnav.org/2024/03/29/prql-support.html
mritchie712
there's a PRQL extension for duckdb:
https://community-extensions.duckdb.org/extensions/prql.html
hamilton
Obviously one advantage of SQL is everyone knows it. But conceptually, I agree. I think [1]Malloy is also doing some really fantastic work in this area.
This is one of the reasons I'm excited about DuckDB's upcoming [2]PEG parser. If they can pull it off, we could have alternative dialects that run on DuckDB.
[1] https://www.malloydata.dev/ [2] https://duckdb.org/2024/11/22/runtime-extensible-parsers.htm...
metadata
Google SQL has it now:
https://cloud.google.com/blog/products/data-analytics/simpli...
It's pretty neat:
FROM mydataset.Produce
|> WHERE sales > 0
|> AGGREGATE SUM(sales) AS total_sales, COUNT(\*) AS num_sales
GROUP BY item;
Edit: formattingcrooked-v
I suspect you'll like PRQL: https://github.com/PRQL/prql
wodenokoto
I haven’t tested but I believe there’s a prql extension for duckdb
ryguyrg
In DuckDB UI and MotherDuck.
Awesome video of feature: https://youtu.be/aFDUlyeMBc8
Disclaimer: I’m a co-founder at MotherDuck.
rancar2
Thanks for sharing this update with the world and including it on the local ui too.
Feature request: enable the tuning of when Instant SQL is run and displayed. The erroring out with flashing updates at nearly every keystoke while expanding on a query is distracting for me personally (my brain goes into troubleshooting vs thinking mode). I like the feature (so I will keep it on by default), but I’d like to have a few modes for it depending on my working context (specifically tuning of update frequency at separation characters [space, comma], end of statement [semicolon/newline], and injections [paste/autocomplete]).
hamilton
Great feedback! Thanks. We agree w/ the red errors. It's not helpful when it feels like your editor is screaming at you.
theLiminator
Curious if there has been any thought given to open sourcing the UI? Of course there's no obligation to though!
hamilton
We do have plans. It's a question of effort, not business / philosophy.
jpambrun
I really like duckdb's notebooks for exploration and this feature makes them even more awesome, but the fact that I can't share, export or commit them into a git repo feels extremely limiting. It's neat-ish that it dodfoods and store them in a duckdb database. It even seems to stores historical versions, but I can't really do anything with it..
hamilton
Definitely something we want too! (I'm the author / lead for the UI)
XCSme
I hope this doesn't work with DELETE queries.
falcor84
Maybe in the next version they could also implement support for DROP, with autocorrect for the nearest (not yet dropped) table name.
munk-a
Or, for extra fun, it auto completes to DROP TRIGGER and just drops a single random trigger from your database. It'll help counter automation fears by ensuring your DBAs get to have a wonderful weekend on payroll where, very much in the easter spirit, they can hunt through the DB looking for the one thing that should be there but isn't!
falcor84
Wow, that's perhaps the most nefarious version of chaos engineering I had ever heard of. Kudos!
clgeoio
LLM powered queries that run in Agent mode so it can answer questions of your data before you know what to ask.
XCSme
That's actually not a bad idea, to have LLM autocomplete when you write queries, especially if you first add a comment at the top saying what you want to achieve:
// Select all orders for users registered in last year, and compute average earnings per user
SELECT ...
krferriter
DELETED 0 rows. Did you mean `where 1=1`? (click accept to re-run with new where clause)
matsonj
for clarity: Instant SQL won't automatically run queries that write or delete data or metadata. It only runs queries that read data.
d0100
And this is a happy coincidence that json_serialize_sql doesn't work with anything but select queries
worldsayshi
Can't it just run inside a transaction that isn't committed?
crmi
Young bobby tables at it again
ayhanfuat
CTE inspection is amazing. I spend too much time doing that manually.
hamilton
Me too (author of the post here). In fact, I was watching a seasoned data engineer at MotherDuck show me how they would attempt to debug a regex in a CTE. As a longtime SQL user, I felt the pain immediately; haven't we all been there before? Instant SQL followed from that.
RobinL
Agree, definitely amazing feature. In the Python API you can get somewhere close with this kind of thing:
input_data = duckdb.sql("SELECT * FROM read_parquet('...')")
step_1 = duckdb.sql("SELECT ... FROM input_data JOIN ...")
step_2 = duckdb.sql("SELECT ... FROM step_1")
final = duckdb.sql("SELECT ... FROM step_2;")
almosthere
Wow, I used DuckDB in my last job, and have to say it was impressive for its speed. Now it's more useful than ever.
mritchie712
a fun function in duckdb (which I think they're using here) is `json_serialize_sql`. It returns a JSON AST of the SQL
SELECT json_serialize_sql('SELECT 2');
[
{
"json_serialize_sql('SELECT 2')": {
"error": false,
"statements": [
{
"node": {
"type": "SELECT_NODE",
"modifiers": [],
"cte_map": {
"map": []
},
"select_list": [
{
"class": "CONSTANT",
"type": "VALUE_CONSTANT",
"alias": "",
"query_location": 7,
"value": {
"type": {
"id": "INTEGER",
"type_info": null
},
"is_null": false,
"value": 2
}
}
],
"from_table": {
"type": "EMPTY",
"alias": "",
"sample": null,
"query_location": 18446744073709551615
},
"where_clause": null,
"group_expressions": [],
"group_sets": [],
"aggregate_handling": "STANDARD_HANDLING",
"having": null,
"sample": null,
"qualify": null
},
"named_param_map": []
}
]
}
}
]
hamilton
Indeed, we are! We worked with DuckDB Labs to add the query_location information, which we're also enriching with the tokenizer to draw a path through the AST to the cursor location. I've been wanting to do this since forever, and now that we have it, there's actually a long tail of inspection / debugging / enrichment features we can add to our SQL editor.
krferriter
This is a very cool feature. I don't know how useful it is or how I'd use it right now but I think I am going to get into some benchmarking and performance tweaking soon and this could be handy.
xdkyx
Does it work as fast with more complicated queries with joins/havings and large tables?
crazygringo
Edit: never mind, thanks for the replies! I had missed the part where it showed visualizing subqueries, which is what I wanted but didn't think it did. This looks very helpful indeed!
Noumenon72
The article says it does subqueries:
> Getting the AST is a big step forward, but we still need a way to take your cursor position in the editor and map it to a path through this AST. Otherwise, we can’t know which part of the query you're interested in previewing. So we built some simple tools that pair DuckDB’s parser with its tokenizer to enrich the parse tree, which we then use to pinpoint the start and end of all nodes, clauses, and select statements. This cursor-to-AST mapping enables us to show you a preview of exactly the SELECT statement you're working on, no matter where it appears in a complex query.
geysersam
> What would be helpful would be to be able to visualize intermediate results -- if my cursor is inside of a subquery, show me the results of that subquery.
But that's exactly what they show in the blog post??
hamilton
You should read the post! This is what the feature does.
hk1337
First time seeing the from at the top of the query and I am not sure how I feel about it. It seems useful but I am so used to select...from.
I'm assuming it's more of a user preference like commas in front of the field instead of after field?
hamilton
You can use any variation of DuckDB valid syntax that you want! I prefer to put from first just because I think it's better, but Instant SQL works with traditional select __ from __ queries.
ltbarcly3
Yes it comes from a desire to impose intuition from other contexts onto something instead of building intuition with that thing.
SQL is a declarative language. The ordering of the statements was carefully thought through.
I will say it's harmless though, the clauses don't have any dependency in terms of meaning so it's fine to just allow them to be reordered in terms of the meaning of the query, but that's true of lots and lots of things in programming and just having a convention is usually better than allowing anything.
For example, you could totally allow this to be legal:
def
for x in whatever:
print(x)
print_whatever(whatever):
There's nothing ambiguous about it, but why? Like if you are used to seeing it one way it just makes it more confusing to read, and if you aren't used to seeing it the normal way you should at least somewhat master something before you try to improve it through cosmetic tweaks.I think you see this all the time, people try to impose their own comfort onto things for no actual improvement.
whstl
No, it comes from wanting to make autocompletion easier and to make variable scoping/method ordering make sense within LINQ. It is an actual improvement in this regard.
LINQ popularized it and others followed. It does what it says.
Btw: saying that "people try to impose their own comfort" is uncalled for.
ltbarcly3
In that case you are just objectively incorrect, you can build a far, far more efficient autocomplete in the standard query order. I will guess something like half as many keystrokes to type the same select and from clauses. You are imagining a very niave autocomplete that can only guess columns after it knows the tables, but in reality you can guess most of the columns, including the first one, the tables, and the aliases. Names in dbs are incredibly sparse, and duplicate names don't make autocomplete less effective.
If you are right about why they did it its even dumber than my reason, they are changing a language grammar to let them make a much worse solution to the same problem.
null
DuckDb is missing a killer feature by not having a pipe syntax like kusto or google's pipe query syntax.
Why is it a killer feature? First of all, LLMs complete text from left to right. That alone is a killer feature.
But for us meatboxes with less compute power, pipe syntax allow (much better) code completion.
Pipe syntax is delightful to work with and makes going back to SQL a real bummer moment (please insert meme of Kate Perry kissing the earth here).