SQL cells

Querying Data with SQL Cells

We've added a schema browser to your Data Connection details -- check it out!

There are two ways to write SQL in Hex: against a Data Connection, or against a dataframe. Together, these two options let you execute some very powerful workflows.

Data Connection SQL

This is the SQL option you are probably most familiar with. Once you have configured a Data Connection, you can query it though a SQL cell. If your Org has set up Shared Data Connections, these will also be available to choose from.

First, add a new SQL cell and select a data source:

You can then write SQL directly in the cell, using the SQL dialect of the selected connection:

Running an uncached SQL Cell will run that query against your database. To avoid accidentally running expensive queries every time you re-run a project, you can cache query results.

If you know you'll be running a lot of queries while developing, it can be smart to add a temporary LIMIT clause to potentially large queries.

SQL Cells have an integrated preview which shows up to 10 rows.

Query results are returned as dataframes. The name of the dataframe is easily changed by clicking the grey "Returned as dataframe" title and editing. The dataframe is automatically re-named and does not require the SQL cell be re-executed to take effect.

Query Datatypes

We return query results as pandas dataframes, which requires that some database-unique column types be transformed into pandas supported datatypes. We try to keep the conversion between database column types to pandas type as consistent as possible between database types.

Generally, the following mappings are applied:

database column type

pandas column type

int

int

int (with null values)

float

float

float

decimal

object

string

object

Some notable differences between database column types and pandas data types:

  • Any int type column which contains nulls will be forced to a float type in the output dataframe. This is due to how pandas handles

  • Depending on the database, Decimal-type columns are returned with varying default number of decimals places (PostgrSQL: 18, BigQuery: 9, Snowflake: 1)

Dataframe SQL

Querying a pandas dataframe using SQL

Hex also lets you write SQL against any dataframe in the current project, including the results of previous SQL Cells. Dataframe SQL queries execute right in Hex using DuckDB, so they don't incur any usage costs on your data connections.

Dataframe SQL executes in-memory, which makes it really fast for most queries, but means it can't stack up to a cloud data warehouse for very large queries. If your query processes a lot of data and seems to be running slow, be mindful of whether or not it has to be Dataframe SQL.

To use Dataframe SQL, add a SQL Cell and select "Dataframes" as the data source.

You can then reference any dataframe in the FROM or JOIN clauses, using the dataframe name where you would normally use the SQL table name. Dataframe SQL uses a syntax very similar to PostgreSQL, and the full SQL reference can be found in the DuckDB docs.

Dataframe indexes are not returned or accessible in a Dataframe SQL query. This also applies to multi-index dataframes, and although queries will not error, you can lose important nested context if you aren't aware of this. If you need to preserve an index, we recommend calling reset_index() on the dataframe before querying it with SQL.

You can join across dataframes, but you cannot directly join across dataframes and database tables. If you want to join a dataframe and a db table, you'll need to first run a data connection SQL query to get that table as a dataframe in your project. Once everything is a dataframe, you can join the two together.

Caching is not available for Dataframe SQL. This is because Dataframe SQL doesn't hit your database, so caching wouldn't protect you from any accidental spend.

Also, it's because Dataframe SQL runs in-memory and tends to be very quick :)

Why use Connection vs Dataframe SQL?

This decision is often made for you by the location of data. If data is in a database or warehouse— Use Connection SQL. If it's in a dataframe, well then use Dataframe SQL.

However, you can build clever and complex things by combining these two in more fluid ways, from "Chained SQL" workflows that help with subqueries to cross-database joins. In general, we hope that it Just Works, but this matrix may be helpful when trying to decide which feature to use:

Target Use

Connection SQL

Dataframe SQL

Can talk to SQL databases

Can use database-specific functions

Can query Python dataframes and variables

Returns results as a dataframe

Can join data across sources

Works with extremely large datasets*

Speed on small — large datasets

🏎️**

🚀

Results can be cached

* To be clear, we're talking of upwards of millions of rows / tens of GB here. ** We don't know how fast your DB is, but we hope it feels like a Formula 1 🙂.

Schema Browser

Can't remember available table names, column names, or column data types? You can browse your database and dataframe schemas using the schema browser, available in the left side bar with your Data Connections. From there, you can explore the database schema, tables, and columns available by clicking the > button for the desired connection. Available dataframes can also be inspected in the Schema browser.

A table level view in the Schema Browser

You can also click the "Browse" button on any SQL cell to jump to the current data source in the schema browser.

Initial load time for full schema details can take a few seconds. We cache schema results so that future references to the database details are immediately available.

Refreshing the Cache

If you need to update the cached schema information, click the icon next to the data connection name in the schema browser and select refresh.