There are two ways to write SQL in Hex: against a Data Connection, or against a dataframe. Together, these two options let you execute some very powerful workflows.
This is the SQL option you are probably most familiar with. Once you have configured a Data Connection, you can query it though a SQL cell. If your Org has set up Shared Data Connections, these will also be available to choose from.
First, add a new SQL cell and select a data source:
You can then write SQL directly in the cell, using the SQL dialect of the selected connection:
SQL Cells have an integrated preview which shows up to 10 rows.
Query results are returned as dataframes. The name of the dataframe is easily changed by clicking the grey "Returned as dataframe" title and editing. The dataframe is automatically re-named and does not require the SQL cell be re-executed to take effect.
We return query results as pandas dataframes, which requires that some database-unique column types be transformed into pandas supported datatypes. We try to keep the conversion between database column types to pandas type as consistent as possible between database types.
Generally, the following mappings are applied:
database column type
pandas column type
int (with null values)
Some notable differences between database column types and pandas data types:
int type column which contains nulls will be forced to a
float type in the output dataframe. This is due to how pandas handles
Depending on the database, Decimal-type columns are returned with varying default number of decimals places (PostgrSQL: 18, BigQuery: 9, Snowflake: 1)
Hex also lets you write SQL against any dataframe in the current project, including the results of previous SQL Cells. Dataframe SQL queries execute right in Hex using DuckDB, so they don't incur any usage costs on your data connections.
To use Dataframe SQL, add a SQL Cell and select "Dataframes" as the data source.
You can then reference any dataframe in the FROM or JOIN clauses, using the dataframe name where you would normally use the SQL table name. Dataframe SQL uses a syntax very similar to PostgreSQL, and the full SQL reference can be found in the DuckDB docs.
Dataframe indexes are not returned or accessible in a Dataframe SQL query. This also applies to multi-index dataframes, and although queries will not error, you can lose important nested context if you aren't aware of this. If you need to preserve an index, we recommend calling
reset_index() on the dataframe before querying it with SQL.
You can join across dataframes, but you cannot directly join across dataframes and database tables. If you want to join a dataframe and a db table, you'll need to first run a data connection SQL query to get that table as a dataframe in your project. Once everything is a dataframe, you can join the two together.
This decision is often made for you by the location of data. If data is in a database or warehouse— Use Connection SQL. If it's in a dataframe, well then use Dataframe SQL.
However, you can build clever and complex things by combining these two in more fluid ways, from "Chained SQL" workflows that help with subqueries to cross-database joins. In general, we hope that it Just Works, but this matrix may be helpful when trying to decide which feature to use:
Can talk to SQL databases
Can use database-specific functions
Can query Python dataframes and variables
Returns results as a dataframe
Can join data across sources
Works with extremely large datasets*
Speed on small — large datasets
Results can be cached
* To be clear, we're talking of upwards of millions of rows / tens of GB here. ** We don't know how fast your DB is, but we hope it feels like a Formula 1 🙂.
Can't remember available table names, column names, or column data types? You can browse your database and dataframe schemas using the schema browser, available in the left side bar with your Data Connections. From there, you can explore the database schema, tables, and columns available by clicking the
> button for the desired connection. Available dataframes can also be inspected in the Schema browser.
You can also click the "Browse" button on any SQL cell to jump to the current data source in the schema browser.
If you need to update the cached schema information, click the icon next to the data connection name in the schema browser and select refresh.