Querying Data in Staged Files

Databend allows you to directly query data in the files stored in one of the following locations without loading them into a table:

User stage, internal stage, or external stage.
Bucket or container created in your object storage, such as Amazon S3, Google Cloud Storage, and Microsoft Azure.
Remote server.

During this process, the schema is automatically detected, just like with the infer_schema function. This feature can be particularly useful for inspecting or viewing the contents of staged files, whether it's before or after loading data.

note

This feature is currently only available for the Parquet file format.

Syntax and Parameters

SELECT <columns> FROM
{@<stage_name>[/<path>] | '<uri>'} [(
  [ PATTERN => '<regex_pattern>']
  [ FILE_FORMAT => '<format_name>']
  [ FILES => ( 'file_name' [ , 'file_name' ... ] ) ]
  [ ENDPOINT_URL => <'url'> ]
  [ AWS_KEY_ID => <'aws_key_id'> ]
  [ AWS_KEY_SECRET => <'aws_key_secret'> ]
  [ ACCESS_KEY_ID => <'access_key_id'> ]
  [ ACCESS_KEY_SECRET => <'access_key_secret'> ]
  [ SECRET_ACCESS_KEY => <'secret_access_key'> ]
  [ SESSION_TOKEN => <'session_token'> ]
  [ REGION => <'region'> ]
  [ ENABLE_VIRTUAL_HOST_STYLE => true|false ]
)]

FILE_FORMAT

The file format must be one of the following:

Built-in file format, see Input & Output File Formats.
Named file format created by CREATE FILE FORMAT.

PATTERN

A PCRE2-based regular expression pattern string, enclosed in single quotes, specifying the file names to match. Click here to see an example. For PCRE2 syntax, see http://www.pcre.org/current/doc/html/pcre2syntax.html.

FILES

Specifies one or a list of file names (separated by commas).

Others

To query data files in a bucket or container, provide necessary connection information with the following parameters:

ENDPOINT_URL
AWS_KEY_ID
AWS_SECRET_KEY
ACCESS_KEY_ID
ACCESS_KEY_SECRET
SECRET_ACCESS_KEY
SESSION_TOKEN
REGION
ENABLE_VIRTUAL_HOST_STYLE

They are explained in Create Stage.

Examples

Example 1: Querying Data in a Parquet File

This example shows how to query data in a Parquet file stored in different locations. Click the tabs below to see details.

Stages
Bucket
Remote

Let's assume you have a sample file named books.parquet and you have uploaded it to your user stage, an internal stage named my_internal_stage, and an external stage named my_external_stage. To upload files to a stage, use the File Upload API.

-- Query file in user stage
SELECT * FROM @~/books.parquet;

-- Query file in internal stage
SELECT * FROM @my_internal_stage/books.parquet;

-- Query file in external stage
SELECT * FROM @my_external_stage/books.parquet;

Let's assume you have a sample file named books.parquet stored in a bucket named databend-toronto on Amazon S3 in the region us-east-2. You can query the data by specifying the connection parameters:

SELECT *  FROM 's3://databend-toronto' 
(
 access_key_id => '<your-access-key-id>', 
 secret_access_key => '<your-secret-access-key>',
 endpoint_url => 'https://databend-toronto.s3.us-east-2.amazonaws.com',
 region => 'us-east-2',
 files => ('books.parquet')
);  

Let's assume you have a sample file named books.parquet stored in a remote server. You can query the data by specifying the file URI:

SELECT * FROM 'https://datafuse-1253727613.cos.ap-hongkong.myqcloud.com/data/books.parquet';

Example 2: Filtering Files with Pattern Matching

Let's assume you have a folder containing the following Parquet files with the same schema, as well as some files of other formats. You can use the PATTERN parameter to filter all the Parquet files in the folder.

data/
  ├── 2022-01-01.parquet
  ├── 2022-01-02.parquet
  ├── 2022-01-03.parquet
  ├── 2022-01-04.parquet
  └── 2022-01-05.parquet

For example, you can query the "name" and "age" columns from all Parquet files in the folder:

SELECT name, age FROM @internal_stage/data/
(file_format => 'parquet', pattern => '.*parquet');

Querying Data in Staged Files

Syntax and Parameters

FILE_FORMAT

PATTERN

FILES

Others

Examples

Example 1: Querying Data in a Parquet File

Example 2: Filtering Files with Pattern Matching

Join our growing community

GitHub

Slack

Twitter

YouTube

Syntax and Parameters​

FILE_FORMAT​

PATTERN​

FILES​

Others​

Examples​

Example 1: Querying Data in a Parquet File​

Example 2: Filtering Files with Pattern Matching​

Join our growing community

GitHub

Slack

Twitter

YouTube

Syntax and Parameters

FILE_FORMAT

PATTERN

FILES

Others

Examples

Example 1: Querying Data in a Parquet File

Example 2: Filtering Files with Pattern Matching