hook

`hook`

`all = ['BigqueryHook']` `module-attribute`

`BigqueryHook`

Bases: BaseHook

Hook for interacting with Google BigQuery.

This hook provides methods for managing datasets, tables, and jobs in BigQuery. It uses the google-cloud-bigquery library to communicate with the BigQuery API.

`init()`

Initializes the BigqueryHook.

Creates a BigQuery client instance.

`build_table_id(project, dataset, table)`

Builds a BigQuery table ID string.

Parameters:

Name	Type	Description	Default
`project`	`str`	The Google Cloud project ID.	required
`dataset`	`str`	The BigQuery dataset ID.	required
`table`	`str`	The BigQuery table ID.	required

Returns:

Name	Type	Description
`str`		The fully qualified table ID in the format 'project.dataset.table'.

`execute_load_job(from_filepath, to_project, to_dataset, to_table, job_config, timeout=240)`

Executes a BigQuery load job from a URI.

Parameters:

Name	Type	Description	Default
`from_filepath`	`str`	The GCS URI of the source file.	required
`to_project`	`str`	The Google Cloud project ID for the destination table.	required
`to_dataset`	`str`	The BigQuery dataset ID for the destination table.	required
`to_table`	`str`	The BigQuery table ID for the destination table.	required
`job_config`	`LoadJobConfig`	The configured load job.	required
`timeout`	`int`	The timeout for the job in seconds. Defaults to 240.	`240`

`execute_query_job(query, to_project, to_dataset, to_table, to_write_disposition, to_time_partitioning, timeout=480)`

Executes a BigQuery query job.

Parameters:

Name	Type	Description	Default
`query`	`str`	The SQL query to execute.	required
`to_project`	`str`	The Google Cloud project ID for the destination table (if any).	required
`to_dataset`	`str`	The BigQuery dataset ID for the destination table (if any).	required
`to_table`	`str`	The BigQuery table ID for the destination table (if any).	required
`to_write_disposition`	`str`	The write disposition if writing to a table.	required
`to_time_partitioning`	`dict`	Configuration for time-based partitioning if writing to a table.	required
`timeout`	`int`	The timeout for the job in seconds. Defaults to 480.	`480`

Raises:

Type	Description
`TimeoutError`	If the job times out.

`export_to_gcs(from_project, from_dataset, from_table, to_filepath)`

Exports a BigQuery table to Google Cloud Storage (GCS).

Parameters:

Name	Type	Description	Default
`from_project`	`str`	The Google Cloud project ID of the source table.	required
`from_dataset`	`str`	The BigQuery dataset ID of the source table.	required
`from_table`	`str`	The BigQuery table ID of the source table.	required
`to_filepath`	`str`	The GCS URI where the table will be exported.	required

`get_all_columns(rows)`

Gets a unique set of all column names from a list of rows.

Parameters:

Name	Type	Description	Default
`rows`	`list`	A list of dictionaries, where each dictionary represents a row.	required

Returns:

Name	Type	Description
`set`		A set of unique column names.

`get_dataset(dataset)`

Gets a BigQuery dataset, creating it if it doesn't exist.

Parameters:

Name	Type	Description	Default
`dataset`	`str`	The BigQuery dataset ID.	required

Returns:

Type	Description
	google.cloud.bigquery.dataset.Dataset: The BigQuery dataset.

`get_query_results(query, timeout=480)`

Executes a query and returns the results.

Parameters:

Name	Type	Description	Default
`query`	`str`	The SQL query to execute.	required
`timeout`	`int`	The timeout for the query in seconds. Defaults to 480.	`480`

Returns:

Type	Description
	google.cloud.bigquery.table.RowIterator: An iterator of rows resulting from the query.

Raises:

Type	Description
`TimeoutError`	If the query times out.

`get_rows_from_table(project, dataset, table, timeout=480)`

Retrieves all rows from a BigQuery table.

Parameters:

Name	Type	Description	Default
`project`	`str`	The Google Cloud project ID.	required
`dataset`	`str`	The BigQuery dataset ID.	required
`table`	`str`	The BigQuery table ID.	required
`timeout`	`int`	The timeout for the query in seconds. Defaults to 480.	`480`

Returns:

Type	Description
	google.cloud.bigquery.table.RowIterator: An iterator of rows from the table.

`get_table(project, dataset, table, schema, partition_column)`

Gets a BigQuery table, creating it if it doesn't exist.

Parameters:

Name	Type	Description	Default
`project`	`str`	The Google Cloud project ID.	required
`dataset`	`str`	The BigQuery dataset ID.	required
`table`	`str`	The BigQuery table ID.	required
`schema`	`list`	A list of dictionaries representing the table schema. Each dictionary should have 'key', 'type', and 'mode' keys.	required
`partition_column`	`str`	The name of the column to use for time-based partitioning. If None, the table will not be partitioned.	required

Returns:

Type	Description
	google.cloud.bigquery.table.Table: The BigQuery table.

`list_datasets()`

Lists all datasets in the current project.

Returns:

Type	Description
	google.cloud.bigquery.dataset.DatasetListItem: An iterator of dataset list items.

`load_file(from_filepath, from_file_format, from_separator, from_skip_leading_rows, from_quote_character, from_encoding, to_project, to_dataset, to_table, to_mode, to_schema, to_time_partitioning)`

Loads data from a file in GCS to a BigQuery table.

Parameters:

Name	Type	Description	Default
`from_filepath`	`str`	The GCS URI of the source file.	required
`from_file_format`	`str`	The format of the source file.	required
`from_separator`	`str`	The delimiter for CSV files.	required
`from_skip_leading_rows`	`int`	Number of leading rows to skip for CSV.	required
`from_quote_character`	`str`	Quote character for CSV files.	required
`from_encoding`	`str`	File encoding.	required
`to_project`	`str`	Destination Google Cloud project ID.	required
`to_dataset`	`str`	Destination BigQuery dataset ID.	required
`to_table`	`str`	Destination BigQuery table ID.	required
`to_mode`	`str`	Write disposition (e.g., 'overwrite', 'WRITE_APPEND').	required
`to_schema`	`list`	Schema for the destination table.	required
`to_time_partitioning`	`dict`	Configuration for time-based partitioning.	required

`setup_job_config(from_file_format, from_separator, from_skip_leading_rows, from_quote_character, from_encoding, to_mode, to_schema, to_time_partitioning)`

Configures a BigQuery load job.

Parameters:

Name	Type	Description	Default
`from_file_format`	`str`	The format of the source file (e.g., 'csv', 'json').	required
`from_separator`	`str`	The delimiter used in CSV files.	required
`from_skip_leading_rows`	`int`	The number of leading rows to skip in CSV files.	required
`from_quote_character`	`str`	The character used to quote fields in CSV files.	required
`from_encoding`	`str`	The encoding of the source file.	required
`to_mode`	`str`	The write disposition for the load job (e.g., 'overwrite', 'WRITE_APPEND').	required
`to_schema`	`list`	The schema for the destination table. If None, autodetect is used.	required
`to_time_partitioning`	`dict`	Configuration for time-based partitioning. Should include 'type' and 'field'.	required

Returns:

Type	Description
	google.cloud.bigquery.job.LoadJobConfig: The configured load job object.

Raises:

Type	Description
`Exception`	If the file format is not supported.

`update_table_schema(bq_table, rows)`

Updates the schema of a BigQuery table if new columns are present in the rows.

Parameters:

Name	Type	Description	Default
`bq_table`	`Table`	The BigQuery table object.	required
`rows`	`list`	A list of dictionaries representing the rows to be inserted.	required

Returns:

Type	Description
	google.cloud.bigquery.table.Table: The updated BigQuery table object.

`write(project, dataset, table, schema, partition_column, rows)`

Writes rows to a BigQuery table.

This method ensures the dataset and table exist, updates the table schema if necessary, and then inserts the rows.

Parameters:

Name	Type	Description	Default
`project`	`str`	The Google Cloud project ID.	required
`dataset`	`str`	The BigQuery dataset ID.	required
`table`	`str`	The BigQuery table ID.	required
`schema`	`list`	A list of dictionaries representing the table schema.	required
`partition_column`	`str`	The name of the column for time-based partitioning.	required
`rows`	`list`	A list of dictionaries representing the rows to insert.	required

Raises:

Type	Description
`Exception`	If there are errors during the insertion process.

hook

hook

__all__ = ['BigqueryHook'] module-attribute

BigqueryHook

__init__()

build_table_id(project, dataset, table)

execute_load_job(from_filepath, to_project, to_dataset, to_table, job_config, timeout=240)

execute_query_job(query, to_project, to_dataset, to_table, to_write_disposition, to_time_partitioning, timeout=480)

export_to_gcs(from_project, from_dataset, from_table, to_filepath)

get_all_columns(rows)

get_dataset(dataset)

get_query_results(query, timeout=480)

get_rows_from_table(project, dataset, table, timeout=480)

get_table(project, dataset, table, schema, partition_column)

list_datasets()

load_file(from_filepath, from_file_format, from_separator, from_skip_leading_rows, from_quote_character, from_encoding, to_project, to_dataset, to_table, to_mode, to_schema, to_time_partitioning)

setup_job_config(from_file_format, from_separator, from_skip_leading_rows, from_quote_character, from_encoding, to_mode, to_schema, to_time_partitioning)

update_table_schema(bq_table, rows)

write(project, dataset, table, schema, partition_column, rows)

`hook`

`all = ['BigqueryHook']` `module-attribute`

`BigqueryHook`

`init()`

`build_table_id(project, dataset, table)`

`execute_load_job(from_filepath, to_project, to_dataset, to_table, job_config, timeout=240)`

`execute_query_job(query, to_project, to_dataset, to_table, to_write_disposition, to_time_partitioning, timeout=480)`

`export_to_gcs(from_project, from_dataset, from_table, to_filepath)`

`get_all_columns(rows)`

`get_dataset(dataset)`

`get_query_results(query, timeout=480)`

`get_rows_from_table(project, dataset, table, timeout=480)`

`get_table(project, dataset, table, schema, partition_column)`

`list_datasets()`

`load_file(from_filepath, from_file_format, from_separator, from_skip_leading_rows, from_quote_character, from_encoding, to_project, to_dataset, to_table, to_mode, to_schema, to_time_partitioning)`

`setup_job_config(from_file_format, from_separator, from_skip_leading_rows, from_quote_character, from_encoding, to_mode, to_schema, to_time_partitioning)`

`update_table_schema(bq_table, rows)`

`write(project, dataset, table, schema, partition_column, rows)`