Skip to content

hook

hook

__all__ = ['BaseHook', 'DatalakeHook', 'EmailHook', 'FileHook', 'FtpHook', 'QueueHook', 'SecretManagerHook', 'LLMHook'] module-attribute

BaseHook

Bases: BaseClass

Base class for hooks in the system.

__init__() -> None

Initializes the BaseHook.

DatalakeHook

Bases: BaseHook

DatalakeHook class is designer to write data to the datalake and must be implemented by a specific datalake vendor class

Inherits from

BaseHook: The base class for hooks in the airless framework.

__init__()

Initializes the DatalakeHook.

build_metadata(message_id: Optional[int], origin: Optional[str]) -> Dict[str, Any]

Builds metadata for the data being sent.

Parameters:

Name Type Description Default
message_id Optional[int]

The message ID.

required
origin Optional[str]

The origin of the data.

required

Returns:

Type Description
Dict[str, Any]

Dict[str, Any]: The metadata dictionary.

prepare_row(row: Any, metadata: Dict[str, Any], now: datetime) -> Dict[str, Any]

Prepares a row for insertion into the datalake.

Parameters:

Name Type Description Default
row Any

The row data.

required
metadata Dict[str, Any]

The metadata for the row.

required
now datetime

The current timestamp.

required

Returns:

Type Description
Dict[str, Any]

Dict[str, Any]: The prepared row.

prepare_rows(data: Any, metadata: Dict[str, Any]) -> Tuple[List[Dict[str, Any]], datetime]

Prepares multiple rows for insertion into the datalake.

Parameters:

Name Type Description Default
data Any

The data to prepare.

required
metadata Dict[str, Any]

The metadata for the rows.

required

Returns:

Type Description
Tuple[List[Dict[str, Any]], datetime]

Tuple[List[Dict[str, Any]], datetime]: The prepared rows and the current timestamp.

send_to_landing_zone(data: Any, dataset: str, table: str, message_id: Optional[int], origin: Optional[str], time_partition: bool = False) -> Union[str, None]

Sends data to the landing zone. This method must be implemented by the vendor specific class

Parameters:

Name Type Description Default
data Any

The data to send.

required
dataset str

The dataset name.

required
table str

The table name.

required
message_id Optional[int]

The message ID.

required
origin Optional[str]

The origin of the data.

required
time_partition bool

Whether to use time partitioning. Defaults to False.

False

Returns:

Type Description
Union[str, None]

Union[str, None]: The path to the uploaded file or None.

EmailHook

Bases: BaseHook

EmailHook class to build and send email messages.

This class is responsible for constructing email messages that may include attachments and other related information. However, the sending functionality is not implemented.

Inherits from

BaseHook: The base class for hooks in the airless framework.

__init__()

Initializes the EmailHook class.

This constructor calls the superclass constructor.

build_message(subject: str, content: str, recipients: list, sender: str, attachments: list = [], mime_type: str = 'plain') -> Union[MIMEMultipart, MIMEText]

Builds an email message with optional attachments.

Parameters:

Name Type Description Default
subject str

The subject of the email.

required
content str

The body content of the email.

required
recipients list

A list of recipient email addresses.

required
sender str

The email address of the sender.

required
attachments list

A list of attachment dictionaries. Each dictionary should contain 'name', 'content', and optionally 'type'. Defaults to an empty list.

[]
mime_type str

The MIME type of the email body content. Defaults to 'plain'.

'plain'

Returns:

Type Description
Union[MIMEMultipart, MIMEText]

Union[MIMEMultipart, MIMEText]: The constructed email message object.

send(subject: str, content: str, recipients: list, sender: str, attachments: list, mime_type: str)

Sends the constructed email message.

This method is not implemented and will raise a NotImplementedError.

Parameters:

Name Type Description Default
subject str

The subject of the email.

required
content str

The body content of the email.

required
recipients list

A list of recipient email addresses.

required
sender str

The email address of the sender.

required
attachments list

A list of attachment dictionaries.

required
mime_type str

The MIME type of the email body content.

required

Raises:

Type Description
NotImplementedError

This method has not been implemented.

FileHook

Bases: BaseHook

FileHook class for handling file operations.

This class provides methods to write data to local files in various formats (JSON and NDJSON), download files, rename files, and list files in a directory.

Inherits from

BaseHook: The base class for hooks in the airless framework.

__init__()

Initializes a new instance of the FileHook class.

download(url: str, headers: dict, timeout: int = 500, proxies: dict = None) -> str

Downloads a file from a given URL and saves it to a temporary path.

Parameters:

Name Type Description Default
url str

The URL of the file to download.

required
headers dict

The headers to include in the request.

required
timeout int

The request timeout in seconds. Defaults to 500.

500
proxies dict

Proxy settings for the request. Defaults to None.

None

Returns:

Name Type Description
str str

The local filename where the downloaded file is saved.

extract_filename(filepath_or_url: str) -> str

Extracts the filename from a filepath or URL.

Parameters:

Name Type Description Default
filepath_or_url str

The original file path or URL.

required

Returns:

Name Type Description
str str

The extracted filename.

get_tmp_filepath(filepath_or_url: str, **kwargs) -> str

Generates a temporary file path based on the provided filepath or URL.

Parameters:

Name Type Description Default
filepath_or_url str

The original file path or URL from which the filename is extracted.

required
Kwargs

add_timestamp (bool, optional): If True, a timestamp and a UUID will be prefixed to the filename to ensure uniqueness. Defaults to True.

Returns:

Name Type Description
str str

The temporary file path.

list_files(folder: str) -> list

Lists all files in a specified directory.

Parameters:

Name Type Description Default
folder str

The folder path to search for files.

required

Returns:

Name Type Description
list list

A list of file paths found in the directory.

rename(from_filename: str, to_filename: str) -> str

Renames a file from the original filename to the new filename.

Parameters:

Name Type Description Default
from_filename str

The original filename to rename.

required
to_filename str

The new filename.

required

Returns:

Name Type Description
str str

The new filename after renaming.

rename_files(dir, prefix)

Renames all files in a directory by prepending a prefix.

Parameters:

Name Type Description Default
dir str

The directory containing files to rename.

required
prefix str

The prefix to prepend to each file name.

required

write(local_filepath: str, data: Any, **kwargs) -> None

Writes data to a local file with support for JSON and NDJSON formats.

Parameters:

Name Type Description Default
local_filepath str

The path to the local file where the data will be written.

required
data Any

The data to write to the file. It can be a string, dictionary, list, or any other type that can be serialized to JSON or converted to a string.

required

Kwargs: use_ndjson (bool): If True and the data is a dictionary or list, the data will be written in NDJSON format. Defaults to False. mode (str): The mode in which the file is opened. Common modes include: - 'w': Write mode, which overwrites the file if it exists. - 'wb': Write binary mode, which overwrites the file if it exists. Defaults to 'w'.

FtpHook

Bases: FileHook

FtpHook class for handling FTP file operations.

This class extends FileHook with methods specific to FTP file operations including connecting to an FTP server, navigating directories, and downloading files.

__init__()

Initializes a new instance of the FtpHook class.

cwd(dir)

Changes the current working directory on the FTP server.

Parameters:

Name Type Description Default
dir str

The directory to change to.

required

dir() -> list

Lists the files and directories in the current directory of the FTP server.

This method retrieves a list of files and directories from the FTP server's current working directory. It populates a list with the directory entries and returns it.

Returns:

Name Type Description
list list

A list of directory entries as strings, each representing a file

list

or directory in the FTP server's current working directory.

download(dir: str, filename: str) -> str

Downloads a file from the FTP server to a temporary local file.

Parameters:

Name Type Description Default
dir str

The directory on the FTP server where the file is located.

required
filename str

The name of the file to download.

required

Returns:

Name Type Description
str str

The local filepath where the downloaded file is saved.

extract_filename(filepath_or_url: str) -> str

Extracts the filename from a filepath or URL.

Parameters:

Name Type Description Default
filepath_or_url str

The original file path or URL.

required

Returns:

Name Type Description
str str

The extracted filename.

get_tmp_filepath(filepath_or_url: str, **kwargs) -> str

Generates a temporary file path based on the provided filepath or URL.

Parameters:

Name Type Description Default
filepath_or_url str

The original file path or URL from which the filename is extracted.

required
Kwargs

add_timestamp (bool, optional): If True, a timestamp and a UUID will be prefixed to the filename to ensure uniqueness. Defaults to True.

Returns:

Name Type Description
str str

The temporary file path.

list(regex: str = None, updated_after=None, updated_before=None) -> tuple

Lists files in the current directory of the FTP server with optional filters.

Parameters:

Name Type Description Default
regex str

A regular expression to filter file names. Defaults to None.

None
updated_after datetime

Filter files updated after this date. Defaults to None.

None
updated_before datetime

Filter files updated before this date. Defaults to None.

None

Returns:

Name Type Description
tuple tuple

A tuple containing two lists: - A list of files (dictionaries with 'name' and 'updated_at'). - A list of directories (dictionaries with 'name' and 'updated_at').

list_files(folder: str) -> list

Lists all files in a specified directory.

Parameters:

Name Type Description Default
folder str

The folder path to search for files.

required

Returns:

Name Type Description
list list

A list of file paths found in the directory.

login(host, user, password)

Logs into the FTP server using the provided credentials.

Parameters:

Name Type Description Default
host str

The FTP server hostname or IP address.

required
user str

The username for the FTP server.

required
password str

The password for the FTP server.

required

rename(from_filename: str, to_filename: str) -> str

Renames a file from the original filename to the new filename.

Parameters:

Name Type Description Default
from_filename str

The original filename to rename.

required
to_filename str

The new filename.

required

Returns:

Name Type Description
str str

The new filename after renaming.

rename_files(dir, prefix)

Renames all files in a directory by prepending a prefix.

Parameters:

Name Type Description Default
dir str

The directory containing files to rename.

required
prefix str

The prefix to prepend to each file name.

required

write(local_filepath: str, data: Any, **kwargs) -> None

Writes data to a local file with support for JSON and NDJSON formats.

Parameters:

Name Type Description Default
local_filepath str

The path to the local file where the data will be written.

required
data Any

The data to write to the file. It can be a string, dictionary, list, or any other type that can be serialized to JSON or converted to a string.

required

Kwargs: use_ndjson (bool): If True and the data is a dictionary or list, the data will be written in NDJSON format. Defaults to False. mode (str): The mode in which the file is opened. Common modes include: - 'w': Write mode, which overwrites the file if it exists. - 'wb': Write binary mode, which overwrites the file if it exists. Defaults to 'w'.

LLMHook

Bases: BaseHook

Base class for Large Language Model (LLM) hooks.

This class provides a basic structure for interacting with LLMs. It includes methods for managing conversation history and generating text completions.

Attributes:

Name Type Description
historic str

A string storing the conversation history.

__init__()

Initializes the LLMHook.

Sets up the conversation history attribute.

generate_completion(content, **kwargs)

Generates a text completion using the LLM.

This method should be implemented by subclasses to interact with a specific LLM API.

Parameters:

Name Type Description Default
content str

The prompt or content to generate a completion for.

required
**kwargs

Additional keyword arguments for the LLM API.

{}

Raises:

Type Description
NotImplementedError

If the method is not implemented by a subclass.

historic_append(text, actor)

Appends text to the conversation history.

Parameters:

Name Type Description Default
text str

The text to append.

required
actor str

The actor who produced the text (e.g., 'user', 'model').

required

QueueHook

Bases: BaseHook

Hook for interacting with a queue system.

__init__() -> None

Initializes the QueueHook.

publish(project: str, topic: str, data: dict) -> None

Publishes data to a specified topic.

Parameters:

Name Type Description Default
project str

The project name.

required
topic str

The topic to publish to.

required
data dict

The data to publish.

required

Raises:

Type Description
NotImplementedError

This method needs to be implemented in a subclass.

SecretManagerHook

Bases: BaseHook

Hook for interacting with a secret management system.

__init__() -> None

Initializes the SecretManagerHook.

add_secret_version(project: str, id: str, value: str) -> None

Adds a new version of a secret.

Parameters:

Name Type Description Default
project str

The project name.

required
id str

The ID of the secret.

required
value str

The value of the secret.

required

Raises:

Type Description
NotImplementedError

This method needs to be implemented in a subclass.

destroy_secret_version(secret_name: str, version: str) -> None

Destroys a specific version of a secret.

Parameters:

Name Type Description Default
secret_name str

The name of the secret.

required
version str

The version of the secret to destroy.

required

Raises:

Type Description
NotImplementedError

This method needs to be implemented in a subclass.

get_secret(project: str, id: str, parse_json: bool = False) -> None

Retrieves a secret.

Parameters:

Name Type Description Default
project str

The project name.

required
id str

The ID of the secret.

required
parse_json bool

Whether to parse the secret as JSON. Defaults to False.

False

Raises:

Type Description
NotImplementedError

This method needs to be implemented in a subclass.

list_secret_versions(secret_name: str, filter: str) -> None

Lists all versions of a specific secret.

Parameters:

Name Type Description Default
secret_name str

The name of the secret.

required
filter str

The filter to apply.

required

Raises:

Type Description
NotImplementedError

This method needs to be implemented in a subclass.

list_secrets() -> None

Lists all secrets.

Raises:

Type Description
NotImplementedError

This method needs to be implemented in a subclass.