hook
hook
__all__ = ['BaseHook', 'DatalakeHook', 'EmailHook', 'FileHook', 'FtpHook', 'QueueHook', 'SecretManagerHook', 'LLMHook']
module-attribute
BaseHook
Bases: BaseClass
Base class for hooks in the system.
__init__() -> None
Initializes the BaseHook.
DatalakeHook
Bases: BaseHook
DatalakeHook class is designer to write data to the datalake and must be implemented by a specific datalake vendor class
Inherits from
BaseHook: The base class for hooks in the airless framework.
__init__()
Initializes the DatalakeHook.
build_metadata(message_id: Optional[int], origin: Optional[str]) -> Dict[str, Any]
Builds metadata for the data being sent.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
message_id
|
Optional[int]
|
The message ID. |
required |
origin
|
Optional[str]
|
The origin of the data. |
required |
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
Dict[str, Any]: The metadata dictionary. |
prepare_row(row: Any, metadata: Dict[str, Any], now: datetime) -> Dict[str, Any]
Prepares a row for insertion into the datalake.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
row
|
Any
|
The row data. |
required |
metadata
|
Dict[str, Any]
|
The metadata for the row. |
required |
now
|
datetime
|
The current timestamp. |
required |
Returns:
Type | Description |
---|---|
Dict[str, Any]
|
Dict[str, Any]: The prepared row. |
prepare_rows(data: Any, metadata: Dict[str, Any]) -> Tuple[List[Dict[str, Any]], datetime]
Prepares multiple rows for insertion into the datalake.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
Any
|
The data to prepare. |
required |
metadata
|
Dict[str, Any]
|
The metadata for the rows. |
required |
Returns:
Type | Description |
---|---|
Tuple[List[Dict[str, Any]], datetime]
|
Tuple[List[Dict[str, Any]], datetime]: The prepared rows and the current timestamp. |
send_to_landing_zone(data: Any, dataset: str, table: str, message_id: Optional[int], origin: Optional[str], time_partition: bool = False) -> Union[str, None]
Sends data to the landing zone. This method must be implemented by the vendor specific class
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
Any
|
The data to send. |
required |
dataset
|
str
|
The dataset name. |
required |
table
|
str
|
The table name. |
required |
message_id
|
Optional[int]
|
The message ID. |
required |
origin
|
Optional[str]
|
The origin of the data. |
required |
time_partition
|
bool
|
Whether to use time partitioning. Defaults to False. |
False
|
Returns:
Type | Description |
---|---|
Union[str, None]
|
Union[str, None]: The path to the uploaded file or None. |
EmailHook
Bases: BaseHook
EmailHook class to build and send email messages.
This class is responsible for constructing email messages that may include attachments and other related information. However, the sending functionality is not implemented.
Inherits from
BaseHook: The base class for hooks in the airless framework.
__init__()
Initializes the EmailHook class.
This constructor calls the superclass constructor.
build_message(subject: str, content: str, recipients: list, sender: str, attachments: list = [], mime_type: str = 'plain') -> Union[MIMEMultipart, MIMEText]
Builds an email message with optional attachments.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
subject
|
str
|
The subject of the email. |
required |
content
|
str
|
The body content of the email. |
required |
recipients
|
list
|
A list of recipient email addresses. |
required |
sender
|
str
|
The email address of the sender. |
required |
attachments
|
list
|
A list of attachment dictionaries. Each dictionary should contain 'name', 'content', and optionally 'type'. Defaults to an empty list. |
[]
|
mime_type
|
str
|
The MIME type of the email body content. Defaults to 'plain'. |
'plain'
|
Returns:
Type | Description |
---|---|
Union[MIMEMultipart, MIMEText]
|
Union[MIMEMultipart, MIMEText]: The constructed email message object. |
send(subject: str, content: str, recipients: list, sender: str, attachments: list, mime_type: str)
Sends the constructed email message.
This method is not implemented and will raise a NotImplementedError.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
subject
|
str
|
The subject of the email. |
required |
content
|
str
|
The body content of the email. |
required |
recipients
|
list
|
A list of recipient email addresses. |
required |
sender
|
str
|
The email address of the sender. |
required |
attachments
|
list
|
A list of attachment dictionaries. |
required |
mime_type
|
str
|
The MIME type of the email body content. |
required |
Raises:
Type | Description |
---|---|
NotImplementedError
|
This method has not been implemented. |
FileHook
Bases: BaseHook
FileHook class for handling file operations.
This class provides methods to write data to local files in various formats (JSON and NDJSON), download files, rename files, and list files in a directory.
Inherits from
BaseHook: The base class for hooks in the airless framework.
__init__()
Initializes a new instance of the FileHook class.
download(url: str, headers: dict, timeout: int = 500, proxies: dict = None) -> str
Downloads a file from a given URL and saves it to a temporary path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
url
|
str
|
The URL of the file to download. |
required |
headers
|
dict
|
The headers to include in the request. |
required |
timeout
|
int
|
The request timeout in seconds. Defaults to 500. |
500
|
proxies
|
dict
|
Proxy settings for the request. Defaults to None. |
None
|
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The local filename where the downloaded file is saved. |
extract_filename(filepath_or_url: str) -> str
Extracts the filename from a filepath or URL.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filepath_or_url
|
str
|
The original file path or URL. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The extracted filename. |
get_tmp_filepath(filepath_or_url: str, **kwargs) -> str
Generates a temporary file path based on the provided filepath or URL.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filepath_or_url
|
str
|
The original file path or URL from which the filename is extracted. |
required |
Kwargs
add_timestamp (bool, optional):
If True
, a timestamp and a UUID will be prefixed to the filename to ensure uniqueness.
Defaults to True
.
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The temporary file path. |
list_files(folder: str) -> list
Lists all files in a specified directory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
folder
|
str
|
The folder path to search for files. |
required |
Returns:
Name | Type | Description |
---|---|---|
list |
list
|
A list of file paths found in the directory. |
rename(from_filename: str, to_filename: str) -> str
Renames a file from the original filename to the new filename.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
from_filename
|
str
|
The original filename to rename. |
required |
to_filename
|
str
|
The new filename. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The new filename after renaming. |
rename_files(dir, prefix)
Renames all files in a directory by prepending a prefix.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dir
|
str
|
The directory containing files to rename. |
required |
prefix
|
str
|
The prefix to prepend to each file name. |
required |
write(local_filepath: str, data: Any, **kwargs) -> None
Writes data to a local file with support for JSON and NDJSON formats.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
local_filepath
|
str
|
The path to the local file where the data will be written. |
required |
data
|
Any
|
The data to write to the file. It can be a string, dictionary, list, or any other type that can be serialized to JSON or converted to a string. |
required |
Kwargs:
use_ndjson (bool):
If True
and the data is a dictionary or list, the data will be
written in NDJSON format. Defaults to False
.
mode (str):
The mode in which the file is opened. Common modes include:
- 'w'
: Write mode, which overwrites the file if it exists.
- 'wb'
: Write binary mode, which overwrites the file if it exists.
Defaults to 'w'
.
FtpHook
Bases: FileHook
FtpHook class for handling FTP file operations.
This class extends FileHook with methods specific to FTP file operations including connecting to an FTP server, navigating directories, and downloading files.
__init__()
Initializes a new instance of the FtpHook class.
cwd(dir)
Changes the current working directory on the FTP server.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dir
|
str
|
The directory to change to. |
required |
dir() -> list
Lists the files and directories in the current directory of the FTP server.
This method retrieves a list of files and directories from the FTP server's current working directory. It populates a list with the directory entries and returns it.
Returns:
Name | Type | Description |
---|---|---|
list |
list
|
A list of directory entries as strings, each representing a file |
list
|
or directory in the FTP server's current working directory. |
download(dir: str, filename: str) -> str
Downloads a file from the FTP server to a temporary local file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dir
|
str
|
The directory on the FTP server where the file is located. |
required |
filename
|
str
|
The name of the file to download. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The local filepath where the downloaded file is saved. |
extract_filename(filepath_or_url: str) -> str
Extracts the filename from a filepath or URL.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filepath_or_url
|
str
|
The original file path or URL. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The extracted filename. |
get_tmp_filepath(filepath_or_url: str, **kwargs) -> str
Generates a temporary file path based on the provided filepath or URL.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filepath_or_url
|
str
|
The original file path or URL from which the filename is extracted. |
required |
Kwargs
add_timestamp (bool, optional):
If True
, a timestamp and a UUID will be prefixed to the filename to ensure uniqueness.
Defaults to True
.
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The temporary file path. |
list(regex: str = None, updated_after=None, updated_before=None) -> tuple
Lists files in the current directory of the FTP server with optional filters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
regex
|
str
|
A regular expression to filter file names. Defaults to None. |
None
|
updated_after
|
datetime
|
Filter files updated after this date. Defaults to None. |
None
|
updated_before
|
datetime
|
Filter files updated before this date. Defaults to None. |
None
|
Returns:
Name | Type | Description |
---|---|---|
tuple |
tuple
|
A tuple containing two lists: - A list of files (dictionaries with 'name' and 'updated_at'). - A list of directories (dictionaries with 'name' and 'updated_at'). |
list_files(folder: str) -> list
Lists all files in a specified directory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
folder
|
str
|
The folder path to search for files. |
required |
Returns:
Name | Type | Description |
---|---|---|
list |
list
|
A list of file paths found in the directory. |
login(host, user, password)
Logs into the FTP server using the provided credentials.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
host
|
str
|
The FTP server hostname or IP address. |
required |
user
|
str
|
The username for the FTP server. |
required |
password
|
str
|
The password for the FTP server. |
required |
rename(from_filename: str, to_filename: str) -> str
Renames a file from the original filename to the new filename.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
from_filename
|
str
|
The original filename to rename. |
required |
to_filename
|
str
|
The new filename. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The new filename after renaming. |
rename_files(dir, prefix)
Renames all files in a directory by prepending a prefix.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dir
|
str
|
The directory containing files to rename. |
required |
prefix
|
str
|
The prefix to prepend to each file name. |
required |
write(local_filepath: str, data: Any, **kwargs) -> None
Writes data to a local file with support for JSON and NDJSON formats.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
local_filepath
|
str
|
The path to the local file where the data will be written. |
required |
data
|
Any
|
The data to write to the file. It can be a string, dictionary, list, or any other type that can be serialized to JSON or converted to a string. |
required |
Kwargs:
use_ndjson (bool):
If True
and the data is a dictionary or list, the data will be
written in NDJSON format. Defaults to False
.
mode (str):
The mode in which the file is opened. Common modes include:
- 'w'
: Write mode, which overwrites the file if it exists.
- 'wb'
: Write binary mode, which overwrites the file if it exists.
Defaults to 'w'
.
LLMHook
Bases: BaseHook
Base class for Large Language Model (LLM) hooks.
This class provides a basic structure for interacting with LLMs. It includes methods for managing conversation history and generating text completions.
Attributes:
Name | Type | Description |
---|---|---|
historic |
str
|
A string storing the conversation history. |
__init__()
Initializes the LLMHook.
Sets up the conversation history attribute.
generate_completion(content, **kwargs)
Generates a text completion using the LLM.
This method should be implemented by subclasses to interact with a specific LLM API.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
content
|
str
|
The prompt or content to generate a completion for. |
required |
**kwargs
|
Additional keyword arguments for the LLM API. |
{}
|
Raises:
Type | Description |
---|---|
NotImplementedError
|
If the method is not implemented by a subclass. |
historic_append(text, actor)
Appends text to the conversation history.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
str
|
The text to append. |
required |
actor
|
str
|
The actor who produced the text (e.g., 'user', 'model'). |
required |
QueueHook
Bases: BaseHook
Hook for interacting with a queue system.
__init__() -> None
Initializes the QueueHook.
publish(project: str, topic: str, data: dict) -> None
Publishes data to a specified topic.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
project
|
str
|
The project name. |
required |
topic
|
str
|
The topic to publish to. |
required |
data
|
dict
|
The data to publish. |
required |
Raises:
Type | Description |
---|---|
NotImplementedError
|
This method needs to be implemented in a subclass. |
SecretManagerHook
Bases: BaseHook
Hook for interacting with a secret management system.
__init__() -> None
Initializes the SecretManagerHook.
add_secret_version(project: str, id: str, value: str) -> None
Adds a new version of a secret.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
project
|
str
|
The project name. |
required |
id
|
str
|
The ID of the secret. |
required |
value
|
str
|
The value of the secret. |
required |
Raises:
Type | Description |
---|---|
NotImplementedError
|
This method needs to be implemented in a subclass. |
destroy_secret_version(secret_name: str, version: str) -> None
Destroys a specific version of a secret.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
secret_name
|
str
|
The name of the secret. |
required |
version
|
str
|
The version of the secret to destroy. |
required |
Raises:
Type | Description |
---|---|
NotImplementedError
|
This method needs to be implemented in a subclass. |
get_secret(project: str, id: str, parse_json: bool = False) -> None
Retrieves a secret.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
project
|
str
|
The project name. |
required |
id
|
str
|
The ID of the secret. |
required |
parse_json
|
bool
|
Whether to parse the secret as JSON. Defaults to False. |
False
|
Raises:
Type | Description |
---|---|
NotImplementedError
|
This method needs to be implemented in a subclass. |
list_secret_versions(secret_name: str, filter: str) -> None
Lists all versions of a specific secret.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
secret_name
|
str
|
The name of the secret. |
required |
filter
|
str
|
The filter to apply. |
required |
Raises:
Type | Description |
---|---|
NotImplementedError
|
This method needs to be implemented in a subclass. |
list_secrets() -> None
Lists all secrets.
Raises:
Type | Description |
---|---|
NotImplementedError
|
This method needs to be implemented in a subclass. |