Skip to content

hook

hook

__all__ = ['PDFHook'] module-attribute

PDFHook

Bases: BaseHook

Hook for handling PDF operations.

__init__() -> None

Initializes the PDFHook.

chunk_in_pages(filepath: str, pages_per_chunk: int) -> List[str]

Splits the PDF into chunks with a specified number of pages per chunk.

Parameters:

Name Type Description Default
filepath str

Path to the PDF file.

required
pages_per_chunk int

Number of pages per chunk.

required

Returns:

Type Description
List[str]

List[str]: List of file paths to the chunked PDF files.

generate_page_screenshot(filepath: str, dpi: int = 300, output_format: str = 'png') -> List[str]

Converts each page of a PDF into an image.

Parameters:

Name Type Description Default
filepath str

The file path to the PDF.

required
dpi int

Dots per inch for the output image. Defaults to 300.

300
output_format str

The format of the output images (e.g., 'png', 'jpg'). Defaults to 'png'.

'png'

Returns:

Type Description
List[str]

List[str]: A list of file paths to the generated images.