hook
hook
__all__ = ['PDFHook']
module-attribute
PDFHook
Bases: BaseHook
Hook for handling PDF operations.
__init__() -> None
Initializes the PDFHook.
chunk_in_pages(filepath: str, pages_per_chunk: int) -> List[str]
Splits the PDF into chunks with a specified number of pages per chunk.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filepath
|
str
|
Path to the PDF file. |
required |
pages_per_chunk
|
int
|
Number of pages per chunk. |
required |
Returns:
Type | Description |
---|---|
List[str]
|
List[str]: List of file paths to the chunked PDF files. |
generate_page_screenshot(filepath: str, dpi: int = 300, output_format: str = 'png') -> List[str]
Converts each page of a PDF into an image.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filepath
|
str
|
The file path to the PDF. |
required |
dpi
|
int
|
Dots per inch for the output image. Defaults to 300. |
300
|
output_format
|
str
|
The format of the output images (e.g., 'png', 'jpg'). Defaults to 'png'. |
'png'
|
Returns:
Type | Description |
---|---|
List[str]
|
List[str]: A list of file paths to the generated images. |