PDF Analyzer
PDF content analysis
Operating mode

At the top of the PDF analyzer window, you can select between two modes: VIEW and TRAIN. In TRAIN mode, you can write scripts, while in VIEW mode, you can preview the document. You can navigate through the pages of the document using the UP and DOWN buttons on the left.
PDF
ISO standard PDF/A format is supported. Click PICK or use the %FILENAME% variable to select a PDF file.
PASSWORD
Password for selected PDF file (optional). Passwords can also be written in a text file and placed in the workspace. In this case, you would fill the field with the filename of that text file.
Input Object
The input object is a data object retrieved from the analysis of a PDF file. It contains all text objects, line objects, etc., in a unified coordinate system, as well as the necessary parsing functions.
Coordinate System
The input uses the Page Normalized Coordinate (PNC) coordinate system. The origin is at the top-left corner, and the coordinates for each page are normalized to adjacent integer ranges. For example, the coordinate range for the first page is [0,0] to [1,1], the second page is [0,1] to [1,2], and so on. In this way, the entire document can be viewed as a continuous coordinate system with x: [01], y: [0(N-1)], where N is the number of pages.
Text Object

Parsing Functions

The input object provides several parsing and utility functions to help users locate target objects within the document. The overall parsing logic involves narrowing down the collection of text objects based on spatial or textual conditions and then using relative relationships to find the target objects.
Utility Functions
Viewer and CodeGen
In PDF Analyzer's Viewer, you can not only mark text objects and coordinates but also generate function code automatically through mouse and keyboard operations. After completing the operations, users can simply copy and paste the generated code into the training mode and make minor modifications, which reduces the time needed for programming.
To capture an object:

To generate boundaries:

To perform the directional analysis:

To perform range analysis:

To perform relative range analysis:

Output Object
Each key added to the output object will be output as a .txt file in the workspace. The file name will be the key, and the text content will be the corresponding value.
Examples
--
We are dedicated to improving our content. Please let us know if you come across any errors, including spelling, grammar, or other mistakes, as your feedback is valuable to us! 🤖️⚡️
Last updated