1. 09 Oct, 2024 2 commits
  2. 08 Oct, 2024 20 commits
  3. 06 Oct, 2024 2 commits
  4. 30 Sep, 2024 6 commits
  5. 29 Sep, 2024 2 commits
  6. 28 Sep, 2024 3 commits
    • myhloli's avatar
      refactor(magic_pdf): import model helpers directly for clarity · 42a7d792
      myhloli authored
      Update import statements in `pdf_parse_union_core_v2.py` to directly import
      `prepare_inputs`, `boxes2inputs`, and `parse_logits` from `magic_pdf.model.v3.helpers`
      instead of from `magic_pdf.model.v3`. This change streamlines the imports, making the
      code more readable and maintaining a cleaner approach to modular design.
      42a7d792
    • myhloli's avatar
      refactor(pdf_parse_union_core_v2): update import paths to use new package structure · 5522d0a3
      myhloli authored
      Adapt import statements in `pdf_parse_union_core_v2.py` to reflect the updated packagestructure, changing from the `magic_pdf.v3.helpers` module to the `magic_pdf.model.v3`
      module. This ensures compatibility with the revised directory layout.
      5522d0a3
    • myhloli's avatar
      fix(pdf_parse): handle blocks without lines and enable bf16 on compatible devices · 2145a8b6
      myhloli authored
      Blocks without lines are now correctly indexed even when they contain textual content rendered
      as images. The sorting logic has been updated to accommodate this scenario. Additionally, the
      LayoutLMv3 model initialization has been enhanced to utilize bfloat16 precision on devices that
      support it, offering potential performance benefits on supported hardware.
      2145a8b6
  7. 27 Sep, 2024 5 commits
    • myhloli's avatar
      refactor(pdf_parse): remove redundant sorting and optimize block indexing · 177ab08e
      myhloli authored
      Removed redundant sorting of lines by model and optimized calculation of block
      indexes by using a single pass through the sorted lines. This change simplifies the
      code and potentially improves performance by reducing the number of sortingoperations and unnecessary iterations over blocks without lines.
      177ab08e
    • myhloli's avatar
      refactor(draw_bbox): remove commented-out code and streamline bbox... · 83c07387
      myhloli authored
      refactor(draw_bbox): remove commented-out code and streamline bbox drawingRemoved legacy commented-out code related to layout_bbox_list from draw_bbox.py, which
      was used for diagnostic purposes and was no longer necessary. This change streamlines
      the codebase and clarifies the drawing process of bounding boxes on PDF pages. The update
      also adjusts the order of operations slightly for improved readability without altering
      the functionality.
      83c07387
    • myhloli's avatar
      feat(requirements): add torch and transformers libraries · 65615455
      myhloli authored
      Introduce torch and transformers libraries to support new ML features.Ensure version compatibility by adding torch version within the range 2.2.2 to 2.3.1and include the necessary transformers library.
      65615455
    • myhloli's avatar
      refactor(pdf_parse_union_core_v2): implement model initialization within... · b9dfdea3
      myhloli authored
      refactor(pdf_parse_union_core_v2): implement model initialization within classRefactored model initialization to be handled by a singleton class to ensure that model
      instances are reused across calls, avoiding redundant initializations. Removed logger
      information that was commented out and ensured consistency in logging behavior.
      b9dfdea3
    • myhloli's avatar
      refactor(drawing): simplify draw bbox functions and adjust debug config · b2790f6f
      myhloli authored
      Refactor the draw bbox functions by removing unused imports and simplifying the
      code logic for drawing layout and line sorting bounding boxes. Adjust the debug
      configuration to enable content list dumping and disable markdown making mode.
      b2790f6f