- 01 Nov, 2024 3 commits
-
-
myhloli authored
- Update remove_outside_spans function to handle all content types - Add processing for text and equation spans - Improve overlap calculation for better accuracy
-
myhloli authored
- Update remove_outside_spans function to handle all content types - Add processing for text and equation spans - Improve overlap calculation for better accuracy
-
myhloli authored
- Update remove_outside_spans function to handle all content types - Add processing for text and equation spans - Improve overlap calculation for better accuracy
-
- 31 Oct, 2024 1 commit
-
-
myhloli authored
- Add new function `remove_outside_spans` to filter spans based on image and table blocks - Reorder span processing steps to improve efficiency - Update imports to include `calculate_overlap_area_in_bbox1_area_ratio`
-
- 30 Oct, 2024 3 commits
-
-
myhloli authored
# Conflicts: # magic_pdf/dict2md/ocr_mkcontent.py
-
myhloli authored
- Add check for 'image_path' in spans to avoid errors when it's missing - Update image handling in both paragraph text and content dictionary - Improve error handling and make the code more robust
-
myhloli authored
- Update image content extraction to iterate through all spans in a block - Add support for extracting table content from spans within a block - Handle multiple content types within table spans (latex, html, image) - Refactor code to be more modular and easier to maintain
-
- 29 Oct, 2024 1 commit
-
-
myhloli authored
- Update PyPI mirror from Tsinghua to Aliyun in multiple Dockerfiles and installation scripts - This change may improve package download speed and reliability for users in China
-
- 28 Oct, 2024 23 commits
-
-
Xiaomeng Zhao authored
docs(README): update model download instructions for PDF-Extract-Kit 1.0
-
myhloli authored
- Update README.md and README_zh-CN.md to include new model download instructions - Provide detailed steps on how to download models after PDF-Extract-Kit 1.0 repository change - Emphasize the need to re-download models due to repository change
-
myhloli authored
- Update README.md and README_zh-CN.md to include new model download instructions - Provide detailed steps on how to download models after PDF-Extract-Kit 1.0 repository change - Emphasize the need to re-download models due to repository change
-
Xiaomeng Zhao authored
refactor(table): disable StructEqTable support and add TableMaster support
-
myhloli authored
- Remove import and usage of StructTableModel- Add support for TableMaster model- Update table model initialization logic to support TableMaster - Log error and exit if StructEqTable is selected, as it's under upgrade - Update README files to reflect changes in table parsing capabilities
-
Xiaomeng Zhao authored
fix: add priority match rule
-
icecraft authored
-
Xiaomeng Zhao authored
perf: table model update with PP OCRv4
-
liukaiwen authored
-
liukaiwen authored
-
liukaiwen authored
-
Kaiwen Liu authored
Dev
-
Xiaomeng Zhao authored
-
Xiaomeng Zhao authored
-
Xiaomeng Zhao authored
docs: update documentation path in README files
-
myhloli authored
- Update image path in README.md and README_zh-CN.md - Update chemical formula recognition link in README.md and README_zh-CN.md
-
Xiaomeng Zhao authored
docs: update logo path in README files
-
myhloli authored
- Change the logo path from 'docs/images/MinerU-logo.png' to 'old_docs/images/MinerU-logo.png' in both README.md and README_zh-CN.md- This update ensures that the correct logo is displayed in the project's README files
-
Xiaomeng Zhao authored
docs(README): update for v0.9.0 release
-
myhloli authored
- Delete unnecessary empty line in the table-config JSON example- Improve readability and formatting consistency in the configuration example
-
myhloli authored
- Add changelog for v0.9.0 release with major refactoring and improvements - Update key features list to include new functionalities - Modify system requirements and hardware support information - Add section for deploying derived projects - Update known issues and TODO list
-
Xiaomeng Zhao authored
Feat/new table caption match
-
icecraft authored
-
- 27 Oct, 2024 3 commits
-
-
Xiaomeng Zhao authored
docs: update model download instructions and simplify demo scripts
-
myhloli authored
- Modify the logic for splitting wide blocks exceeding 0.4 page width - Remove the specific case for blocks exceeding 0.25 page width - Add comments to explain the reasoning behind different splitting strategies
-
myhloli authored
- Update model download instructions for versions 0.9.x and later - Simplify demo scripts by removing unnecessary model configuration - Add visualization function to draw bounding boxes - Update CLI help message with new URL
-
- 26 Oct, 2024 4 commits
-
-
Xiaomeng Zhao authored
Add multi_gpu process project
-
Hui authored
-
Xiaomeng Zhao authored
feat(draw_bbox): update bounding box drawing for tables and images
-
myhloli authored
- Add support for drawing bounding boxes of table and image sub-blocks - Implement sorting of table blocks based on type order - Update bounding box drawing for text and title blocks - Refactor code to handle different block types and their sub-blocks
-
- 25 Oct, 2024 2 commits
-
-
Xiaomeng Zhao authored
fix: add init to magic_pdf.utils
-
myhloli authored
-