- 06 Nov, 2024 10 commits
-
-
Xiaomeng Zhao authored
-
Xiaomeng Zhao authored
docs(README): update changelog for v0.9.1 release
-
Xiaomeng Zhao authored
docs(README): update changelog for v0.9.1 release
-
myhloli authored
docs(README): update changelog for v0.9.1 release- Add entry for 0.9.1 release on 2024/11/06- Update changelog in both English (README.md) and Chinese (README_zh-CN.md) - Include integration of StructTable-InternVL2-1B model for table recognition
-
Xiaomeng Zhao authored
docs: update arXiv paper link in README files
-
Xiaomeng Zhao authored
docs: update arXiv paper link in README files
-
myhloli authored
- Replace the incorrect arXiv paper link with the correct one in both README.md and README_zh-CN.md - Update the badge image link from 'pdf' to 'abs' for the correct paper URL
-
Xiaomeng Zhao authored
test(table): improve HTML validation for table extraction
-
myhloli authored
- Add lxml dependency for HTML parsing - Update test case to use XPath and HTML parser for structure and content validation - Check for presence of essential HTML elements like <table>, <thead>, <tbody>, <tr>, and <td> - Validate column headers and specific row contents
-
Xiaomeng Zhao authored
feat: mineru_demo接口文档替换为链接
-
- 05 Nov, 2024 9 commits
-
-
houlinfeng authored
-
myhloli authored
- Remove outdated version options (0.6.x, 0.7.x, 0.8.x)- Add current version option (0.9.x)
-
Xiaomeng Zhao authored
docs(faq): add troubleshooting for illegal instruction error on Linux servers
-
myhloli authored
- Add information about AVX/AVX2 instruction set issues on Linux servers - Provide guidance for users encountering "Illegal instruction (core dumped)" error - Suggest contacting system administrator or changing servers as potential solutions - Include relevant issue references for context
-
Xiaomeng Zhao authored
fix(table): improve table image processing
-
myhloli authored
- Replace np.array with np.asarray for better performance - Add image color conversion from RGB to BGR using OpenCV
-
myhloli authored
-
Xiaomeng Zhao authored
docs(README): update Colab demo link
-
myhloli authored
- Update the Colab demo link in the README files to the new version - Add a note in the Japanese README indicating that the document is outdated
-
- 04 Nov, 2024 13 commits
-
-
Xiaomeng Zhao authored
chore: add CSS and SCSS files to linguist-vendored- Update .gitattributes to mark CSS and SCSS files as vendored
-
myhloli authored
chore: add CSS and SCSS files to linguist-vendored- Update .gitattributes to mark CSS and SCSS files as vendored- Ensure these files are not included in language statistics
-
Xiaomeng Zhao authored
fix(merge_text): add ligature replacement functionality #305 #241
-
myhloli authored
- Implement __replace_ligatures function to split ligature characters- Integrate ligature replacement into the merge_para_with_text function - Handle common ligatures such as fi, fl, ff, ffi, and ffl
-
Xiaomeng Zhao authored
chore: add .gitattributes to configure file linguist attributes
-
myhloli authored
- Set .js and .mjs files as vendored - Set .html files as documentation
-
Xiaomeng Zhao authored
feat(model): add HTML minification to StructTableModel
-
myhloli authored
- Import 're' module for regular expression operations - Implement HTML minification for 'output_format=html' - Add 'minify_html' method to remove unnecessary whitespace and format HTML
-
Xiaomeng Zhao authored
feat(table): upgrade StructEqTable model and integrate into PDF Extract Kit
-
myhloli authored
- Comment out an unused code block in the ppTableModel.py file - Improve code readability and maintainability by removing unnecessary code
-
myhloli authored
- Update StructTableModel to use the latest struct-eqtable library - Add support for HTML table extraction in PDF Extract Kit - Improve error handling and model initialization - Update dependencies in setup.py for struct-eqtable
-
Xiaomeng Zhao authored
Update pdf_extract_kit.py
-
ciaran authored
Modify line 397 to ensure compatibility with CPU execution, addressing the issue where specifying 'cpu' in config.json still results in a ValueError for expecting a cuda device but getting 'cpu' during demo execution.
-
- 03 Nov, 2024 5 commits
-
-
Xiaomeng Zhao authored
-
Xiaomeng Zhao authored
fix(dict2md): improve text concatenation logic
-
myhloli authored
- Optimize content stripping and checking logic - Add special case handling for single-character content - Adjust spacing rules for different content types
-
Xiaomeng Zhao authored
feat(para_split_v3): improve list identification with block aspect ratio
-
myhloli authored
- Add block_height calculation to determine block aspect ratio - Update list identification condition to include aspect ratio check - Improve code readability with better formatting and line breaks
-
- 02 Nov, 2024 3 commits
-
-
Xiaomeng Zhao authored
docs(tutorial): update magic-pdf command with output directory
-
myhloli authored
- Add '-o ./output' flag to magic-pdf command in multiple documentation files
-
Xiaomeng Zhao authored
feat(list): improve list detection algorithm & fix(list): improve list identification accuracy
-