- 15 Oct, 2024 1 commit
-
-
Kaiwen Liu authored
Dev
-
- 14 Oct, 2024 3 commits
-
-
Xiaomeng Zhao authored
feat(list&index block): detect and merge list and index blocks
-
myhloli authored
- Add detection for list and index blocks in OCR processing- Implement merging of list and index blocks across pages - Update block types to include list and index categories - Adjust text merging logic to handle new block types - Modify layout drawing to distinguish list and index blocks
-
icecraft authored
* feat: manager docs with sphinx * fix: readthedocs configure * feat: support multiple language * fix: add .readthedocs.yaml * fix: requirments.txt path --------- Co-authored-by:
icecraft <xurui1@pjlab.org.cn>
-
- 10 Oct, 2024 6 commits
-
-
Xiaomeng Zhao authored
fix: Solving the Grouping Anomaly Issue with Multiple Consecutive Non-Text Blocks
-
myhloli authored
-
Xiaomeng Zhao authored
Update how_to_download_models_zh_cn.md
-
Xiaomeng Zhao authored
-
Xiaomeng Zhao authored
feat(pdf_parse_union_core_v2): reintegrate para_split_v3 and add page range support
-
myhloli authored
- Reintegrate para_split_v3 into the pdf_parse_union_core_v2 process - Add support for specifying page range in doc_analyze_by_custom_model - Implement garbage collection and memory cleaning after processing - Refine image loading from PDF, including handling out-of-range pages
-
- 09 Oct, 2024 3 commits
-
-
Xiaomeng Zhao authored
-
Xiaomeng Zhao authored
Update README_Windows_CUDA_Acceleration_en_US.md
-
Xiaomeng Zhao authored
-
- 08 Oct, 2024 20 commits
-
-
Xiaomeng Zhao authored
docs: update CUDA acceleration guides and README content
-
myhloli authored
- Update GPU hardware support information in README.md and README_zh-CN.md - Enhance CUDA acceleration guides for Ubuntu and Windows - Modify README_zh-CN.md to reflect changes in GPU requirements and configurations - Update TODO list to mark semantic reading order as completed
-
myhloli authored
- Update GPU hardware support information in README.md and README_zh-CN.md - Enhance CUDA acceleration guides for Ubuntu and Windows - Modify README_zh-CN.md to reflect changes in GPU requirements and configurations - Update TODO list to mark semantic reading order as completed
-
Xiaomeng Zhao authored
docs: add filename to wget command in model download scripts
-
myhloli authored
- Update wget commands in both English and Chinese documentation to specify the filename - Improve clarity and prevent potential filename conflicts when downloading the scripts
-
Xiaomeng Zhao authored
feat(docs): automate model download and configuration
-
myhloli authored
- Add scripts to download models and update configuration file - Remove manual steps for modifying model paths - Update documentation for both ModelScope and HuggingFace model downloads - Improve user experience by automating the entire process
-
myhloli authored
- Add scripts to download models and update configuration file - Remove manual steps for modifying model paths - Update documentation for both ModelScope and HuggingFace model downloads - Improve user experience by automating the entire process
-
Xiaomeng Zhao authored
feat(layoutreader): support local model directory and improve model loading
-
myhloli authored
Added a link to the layoutreader repository in the Related Projects sections of both the README.md and README_zh-CN.md files. This addition helps to provide users with more resources and tools related to document layout analysis and processing.
-
myhloli authored
docs: update model download instructions for version 0.9.x and later- Add note about separate download for layoutreader model in version 0.9.x and later - Include example code for downloading layoutreader model using ModelScope - Clarify that previous download methods do not support updating to version 0.9.x and later
-
myhloli authored
- Add function to get local LayoutReader model directory- Check and use local model directory if available - Fall back to online model if local directory not found - Update model initialization to support local path - Refactor model loading in singleton class
-
Xiaomeng Zhao authored
fix: caption|footnote match algorithm
-
icecraft authored
-
Xiaomeng Zhao authored
fix: caption or footnote match algorithm
-
icecraft authored
-
Xiaomeng Zhao authored
perf(pdf_extract_kit): conditional memory cleanup based on GPU capacity
-
myhloli authored
- Introduce a conditional memory cleanup step in the PDF extraction process - Assess available GPU memory before deciding to perform memory cleanup- Log the time taken for garbage collection when it occurs - This optimization helps to balance performance and resource utilization
-
Xiaomeng Zhao authored
feat: add arXiv paper link to header and adjust PDF parsing logic
-
myhloli authored
feat: add arXiv paper link to header and adjust PDF parsing logic- Add arXiv paper link to the header template for easy access to the latest research paper. - Modify the PDF parsing logic to handle edge cases more accurately, particularly in determining the number of lines in a block based on its height.
-
- 06 Oct, 2024 2 commits
-
-
Xiaomeng Zhao authored
refactor(model): improve timing information and performance
-
myhloli authored
- Enhance timing output precision to two decimal places for better readability- Calculate and log document analysis speed in pages per second - Optimize logging for YOLO and table recognition processes - Remove unnecessary comments and improve code efficiency
-
- 30 Sep, 2024 5 commits
-
-
sfk authored
add arxiv url
-
sfk authored
add arxiv url
-
sfk authored
-
wangbinDL authored
-
Xiaomeng Zhao authored
feat:add layoutreader to sort blocks
-