Commits · c1ad30e74b5cd02787e36a62529a446f553290d8 · Qin Kaijie / pdf-miner

09 Aug, 2024 19 commits
- Update README_zh-CN_v2.md · c1ad30e7
  sfk authored Aug 09, 2024
```
update content
```
  c1ad30e7
- Update README_v2.md · 5a0cce0c
  sfk authored Aug 09, 2024
  
  5a0cce0c
- Update FAQ_zh_cn.md · b03b5cdd
  Xiaomeng Zhao authored Aug 09, 2024
  
  b03b5cdd
- Update README_v2.md · d9e72e92
  sfk authored Aug 09, 2024
  
  d9e72e92
- Update README_v2.md · 755e8a9b
  sfk authored Aug 09, 2024
  
  755e8a9b
- Update README_v2.md · b413a89d
  sfk authored Aug 09, 2024
  
  b413a89d
- Update README_zh-CN_v2.md · 58e429b6
  sfk authored Aug 09, 2024
  
  58e429b6
- Create README_v2.md · a9063f8c
  sfk authored Aug 09, 2024
  
  a9063f8c
- Update README_zh-CN_v2.md · f8261f35
  sfk authored Aug 09, 2024
  
  f8261f35
- Update README_zh-CN_v2.md · 90f4e364
  sfk authored Aug 09, 2024
  
  90f4e364
- Update README_zh-CN_v2.md · fc6a7c30
  sfk authored Aug 09, 2024
  
  fc6a7c30
- 合并来自myhloli/master的拉取请求#379 · 4ec8466e
  Xiaomeng Zhao authored Aug 09, 2024
```
fix(doc-analyze): adjust image scaling limit to 9000 pixels
```
  4ec8466e
- fix(doc-analyze): adjust image scaling limit to 9000 pixels · 445a397f
  myhloli authored Aug 09, 2024
```
Previously, images were not enlarged if their width or height exceeded 3000 pixels.
This threshold has been increased to 9000 pixels to better handle high-resolutionscans and improve the analysis of documents with larger dimensions.
```
  445a397f
- docs: how to use table recognition · f3ad9be3
  xuchao authored Aug 09, 2024
  
  f3ad9be3
- docs: update known issue · edcced27
  xuchao authored Aug 09, 2024
  
  edcced27
- Merge pull request #374 from myhloli/master · 2502db13
  Xiaomeng Zhao authored Aug 09, 2024
```
fix&refactor(pdf-extract-kit):  table recognition and ocr
```
  2502db13
- fix(pdf-extract-kit): ensure table extraction success with additional ending... · 334ccac2
  myhloli authored Aug 09, 2024
```
fix(pdf-extract-kit): ensure table extraction success with additional ending conditionAdd an additional condition to determine the success of table extraction by checking
if the latex_code ends with 'end{table}'. This extends the validation to cover table
environments that may not strictly end with 'end{tabular}', thus improving the robustnessof table recognition processing.
```
  334ccac2
- refactor(pdf_extract_kit): optimize image processing and table recognition... · 29e590a7
  myhloli authored Aug 09, 2024
```
refactor(pdf_extract_kit): optimize image processing and table recognition logicRefactor the image processing logic for OCR and table recognition to ensure
consistency and improve performance. Remove redundant initialization of PIL images,
unify image cropping logic, and streamline the handling of formula detection results.
Also, adjust the table recognition process to improve integration with the updated image
processing logic and enhance overall efficiency.
```
  29e590a7
- fix: #366 the broken chain after the refractor of AbsReaderWriter lead to wrong api invoke (#371) · ad5596fc
  icecraft authored Aug 09, 2024
```
Co-authored-by: shenguanlin <shenguanlin@pjlab.org.cn>
```
  ad5596fc
08 Aug, 2024 2 commits

docs(cuda-acceleration): add tips to verify CUDA acceleration effectiveness · 048e0952

myhloli authored Aug 08, 2024

Add notes in the Ubuntu and Windows CUDA acceleration guides on how to
determine if CUDA acceleration is working. This includes checking for
significant reductions in `layout detection cost`, `mfr time`, and `ocr cost`
as indicators of successful acceleration.

048e0952

Remove unnecessary commas. (#355) · c0ee70d5
ZuanZuan authored Aug 08, 2024

c0ee70d5

07 Aug, 2024 11 commits

@zuanzuanshao has signed the CLA in opendatalab/MinerU#355 · 14b6e26d
github-actions[bot] authored Aug 07, 2024

14b6e26d
Merge pull request #354 from papayalove/master · d93ea5b9
Xiaomeng Zhao authored Aug 07, 2024
```
feat: add table recognition success detect
```
d93ea5b9
Merge branch 'master' of github.com:papayalove/Magic-PDF · fbf8f89b
liukaiwen authored Aug 07, 2024

fbf8f89b
add table recognition success detect · 377b49eb
liukaiwen authored Aug 07, 2024

377b49eb

docs(zh-cn): emphasize additional steps in model download guide · 8da5328f

myhloli authored Aug 07, 2024

Add an exclamation mark to the section title to stress the importance of completing the
additional steps after downloading a model. This change is made in the Chinese
documentation to ensure users are aware of the necessary post-download actions.

8da5328f

fix(models-download-path): correct the download path for PDF-Extract-Kit · 2ff63b7c

myhloli authored Aug 07, 2024

Adjust the print statement in the how_to_download_models_zh_cn.md guide to reflect
the correct model download location. The path has been updated to specify the 'models'
directory where the model is actually downloaded.

2ff63b7c

Merge pull request #350 from papayalove/master · 0a3a31dc
Xiaomeng Zhao authored Aug 07, 2024
```
feat: add table recognition success detect
```
0a3a31dc
Merge branch 'master' of github.com:papayalove/Magic-PDF · a38c2a88
liukaiwen authored Aug 07, 2024

a38c2a88
add table recognition success detect · b18496b0
liukaiwen authored Aug 07, 2024

b18496b0

docs(models_zh_cn): add print statement to download models example · c7067c85

赵小蒙 authored Aug 07, 2024

Add a print statement to the example code in 'how_to_download_models_zh_cn.md' to
output the downloaded model directory path. This enhancement aids users in locating
the model files as it provides a clear indication of where they are saved on the
user's file system.

c7067c85

docs(readme): update acknowledgment section and project description-... · 361f5042

myhloli authored Aug 07, 2024

docs(readme): update acknowledgment section and project description- Streamline the Acknowledgments section in the README by removing redundant entries.- Clarify the project's current use of PyMuPDF and future plans for exploring a more  permissively licensed PDF processing library in the project description.
- Ensure all modifications adhere to the project's documentation standards and improve reader understanding.

361f5042

06 Aug, 2024 8 commits

docs(readme): update acknowledgment section and project description-... · 6350f349

myhloli authored Aug 06, 2024

docs(readme): update acknowledgment section and project description- Streamline the Acknowledgments section in the README by removing redundant entries.- Clarify the project's current use of PyMuPDF and future plans for exploring a more  permissively licensed PDF processing library in the project description.
- Ensure all modifications adhere to the project's documentation standards and improve reader understanding.

6350f349

docs(models-download): update steps and remove deprecated sectionsUpdate the... · d2a8cb42

myhloli authored Aug 06, 2024

docs(models-download): update steps and remove deprecated sectionsUpdate the model download instructions to reflect the current process, removing
unnecessary sections and simplifying the steps. The updated guide now includesclearer instructions on installing Git LFS, downloading models from Hugging Face,and additional checks for model file completeness. This change ensures that the
documentation is up-to-date and provides a streamlined experience for users
downloading models.

d2a8cb42

docs: correct path format description in Windows CUDA docsUpdate the... · c723cc65

myhloli authored Aug 06, 2024

docs: correct path format description in Windows CUDA docsUpdate the instructions in the Windows CUDA Acceleration documentation toreflect the correct path format. Specifically, clarify that Windows paths
should include the drive letter and replace backslashes with forward slashes.

c723cc65

docs(cuda-acceleration): update PowerShell examples and formatting in README · 4f5689a4
myhloli authored Aug 06, 2024

4f5689a4

docs: update URLs to gitee for Windows CUDA acceleration guides · d3e42e08

myhloli authored Aug 06, 2024

Update the URLs for downloading the `magic-pdf.template.json` and `small_ocr.pdf`
files in the Windows CUDA acceleration guides. The links now point to the giteerepository instead of GitHub, ensuring users have access to the necessary files
from the correct source.

d3e42e08

docs(zh_CN): update Ubuntu CUDA Acceleration guide · 020602eb

myhloli authored Aug 06, 2024

- Streamline the installation process by removing the redundant apt update step.
- Adjust the numbering of installation steps throughout the document.
- Update download URLs to gitee for the configuration template and demo file.
- Ensure consistency in the model directory configuration advice.

020602eb

docs: add Ubuntu 22.04 LTS CUDA acceleration setup guide · 4d7dc065

myhloli authored Aug 06, 2024

Add a new README_Ubuntu_CUDA_Acceleration_en_US.md document to provide users with a
setup guide for enabling and testing CUDA acceleration on Ubuntu 22.04 LTS. The guideincludes steps to check and install NVIDIA drivers, install Anaconda, create a conda
environment, install required applications, download and verify models, configure theenvironment, and test CUDA acceleration.

This addition addresses the need for clear, concise instructions on achieving better
performance with CUDA-enabled graphics cards and

4d7dc065

docs(FAQ): update troubleshooting sections for offline deployment and Mac issues · 2eaa9ca1

myhloli authored Aug 06, 2024

- Note the fix in version 0.6.2b1 for the network error during the first run of offline  deployment and clarify the model download requirement.
- Update the dependency installation guide for users on macOS with Intel CPUs.
- Indicate the resolution in version 0.6.2b1 for compatibility issues with paddlepaddle
  version 2.6.1 on certain Linux systems.

This change aims to make the FAQ more informative and easier to navigate for users
experiencing similar issues, providing direct solutions and links where applicable.

2eaa9ca1