Commits · e7b0f8beed57429eeabd06e5c6601686d792f69d · Qin Kaijie / pdf-miner

09 Aug, 2024 26 commits
- docs: update to 0.7.0b1 · e7b0f8be
  xuchao authored Aug 09, 2024
  
  e7b0f8be
- Create FAQ_en_us.md · 85e36358
  sfk authored Aug 09, 2024
  
  85e36358
- Create output_file_en_us.md · cf704253
  sfk authored Aug 09, 2024
  
  cf704253
- Update README_zh-CN_v2.md · 54baabd8
  sfk authored Aug 09, 2024
```
edit FAQ
```
  54baabd8
- Update README_v2.md · 8cc8ab17
  sfk authored Aug 09, 2024
```
update doc url
```
  8cc8ab17
- Update README_zh-CN_v2.md · ba25b1db
  sfk authored Aug 09, 2024
```
update discord url
```
  ba25b1db
- Update README_zh-CN_v2.md · 004beb5c
  sfk authored Aug 09, 2024
```
update content
```
  004beb5c
- Update README_zh-CN_v2.md · c1ad30e7
  sfk authored Aug 09, 2024
```
update content
```
  c1ad30e7
- Update README_v2.md · 5a0cce0c
  sfk authored Aug 09, 2024
  
  5a0cce0c
- Update FAQ_zh_cn.md · b03b5cdd
  Xiaomeng Zhao authored Aug 09, 2024
  
  b03b5cdd
- Update README_v2.md · d9e72e92
  sfk authored Aug 09, 2024
  
  d9e72e92
- Update README_v2.md · 755e8a9b
  sfk authored Aug 09, 2024
  
  755e8a9b
- Update README_v2.md · b413a89d
  sfk authored Aug 09, 2024
  
  b413a89d
- Update README_zh-CN_v2.md · 58e429b6
  sfk authored Aug 09, 2024
  
  58e429b6
- Create README_v2.md · a9063f8c
  sfk authored Aug 09, 2024
  
  a9063f8c
- Update README_zh-CN_v2.md · f8261f35
  sfk authored Aug 09, 2024
  
  f8261f35
- Update README_zh-CN_v2.md · 90f4e364
  sfk authored Aug 09, 2024
  
  90f4e364
- Update README_zh-CN_v2.md · fc6a7c30
  sfk authored Aug 09, 2024
  
  fc6a7c30
- 合并来自myhloli/master的拉取请求#379 · 4ec8466e
  Xiaomeng Zhao authored Aug 09, 2024
```
fix(doc-analyze): adjust image scaling limit to 9000 pixels
```
  4ec8466e
- fix(doc-analyze): adjust image scaling limit to 9000 pixels · 445a397f
  myhloli authored Aug 09, 2024
```
Previously, images were not enlarged if their width or height exceeded 3000 pixels.
This threshold has been increased to 9000 pixels to better handle high-resolutionscans and improve the analysis of documents with larger dimensions.
```
  445a397f
- docs: how to use table recognition · f3ad9be3
  xuchao authored Aug 09, 2024
  
  f3ad9be3
- docs: update known issue · edcced27
  xuchao authored Aug 09, 2024
  
  edcced27
- Merge pull request #374 from myhloli/master · 2502db13
  Xiaomeng Zhao authored Aug 09, 2024
```
fix&refactor(pdf-extract-kit):  table recognition and ocr
```
  2502db13
- fix(pdf-extract-kit): ensure table extraction success with additional ending... · 334ccac2
  myhloli authored Aug 09, 2024
```
fix(pdf-extract-kit): ensure table extraction success with additional ending conditionAdd an additional condition to determine the success of table extraction by checking
if the latex_code ends with 'end{table}'. This extends the validation to cover table
environments that may not strictly end with 'end{tabular}', thus improving the robustnessof table recognition processing.
```
  334ccac2
- refactor(pdf_extract_kit): optimize image processing and table recognition... · 29e590a7
  myhloli authored Aug 09, 2024
```
refactor(pdf_extract_kit): optimize image processing and table recognition logicRefactor the image processing logic for OCR and table recognition to ensure
consistency and improve performance. Remove redundant initialization of PIL images,
unify image cropping logic, and streamline the handling of formula detection results.
Also, adjust the table recognition process to improve integration with the updated image
processing logic and enhance overall efficiency.
```
  29e590a7
- fix: #366 the broken chain after the refractor of AbsReaderWriter lead to wrong api invoke (#371) · ad5596fc
  icecraft authored Aug 09, 2024
```
Co-authored-by: shenguanlin <shenguanlin@pjlab.org.cn>
```
  ad5596fc
08 Aug, 2024 2 commits

docs(cuda-acceleration): add tips to verify CUDA acceleration effectiveness · 048e0952

myhloli authored Aug 08, 2024

Add notes in the Ubuntu and Windows CUDA acceleration guides on how to
determine if CUDA acceleration is working. This includes checking for
significant reductions in `layout detection cost`, `mfr time`, and `ocr cost`
as indicators of successful acceleration.

048e0952

Remove unnecessary commas. (#355) · c0ee70d5
ZuanZuan authored Aug 08, 2024

c0ee70d5

07 Aug, 2024 11 commits

@zuanzuanshao has signed the CLA in opendatalab/MinerU#355 · 14b6e26d
github-actions[bot] authored Aug 07, 2024

14b6e26d
Merge pull request #354 from papayalove/master · d93ea5b9
Xiaomeng Zhao authored Aug 07, 2024
```
feat: add table recognition success detect
```
d93ea5b9
Merge branch 'master' of github.com:papayalove/Magic-PDF · fbf8f89b
liukaiwen authored Aug 07, 2024

fbf8f89b
add table recognition success detect · 377b49eb
liukaiwen authored Aug 07, 2024

377b49eb

docs(zh-cn): emphasize additional steps in model download guide · 8da5328f

myhloli authored Aug 07, 2024

Add an exclamation mark to the section title to stress the importance of completing the
additional steps after downloading a model. This change is made in the Chinese
documentation to ensure users are aware of the necessary post-download actions.

8da5328f

fix(models-download-path): correct the download path for PDF-Extract-Kit · 2ff63b7c

myhloli authored Aug 07, 2024

Adjust the print statement in the how_to_download_models_zh_cn.md guide to reflect
the correct model download location. The path has been updated to specify the 'models'
directory where the model is actually downloaded.

2ff63b7c

Merge pull request #350 from papayalove/master · 0a3a31dc
Xiaomeng Zhao authored Aug 07, 2024
```
feat: add table recognition success detect
```
0a3a31dc
Merge branch 'master' of github.com:papayalove/Magic-PDF · a38c2a88
liukaiwen authored Aug 07, 2024

a38c2a88
add table recognition success detect · b18496b0
liukaiwen authored Aug 07, 2024

b18496b0

docs(models_zh_cn): add print statement to download models example · c7067c85

赵小蒙 authored Aug 07, 2024

Add a print statement to the example code in 'how_to_download_models_zh_cn.md' to
output the downloaded model directory path. This enhancement aids users in locating
the model files as it provides a clear indication of where they are saved on the
user's file system.

c7067c85

docs(readme): update acknowledgment section and project description-... · 361f5042

myhloli authored Aug 07, 2024

docs(readme): update acknowledgment section and project description- Streamline the Acknowledgments section in the README by removing redundant entries.- Clarify the project's current use of PyMuPDF and future plans for exploring a more  permissively licensed PDF processing library in the project description.
- Ensure all modifications adhere to the project's documentation standards and improve reader understanding.

361f5042

06 Aug, 2024 1 commit

docs(readme): update acknowledgment section and project description-... · 6350f349

myhloli authored Aug 06, 2024

docs(readme): update acknowledgment section and project description- Streamline the Acknowledgments section in the README by removing redundant entries.- Clarify the project's current use of PyMuPDF and future plans for exploring a more  permissively licensed PDF processing library in the project description.
- Ensure all modifications adhere to the project's documentation standards and improve reader understanding.

6350f349