Commits · 377b49eb2426877e38b9edf722467920e90c733e · Qin Kaijie / pdf-miner

07 Aug, 2024 5 commits

add table recognition success detect · 377b49eb
liukaiwen authored Aug 07, 2024

377b49eb
Merge branch 'master' of github.com:papayalove/Magic-PDF · a38c2a88
liukaiwen authored Aug 07, 2024

a38c2a88
add table recognition success detect · b18496b0
liukaiwen authored Aug 07, 2024

b18496b0

docs(models_zh_cn): add print statement to download models example · c7067c85

赵小蒙 authored Aug 07, 2024

Add a print statement to the example code in 'how_to_download_models_zh_cn.md' to
output the downloaded model directory path. This enhancement aids users in locating
the model files as it provides a clear indication of where they are saved on the
user's file system.

c7067c85

docs(readme): update acknowledgment section and project description-... · 361f5042

myhloli authored Aug 07, 2024

docs(readme): update acknowledgment section and project description- Streamline the Acknowledgments section in the README by removing redundant entries.- Clarify the project's current use of PyMuPDF and future plans for exploring a more  permissively licensed PDF processing library in the project description.
- Ensure all modifications adhere to the project's documentation standards and improve reader understanding.

361f5042

06 Aug, 2024 10 commits

docs(readme): update acknowledgment section and project description-... · 6350f349

myhloli authored Aug 06, 2024

docs(readme): update acknowledgment section and project description- Streamline the Acknowledgments section in the README by removing redundant entries.- Clarify the project's current use of PyMuPDF and future plans for exploring a more  permissively licensed PDF processing library in the project description.
- Ensure all modifications adhere to the project's documentation standards and improve reader understanding.

6350f349

docs(models-download): update steps and remove deprecated sectionsUpdate the... · d2a8cb42

myhloli authored Aug 06, 2024

docs(models-download): update steps and remove deprecated sectionsUpdate the model download instructions to reflect the current process, removing
unnecessary sections and simplifying the steps. The updated guide now includesclearer instructions on installing Git LFS, downloading models from Hugging Face,and additional checks for model file completeness. This change ensures that the
documentation is up-to-date and provides a streamlined experience for users
downloading models.

d2a8cb42

docs: correct path format description in Windows CUDA docsUpdate the... · c723cc65

myhloli authored Aug 06, 2024

docs: correct path format description in Windows CUDA docsUpdate the instructions in the Windows CUDA Acceleration documentation toreflect the correct path format. Specifically, clarify that Windows paths
should include the drive letter and replace backslashes with forward slashes.

c723cc65

docs(cuda-acceleration): update PowerShell examples and formatting in README · 4f5689a4
myhloli authored Aug 06, 2024

4f5689a4

docs: update URLs to gitee for Windows CUDA acceleration guides · d3e42e08

myhloli authored Aug 06, 2024

Update the URLs for downloading the `magic-pdf.template.json` and `small_ocr.pdf`
files in the Windows CUDA acceleration guides. The links now point to the giteerepository instead of GitHub, ensuring users have access to the necessary files
from the correct source.

d3e42e08

docs(zh_CN): update Ubuntu CUDA Acceleration guide · 020602eb

myhloli authored Aug 06, 2024

- Streamline the installation process by removing the redundant apt update step.
- Adjust the numbering of installation steps throughout the document.
- Update download URLs to gitee for the configuration template and demo file.
- Ensure consistency in the model directory configuration advice.

020602eb

docs: add Ubuntu 22.04 LTS CUDA acceleration setup guide · 4d7dc065

myhloli authored Aug 06, 2024

Add a new README_Ubuntu_CUDA_Acceleration_en_US.md document to provide users with a
setup guide for enabling and testing CUDA acceleration on Ubuntu 22.04 LTS. The guideincludes steps to check and install NVIDIA drivers, install Anaconda, create a conda
environment, install required applications, download and verify models, configure theenvironment, and test CUDA acceleration.

This addition addresses the need for clear, concise instructions on achieving better
performance with CUDA-enabled graphics cards and

4d7dc065

docs(FAQ): update troubleshooting sections for offline deployment and Mac issues · 2eaa9ca1

myhloli authored Aug 06, 2024

- Note the fix in version 0.6.2b1 for the network error during the first run of offline  deployment and clarify the model download requirement.
- Update the dependency installation guide for users on macOS with Intel CPUs.
- Indicate the resolution in version 0.6.2b1 for compatibility issues with paddlepaddle
  version 2.6.1 on certain Linux systems.

This change aims to make the FAQ more informative and easier to navigate for users
experiencing similar issues, providing direct solutions and links where applicable.

2eaa9ca1

docs: add conda install steps for environment setupAdd detailed steps on how... · b0bd91dc

myhloli authored Aug 06, 2024

docs: add conda install steps for environment setupAdd detailed steps on how to create a conda environment and activate it beforeproceeding with the pip installation of magic-pdf and required dependencies.
This provides users with a clearer guide on setting up their environment.

b0bd91dc

Update version.py with new version · 5fcfd8b4
myhloli authored Aug 06, 2024

5fcfd8b4

05 Aug, 2024 5 commits
- mirror(conda): use tuna mirror for Anaconda download · 29e48c73
  myhloli authored Aug 05, 2024
```
Update the download links for Anaconda in both Ubuntu and Windows CUDA
Acceleration documents to use the Tuna mirror. This change helps ensure that
users in China have faster access to the Anaconda distribution.
```
  29e48c73
- Merge pull request #329 from papayalove/master · 8c5ecdf1
  Xiaomeng Zhao authored Aug 05, 2024
```
[fix bug] table recognition bug fixed#321
```
  8c5ecdf1
- fix table recognition bug#321 · cae215bb
  liukaiwen authored Aug 05, 2024
  
  cae215bb
- fix table recognition bug#321 · ec7271fa
  liukaiwen authored Aug 05, 2024
  
  ec7271fa
- Merge branch 'master' of github.com:papayalove/Magic-PDF · b8adb630
  liukaiwen authored Aug 05, 2024
```
# Conflicts:
#	docs/how_to_download_models_zh_cn.md
```
  b8adb630
04 Aug, 2024 5 commits

refactor(common): Moving the output path logs from the start of the program to its end. · 52069612
myhloli authored Aug 04, 2024

52069612

fix(pdf-extract): ensure table recognition config defaults to disabled · 52156eae

myhloli authored Aug 04, 2024

If 'table-config' is not present in the configuration file, the table recognition
feature will default to being disabled to ensure consistent behavior. This change
adds a warning log and sets a default configuration for table recognition when the
expected config is missing.

52156eae

fix(ocr_mkcontent): add spaces around inline equation in content · 0998d22a

myhloli authored Aug 04, 2024

Ensure proper formatting of inline equations by adding spaces outside the equation delimitersto prevent markdown from interpreting the equation content as part of a link. This addresses
the issue where inline OCR equations appear without the correct markdown formatting.

0998d22a

fix(setup): allow latest matplotlib versions on non-Windows platforms · 25213909

myhloli authored Aug 04, 2024

The restriction on the matplotlib version has been updated to only apply on Windows
platforms, where precompiled packages are not available starting from version 3.9.1.
This change enables users on Linux and macOS to install newer versions of matplotlib,
addressing compatibility issues with recent bug fixes.

25213909

fix(dependencies): remove unnecessary pypandoc and struct-eqtable packages;fix... · 9ececf3a

myhloli authored Aug 04, 2024

fix(dependencies): remove unnecessary pypandoc and struct-eqtable packages;fix matplotlib>=3.9.1 not support Windows system without compilation environment.

9ececf3a

02 Aug, 2024 15 commits

docs: specify absolute path for model weights configuration · 9778a461

myhloli authored Aug 02, 2024

Update the README documents to clarify that the "models-dir" in the
configuration should be an absolute path. Also, provide additional guidance
for Windows users on how to correctly format the path to avoid common issues
with path escaping in JSON files.

9778a461

docs: add wget command for Ubuntu and powershell script for Windows · 44a2dc37

myhloli authored Aug 02, 2024

Add instructions to download the magic-pdf.template.json file using wget on
Ubuntu and a PowerShell script on Windows in the respective README files.
This is to facilitate the setup process by providing direct download options,
replacing manual file transfers.

44a2dc37

Update README_zh-CN_v2.md · f6d399f2
sfk authored Aug 02, 2024
```
add demo url
```
f6d399f2

docs(README_zh-CN_v2): add note on GPU acceleration for CUDA supported devices · 99d284a7

myhloli authored Aug 02, 2024

Add a note in the README_zh-CN_v2.md to clarify the availability of GPU
acceleration for devices supporting CUDA, directing users to specific
tutorials based on their operating system.

99d284a7

fix(docs): pin Magic-PDF version to 0.6.2b1 in install commands · a0c62b26

myhloli authored Aug 02, 2024

Update the install commands in both Ubuntu and Windows CUDA Acceleration
guides to specify Magic-PDF version 0.6.2b1, ensuring consistency andavoiding potential version mismatches.

a0c62b26

docs(FAQ): update dependency installation troubleshooting · 961330f7

myhloli authored Aug 02, 2024

Update the FAQ to clarify the dependency installation issue when using magic-pdf. Ensure
users are directed to install the specific version of magic-pdf that resolves the dependency
error, rather than listing all individual dependencies. This simplifies the troubleshooting process
and provides a direct solution for users encountering the "Required dependency not installed"
error.

961330f7

docs(models_zh_cn): update download methods from ModelScope · a24890b1

myhloli authored Aug 02, 2024

Update the download methods for models in the Chinese documentation to reflect
the latest options available from ModelScope. Simplify the section titles and
revise download instructions for clarity and consistency.

a24890b1

Merge remote-tracking branch 'origin/master' · a53cb30f
myhloli authored Aug 02, 2024

a53cb30f

docs: update model download instructions and CUDA acceleration setup · 3ef4d054

myhloli authored Aug 02, 2024

Update the documentation to reflect the latest model download procedures, emphasis on
model file integrity checks, and expanded instructions for setting up CUDA accelerationon Ubuntu and Windows environments. The README files for various OS have been
enhanced with additional details to assist users in configuring and verifying their
environments for optimal performance.

3ef4d054

delete old magic-pdf cli · 8d88330d
xuchao authored Aug 02, 2024

8d88330d
Make the documentation on how to download the model more concise · 2a06e0c8
xuchao authored Aug 02, 2024

2a06e0c8
fix some misdescription in document · f052c75e
xuchao authored Aug 02, 2024

f052c75e

docs(zh-CN): update installation guide for magic-pdf · 1c5b42e0

myhloli authored Aug 02, 2024

Update the Chinese documentation to include detailed steps for installingmagic-pdf using CPU and GPU. These updates clarify the process for end
users, addressing common issues such as configuration file placement and
model weight file downloads. The documentation now provides users with
direct links and version validation steps to ensure a smoother installation
experience.

1c5b42e0

feat(model inference): add table recognition and conversion to LaTeX (#284) · 37925f36

Kaiwen Liu authored Aug 02, 2024

* # add table recognition using struct-eqtable
## Changelog
31/07/20204
- Support table recognition. Table images will be converted into html.

### how to use the new feature:
set the attribute 'table-mode' to 'true' in magic-pdf.json

### caution:
it takes 200s to 500s to convert a single table image using cpu

* # add table recognition using struct-eqtable
## Changelog
31/07/20204
- Support table recognition. Table images will be converted into LaTex.

### how to use the new feature:
set the attribute 'table-mode' to 'true' in magic-pdf.json

### caution:
it takes 200s to 500s to convert a single table image using cpu

* # feat(model inference): add table recognition and convertion to LaTeX

# What's Changed

### New Features

- Add table content recognition, we use weights of [StructEqTable](https://github.com/UniModal4Reasoning/StructEqTable-Deploy) to convert table image to LaTex.

### Instruction

- pip install pypandoc struct-eqtable==0.1.0
- Download [StructEqTable weights](https://huggingface.co/wanderkid/PDF-Extract-Kit/tree/main/models/TabRec) and put it under models/ directory.
- Edit 'table-mode' value to turn on table recognition function which is turned off by default.
- If you did not download any models before, refer to [how to download models](docs/how_to_download_models_zh_cn.md)。

* add table recognition and convertion to LaTeX

* add table recognition and conversion to LaTeX

* add table recognition and conversion to LaTeX

* add table recognition and conversion to LaTeX

---------
Co-authored-by: liukaiwen <liukaiwen@pjlab.org.cn>

37925f36

docs(output-file): correct poly coordinate format and update table descriptions · 41737adf

myhloli authored Aug 02, 2024

- Fix the description of the 'poly' coordinate format in the output file documentation to correctly reflect the order of coordinates: left-top, right-top, right-bottom,
  left-bottom.
- Update various table-related descriptions for clarity and consistency, including
  field names and their corresponding explanations.
- Add version name field description in 'middle.json' structure to document the
  version of the magic-pdf used in the parsing process.
- Refactor the block and line description tables to improve readability and alignment
  with the rest of the documentation.

41737adf