Commits · 3ef4d054cf019d6dce4e03e10f698726b7c14ac0 · Qin Kaijie / pdf-miner

02 Aug, 2024 4 commits

docs: update model download instructions and CUDA acceleration setup · 3ef4d054

myhloli authored Aug 02, 2024

Update the documentation to reflect the latest model download procedures, emphasis on
model file integrity checks, and expanded instructions for setting up CUDA accelerationon Ubuntu and Windows environments. The README files for various OS have been
enhanced with additional details to assist users in configuring and verifying their
environments for optimal performance.

3ef4d054

docs(zh-CN): update installation guide for magic-pdf · 1c5b42e0

myhloli authored Aug 02, 2024

Update the Chinese documentation to include detailed steps for installingmagic-pdf using CPU and GPU. These updates clarify the process for end
users, addressing common issues such as configuration file placement and
model weight file downloads. The documentation now provides users with
direct links and version validation steps to ensure a smoother installation
experience.

1c5b42e0

feat(model inference): add table recognition and conversion to LaTeX (#284) · 37925f36

Kaiwen Liu authored Aug 02, 2024

* # add table recognition using struct-eqtable
## Changelog
31/07/20204
- Support table recognition. Table images will be converted into html.

### how to use the new feature:
set the attribute 'table-mode' to 'true' in magic-pdf.json

### caution:
it takes 200s to 500s to convert a single table image using cpu

* # add table recognition using struct-eqtable
## Changelog
31/07/20204
- Support table recognition. Table images will be converted into LaTex.

### how to use the new feature:
set the attribute 'table-mode' to 'true' in magic-pdf.json

### caution:
it takes 200s to 500s to convert a single table image using cpu

* # feat(model inference): add table recognition and convertion to LaTeX

# What's Changed

### New Features

- Add table content recognition, we use weights of [StructEqTable](https://github.com/UniModal4Reasoning/StructEqTable-Deploy) to convert table image to LaTex.

### Instruction

- pip install pypandoc struct-eqtable==0.1.0
- Download [StructEqTable weights](https://huggingface.co/wanderkid/PDF-Extract-Kit/tree/main/models/TabRec) and put it under models/ directory.
- Edit 'table-mode' value to turn on table recognition function which is turned off by default.
- If you did not download any models before, refer to [how to download models](docs/how_to_download_models_zh_cn.md)。

* add table recognition and convertion to LaTeX

* add table recognition and conversion to LaTeX

* add table recognition and conversion to LaTeX

* add table recognition and conversion to LaTeX

---------
Co-authored-by: liukaiwen <liukaiwen@pjlab.org.cn>

37925f36

docs(output-file): correct poly coordinate format and update table descriptions · 41737adf

myhloli authored Aug 02, 2024

- Fix the description of the 'poly' coordinate format in the output file documentation to correctly reflect the order of coordinates: left-top, right-top, right-bottom,
  left-bottom.
- Update various table-related descriptions for clarity and consistency, including
  field names and their corresponding explanations.
- Add version name field description in 'middle.json' structure to document the
  version of the magic-pdf used in the parsing process.
- Refactor the block and line description tables to improve readability and alignment
  with the rest of the documentation.

41737adf

01 Aug, 2024 21 commits

feat: remove dummpy code, magic_pdf/cli, magic_pdf/train_utils (#291) · e155d322

icecraft authored Aug 01, 2024

* feat: remove dummpy code, magic_pdf/cli, magic_pdf/train_utils

* feat: expose version in command line

---------
Co-authored-by: shenguanlin <shenguanlin@pjlab.org.cn>

e155d322

docs: update README for Ubuntu CUDA Acceleration · 15125623

myhloli authored Aug 01, 2024

- Adjust command installation format for PaddlePaddle GPU.
- Clarify instruction numbering for testing OCR acceleration.

15125623

docs(zh_CN): update Ubuntu CUDA setup guide for accuracy · a09291ad

myhloli authored Aug 01, 2024

Update the Ubuntu CUDA Acceleration setup guide to reflect the correct user directory
path and improve the clarity of instructions. Remove references to Windows and macOS
as they are out of scope for this document. Ensure the configuration file copying
command is correctly represented for Linux users.

a09291ad

fix(docs): correct link to magic-pdf.template.json in README · 51a0bf4a

myhloli authored Aug 01, 2024

Update the link to the magic-pdf.template.json configuration template file in the
README_Ubuntu_CUDA_Acceleration_zh_CN.md document. The file path was previously
incorrect and has been amended to point to the correct location.

51a0bf4a

docs(magic-pdf): update model directory reference in configuration · 866e47a0

myhloli authored Aug 01, 2024

Update the instruction in README_Ubuntu_CUDA_Acceleration_zh_CN.md to reference
the correct section number for downloading the model weights. This change ensures
that users are directed to the correct location in the document for setting up the
model directory in the magic-pdf.json configuration.

866e47a0

Merge remote-tracking branch 'origin/master' · 92b981bd
myhloli authored Aug 01, 2024

92b981bd

docs: update Ubuntu CUDA acceleration guide for version 0.6.2- Add steps for... · fc18a5cf

myhloli authored Aug 01, 2024

docs: update Ubuntu CUDA acceleration guide for version 0.6.2- Add steps for Ubuntu 22.04 LTS installation.
- Detail the process of checking, installing, and configuring NVIDIA drivers.
- Include instructions for installing Anaconda and creating a specific environment.
- Provide guidance on installing magic-pdf and its dependencies.
- Add a note to verify magic-pdf version and report issues if necessary.
- Describe the process of downloading models and configuring the application.
- Include a sample command to run the application with CUDA acceleration.
- Add a note for enabling OCR CUDA acceleration with specific GPU requirements.

This update ensures users have the latest information for setting up CUDA accelerationwith magic-pdf on Ubuntu 22.04 LTS, specifically for version 0.6.2, and provides clearer
instructions on the installation and configuration process.

fc18a5cf

modify command usage document · a5e13b97
xuchao authored Aug 01, 2024

a5e13b97
modify readme_zh-cn_v2.md · 1d2f55a3
xuchao authored Aug 01, 2024

1d2f55a3
upload ocr_demo pdf · 7bca348d
myhloli authored Aug 01, 2024

7bca348d
Merge remote-tracking branch 'origin/master' · a697002c
myhloli authored Aug 01, 2024

a697002c

docs: restructure download guide and add ModelScope options · b4b2a099

myhloli authored Aug 01, 2024

Restructured the how-to download models document for better clarity and
added sections on downloading models from ModelScope, including SDK and
Git download methods. Provided detailed steps for installing Git LFS and
checking model integrity after download. Also included recommendations
for moving the models to an SSD for better performance.

b4b2a099

Feat/impl cli (#264) · 40e0827e

icecraft authored Aug 01, 2024

* feat: refractor cli command

* feat: add docs to describe the output files of cli

* feat: resove review comments

* feat: updat docs about middle.json

---------
Co-authored-by: shenguanlin <shenguanlin@pjlab.org.cn>

40e0827e

add minerU video · 5e38c4c8
xuchao authored Aug 01, 2024

5e38c4c8
Merge pull request #286 from nutshellfool/patch-1 · 37491304
drunkpig authored Aug 01, 2024
```
Update how_to_download_models_en.md
```
37491304
docs(readme): update wheel install URL · 3abf22cc
myhloli authored Aug 01, 2024

3abf22cc
Merge remote-tracking branch 'origin/master' · 78b8c9a9
myhloli authored Aug 01, 2024

78b8c9a9

docs(readme): update installation URLs for faster wheel downloads · 793e8d43

myhloli authored Aug 01, 2024

Change the URLs in the installation instructions to new mirrors that are expected to provide faster downloads for users. This update affects the installation guides for both detectron2 and magic-pdf in the Chinese documentation.

793e8d43

Update how_to_download_models_en.md · c30a1abd
Richard Li authored Aug 01, 2024

c30a1abd
Merge pull request #258 from nutshellfool/patch-1 · 4a1d82ed
drunkpig authored Aug 01, 2024
```
Update how_to_download_models_zh_cn.md
```
4a1d82ed

refactor(readme): optimize detectron2 installation guide · 69ce578c

myhloli authored Aug 01, 2024

Reorganize the installation instructions for Magic-PDF to clarify the dependency on
detectron2 and provide a more straightforward installation process. The update includes
separating the dependency installation from the package installation and adding a note
about precompiled wheels for Python 3.10.

BREAKING CHANGE: The installation guide now assumes basic familiarity with detectron2
installation requirements. Users who need to compile detectron2 from source should refer
to the official detectron2 documentation.

69ce578c

31 Jul, 2024 14 commits

Update README_zh-CN.md · 5e8d149f
Xiaomeng Zhao authored Jul 31, 2024

5e8d149f

docs(readme): update PyTorch installation guide for CUDA 11.8 · 13d30a4f

myhloli authored Jul 31, 2024

Update the PyTorch installation command in the README files for both English and Chinese
versions to reflect the required version compatibility with CUDA 11.8. Include explicit
instructions to specify the PyTorch version to avoid automatic installation of higher,
unsupported versions. Additionally, clarify the importance of modifying the "device-mode"
parameter in the magic-pdf.json configuration file for proper CUDA device selection.

13d30a4f

fix(readme): specify supported PyTorch versions in install guide · fd60393d

myhloli authored Jul 31, 2024

Update the PyTorch installation guide in both English and Chinese READMEs to explicitly
recommend using torch==2.3.1 and torchvision==0.18.1 for CUDA 11.8. Emphasize the
importance of specifying these versions to avoid compatibility issues with higher,
unsupported versions.

fd60393d

docs(readme): add notice for pre-release version 0.6.2b1 · 6dbb4197

myhloli authored Jul 31, 2024

A pre-release version0.6.2b1 of magic-pdf is now available. This version includes
many fixes addressed in our logs but has not undergone full QA testing. Users are
advised to report any issues encountered or revert to version 0.6.1. The installationguides in both Japanese and Chinese READMEs have been updated to reflect the availability
of this pre-release version and the previous stable version.

BREAKING CHANGE: Installation commands now point to version 0.6.2b1 by default.
Users who wish to install the stable version 0.6.1 should follow the provided
command instead.

6dbb4197

docs(readme): update install command and add beta version notice · 891a9741

myhloli authored Jul 31, 2024

- Change the pip install command in README_zh-CN.md to reflect the new version 0.6.2b1.
- Include a notice about the pre-release of version 0.6.2beta, cautioning users about its未经完整QA测试的状态，并提供回退到0.6.1版本的指导。
- Verify the installed version with `magic-pdf --version` after installation to ensure
 the correct version is installed, addressing feedback about incorrect versions due to
 mirror source and dependency conflicts.

891a9741

update discord link · cfcb1f47
xuchao authored Jul 31, 2024

cfcb1f47

feat(readme): add beta release note for 0.6.2b1 · aa768fcc

myhloli authored Jul 31, 2024

We have pre-released the 0.6.2 beta version, which addresses numerous issues
reported in our logs. This commit updates the installation guide in the README to
include information on how to install this beta version. Users are advised that
this build has not undergone full QA testing and may contain issues. A
revert instruction to version 0.6.1 is also provided for users who encounter
problems.

BREAKING CHANGE: Installation instructions now include beta version
information. Users should be aware of potential issues with the0.6.2 beta
version and consider reverting to 0.6.1 if necessary.

aa768fcc

Merge remote-tracking branch 'origin/master' · c05d37dd
myhloli authored Jul 31, 2024

c05d37dd
fix(readme): pin magic-pdf version to 0.6.2b1 in Chinese README · badea43a
myhloli authored Jul 31, 2024

badea43a
Update version.py with new version · c88ba5df
myhloli authored Jul 31, 2024

c88ba5df
Merge remote-tracking branch 'origin/master' · 3aec9c61
myhloli authored Jul 31, 2024

3aec9c61

docs: add installation guide for git lfs on various platforms · 808563ce

myhloli authored Jul 31, 2024

Add detailed instructions for installing git lfs on Linux, macOS, and Windows
to facilitate users in downloading models from ModelScope repository. The guide
is included in the `how_to_download_models_zh_cn.md` document.

808563ce

@nutshellfool has signed the CLA in opendatalab/MinerU#258 · b495880e
github-actions[bot] authored Jul 31, 2024

b495880e
Update how_to_download_models_zh_cn.md · b7cd875f
Richard Li authored Jul 31, 2024
```
use git lfs clone to download model from ModelScope
```
b7cd875f

30 Jul, 2024 1 commit
- modify readme, make expression more clear · b3850865
  xuchao authored Jul 30, 2024
  
  b3850865