Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
P
pdf-miner
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Qin Kaijie
pdf-miner
Commits
6e8e81c9
Commit
6e8e81c9
authored
Jun 25, 2024
by
赵小蒙
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
update readme
parent
63a4a062
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
81 additions
and
19 deletions
+81
-19
README.md
README.md
+38
-7
README_zh-CN.md
README_zh-CN.md
+43
-12
No files found.
README.md
View file @
6e8e81c9
...
...
@@ -41,21 +41,52 @@ Key features include:
### Usage Instructions
1.
**Install Magic-PDF**
#### 1. Install Magic-PDF
```
bash
pip
install
magic-pdf[cpu]
# Install the CPU version
or
pip
install
magic-pdf[gpu]
# Install the GPU version
pip
install
magic-pdf
```
2.
**Usage via Command Line**
#### 2. Usage via Command Line
###### simple
```
bash
cp
magic-pdf.template.json to ~/magic-pdf.json
magic-pdf pdf-command
--pdf
"pdf_path"
--model
"model_json_path"
```
###### more
```
bash
magic-pdf
--help
```
### All Thanks To Our Contributors
#### 3. Usage via Api
###### Local
```
python
image_writer
=
DiskReaderWriter
(
local_image_dir
)
image_dir
=
str
(
os
.
path
.
basename
(
local_image_dir
))
jso_useful_key
=
{
"_pdf_type"
:
""
,
"model_list"
:
model_json
}
pipe
=
UNIPipe
(
pdf_bytes
,
jso_useful_key
,
image_writer
)
pipe
.
pipe_classify
()
pipe
.
pipe_parse
()
md_content
=
pipe
.
pipe_mk_markdown
(
image_dir
,
drop_mode
=
"none"
)
```
###### Object Storage
```
python
s3pdf_cli
=
S3ReaderWriter
(
pdf_ak
,
pdf_sk
,
pdf_endpoint
)
image_dir
=
"s3://img_bucket/"
s3image_cli
=
S3ReaderWriter
(
img_ak
,
img_sk
,
img_endpoint
,
parent_path
=
image_dir
)
pdf_bytes
=
s3pdf_cli
.
read
(
s3_pdf_path
,
mode
=
s3pdf_cli
.
MODE_BIN
)
jso_useful_key
=
{
"_pdf_type"
:
""
,
"model_list"
:
model_json
}
pipe
=
UNIPipe
(
pdf_bytes
,
jso_useful_key
,
s3image_cli
)
pipe
.
pipe_classify
()
pipe
.
pipe_parse
()
md_content
=
pipe
.
pipe_mk_markdown
(
image_dir
,
drop_mode
=
"none"
)
```
Demo can be referred to
[
demo.py
](
https://github.com/magicpdf/Magic-PDF/blob/master/demo/demo.py
)
## All Thanks To Our Contributors
<a
href=
"https://github.com/magicpdf/Magic-PDF/graphs/contributors"
>
<img
src=
"https://contrib.rocks/image?repo=magicpdf/Magic-PDF"
/>
...
...
README_zh-CN.md
View file @
6e8e81c9
...
...
@@ -17,7 +17,7 @@
# Magic-PDF
##
#
简介
## 简介
Magic-PDF 是一款将 PDF 转化为 markdown 格式的工具。支持转换本地文档或者位于支持S3协议对象存储上的文件。
...
...
@@ -33,33 +33,64 @@ Magic-PDF 是一款将 PDF 转化为 markdown 格式的工具。支持转换本
-
支持cpu和gpu环境
-
支持windows/linux/mac平台
##
#
上手指南
## 上手指南
###
###
配置要求
### 配置要求
python 3.9+
###### 使用说明
1.
安装Magic-PDF
### 使用说明
#### 1. 安装Magic-PDF
```
bash
pip
install
magic-pdf[cpu]
# 安装 cpu 版本
或
pip
install
magic-pdf[gpu]
# 安装 gpu 版本
pip
install
magic-pdf
```
2.
通过命令行使用
#### 2.
通过命令行使用
###### 直接使用
```
bash
cp
magic-pdf.template.json to ~/magic-pdf.json
magic-pdf pdf-command
--pdf
"pdf_path"
--model
"model_json_path"
```
###### 更多用法
```
bash
magic-pdf
--help
```
### 版权说明
#### 3. 通过接口调用
###### 本地使用
```
python
image_writer
=
DiskReaderWriter
(
local_image_dir
)
image_dir
=
str
(
os
.
path
.
basename
(
local_image_dir
))
jso_useful_key
=
{
"_pdf_type"
:
""
,
"model_list"
:
model_json
}
pipe
=
UNIPipe
(
pdf_bytes
,
jso_useful_key
,
image_writer
)
pipe
.
pipe_classify
()
pipe
.
pipe_parse
()
md_content
=
pipe
.
pipe_mk_markdown
(
image_dir
,
drop_mode
=
"none"
)
```
###### 在对象存储上使用
```
python
s3pdf_cli
=
S3ReaderWriter
(
pdf_ak
,
pdf_sk
,
pdf_endpoint
)
image_dir
=
"s3://img_bucket/"
s3image_cli
=
S3ReaderWriter
(
img_ak
,
img_sk
,
img_endpoint
,
parent_path
=
image_dir
)
pdf_bytes
=
s3pdf_cli
.
read
(
s3_pdf_path
,
mode
=
s3pdf_cli
.
MODE_BIN
)
jso_useful_key
=
{
"_pdf_type"
:
""
,
"model_list"
:
model_json
}
pipe
=
UNIPipe
(
pdf_bytes
,
jso_useful_key
,
s3image_cli
)
pipe
.
pipe_classify
()
pipe
.
pipe_parse
()
md_content
=
pipe
.
pipe_mk_markdown
(
image_dir
,
drop_mode
=
"none"
)
```
详细实现可参考
[
demo.py
](
https://github.com/magicpdf/Magic-PDF/blob/master/demo/demo.py
)
## 版权说明
[
LICENSE.md
](
https://github.com/magicpdf/Magic-PDF/blob/master/LICENSE.md
)
##
#
鸣谢
## 鸣谢
-
[
PaddleOCR
](
https://github.com/PaddlePaddle/PaddleOCR
)
-
[
PyMuPDF
](
https://github.com/pymupdf/PyMuPDF
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment