Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
P
pdf-miner
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Qin Kaijie
pdf-miner
Commits
2154e7b9
Commit
2154e7b9
authored
Jun 27, 2024
by
赵小蒙
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
update readme
parent
6f945f17
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
31 additions
and
1 deletion
+31
-1
README_zh-CN.md
README_zh-CN.md
+31
-1
No files found.
README_zh-CN.md
View file @
2154e7b9
...
...
@@ -15,6 +15,13 @@
</div>
# MinerU
MinerU 是一款一站式开源数据提取工具,主要包含以下功能:
-
PDF文档提取 (Magic-PDF)
-
网页与电子书提取 (Magic-Doc)
# Magic-PDF
## 简介
...
...
@@ -49,7 +56,9 @@ https://github.com/magicpdf/Magic-PDF/assets/11393164/618937cb-dc6a-4646-b433-e3
### 子模块仓库
-
[
pdf-extract-kit
](
https://github.com/wangbinDL/pdf-extract-kit
)
-
[
Miner-PDF-Benchmark
](
https://github.com/opendatalab/Miner-PDF-Benchmark
)
-
[
Miner-PDF-Benchmark
](
https://github.com/opendatalab/Miner-PDF-Benchmark
)
端到端的PDF文档理解评估套件,专为大规模模型数据场景而设计。
## 上手指南
...
...
@@ -105,6 +114,27 @@ md_content = pipe.pipe_mk_markdown(image_dir, drop_mode="none")
详细实现可参考
[
demo.py
](
demo/demo.py
)
# Magic-Doc
Magic-Doc 是一款支持将网页或多格式电子书转换为 markdown 格式的工具。
主要功能包含
-
Web网页提取
-
跨模态精准解析图文、表格、公式信息
-
电子书文献提取
-
支持 epub,mobi等多格式文献,文本图片全适配
-
语言类型鉴定
-
支持176种语言的准确识别
## 项目仓库
-
[
Magic-Doc
](
https://github.com/magicpdf/Magic-Doc
)
## 版权说明
[
LICENSE.md
](
LICENSE.md
)
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment