Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
P
pdf-miner
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Qin Kaijie
pdf-miner
Commits
f052c75e
Commit
f052c75e
authored
Aug 02, 2024
by
xuchao
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
fix some misdescription in document
parent
1c5b42e0
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
5 additions
and
1 deletion
+5
-1
README_zh-CN_v2.md
README_zh-CN_v2.md
+5
-1
No files found.
README_zh-CN_v2.md
View file @
f052c75e
...
...
@@ -34,7 +34,8 @@
# 更新记录
-
2024/07/08 首次开源
-
2024/08/01 0.6.2b1发布,优化了依赖冲突问题和安装文档
-
2024/07/05 首次开源
<!-- TABLE OF CONTENT -->
...
...
@@ -82,6 +83,7 @@
## 项目简介
MinerU是一款将PDF转化为机器可读格式的工具(如markdown、json),可以很方便地抽取为任意格式。
MinerU诞生于
[
书生-浦语
](
https://github.com/InternLM/InternLM
)
的预训练过程中,我们将会集中精力解决科技文献中的符号转化问题,希望在大模型时代为科技发展做出贡献。
相比国内外知名商用产品MinerU还很年轻,如果遇到问题或者结果不及预期请到issue提交问题,同时附上相关PDF。
https://github.com/user-attachments/assets/4bea02c9-6d54-4cd6-97ed-dff14340982c
...
...
@@ -302,6 +304,8 @@ TODO
-
列表、代码块、目录在layout模型里还没有支持
-
漫画书、艺术图册、小学教材、习题尚不能很好解析
-
在一些公式密集的PDF上强制启用OCR效果会更好
-
如果您要处理包含大量公式的pdf,强烈建议开启OCR功能。使用pymuPDF提取文字的时候会出现文本行互相重叠的情况导致公式插入位置不准确。
-
好消息是,这些我们正在努力实现!
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment