Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
P
pdf-miner
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Qin Kaijie
pdf-miner
Commits
0a3afbf0
Commit
0a3afbf0
authored
Apr 16, 2024
by
kernel.h@qq.com
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
添加模型解析类
parent
7d08e78f
Changes
2
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
51 additions
and
2 deletions
+51
-2
magicpdf.py
magic_pdf/cli/magicpdf.py
+2
-2
magic_model.py
magic_pdf/model/magic_model.py
+49
-0
No files found.
magic_pdf/cli/magicpdf.py
View file @
0a3afbf0
...
...
@@ -119,7 +119,7 @@ def json_command(json, method):
_do_parse
(
pdf_data
,
jso
[
"doc_layout_result"
]
,
jso
,
method
,
local_image_rw
,
local_md_rw
,
...
...
@@ -158,7 +158,7 @@ def pdf_command(pdf, model, method):
)
_do_parse
(
pdf_data
,
jso
[
"doc_layout_result"
]
,
jso
,
method
,
local_image_rw
,
local_md_rw
,
...
...
magic_pdf/model/magic_model.py
0 → 100644
View file @
0a3afbf0
class
MagicModel
():
"""
每个函数没有得到元素的时候返回空list
"""
def
__fix_axis
():
# TODO 计算
self
.
__model_list
=
xx
def
__init__
(
model_list
:
list
,
page
:
Page
):
self
.
__model_list
=
model_list
self
.
__fix_axis
()
self
.
__page
=
page
def
get_imgs
(
self
,
page_no
:
int
):
# @许瑞
return_lst
=
[]
img
=
{
"bbox"
:[
x0
,
y0
,
x1
,
y1
]
}
img_caption
=
{
"bbox"
:[
x0
,
y0
,
x1
,
y1
],
"text"
:
""
,
}
return
[{
"img"
:
img
,
"caption"
:
img_caption
},]
def
get_tables
(
self
,
page_no
:
int
)
->
list
:
# 3个坐标, caption, table主体,table-note
pass
# 许瑞
def
get_equations
(
self
,
page_no
:
int
)
->
list
:
# 有坐标,也有字
return
inline_equations
,
interline_equations
# @凯文
def
get_discarded
(
self
,
page_no
:
int
)
->
list
:
# 自研模型,只有坐标
pass
# @凯文
def
get_text_blocks
(
self
,
page_no
:
int
)
->
list
:
# 自研模型搞的,只有坐标,没有字
pass
# @凯文
def
get_title_blocks
(
self
,
page_no
:
int
)
->
list
:
# 自研模型,只有坐标,没字
pass
# @凯文
def
get_ocr_text
(
self
,
page_no
:
int
)
->
list
:
# paddle 搞的,有字也有坐标
pass
# @小蒙
def
get_ocr_spans
(
self
,
page_no
:
int
)
->
list
:
pass
# @小蒙
\ No newline at end of file
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment