Unverified Commit 3d2fb836 authored by yyy's avatar yyy Committed by GitHub

feat: add test case (#499)

Co-authored-by: 's avatarquyuan <quyuan@pjlab.org>
parent f0a8886c
...@@ -6,20 +6,22 @@ on: ...@@ -6,20 +6,22 @@ on:
push: push:
branches: branches:
- "master" - "master"
- "dev"
paths-ignore: paths-ignore:
- "cmds/**" - "cmds/**"
- "**.md" - "**.md"
pull_request: pull_request:
branches: branches:
- "master" - "master"
- "dev"
paths-ignore: paths-ignore:
- "cmds/**" - "cmds/**"
- "**.md" - "**.md"
workflow_dispatch: workflow_dispatch:
jobs: jobs:
cli-test: cli-test:
runs-on: ubuntu-latest runs-on: pdf
timeout-minutes: 40 timeout-minutes: 120
strategy: strategy:
fail-fast: true fail-fast: true
...@@ -28,27 +30,22 @@ jobs: ...@@ -28,27 +30,22 @@ jobs:
uses: actions/checkout@v3 uses: actions/checkout@v3
with: with:
fetch-depth: 2 fetch-depth: 2
- name: check-requirements - name: install
run: |
pip install -r requirements.txt
pip install -r requirements-qa.txt
pip install magic-pdf
- name: test_cli
run: | run: |
cp magic-pdf.template.json ~/magic-pdf.json echo $GITHUB_WORKSPACE && sh tests/retry_env.sh
echo $GITHUB_WORKSPACE - name: unit test
cd $GITHUB_WORKSPACE && export PYTHONPATH=. && pytest -s -v tests/test_unit.py run: |
cd $GITHUB_WORKSPACE && pytest -s -v tests/test_cli/test_cli.py cd $GITHUB_WORKSPACE && export PYTHONPATH=. && coverage run -m pytest tests/test_unit.py --cov=magic_pdf/ --cov-report term-missing --cov-report html
cd $GITHUB_WORKSPACE && python tests/get_coverage.py
- name: benchmark - name: cli test
run: | run: |
cd $GITHUB_WORKSPACE && pytest -s -v tests/test_cli/test_bench.py cd $GITHUB_WORKSPACE && pytest -s -v tests/test_cli/test_cli_sdk.py
notify_to_feishu: notify_to_feishu:
if: ${{ always() && !cancelled() && contains(needs.*.result, 'failure') && (github.ref_name == 'master') }} if: ${{ always() && !cancelled() && contains(needs.*.result, 'failure') && (github.ref_name == 'master') }}
needs: [cli-test] needs: cli-test
runs-on: ubuntu-latest runs-on: pdf
steps: steps:
- name: get_actor - name: get_actor
run: | run: |
...@@ -67,9 +64,5 @@ jobs: ...@@ -67,9 +64,5 @@ jobs:
- name: notify - name: notify
run: | run: |
curl ${{ secrets.WEBHOOK_URL }} -H 'Content-Type: application/json' -d '{ echo ${{ secrets.USER_ID }}
"msgtype": "text", curl -X POST -H "Content-Type: application/json" -d '{"msg_type":"post","content":{"post":{"zh_cn":{"title":"'${{ github.repository }}' GitHubAction Failed","content":[[{"tag":"text","text":""},{"tag":"a","text":"Please click here for details ","href":"https://github.com/'${{ github.repository }}'/actions/runs/'${GITHUB_RUN_ID}'"},{"tag":"at","user_id":"'${{ secrets.USER_ID }}'"}]]}}}}' ${{ secrets.WEBHOOK_URL }}
"text": { \ No newline at end of file
"mentioned_list": ["${{ env.METIONS }}"] , "content": "'${{ github.repository }}' GitHubAction Failed!\n 细节请查看:https://github.com/'${{ github.repository }}'/actions/runs/'${GITHUB_RUN_ID}'"
}
}'
\ No newline at end of file
...@@ -14,4 +14,6 @@ tqdm ...@@ -14,4 +14,6 @@ tqdm
htmltabletomd htmltabletomd
pypandoc pypandoc
pyopenssl==24.0.0 pyopenssl==24.0.0
struct-eqtable==0.1.0 struct-eqtable==0.1.0
\ No newline at end of file pytest-cov
beautifulsoup4
\ No newline at end of file
"""
get cov
"""
from bs4 import BeautifulSoup
def get_covrage():
"""get covrage"""
# 发送请求获取网页内容
html_content = open("htmlcov/index.html", "r", encoding="utf-8").read()
soup = BeautifulSoup(html_content, 'html.parser')
# 查找包含"pc_cov"的span标签
pc_cov_span = soup.find('span', class_='pc_cov')
# 提取百分比值
percentage_value = pc_cov_span.text.strip()
percentage_float = float(percentage_value.rstrip('%'))
print ("percentage_float:", percentage_float)
assert percentage_float >= 0.2
if __name__ == '__main__':
get_covrage()
\ No newline at end of file
This diff is collapsed.
#!/bin/bash
# 定义最大重试次数
max_retries=5
retry_count=0
while true; do
# prepare env
source activate MinerU
pip install -r requirements-qa.txt
pip install magic-pdf[full]==0.7.0b1 --extra-index-url https://wheels.myhloli.com -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install paddlepaddle-gpu==3.0.0b1 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/
exit_code=$?
if [ $exit_code -eq 0 ]; then
echo "test.sh 成功执行!"
break
else
let retry_count+=1
if [ $retry_count -ge $max_retries ]; then
echo "达到最大重试次数 ($max_retries),放弃重试。"
exit 1
fi
echo "test.sh 执行失败 (退出码: $exit_code)。尝试第 $retry_count 次重试..."
sleep 5 # 等待 5 秒后重试
fi
done
import subprocess """common definitions."""
import os import os
import shutil
def check_shell(cmd): def check_shell(cmd):
""" """shell successful."""
shell successful
"""
res = os.system(cmd) res = os.system(cmd)
assert res == 0 assert res == 0
def count_folders_and_check_contents(file_path):
"""" def cli_count_folders_and_check_contents(file_path):
获取文件夹大小 """" count cli files."""
""" if os.path.exists(file_path):
for files in os.listdir(file_path):
folder_count = os.path.getsize(os.path.join(file_path, files))
assert folder_count > 0
assert len(os.listdir(file_path)) > 5
def sdk_count_folders_and_check_contents(file_path):
"""count folders."""
if os.path.exists(file_path): if os.path.exists(file_path):
folder_count = os.path.getsize(file_path) file_count = os.path.getsize(file_path)
assert folder_count > 0 assert file_count > 0
else:
exit(1)
if __name__ == "__main__": def delete_file(path):
count_folders_and_check_contents("/home/quyuan/code/Magic-PDF/Magic-PDF/Magic-PDF/ci") """delete file."""
\ No newline at end of file if not os.path.exists(path):
if os.path.isfile(path):
try:
os.remove(path)
print(f"File '{path}' deleted.")
except TypeError as e:
print(f"Error deleting file '{path}': {e}")
elif os.path.isdir(path):
try:
shutil.rmtree(path)
print(f"Directory '{path}' and its contents deleted.")
except TypeError as e:
print(f"Error deleting directory '{path}': {e}")
\ No newline at end of file
# 数学新星问题征解
第十五期 (2016.06)
主持: 牟晓生
第一题. 设 $z_{1}, z_{2}, z_{3}$ 是单位复数. 证明存在单位复数 $z$ 使得:
$$
\frac{1}{\left|z-z_{1}\right|^{2}}+\frac{1}{\left|z-z_{2}\right|^{2}}+\frac{1}{\left|z-z_{3}\right|^{2}} \leq \frac{9}{4}
$$
(湖北武钢三中学生 王逸轩, 上海大学冷岗松 供题)
第二题. 如图, $D$ 是正三角形 $A B C$ 的边 $B C$ 上一点, $B D>C D$. 记 $O_{1}, I_{1}$ 为 $\triangle A B D$ 的外心与内心, $O_{2}, I_{2}$ 为 $\triangle A C D$ 的外心与内心. 圆 $I_{1}$ 与圆 $I_{2}$ 除 $B C$外的另一条外公切线交 $A B, A C$ 于 $P, Q$. 设直线 $P I_{1}$与 $Q I_{2}$ 交于 $R$, 而直线 $O_{1} I_{1}$ 与 $O_{2} I_{2}$ 交于 $T$. 证明: $A T^{2}=A R^{2}+A D \cdot B C$.
(广西钦州 卢圣 供题)
第三题. 给定正整数 $m, n$, 考虑在 $m \times n$ 白棋盘上先将一些格染成黑色. 在之后的每一时刻, 若存在一个白格至少与两个黑格相邻, 则可将它也染成黑色. 求最初至少要染多少个黑色格才能在某一时刻染黑整个棋盘?
(哈佛大学 牟晓生 供题)
第四题. $A B C$ 是一个三角形, 而 $P, Q, R$ 分别是 $B C, C A, A B$ 上的点。证明 $\triangle P Q R$ 的周长不小于 $\triangle A Q R, \triangle B R P, \triangle C P Q$ 周长的最小值.
(哈佛大学 牟晓生 供题)
...@@ -37,9 +37,9 @@ class TestBench(): ...@@ -37,9 +37,9 @@ class TestBench():
now_simscore = now_score["average_sim_score"] now_simscore = now_score["average_sim_score"]
now_editdistance = now_score["average_edit_distance"] now_editdistance = now_score["average_edit_distance"]
now_bleu = now_score["average_bleu_score"] now_bleu = now_score["average_bleu_score"]
#assert last_simscore <= now_simscore assert last_simscore <= now_simscore
#assert last_editdistance <= now_editdistance assert last_editdistance <= now_editdistance
#assert last_bleu <= now_bleu assert last_bleu <= now_bleu
def get_score(): def get_score():
......
This diff is collapsed.
This diff is collapsed.
dependent on the service headway and the reliability of the departure time of the service to which passengers are incident.
After briefly introducing the random incidence model, which is often assumed to hold at short headways, the balance of this section reviews six studies of passenger incidence behavior that are moti- vated by understanding the relationships between service headway, service reliability, passenger incidence behavior, and passenger waiting time in a more nuanced fashion than is embedded in the random incidence assumption ( 2 ). Three of these studies depend on manually collected data, two studies use data from AFC systems, and one study analyzes the issue purely theoretically. These studies reveal much about passenger incidence behavior, but all are found to be limited in their general applicability by the methods with which they collect information about passengers and the services those passengers intend to use.
# Random Passenger Incidence Behavior
One characterization of passenger incidence behavior is that of ran- dom incidence ( 3 ). The key assumption underlying the random inci- dence model is that the process of passenger arrivals to the public transport service is independent from the vehicle departure process of the service. This implies that passengers become incident to the service at a random time, and thus the instantaneous rate of passen- ger arrivals to the service is uniform over a given period of time. Let $W$ and $H$ be random variables representing passenger waiting times and service headways, respectively. Under the random incidence assumption and the assumption that vehicle capacity is not a binding constraint, a classic result of transportation science is that
$$
E!\\left(W\\right)!=!\\frac{E!\\left\[H^{2}\\right\]}{2E!\\left\[H\\right\]}!=!\\frac{E!\\left\[H\\right\]}{2}!!\\left(1!+!\\operatorname{CV}!\\left(H\\right)^{2}\\right)
$$
where $E\[X\]$ is the probabilistic expectation of some random variable $X$ and $\\operatorname{CV}(H)$ is the coefficient of variation of $H$ , a unitless measure of the variability of $H$ defined as
$$
\\mathbf{CV}\\big(H\\big)!=!\\frac{\\boldsymbol{\\upsigma}\_{H}}{E\\big\[H\\big\]}
$$
where $\\upsigma\_{H}$ is the standard deviation of $H\\left(4\\right)$ . The second expression in Equation 1 is particularly useful because it expresses the mean passenger waiting time as the sum of two components: the waiting time caused by the mean headway (i.e., the reciprocal of service fre- quency) and the waiting time caused by the variability of the head- ways (which is one measure of service reliability). When the service is perfectly reliable with constant headways, the mean ­ waiting time will be simply half the headway.
# More Behaviorally Realistic Incidence Models
Jolliffe and Hutchinson studied bus passenger incidence in South London suburbs ( 5 ). They observed 10 bus stops for $^{1\\mathrm{~h~}}$ per day over 8 days, recording the times of passenger incidence and actual and scheduled bus departures. They limited their stop selection to those served by only a single bus route with a single service pat- tern so as to avoid ambiguity about which service a passenger was waiting for. The authors found that the actual average passenger waiting time was $30%$ less than predicted by the random incidence model. They also found that the empirical distributions of passenger incidence times (by time of day) had peaks just before the respec- tive average bus departure times. They hypothesized the existence of three classes of passengers: with proportion $q$ , passengers whose time of incidence is causally coincident with that of a bus departure (e.g., because they saw the approaching bus from their home or a shop window); with proportion $p(1-q)$ , passengers who time their arrivals to minimize expected waiting time; and with proportion $(1-p)(1-q)$ , passengers who are randomly incident. The authors found that $p$ was positively correlated with the potential reduction in waiting time (compared with arriving randomly) that resulted from knowledge of the timetable and of service reliability. They also found $p$ to be higher in the peak commuting periods rather than in the off-peak periods, indicating more awareness of the timetable or historical reliability, or both, by commuters.
Bowman and Turnquist built on the concept of aware and unaware passengers of proportions $p$ and $(1-p)$ , respectively. They proposed a utility-based model to estimate $p$ and the distribution of incidence times, and thus the mean waiting time, of aware passengers over a given headway as a function of the headway and reliability of bus departure times $(l)$ . They observed seven bus stops in Chicago, Illinois, each served by a single (different) bus route, between 6:00 and $8{\\cdot}00;\\mathrm{a.m}$ . for 5 to 10 days each. The bus routes had headways of 5 to $20~\\mathrm{min}$ and a range of reliabilities. The authors found that actual average waiting time was substantially less than predicted by the random incidence model. They estimated that $p$ was not statistically significantly different from 1.0, which they explain by the fact that all observations were taken during peak commuting times. Their model predicts that the longer the headway and the more reliable the departures, the more peaked the distribution of incidence times will be and the closer that peak will be to the next scheduled departure time. This prediction demonstrates what they refer to as a safety margin that passengers add to reduce the chance of missing their bus when the service is known to be somewhat unreliable. Such a safety margin can also result from unreliability in passengers’ journeys to the public transport stop or station. Bowman and ­ Turnquist conclude from their model that the random incidence model underestimates the waiting time benefits of improving reli- ability and overestimates the waiting time benefits of increasing ser- vice frequency. This is because as reliability increases passengers can better predict departure times and so can time their incidence to decrease their waiting time.
Furth and Muller study the issue in a theoretical context and gener- ally agree with the above findings ( 2 ). They are primarily concerned with the use of data from automatic vehicle-tracking systems to assess the impacts of reliability on passenger incidence behavior and wait- ing times. They propose that passengers will react to unreliability by departing earlier than they would with reliable services. Randomly incident unaware passengers will experience unreliability as a more dispersed distribution of headways and simply allocate additional time to their trip plan to improve the chance of arriving at their des- tination on time. Aware passengers, whose incidence is not entirely random, will react by timing their incidence somewhat earlier than the scheduled departure time to increase their chance of catching the desired service. The authors characterize these ­ reactions as the costs of unreliability.
Luethi et al. continued with the analysis of manually collected data on actual passenger behavior ( 6 ). They use the language of probability to describe two classes of passengers. The first is timetable-dependent passengers (i.e., the aware passengers), whose incidence behavior is affected by awareness (possibly gained
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment