kr_lp_pgnet/scripts/run_step1.sh

#!/usr/bin/env bash
# Step1 pretrain — 합성 데이터로 PGNet 학습
#
# 컨테이너 안 실행:
#   docker exec kr_lp_pgnet bash /workspace/kr_lp_pgnet/scripts/run_step1.sh
#
# 환경 변수:
#   DRY_RUN=1   2 epoch만 돌려 동작 검증
#   EPOCHS=N    epoch 수 override (기본 config의 epoch_num)
#   NUM_SAMPLES=N  합성 데이터 수 (기본 50000)

set -euo pipefail

PADDLEOCR_DIR=/workspace/PaddleOCR
KR_LP_DIR=/workspace/kr_lp_pgnet
TRAIN_DATA=/workspace/train_data
SYNTH_DIR="$TRAIN_DATA/kr_lp_synth"
ASSET_DIR="$KR_LP_DIR/data_gen/Korean-license-plate-Generator"
NUM_SAMPLES="${NUM_SAMPLES:-50000}"

TS=$(date +%Y%m%d_%H%M)
RUN_NAME="step1-${TS}"
OUTPUT_DIR="$PADDLEOCR_DIR/output/kr_lp_pgnet_${TS}"
LOG="$OUTPUT_DIR/run.log"

echo "==========================="
echo "RUN: $RUN_NAME"
echo "OUTPUT: $OUTPUT_DIR"
echo "==========================="

# ── 1. 합성 데이터 생성 ──────────────────────────────────────────────────────
echo "[1/3] 합성 데이터 생성 (${NUM_SAMPLES}장)"

rm -rf "$SYNTH_DIR"
python3.10 "$KR_LP_DIR/data_gen/generate_synthetic.py" \
    --asset_dir "$ASSET_DIR" \
    --out_dir   "$SYNTH_DIR" \
    --num       "$NUM_SAMPLES" \
    --dict      "$KR_LP_DIR/dict/kr_lp_dict.txt"

# ── 2. eval GT mat 생성 ─────────────────────────────────────────────────────
echo "[2/3] eval GT mat 생성"

python3.10 "$KR_LP_DIR/tools/make_gt_mat.py" \
    --label   "$SYNTH_DIR/test/test.txt" \
    --out_dir "$SYNTH_DIR/gt"

# ── 3. 학습 ─────────────────────────────────────────────────────────────────
echo "[3/3] Step1 학습 시작"
cd "$PADDLEOCR_DIR"

if [ ! -e ./train_data ]; then
    ln -sf "$TRAIN_DATA" ./train_data
fi

mkdir -p "$OUTPUT_DIR"

OVERRIDE=(
    -o Global.pretrained_model=./pretrain_models/train_step1/best_accuracy
       Global.load_static_weights=False
       Global.save_model_dir="${OUTPUT_DIR}/"
       Global.save_res_path="${OUTPUT_DIR}/predicts.txt"
       wandb.name="${RUN_NAME}"
)
if [ -n "${EPOCHS:-}" ]; then
    OVERRIDE+=(Global.epoch_num="$EPOCHS")
fi
if [ "${DRY_RUN:-0}" = "1" ]; then
    OVERRIDE+=(Global.epoch_num=2 Global.eval_batch_step="[0,200]")
    echo "DRY_RUN=1 → 2 epoch만 실행"
fi

echo "  config:   configs/e2e/kr_lp_pgnet.yml"
echo "  data:     $SYNTH_DIR/"
echo "  output:   $OUTPUT_DIR/"
echo "  wandb:    $RUN_NAME"
echo "  log:      $LOG"

python3.10 tools/train.py -c configs/e2e/kr_lp_pgnet.yml "${OVERRIDE[@]}" 2>&1 | tee "$LOG"
Add Step1 training runner and lower default epochs to 50 - run_step1.sh: symlinks /workspace/train_data into PaddleOCR, runs tools/train.py with the step1 pretrain checkpoint, supports DRY_RUN=1 for quick smoke test and EPOCHS=N override - epoch_num: 200 -> 50 (matches the 50k synthetic budget) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> 1 month ago			`#!/usr/bin/env bash`
			`# Step1 pretrain — 합성 데이터로 PGNet 학습`
			`#`
			`# 컨테이너 안 실행:`
			`# docker exec kr_lp_pgnet bash /workspace/kr_lp_pgnet/scripts/run_step1.sh`
			`#`
			`# 환경 변수:`
			`# DRY_RUN=1 2 epoch만 돌려 동작 검증`
			`# EPOCHS=N epoch 수 override (기본 config의 epoch_num)`
run_step1: add timestamp to output dir and wandb run name, reset synth data on each run Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> 4 weeks ago			`# NUM_SAMPLES=N 합성 데이터 수 (기본 50000)`
Add Step1 training runner and lower default epochs to 50 - run_step1.sh: symlinks /workspace/train_data into PaddleOCR, runs tools/train.py with the step1 pretrain checkpoint, supports DRY_RUN=1 for quick smoke test and EPOCHS=N override - epoch_num: 200 -> 50 (matches the 50k synthetic budget) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> 1 month ago
			`set -euo pipefail`

			`PADDLEOCR_DIR=/workspace/PaddleOCR`
			`KR_LP_DIR=/workspace/kr_lp_pgnet`
			`TRAIN_DATA=/workspace/train_data`
fix: tight polygon 라벨 + eval GT mat 생성 파이프라인 추가 - generate_synthetic.py: plate 전체 box → 글자 실제 좌표 기반 tight polygon으로 변경 (글자 반복 출력·over-segmentation의 근본 원인 해결) gen_type1/2/_gen_two_line 모두 (plate, label_list) 통일 반환 - tools/make_gt_mat.py: test.txt → ICDAR wordBB 포맷 gt_img_N.mat 생성 스크립트 신규 (E2EMetric seqerr=0.99 고착 문제 해결) - scripts/run_step1.sh: 데이터 생성 → GT mat 생성 → 학습 3단계로 재구성 NUM_SAMPLES 환경변수로 데이터 수 제어 가능 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> 1 month ago			`SYNTH_DIR="$TRAIN_DATA/kr_lp_synth"`
			`ASSET_DIR="$KR_LP_DIR/data_gen/Korean-license-plate-Generator"`
run_step1: add timestamp to output dir and wandb run name, reset synth data on each run Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> 4 weeks ago			`NUM_SAMPLES="${NUM_SAMPLES:-50000}"`

			`TS=$(date +%Y%m%d_%H%M)`
			`RUN_NAME="step1-${TS}"`
			`OUTPUT_DIR="$PADDLEOCR_DIR/output/kr_lp_pgnet_${TS}"`
			`LOG="$OUTPUT_DIR/run.log"`
Add Step1 training runner and lower default epochs to 50 - run_step1.sh: symlinks /workspace/train_data into PaddleOCR, runs tools/train.py with the step1 pretrain checkpoint, supports DRY_RUN=1 for quick smoke test and EPOCHS=N override - epoch_num: 200 -> 50 (matches the 50k synthetic budget) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> 1 month ago
fix: tight polygon 라벨 + eval GT mat 생성 파이프라인 추가 - generate_synthetic.py: plate 전체 box → 글자 실제 좌표 기반 tight polygon으로 변경 (글자 반복 출력·over-segmentation의 근본 원인 해결) gen_type1/2/_gen_two_line 모두 (plate, label_list) 통일 반환 - tools/make_gt_mat.py: test.txt → ICDAR wordBB 포맷 gt_img_N.mat 생성 스크립트 신규 (E2EMetric seqerr=0.99 고착 문제 해결) - scripts/run_step1.sh: 데이터 생성 → GT mat 생성 → 학습 3단계로 재구성 NUM_SAMPLES 환경변수로 데이터 수 제어 가능 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> 1 month ago			`echo "==========================="`
run_step1: add timestamp to output dir and wandb run name, reset synth data on each run Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> 4 weeks ago			`echo "RUN: $RUN_NAME"`
			`echo "OUTPUT: $OUTPUT_DIR"`
fix: tight polygon 라벨 + eval GT mat 생성 파이프라인 추가 - generate_synthetic.py: plate 전체 box → 글자 실제 좌표 기반 tight polygon으로 변경 (글자 반복 출력·over-segmentation의 근본 원인 해결) gen_type1/2/_gen_two_line 모두 (plate, label_list) 통일 반환 - tools/make_gt_mat.py: test.txt → ICDAR wordBB 포맷 gt_img_N.mat 생성 스크립트 신규 (E2EMetric seqerr=0.99 고착 문제 해결) - scripts/run_step1.sh: 데이터 생성 → GT mat 생성 → 학습 3단계로 재구성 NUM_SAMPLES 환경변수로 데이터 수 제어 가능 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> 1 month ago			`echo "==========================="`

run_step1: add timestamp to output dir and wandb run name, reset synth data on each run Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> 4 weeks ago			`# ── 1. 합성 데이터 생성 ──────────────────────────────────────────────────────`
			`echo "[1/3] 합성 데이터 생성 (${NUM_SAMPLES}장)"`

			`rm -rf "$SYNTH_DIR"`
fix: tight polygon 라벨 + eval GT mat 생성 파이프라인 추가 - generate_synthetic.py: plate 전체 box → 글자 실제 좌표 기반 tight polygon으로 변경 (글자 반복 출력·over-segmentation의 근본 원인 해결) gen_type1/2/_gen_two_line 모두 (plate, label_list) 통일 반환 - tools/make_gt_mat.py: test.txt → ICDAR wordBB 포맷 gt_img_N.mat 생성 스크립트 신규 (E2EMetric seqerr=0.99 고착 문제 해결) - scripts/run_step1.sh: 데이터 생성 → GT mat 생성 → 학습 3단계로 재구성 NUM_SAMPLES 환경변수로 데이터 수 제어 가능 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> 1 month ago			`python3.10 "$KR_LP_DIR/data_gen/generate_synthetic.py" \`
			`--asset_dir "$ASSET_DIR" \`
			`--out_dir "$SYNTH_DIR" \`
			`--num "$NUM_SAMPLES" \`
			`--dict "$KR_LP_DIR/dict/kr_lp_dict.txt"`

			`# ── 2. eval GT mat 생성 ─────────────────────────────────────────────────────`
			`echo "[2/3] eval GT mat 생성"`

			`python3.10 "$KR_LP_DIR/tools/make_gt_mat.py" \`
			`--label "$SYNTH_DIR/test/test.txt" \`
			`--out_dir "$SYNTH_DIR/gt"`

			`# ── 3. 학습 ─────────────────────────────────────────────────────────────────`
run_step1: add timestamp to output dir and wandb run name, reset synth data on each run Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> 4 weeks ago			`echo "[3/3] Step1 학습 시작"`
Add Step1 training runner and lower default epochs to 50 - run_step1.sh: symlinks /workspace/train_data into PaddleOCR, runs tools/train.py with the step1 pretrain checkpoint, supports DRY_RUN=1 for quick smoke test and EPOCHS=N override - epoch_num: 200 -> 50 (matches the 50k synthetic budget) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> 1 month ago			`cd "$PADDLEOCR_DIR"`

			`if [ ! -e ./train_data ]; then`
			`ln -sf "$TRAIN_DATA" ./train_data`
			`fi`

run_step1: add timestamp to output dir and wandb run name, reset synth data on each run Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> 4 weeks ago			`mkdir -p "$OUTPUT_DIR"`
Add Step1 training runner and lower default epochs to 50 - run_step1.sh: symlinks /workspace/train_data into PaddleOCR, runs tools/train.py with the step1 pretrain checkpoint, supports DRY_RUN=1 for quick smoke test and EPOCHS=N override - epoch_num: 200 -> 50 (matches the 50k synthetic budget) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> 1 month ago
			`OVERRIDE=(`
			`-o Global.pretrained_model=./pretrain_models/train_step1/best_accuracy`
			`Global.load_static_weights=False`
run_step1: add timestamp to output dir and wandb run name, reset synth data on each run Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> 4 weeks ago			`Global.save_model_dir="${OUTPUT_DIR}/"`
			`Global.save_res_path="${OUTPUT_DIR}/predicts.txt"`
			`wandb.name="${RUN_NAME}"`
Add Step1 training runner and lower default epochs to 50 - run_step1.sh: symlinks /workspace/train_data into PaddleOCR, runs tools/train.py with the step1 pretrain checkpoint, supports DRY_RUN=1 for quick smoke test and EPOCHS=N override - epoch_num: 200 -> 50 (matches the 50k synthetic budget) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> 1 month ago			`)`
			`if [ -n "${EPOCHS:-}" ]; then`
			`OVERRIDE+=(Global.epoch_num="$EPOCHS")`
			`fi`
			`if [ "${DRY_RUN:-0}" = "1" ]; then`
			`OVERRIDE+=(Global.epoch_num=2 Global.eval_batch_step="[0,200]")`
			`echo "DRY_RUN=1 → 2 epoch만 실행"`
			`fi`

run_step1: add timestamp to output dir and wandb run name, reset synth data on each run Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> 4 weeks ago			`echo " config: configs/e2e/kr_lp_pgnet.yml"`
			`echo " data: $SYNTH_DIR/"`
			`echo " output: $OUTPUT_DIR/"`
			`echo " wandb: $RUN_NAME"`
			`echo " log: $LOG"`
Add Step1 training runner and lower default epochs to 50 - run_step1.sh: symlinks /workspace/train_data into PaddleOCR, runs tools/train.py with the step1 pretrain checkpoint, supports DRY_RUN=1 for quick smoke test and EPOCHS=N override - epoch_num: 200 -> 50 (matches the 50k synthetic budget) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> 1 month ago
			`python3.10 tools/train.py -c configs/e2e/kr_lp_pgnet.yml "${OVERRIDE[@]}" 2>&1 \| tee "$LOG"`