idlak相关记录
数据准备
idlak_make_lang mode 0
idlak-voice-build/utils/idlak_make_lang.py --mode 0 /home/sooda/speech/idlak/egs/tts_dnn_arctic/s1/data/full/text_norm.xml /home/sooda/speech/idlak/egs/tts_dnn_arctic/s1/data/full /home/sooda/speech/idlak/egs/tts_dnn_arctic/s1/data/local/dict1
这个步骤的结果,在dict1文件夹下生成以下文件:
characters.txt extra_questions.txt lexicon.txt nonsilence_phones.txt oov.txt optional_silence.txt silence_phones.txt
构造fst语言模型
utils/prepare_lang.sh --num-nonsil-states 5 --share-silence-phones true $dict "<OOV>" data/local/lang_tmp $lang
同时生成lexiconp.txt. lexiconp.txt是带概率的lexicon.txt
生成带对齐信息的xml
idlak_make_lang mode 1
python $KALDI_ROOT/idlak-voice-build/utils/idlak_make_lang.py --mode 1 "2:0.03,3:0.2" "4" $ali/phones.txt $ali/wrdalign.dat data/$step/text_align.xml
对应命令为:
python /home/sooda/speech/idlak/idlak-voice-build/utils/idlak_make_lang.py --mode 1 2:0.03,3:0.2 4 exp-align/quin_ali_full/phones.txt exp-align/quin_ali_full/wrdalign.dat data/full/text_align.xml
最后一个参数是输出。其他是输入。
对齐结果phone.txt的格式为:
slt_arctic_a0001 sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp ao1_B ao1_B ao1_B ao1_B ao1_B ao1_B ao1_B ao1_B ao1_B ao1_B ao1_B ao1_B ao1_B ao1_B ao1_B ao1_B ao1_B ao1_B ao1_B ao1_B ao1_B ao1_B ao1_B ao1_B ao1_B ao1_B ao1_B ao1_B th_I th_I th_I th_I th_I th_I th_I th_I th_I th_I th_I th_I th_I th_I th_I th_I th_I th_I th_I th_I th_I th_I er0_E er0_E er0_E er0_E er0_E er0_E er0_E er0_E er0_E er0_E er0_E er0_E er0_E er0_E er0_E er0_E er0_E er0_E er0_E er0_E er0_E er0_E er0_E er0_E er0_E er0_E er0_E er0_E er0_E er0_E er0_E er0_E er0_E er0_E er0_E ah1_B ah1_B ah1_B ah1_B ah1_B ah1_B ah1_B ah1_B ah1_B v_E v_E v_E v_E v_E v_E v_E v_E v_E v_E v_E v_E v_E v_E v_E v_E v_E v_E v_E v_E v_E dh_B dh_B dh_B dh_B dh_B dh_B dh_B dh_B ah0_E ah0_E ah0_E ah0_E ah0_E ah0_E ah0_E ah0_E d_B d_B d_B d_B d_B d_B d_B d_B d_B d_B d_B d_B d_B d_B d_B d_B d_B ey1_I ey1_I ey1_I ey1_I ey1_I ey1_I ey1_I ey1_I ey1_I ey1_I ey1_I ey1_I ey1_I ey1_I ey1_I ey1_I ey1_I ey1_I ey1_I ey1_I ey1_I ey1_I n_I n_I n_I n_I n_I n_I n_I n_I n_I n_I n_I n_I n_I jh_I jh_I jh_I jh_I jh_I jh_I jh_I jh_I jh_I er0_E er0_E er0_E er0_E er0_E er0_E er0_E er0_E er0_E er0_E er0_E er0_E er0_E t_B t_B t_B t_B t_B t_B t_B t_B t_B t_B t_B t_B t_B t_B t_B t_B t_B t_B t_B t_B t_B t_B t_B r_I r_I r_I r_I r_I r_I r_I r_I ey1_I ey1_I ey1_I ey1_I ey1_I ey1_I ey1_I ey1_I ey1_I ey1_I ey1_I ey1_I ey1_I ey1_I ey1_I ey1_I ey1_I ey1_I ey1_I ey1_I ey1_I ey1_I l_E l_E l_E l_E l_E l_E l_E l_E l_E l_E l_E l_E l_E l_E l_E l_E l_E l_E l_E l_E l_E l_E l_E l_E l_E l_E l_E l_E l_E l_E l_E l_E l_E l_E l_E f_B f_B f_B f_B f_B f_B f_B f_B f_B f_B f_B f_B f_B f_B f_B f_B f_B f_B f_B f_B f_B f_B f_B f_B f_B f_B f_B f_B f_B f_B f_B ih1_I ih1_I ih1_I ih1_I ih1_I l_I l_I l_I l_I l_I l_I l_I l_I l_I l_I l_I l_I l_I l_I l_I l_I l_I l_I l_I l_I ah0_I ah0_I ah0_I ah0_I ah0_I p_E p_E p_E p_E p_E p_E p_E p_E p_E p_E p_E p_E p_E s_B s_B s_B s_B s_B s_B s_B s_B s_B s_B s_B s_B s_B s_B s_B s_B s_B s_B s_B s_B s_B t_I t_I t_I t_I t_I t_I t_I t_I t_I t_I iy1_I iy1_I iy1_I iy1_I iy1_I iy1_I iy1_I iy1_I iy1_I iy1_I iy1_I iy1_I iy1_I iy1_I iy1_I iy1_I iy1_I iy1_I iy1_I iy1_I iy1_I iy1_I l_I l_I l_I l_I l_I l_I l_I l_I l_I l_I l_I l_I l_I l_I l_I l_I l_I l_I l_I l_I l_I l_I l_I l_I l_I l_I l_I l_I l_I l_I l_I l_I z_E z_E z_E z_E z_E z_E z_E z_E z_E z_E z_E z_E z_E z_E z_E z_E z_E z_E z_E z_E z_E z_E z_E z_E sp sp sp sp sp sp sp sp sp sp eh2_B eh2_B eh2_B eh2_B eh2_B t_I t_I t_I t_I t_I t_I t_I t_I s_I s_I s_I s_I s_I s_I s_I s_I s_I s_I s_I s_I s_I s_I s_I s_I s_I s_I s_I s_I eh1_I eh1_I eh1_I eh1_I eh1_I eh1_I eh1_I eh1_I eh1_I eh1_I eh1_I eh1_I eh1_I eh1_I eh1_I eh1_I t_I t_I t_I t_I t_I t_I t_I t_I t_I er0_I er0_I er0_I er0_I er0_I er0_I er0_I er0_I er0_I er0_I er0_I er0_I er0_I er0_I er0_I er0_I er0_I er0_I er0_I er0_I er0_I er0_I ah0_E ah0_E ah0_E ah0_E ah0_E ah0_E ah0_E ah0_E ah0_E ah0_E ah0_E ah0_E ah0_E ah0_E ah0_E ah0_E ah0_E ah0_E ah0_E ah0_E ah0_E sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp
exp-align/quin_ali_full/wrdalign.dat给出每个单词的时间对齐信息
格式为:
slt_arctic_a0001 1 0.205 0.425 AUTHOR
slt_arctic_a0001 1 0.630 0.150 OF
slt_arctic_a0001 1 0.780 0.080 THE
slt_arctic_a0001 1 0.860 0.370 DANGER
slt_arctic_a0001 1 1.230 0.440 TRAIL
load_lablels之后,每个label的格式为:[0.205 0.34 ao_B]
生成context label
idlak_make_lang mode 2
idlaktxp --pretty --tpdb=$tpdb data/$step/text_align.xml data/$step/text_anorm.xml
idlakcex --pretty --cex-arch=default --tpdb=$tpdb data/$step/text_anorm.xml data/$step/text_afull.xml
python $KALDI_ROOT/idlak-voice-build/utils/idlak_make_lang.py --mode 2 data/$step/text_afull.xml data/$step/cex.ark > data/$step/cex_output_dump
对应为:
idlaktxp --pretty --tpdb=/home/sooda/speech/idlak/idlak-data/en/ga/ data/full/text_align.xml data/full/text_anorm.xml
idlakcex --pretty --cex-arch=default --tpdb=/home/sooda/speech/idlak/idlak-data/en/ga/ data/full/text_anorm.xml data/full/text_afull.xml
python /home/sooda/speech/idlak/idlak-voice-build/utils/idlak_make_lang.py --mode 2 data/full/text_afull.xml data/full/cex.ark > data/full/cex_output_dump
mlpg mlsa
在训练的时候没有对cmp进行加窗处理,在预测时候,获得cmp值,对齐加窗,使用mlpg获得实际值。这个操作就是传说中的smooth
窗口值是二进制的,可以使用~/speech/marytts/lib/external/bin/x2x +fa mcep_d2.win
来查看窗口具体值。(x2x +fa可以将二进制文件转化为非二进制)
delta:
-0.2
-0.1
0
0.1
0.2
delta-delta
0.285714
-0.142857
-0.285714
-0.142857
0.285714
对应hts的窗口值分别是:
delta:
-0.5 0.0 0.5
delta-delta:
1.0 -2.0 1.0
delta计算
kaldi中通过featbin/add-deltas进行加窗处理
-0.2
-0.1
0
0.1
0.2
0.04
0.04
0.01
-0.04
-0.1
-0.04
0.01
0.04
0.04