- computing variance floors
- initialization & reestimation
- making monophone mmf
- embed reestimate monophone (monephone)
- copying monophone mmf to fullcontext
- embeded reestimate (fullcontext)
- tree-based context clustering
- embeded reestimate (clustered)
- untying the parameter sharing structure
- embedded reestimation (untied)
- tree-based context clustering
- embedded reestimation (re-clustered)
- forced alignment
- making global variance
- making unseen models (GV)
- making unseen models (1mix)
- generating speech parameter sequences (1mix)
- synthesizing waveforms (1mix)
- converting mmfs to the hts_engine file format
- synthesizing waveforms using hts_engine
- ps
以下是对Training.pl的解析。相对路径的根目录在train_data/hts
下
computing variance floors
HCompV -A -C configs/qst001/ver3/trn.cnf -D -T 1 -S data/scp/train.scp -m -M models/qst001/ver3/cmp -o models/qst001/ver3/cmp/average.mmf proto/qst001/ver3/state-5_stream-5_mgc-105_lf0-3_str-15.prt
head -n 1 proto/qst001/ver3/state-5_stream-5_mgc-105_lf0-3_str-15.prt > models/qst001/ver3/cmp/init.mmf
cat models/qst001/ver3/cmp/vFloors » models/qst001/ver3/cmp/init.mmf
initialization & reestimation
HInit -A -C configs/qst001/ver3/trn.cnf -D -T 1 -S data/scp/train.scp -m 1 -u tmvw -w 5000 -H models/qst001/ver3/cmp/init.mmf -M models/qst001/ver3/cmp/HInit -I data/labels/mono.mlf -l li -o li proto/qst001/ver3/state-5_stream-5_mgc-105_lf0-3_str-15.prt
HRest -A -C configs/qst001/ver3/trn.cnf -D -T 1 -S data/scp/train.scp -m 1 -u tmvw -w 5000 -H models/qst001/ver3/cmp/init.mmf -M models/qst001/ver3/cmp/HRest -I data/labels/mono.mlf -l li -g models/qst001/ver3/dur/HRest/li models/qst001/ver3/cmp/HInit/li
并行化
~/speech/marytts/lib/external/bin/HERest -A -C configs/qst001/ver3/trn.cnf -D -T 1 -S data/scp/train.scp.part05 -I data/labels/mono.mlf -m 1 -u tmvwdmv -w 5000 -t 1500 100 5000 -p 6 -H models/qst001/ver3/cmp/monophone.mmf -N models/qst001/ver3/dur/monophone.mmf -M models/qst001/ver3/cmp -R models/qst001/ver3/dur data/lists/mono.list data/lists/mono.list
~/speech/marytts/lib/external/bin/HERest -p 0 -A -C configs/qst001/ver3/trn.cnf -D -T 1 -m 1 -u tmvwdmv -w 5000 -t 1500 100 5000 -H models/qst001/ver3/cmp/monophone.mmf -N models/qst001/ver3/dur/monophone.mmf -M models/qst001/ver3/cmp -R models/qst001/ver3/dur data/lists/mono.list data/lists/mono.list models/qst001/ver3/cmp/HER1.hmm.acc models/qst001/ver3/cmp/HER1.dur.acc models/qst001/ver3/cmp/HER2.hmm.acc models/qst001/ver3/cmp/HER2.dur.acc models/qst001/ver3/cmp/HER3.hmm.acc models/qst001/ver3/cmp/HER3.dur.acc models/qst001/ver3/cmp/HER4.hmm.acc models/qst001/ver3/cmp/HER4.dur.acc models/qst001/ver3/cmp/HER5.hmm.acc models/qst001/ver3/cmp/HER5.dur.acc models/qst001/ver3/cmp/HER6.hmm.acc models/qst001/ver3/cmp/HER6.dur.acc
making monophone mmf
HHEd -A -C configs/qst001/ver3/trn.cnf -D -T 1 -p -i -d models/qst001/ver3/cmp/HRest -w models/qst001/ver3/cmp/monophone.mmf edfiles/qst001/ver3/cmp/lvf.hed data/lists/mono.list
embed reestimate monophone (monephone)
HERest -A -C configs/qst001/ver3/trn.cnf -D -T 1 -S data/scp/train.scp -I data/labels/mono.mlf -m 1 -u tmvwdmv -w 5000 -t 1500 100 5000 -H models/qst001/ver3/cmp/monophone.mmf -N models/qst001/ver3/dur/monophone.mmf -M models/qst001/ver3/cmp -R models/qst001/ver3/dur data/lists/mono.list data/lists/mono.list
copying monophone mmf to fullcontext
HHEd -A -C configs/qst001/ver3/trn.cnf -D -T 1 -p -i -H models/qst001/ver3/cmp/monophone.mmf -w models/qst001/ver3/cmp/fullcontext.mmf edfiles/qst001/ver3/cmp/m2f.hed data/lists/mono.list
embeded reestimate (fullcontext)
HERest -A -C configs/qst001/ver3/trn.cnf -D -T 1 -S data/scp/train.scp -I data/labels/full.mlf -m 1 -u tmvwdmv -w 5000 -t 1500 100 5000 -H models/qst001/ver3/cmp/fullcontext.mmf -N models/qst001/ver3/dur/fullcontext.mmf -M models/qst001/ver3/cmp -R models/qst001/ver3/dur -C configs/qst001/ver3/nvf.cnf -s stats/qst001/ver3/cmp.stats -w 0.0 data/lists/full.list data/lists/full.list
tree-based context clustering
cp models/qst001/ver3/cmp/fullcontext.mmf models/qst001/ver3/cmp/clustered.mmf
HHEd -A -C configs/qst001/ver3/trn.cnf -D -T 1 -p -i -C configs/qst001/ver3/mgc.cnf -H models/qst001/ver3/cmp/clustered.mmf -m -a 1.0 -w models/qst001/ver3/cmp/clustered.mmf edfiles/qst001/ver3/cmp/cxc_mgc.hed data/lists/full.list
embeded reestimate (clustered)
HERest -A -C configs/qst001/ver3/trn.cnf -D -T 1 -S data/scp/train.scp -I data/labels/full.mlf -m 1 -u tmvwdmv -w 5000 -t 1500 100 5000 -H models/qst001/ver3/cmp/clustered.mmf -N models/qst001/ver3/dur/clustered.mmf -M models/qst001/ver3/cmp -R models/qst001/ver3/dur data/lists/full.list data/lists/full.list
untying the parameter sharing structure
HHEd -A -C configs/qst001/ver3/trn.cnf -D -T 1 -p -i -H models/qst001/ver3/cmp/clustered.mmf -w models/qst001/ver3/cmp/untied.mmf edfiles/qst001/ver3/cmp/unt.hed data/lists/full.list
embedded reestimation (untied)
HERest -A -C configs/qst001/ver3/trn.cnf -D -T 1 -S data/scp/train.scp -I data/labels/full.mlf -m 1 -u tmvwdmv -w 5000 -t 1500 100 5000 -H models/qst001/ver3/cmp/untied.mmf -N models/qst001/ver3/dur/untied.mmf -M models/qst001/ver3/cmp -R models/qst001/ver3/dur -C configs/qst001/ver3/nvf.cnf -s stats/qst001/ver3/cmp.stats.untied -w 0.0 data/lists/full.list data/lists/full.list
tree-based context clustering
cp models/qst001/ver3/cmp/untied.mmf.mmf models/qst001/ver3/cmp/re_clustered.mmf
HHEd -A -C configs/qst001/ver3/trn.cnf -D -T 1 -p -i -C configs/qst001/ver3/mgc.cnf -H models/qst001/ver3/cmp/re_clustered.mmf -m -a 1.0 -w models/qst001/ver3/cmp/re_clustered.mmf edfiles/qst001/ver3/cmp/cxc_mgc.hed.untied data/lists/full.list
embedded reestimation (re-clustered)
HERest -A -C configs/qst001/ver3/trn.cnf -D -T 1 -S data/scp/train.scp -I data/labels/full.mlf -m 1 -u tmvwdmv -w 5000 -t 1500 100 5000 -H models/qst001/ver3/cmp/re_clustered.mmf -N models/qst001/ver3/dur/re_clustered.mmf -M models/qst001/ver3/cmp -R models/qst001/ver3/dur data/lists/full.list data/lists/full.list
forced alignment
HSMMAlign -A -C configs/qst001/ver3/trn.cnf -D -T 1 -S data/scp/train.scp -I data/labels/mono.mlf -t 1500 100 5000 -w 1.0 -H models/qst001/ver3/cmp/monophone.mmf -N models/qst001/ver3/dur/monophone.mmf -m gv/qst001/ver3/fal data/lists/mono.list data/lists/mono.list
making global variance
- make_proto_gv();
- make_data_gv();
- making average model
HCompV -A -C configs/qst001/ver3/trn.cnf -D -T 1 -S gv/qst001/ver3/gv.scp -m -o gv/qst001/ver3/average.mmf -M gv/qst001/ver3 gv/qst001/ver3/state-1_stream-3_mgc-35_lf0-1_str-5.prt - make full context depdent model
HERest -A -C configs/qst001/ver3/trn.cnf -D -T 1 -S gv/qst001/ver3/gv.scp -I gv/qst001/ver3/gv.mlf -m 1 -C configs/qst001/ver3/nvf.cnf -s gv/qst001/ver3/gv.stats -w 0.0 -H gv/qst001/ver3/fullcontext.mmf -M gv/qst001/ver3 gv/qst001/ver3/gv.list - context clustering
cp gv/qst001/ver3/fullcontext.mmf gv/qst001/ver3/clustered.mmf
HHEd -A -C configs/qst001/ver3/trn.cnf -D -T 1 -p -i -H gv/qst001/ver3/clustered.mmf -m -a 1.0 -w gv/qst001/ver3/clustered.mmf gv/qst001/ver3/cxc_mgc.hed gv/qst001/ver3/gv.list - re-estimate
HERest -A -C configs/qst001/ver3/trn.cnf -D -T 1 -S gv/qst001/ver3/gv.scp -I gv/qst001/ver3/gv.mlf -m 1 -H gv/qst001/ver3/clustered.mmf -M gv/qst001/ver3 gv/qst001/ver3/gv.list
making unseen models (GV)
HHEd -A -C configs/qst001/ver3/trn.cnf -D -T 1 -p -i -H gv/qst001/ver3/clustered.mmf -w gv/qst001/ver3/clustered_all.mmf gv/qst001/ver3/mku.hed gv/qst001/ver3/gv.list
making unseen models (1mix)
HHEd -A -C configs/qst001/ver3/trn.cnf -D -T 1 -p -i -H models/qst001/ver3/cmp/re_clustered.mmf -w models/qst001/ver3/cmp/re_clustered_all.mmf.1mix edfiles/qst001/ver3/cmp/mku.hed data/lists/full.list
generating speech parameter sequences (1mix)
HMGenS -A -C configs/qst001/ver3/syn.cnf -D -T 1 -t 1500 100 5000 -S data/scp/gen.scp -c 0 -H models/qst001/ver3/cmp/re_clustered_all.mmf.1mix -N models/qst001/ver3/dur/re_clustered_all.mmf.1mix -M gen/qst001/ver3/1mix/0 models/qst001/ver3/cmp/tiedlist models/qst001/ver3/dur/tiedlist
hmm,gv合成的参数有:weight,inc等。 对于每个流的每一维特征使用牛顿法进行优化
Generating Label gen_cslt_00001.lab
Total number of frames = 1529
PdfStream[1]: 1529 frames
PdfStream[2]: 948 frames
PdfStream[3]: 1529 frames
Parameter generation considering global variance (GV)
Optimization=NEWTON step size (init=1.000, inc=1.20, dec=0.50)
HMM weight=1.00, GV weight=1.00
Stream: 1
Feature: 1
Iteration 1: GV Obj = 4.069850e+01 (HMM:4.069850e+01 GV:-4.091874e-30)
Iteration 2: GV Obj = 4.069933e+01 (HMM:4.069939e+01 GV:-6.803006e-05) Change = 0.000825
...
Iteration 20: GV Obj = 4.070425e+01 (HMM:4.070450e+01 GV:-2.517048e-04) Change = 0.000209
Iteration 21: GV Obj = 4.070434e+01 (HMM:4.070459e+01 GV:-2.497188e-04) Change = 0.000092
Converged (norm=1.989407e-01, change=9.183336e-05).
Feature: 2
Iteration 1: GV Obj = 1.615898e+01 (HMM:1.615898e+01 GV:-1.044688e-29)
Iteration 2: GV Obj = 1.615910e+01 (HMM:1.615915e+01 GV:-4.773108e-05) Change = 0.000121
Iteration 3: GV Obj = 1.615915e+01 (HMM:1.615921e+01 GV:-5.514864e-05) Change = 0.000054
synthesizing waveforms (1mix)
using SPTK to generate wav
converting mmfs to the hts_engine file format
HHEd -A -C configs/qst001/ver3/cnv.cnf -D -T 1 -p -i -H models/qst001/ver3/cmp/re_clustered.mmf edfiles/qst001/ver3/cmp/cnv_mgc.hed data/lists/full.list
mv trees/qst001/ver3/cmp/trees.1 voices/qst001/ver3/tree-mgc.inf
mv models/qst001/ver3/cmp/pdf.2 voices/qst001/ver3/lf0.pdf
GV:
HHEd -A -C configs/qst001/ver3/cnv.cnf -D -T 1 -p -i -H gv/qst001/ver3/clustered.mmf gv/qst001/ver3/cnv_mgc.hed gv/qst001/ver3/gv.list
mv gv/qst001/ver3/trees.1 voices/qst001/ver3/tree-gv-mgc.inf
mv gv/qst001/ver3/pdf.1 voices/qst001/ver3/gv-mgc.pdf
synthesizing waveforms using hts_engine
using hts_engine to generate wav
ps
-
mlf: Master Label File
-
将这里的-B去掉,从而获得非二进制文件
-
context hmm名:
-
问题集: