, , . , HAL 9000 . « 2001 ». 1976 , «»
, (.
"Hearing lips and seeing voices", Nature 264, 746-748, 23 December 1976, doi: 10.1038/264746a0).
— . , (, ), . . . , .
. , , , , .. , ,
HAL 9000.
, . , , , .
— . , . c
( ) . , . , .
17±12% 30 21±11% ( ).
— , . , , - . . , . , , .
.
LipNet , , .
"please" () "lay" () , , ()LipNet — LSTM (long short-term memory). . (Connectionist Temporal Classification, CTC), , , .
LipNet. T, - () (STCNN), . (), LTSM. LTSM SoftMaxGRID 93,4%. ( ), .
| | | | |
---|
Fu et al. (2008) | AVICAR | 851 | | 37,9% |
Zhao et al. (2009) | AVLetter | 78 | | 43,5% |
Papandreou et al. (2009) | CUAVE | 1800 | | 83,0% |
Chung & Zisserman (2016a) | OuluVS1 | 200 | | 91,4% |
Chung & Zisserman (2016b) | OuluVS2 | 520 | | 94,1% |
Chung & Zisserman (2016a) | BBC TV | >400000 | | 65,4% |
Wand et al. (2016) | GRID | 9000 | | 79,6% |
LipNet | GRID | 28853 | | 93,4% |
GRID :
command(4) + color(4) + preposition(4) + letter(25) + digit(10) + adverb(4),.
, 93,4% — - , . , . , .
LipNet .
ICLR 2017
4 2016 .