Нейросеть LipNet читает по губам с точностью 93,4%


, , . , HAL 9000 . « 2001 »

. 1976 , «» , (. "Hearing lips and seeing voices", Nature 264, 746-748, 23 December 1976, doi: 10.1038/264746a0).

— . , (, ), . . . , .

. , , , , .. , , HAL 9000.

, . , , , .

— . , . c ( ) . , . , . 17±12% 30 21±11% ( ).

— , . , , - . . , . , , .

. LipNet , , .


"please" () "lay" () , , ()

LipNet — LSTM (long short-term memory). . (Connectionist Temporal Classification, CTC), , , .


LipNet. T, - () (STCNN), . (), LTSM. LTSM SoftMax

GRID 93,4%. ( ), .

Fu et al. (2008)AVICAR85137,9%
Zhao et al. (2009)AVLetter7843,5%
Papandreou et al. (2009)CUAVE180083,0%
Chung & Zisserman (2016a)OuluVS120091,4%
Chung & Zisserman (2016b)OuluVS252094,1%
Chung & Zisserman (2016a)BBC TV>40000065,4%
Wand et al. (2016)GRID900079,6%
LipNetGRID2885393,4%

GRID :

command(4) + color(4) + preposition(4) + letter(25) + digit(10) + adverb(4),

.

, 93,4% — - , . , . , .

LipNet .


ICLR 2017 4 2016 .

Source: https://habr.com/ru/post/fr398901/


All Articles