"Black Budget" -- FY 2013 Congressional Budget Justification/National Intelligence Program, pp. 360-364

TOP THIS PAGE INTENTIONALLY LEFT BLANK 360 TOP

Page 2 from “Black Budget” — FY 2013 Congressional Budget Justification/National Intelligence Program, pp. 360-364

TOP SECRET//SI/TK//NOFORN (U) RESEARCH & TECHNOLOGY (U) HUMAN LANGUAGE TECHNOLOGY RESEARCH This Exhibit is SECRET//NOFORN FY 20111 Actual FY 2012 Enacted Base Funding ($M) 26.4 31.0 Civilian FTE 8 8 Civilian Positions 8 Military Positions — OCO 3.0 FY 2013 Request Total Base OCO Total 29.4 Change -4.7 % Change 34.0 26.0 — 8 8 — 8 — — 8 — 8 8 — 8 — — — — — — — — — — 1 3.4 FY 2012 — FY 2013 -14 Totals may not add due to rounding. Includes enacted OCO funding. (U) Project Description (U//FOUO) The Human Language Technology (HLT) Research Project provides a coherent, concentrated focus on language analytics to exploit the volume, variety, and velocity of communications that the SIGINT system collects. HLT Research conducts research that supports the goals of the NSA/CSS' Analytic Modernization effort. This Project complements NSA/CSS initiatives to strengthen the language analyst workforce by providing the technologies that serve as force multipliers for analysts. (U//FOUO) The HLT Research Project has an HLT Center of Excellence (COE) at Johns Hopkins University to promote academic and industry interest in intelligence challenges and attract world-class talent to work on IC HLT problems. The HLT COE focuses on critical intelligence needs that are not adequately addressed by commercial technology or other government programs. The HLT Research Project also leverages programs at the Defense Advanced Research Projects Agency (DARPA) and the Intelligence Advanced Research Projects Activity (IARPA). DARPA and IARPA programs provide foundational HLT capabilities in automatic content extraction, speech-to-text, machine translation, summarization, and question answering. The HLT Research Project conducts research and advanced development necessary to bridge research results from DARPA’s and IARPA’s efforts to SIGINT applications. This Project includes the Human Language Technology Research SubProject. (U) Base resources in this project are used to: • (S//SI//REL TO USA, FVEY) Research and develop voice, text, video and image analytics to enable fundamental language exploitation capabilities for all types of communication, regardless of medium. • (S//SI//REL TO USA, FVEY) Increase the number of languages, accuracy, and speed of results for keyword search from machine-generated transformations of speech-to-text. • (S//SI//REL TO USA, FVEY) Conduct research and advanced development on automatic document image analysis, particularly for handwritten documents, an extreme technical challenge. The primary emphasis is on core capabilities to enable triage and keyword search on the diverse kinds of documents found in intercept, including language and script identification and handwritten document detection, segmentation, and analysis. • (U//FOUO) Research analytics that automatically analyze the linguistic content of communications. This area comprises several technologies, including content extraction and machine translation. Content analytics identifies and extracts information from language communications, turning a mass of unstructured text into usable metadata. TOP SECRET//SI/TK//NOFORN 361

Page 3 from “Black Budget” — FY 2013 Congressional Budget Justification/National Intelligence Program, pp. 360-364

TOP SECRET//SI/TK//NOFORN • (TS//SI//REL TO USA, FVEY) Research, design, and develop analytics that enable deployment of HLT capabilities nearer to the point of collection within the SIGINT system. • (U//FOUO) Support collaborative research into human language exploitation and machine learning with commercial and academic partners. • (U//FOUO) Develop test and training data to support scientific research and evaluation. • (U//FOUO) Provide and maintain a computer lab to support in-house algorithm development, evaluation, and proof-of-concept demonstrations of promising solutions. • (U//FOUO) Sustain support activities that foster cross-organizational and cross-discipline collaboration in solving hard technical problems critical to the success of NSA/CSS’ SIGINT and cyber missions as well technical health of the workforce. (U) There are no new activities in this Project in FY 2013. (U) OCO resources in this project are used to: • (TS//SI//REL TO USA, FVEY) Enable machine translation research and new speech processing capabilities for Afghanistan and Pakistan dialects using state-of-the-art research findings in less-common languages and by developing new language and dialect models. (U) The CCP expects this Project to accomplish the following in FY 2013: • (S//REL TO USA, FVEY) Develop and deploy speech-to-text models for additional languages, where the languages will be selected according to corporate NSA/CSS priorities, language analyst preparation, and scientific assessment of technology readiness. [CCP_0106] • (S//REL TO USA, FVEY) Extend name-finding solutions to support named-entity extraction for 12 additional languages, to include at least three languages that are less-commonly taught. Create and demonstrate solutions in three to five languages for the much harder problem of extracting relations between entities. These capabilities will yield automated solutions to uncover pertinent facts within both unstructured written communications and spoken communications that have been transformed into text. [CCP_0106] • (U//FOUO) Design techniques to reduce by 25 percent hand-annotated data required to develop models in support of speech-to-text solutions. [CCP_0106] • (S//REL TO USA, FVEY) Research, develop, and demonstrate solutions for cross-lingual entity disambiguation to enable analysts to perform language independent retrieval of communications to, from, or about persons of interest from multi-lingual SIGINT data sets. [CCP_0106] (U) Changes From FY 2012 to FY 2013: (S//NF) Human Language Technology Research: -$4.7 million (-$5.1 Base, +$0.4 OCO). The aggregate decrease is the result of: • (U) Increases: — (S//NF) $0.4 million in Overseas Contingency Operations (OCO) accelerates new speech processing capabilities and associated analyst applications for Afghanistan and Pakistan dialects. 362 TOP SECRET//SI/TK//NOFORN

Page 4 from “Black Budget” — FY 2013 Congressional Budget Justification/National Intelligence Program, pp. 360-364

TOP SECRET//SI/TK//NOFORN • (U) Decreases: — (S//NF) $5.0 million due to a FY 2012 Congressional add not sustained in FY 2013. — (S//NF) $0.1 million due to a planned programmatic reduction in travel and training. Human Language Technology Research Project Budget Chart FY 2013 Budget Request by Appropriation Account This Exhibit is SECRET//NOFORN Subproject Description Operation and Maintenance, Defense-Wide Human Language Technology Research Pay and Benefits Research, Development, Test, and Evaluation, Defense-Wide Communications and Utilities Contract Services Human Language Technology Research Funds — Dollars in Millions Resourcing FY 2011 FY 2012 FY 2013 Funds — — 1.12 Positions — — 8 Base — — 1.12 Positions — — 8 Funds 26.36 34.03 28.23 Positions 8 8 Base 0.06 0.04 0.04 Base 24.35 28.07 23.36 OCO — 3.00 3.40 1.36 Equipment Base 0.57 1.76 Pay and Benefits Base 1.20 1.09 Travel and Transportation Base 0.17 0.07 Positions 8 8 — — 0.07 — Totals may not add due to rounding. TOP SECRET//SI/TK//NOFORN 363

Page 5 from “Black Budget” — FY 2013 Congressional Budget Justification/National Intelligence Program, pp. 360-364

TOP THIS PAGE INTENTIONALLY LEFT BLANK 364 TOP

“Black Budget” — FY 2013 Congressional Budget Justification/National Intelligence Program, pp. 360-364

Enter your email to keep reading for free.

We’re independent of corporate interests. Will you join us?

Enter your email to keep reading for free.

No ads. No corporate BS. Skip the propaganda and donate to keep The Intercept going strong:

No ads. No corporate BS. Skip the propaganda and donate to keep The Intercept going strong:

We’re independent of corporate interests. Will you join us?