Startseite - Home

Katalog
Catalogue

eBooks

Verlage
Publishers

Startseite :: Home
Kontakt :: Contact
Über uns :: About us
Datenschutz :: Privacy Policy
Impressum :: Imprint
Kundeninformation

Linguistic Data Consortium (LDC) Corpora

Kontakt/Bestellung
Contact/Order:

Contact/Order: info@digento.de

DVD-ROM, Download

Inhalt :: Content

Das 1992 gegründete und von der University of Pennsylvania (USA) betriebene Linguistic Data Consortium (LDC) ist eine der weltweit wichtigsten Organisationen für die Bereitstellung von Sprachdaten. Das LDC bietet unterschiedliche Korpora in Hunderten von verschiedenen Sprachen und Dialekten an:

  • Text-Korpora: Sammlungen von Zeitungsartikeln, Blogs, Forenbeiträgen oder sonstigen Dokumenten.
  • Sprach- und Audio-Korpora: Aufnahmen von Telefonaten, Nachrichtensendungen oder Alltagsgesprächen – meistens direkt verknüpft mit Texttranskripten.
  • Parallele Korpora: Texte, die in mindestens zwei Sprachen exakt nebeneinander vorliegen (z. B. Parlamentsprotokolle der EU).
  • Annotierte Korpora: Daten, die von Linguisten manuell mit Zusatzinfos versehen wurden – zum Beispiel, indem jedes Wort mit seiner Wortart (Substantiv, Verb etc.) markiert wurde oder Emotionen in Audioaufnahmen bewertet wurden.


Zu den am häufigsten abgerufenen Sprachkorpora zählen: OntoNotes Release 5.0, TIMIT Acoustic-Phonetic Continuous Speech Corpus, Web 1T 5-gram Version 1, CELEX2, Treebank-3, TIDIGITS, Switchboard-1 Release 2, CSR-I (WSJ0) Complete, TAC Relation Extraction Dataset und ACE 2005 Multilingual Training Corpus.

Linguistic Data Consortium (LDC) Corpora


Verlag :: Publisher

Linguistic Data Consortium (LDC)

Preis :: Price

Preise auf Anfrage / Prices on request

Das Angebot richtet sich nicht an Verbraucher i. S. d. § 13 BGB und Letztverbraucher i. S. d. PAngV.

Bestellnummer bei digento :: digento order number

108889

Verlagsinformation :: Publisher's information

The Linguistic Data Consortium (LDC) is an open consortium of universities, libraries, corporations and government research laboratories. LDC was formed in 1992 to address the critical data shortage then facing language technology research and development. The Advanced Research Projects Agency provided seed funding for the Consortium and the National Science Foundation provided additional support via Grant IRI-9528587 from the Information and Intelligent Systems division.

Initially, LDC's primary role was as a repository and distribution point for language resources. Since that time, and with the help of its members, LDC has grown into an organization that creates and distributes a wide array of language resources. LDC also supports sponsored research programs and language-based technology evaluations by providing resources and contributing organizational expertise.

LDC is hosted by the University of Pennsylvania and is a center within the University’s School of Arts and Sciences. LDC’s connection with Penn provides a strong foundation for the Consortium’s research and outreach to an active and diverse member community.

LDC Overview

In the early 1990s, advances in human language technologies were accelerating thanks to better algorithms and the growing power of affordable computers. Researchers quickly realized, however, that they lacked the volume and variety of data necessary to build robust, portable and scalable systems. To meet this need, the Linguistic Data Consortium (LDC) was founded to make critical linguistic resources easier to acquire, preserve, and share.


The Consortium was established in 1992 following a call from ARPA (now DARPA) for an organization dedicated to linguistic data distribution. The successful proposal came from the University of Pennsylvania, an ideal host for the repository given its reputation in linguistics, computer science, and projects like the Penn Treebank.


The LDC Catalog was immediately populated with several cornerstone data sets donated by government and private sources including TIMIT, ATIS and Switchboard, which remain widely used by researchers today. Since then, the Catalog has continued to grow to more than 1,000 corpora developed by LDC and contributed by partners around the world.


Collaboration is central to LDC’s mission. The Consortium works with U.S. and international researchers, institutions, and data centers, and supports global initiatives like the Open Language Archives Community (OLAC) and the Language Grid. LDC’s work in sponsored programs includes consultation, needs analysis, task specification, data collection, development of annotation guidelines and software tools, management of multiple data providers and coordination with program sponsors, research performers and evaluation teams.

After more than three decades as the leader in language resource development and distribution, LDC continues its mission of providing large quantities of diverse data, research program support and high-quality member services. Human language technology development and its related fields are changing rapidly and need effective digital resource delivery, greater language coverage, new data genres, faster, cost-efficient annotation processes and flexible tools. The Consortium successfully meets those challenges and will continue to do so with the support of members, licensees, sponsors and collaborators.

Nach oben