An Extensible Schema for Building Large Weakly-Labeled Semantic Corpora

English, S. Matthew

AL-Lisaniyyat
Volume 22, Numéro 2, Pages 18-22
2016-05-30

An Extensible Schema For Building Large Weakly-labeled Semantic Corpora

Abstract

In NLP data drives research, as evidenced by the frequency with which seminal works of database engineering such as The Penn Treebank have been employed as a basis for experimentation. Traditionally large-scale expertly annotated corpora are expensive and time consuming to produce. This paradigm drove researchers to adopt automated methods for generating labelled data with available tools such as Freebase, DBpedia, and the "infoboxes" found on Wikipedia pages. These knowledge bases have been, or are in the process of being, subsumed by Wikidata, an initiative to concentrate such disparate data repositories in an organized machine readable format. This resource is an important research tool. In this paper, we review our experience using Wikidata in constructing a large annotated corpus under distant supervision, moreover we make the materials, the code used to generate our annotations, freely available to all interested parties.

Keywords

Wikidata - Semantic Corpora -

Challenges In Building Corpora For Algerian Arabic From Cmc Content

Omari Mohammed . Bouhania Bachir .
pages 594-617.

The Medea Of Euripides And Seneca: A Female Monster Labeled A Greek Hero

Nabil Aziz Hamadi . Imene Sara Bellaha .
pages 193-204.

The Posterior Mean Approach To Determine The Mean Value Of Risk In The Case Of Heavily And Weakly Censored Data

Hamimes Ahmed . Benamirouche Rachid .
pages 129-142.

On The Static Scattering From Ternary Mixtures Of Two Weakly Charged Linear Homopolymers And The Corresponding Diblock Copolymer

Bensafi A . Enhamou M . Bouzina L . Khaldi S . Boussaid A .
pages 93-98.

الترجمة القانونية للقاعدة الدستورية من العربية إلىالترجمة القانونية للقاعدة الدستورية من العربية إلى الفرنسية: بين إشكالية المصطلح وصعوبة البحث عن معناها الدلالي المقصود (التعديل الدستوري لسنة 2020 أنموذجا) Legal Translation Of The Constitutional Rule From Arabic Into French : Between The Problematic Of The Term And The Difficulty Of Searching For Its Intended Semantic Meaning (the Constitutional Revision Of 2020 As A Model) الفرنسية: بين إشكالية المصطلح وصعوبة البحث عن معناها الدلالي المقصود (التعديل الدستوري لسنة 2020 أنموذجا) Legal Translation Of The Constitutional Rule From Arabic Into French : Between The Problematic Of The Term And The Difficulty Of Searching For Its Intended Semantic Meaning (the Constitutional Revision Of 2020 As A Model)

رمضاني فاطمة الزهراء .
ص 254-280.

An Extensible Schema For Building Large Weakly-labeled Semantic Corpora

Abstract

Keywords

Les articles similaires

Formats de citation