By: Francesco Sclano

Francesco Sclano — Mon, 20 Nov 2006 22:11:52 +0000

Hi everybody!
TermExtractor, my master thesis, is online at the
address http://lcl2.di.uniroma1.it.

TermExtractor is a FREE and high-performing software package for Terminology
Extraction. The software helps a web community to
extract and validate relevant domain terms in their
interest domain, by submitting an archive of
domain-related documents in any format
(txt, pdf, ps, dvi, tex, doc, rtf, ppt, xls, xml,
html/htm, chm, wpd and also zip archives.)

TermExtractor extracts terminology consensually
referred in a specific application domain. The
software takes as input a corpus of domain documents,
parses the documents, and extracts a list of
“syntactically plausible” terms (e.g. compounds,
adjective-nouns, etc.).
Documents parsing assigns a greater importance
to terms with text layouts (title, bold, italic,
underlined, etc.). Two entropy-based measures, called
Domain Relevance and Domain Consensus, are then used.
Domain Consensus is used to select only the terms
which are consensually referred throughout the corpus
documents. Domain Relevance to select only the terms
which are relevant to the domain of interest, Domain
Relevance is computed with reference to a set of
contrastive terminologies from different domains.
Finally, extracted terms are further filtered using
Lexical Cohesion, that measures the degree of
association of all the words in a terminological
string.

—
Francesco Sclano
home page: http://lcl2.di.uniroma1.it/~sclano
msn: francesco_sclano@yahoo.it
skype: francesco978

Comments on: Site and feed changes coming soon

By: Francesco Sclano