Automatic Information Extraction (IE) is a challenging task because it involves experts' skills and requires well developed Natural Language Processing (NLP) algorithms. Moreover, IE is domain dependent and context sensitive. In this research, we present a general learning approach that may be applied for different types of events. As a matter of fact, we observed that even if a natural language text containing a target event is apparently unstructured, it may contain a segment that we can map automatically into a structured form. Segments representing the same kind of events have a similar structure or pattern. Each pattern is composed of an ordered sequence of named entities, keywords and articulation words. Some generic named entities like organizations, persons, locations, dates, and grammatical annotations are generated by an automatic part of speech identification tool. During the learning step, each relevant segment is manually annotated with respect to the targeted entities (roles) structuring an event of the ontology. IE is processed by associating a role with a specific entity. By alignment of generic entities to specific entities, some strings of a text are automatically annotated. The alignment between patterns and a new text is not often guaranteed because of the writing styles diversity that may be detected in the news. For that reason, we have proposed soft matching between reduced formats with the objective of maximal utilization of pattern expressiveness. In several cases, this reduced format successfully allows the assignment of the same role to similar entities cited in the same side, with respect to some keywords or cue words. The experiment results are very promising since we've obtained 76.90 % as an average recognition rate.


Article metrics loading...

Loading full text...

Full text loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error