Automatic generation of the timeline — a graphical representation of a time period, on which important events are marked — from a Wikipedia article is a fascinating idea and very useful in quickly grasping the historical perspective. This post outlines the approach to create a well formatted timeline from any Wikipedia article using WinkNLP’s API and Named Entity Recognition (NER) feature:
- Fetch the article's contents and convert them into a WinkNLP document.
- Iterate through detected entities and filter only DATEs.
- Use shapes of dates to convert them into standard Unix time.
- Using parentSentence() API, extract the sentence containing the date; also markup() the date to highlight it in the corresponding sentence.
- Collect each Unix time and sentence pair in an array and sort them on Unix time.
- Converts this array into a well formatted timeline using Observable capabilities along with some CSS.
The above approach is realized in about 30 lines of code:
timeLine = { const response = await fetch( `https://en.wikipedia.org/w/api.php?action=query&prop=extracts&titles=${WikiArticleTitle || '2022 United Nations Climate Change Conference'}&explaintext=1&formatversion=2&format=json&origin=*` ); const body = await response.json(); const text = body.query.pages[ 0 ].extract; var doc = nlp.readDoc( text || '' ); var timeline = []; doc .entities() .filter( ( e ) => { var shapes = e.tokens().out( its.shape ); // We only want dates that can be converted to an actual // time using new Date() return ( e.out( its.type ) === 'DATE' && ( shapes[ 0 ] === 'dddd' || ( shapes[ 0 ] === 'Xxxxx' && shapes[ 1 ] === 'dddd' ) || ( shapes[ 0 ] === 'Xxxx' && shapes[ 1 ] === 'dddd' ) || ( shapes[ 0 ] === 'dd' && shapes[ 1 ] === 'Xxxxx' && shapes[ 2 ] === 'dddd' ) || ( shapes[ 0 ] === 'dd' && shapes[ 1 ] === 'Xxxx' && shapes[ 2 ] === 'dddd' ) || ( shapes[ 0 ] === 'd' && shapes[ 1 ] === 'Xxxxx' && shapes[ 2 ] === 'dddd' ) || ( shapes[ 0 ] === 'd' && shapes[ 1 ] === 'Xxxx' && shapes[ 2 ] === 'dddd' ) ) ); }) .each( ( e ) => { e.markup(); let eventDate = e.out(); if ( isNaN( eventDate[ 0 ] ) ) eventDate = '1 ' + eventDate; timeline.push({ date: e.out(), unixTime: new Date( eventDate ).getTime() / 1000, sentence: e.parentSentence().out( its.markedUpText ) }) }); return timeline.sort( ( a, b ) => a.unixTime - b.unixTime ) }
You can see it in action on an interactive Observable notebook — "How to visualize timeline of a Wiki article?".
About winkNLP
WinkNLP is a developer friendly JavaScript library for Natural Language Processing (NLP). It can easily process large amount of raw text at speeds over 650,000 tokens/second on a M1 Macbook Pro in both browser and Node.js environments. It even runs smoothly on a low-end smartphone's browser.
It is built ground up with a lean code base that has no external dependency. A test coverage of ~100% and compliance with the Open Source Security Foundation best practices make winkNLP the ideal tool for building production grade systems with confidence.
Top comments (0)