Implementation

The tool has been implemented as an add-on for the most used browsers rigth now.

Once installed, the user can browse on the Internet as usual. Then, when the user wants to extract the reviews section of a webpage, he/she only needs to press the "Extract Reviews" button and the tool automatically will extract the reviews section. The reviews section is then displayed in the browser as any other webpage.

The source code of the plugin is public (but the use of this source code needs permission from the authors). If you want to use this code in any form, please contact the authors.

It constains the following files:

  • manifest.json: Identifies the plugin (within the browser) and specifies the internal organization of the plugin. It also specifies the interface and the permissions of the extension.
  • background-scripts: This script is loaded as soon as the extension is loaded. This script will be loaded until the user disables the extension. It contains listeners that interact with the user's actions.
  • content-scripts: These scripts allows us to access to the internal structure of a given webpage. In other words, we can read and manipulate the DOM tree.

The core algorithm of the technique is implemented with JavaScript. It is composed of:

  • content-scripts/RevEx/TemplateExtractor.js
  • content-scripts/RevEx/site/Website.js
  • content-scripts/RevEx/site/Webpage.js
  • content-scripts/RevEx/algorithm/RevEx/RevEx.js
  • content-scripts/RevEx/algorithm/RevEx/Config.js
  • content-scripts/RevEx/algorithm/RevEx/Content.js

And we have only one background-script:

  • background-scripts/background.js