Releases · microlinkhq/metascraper

10 Jan 18:32

Kikobeats

v4.9.0

ca32573

v4.9.0

Remove `sanitize-html`

The dependency is introducing a bug related to malformed URLs: apostrophecms/sanitize-html#274

In fact, I detected it's no longer necessary since htmlparser2 is present as part of cheerio load method.

Result: Smaller bundler, less parsing time.

Setup CSS Insensitive Rules

One of the things related to sanitize-html was normalized some common things around the HTML markup.

Because this dependency is no more dependency and after discovering that CSS rules can be insensitive, I enabled it properly in where is possible.

Result: Better data detection, less initial parsing time.

Improve Date Rules

Based on the insensitive CSS rules improvement, I was re-checking the bundle set related to metascraper-date.

I detected some interesting improvement opportunities: some rules can be merged into the same, also being possible to convert some rules into more generic, improving the data accurately.

Also, I tried to prioritize update over create, so the output is more associated with the last modification date over the creation date.

Result: Better date accurate, more value detected.

Improve URL detection

The URL detection has been improved for being possible detected more kind of URLs.

An URL is a subtype of URI. The thing that I want to be sure is detecting as much data as possible.

Now the metascraper-helpers related with urls being possible detected URIs, such data image URI encoded on base64 or magnet URIs.

The challenge here is doing that while we still support original functionality. I added a lot of tests to ensure about that.

Result: Better URLs detection, supporting URIs.

Assets 2

26 Oct 18:02

Kikobeats

v4.6.0

0ef7ad5

v4.6.0

Features

get language from twitter payload (#129) (80d5ddf)

Assets 2

24 Aug 08:25

Kikobeats

v4.0.0

7901fb6

v4.0.0

Breaking Changes

The autoload feature has been removed.

Now rules bundles need to be loaded explicitly:

const metascraper = require('metascraper')([
  require('metascraper-author')(),
  require('metascraper-date')(),
  require('metascraper-description')(),
  require('metascraper-image')(),
  require('metascraper-logo')(),
  require('metascraper-clearbit-logo')(),
  require('metascraper-publisher')(),
  require('metascraper-title')(),
  require('metascraper-url')()
])

Migration guide

If you are using metrascraper.load
Just rename it to metascraper. The .load method is now the main exported function.

If you ar using metascraper autoload
Replace it with the snippet code on top. It's loading the defaults rules bundles present in v3.

Assets 2

19 Dec 14:13

Kikobeats

v3.2.0

7cead99

v3.2.0

Add amazon metascraper.
Simplify rules interface.
Improve documentation.

Assets 2

11 Dec 13:37

Kikobeats

2.0.0

3a81306

2.0.0

Breaking Changes

From now, metascraper will be the main method and you need to pass html and url for extracting metadata.

const metascraper = require('metascraper')
const got = require('got')

const targetUrl = 'http://www.bloomberg.com/news/articles/2016-05-24/as-zenefits-stumbles-gusto-goes-head-on-by-selling-insurance'

;(async () => {
  const {body: html, url} = await got(targetUrl)
  const metadata = await metascraper({html, url})
  console.log(metadata)
})()

We moved the HTTP layout out of the library to avoid problems related to the connections.

Also in this new interface rules are not exposed directly.

Features

`logo` data field

We added a new field logo for identifying the publisher brand under a link. It uses the high resolution favicon possible to get as a fallback.

Improvements

Codebase simplification

We rewrote the code to make easy support plugins in the future.

Testing environment

We updated integration tests, with at least top50 popular internet sites. Also, they are automated, so add a new test is easy.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove `sanitize-html`

Setup CSS Insensitive Rules

Improve Date Rules

Improve URL detection

Features

Breaking Changes

Migration guide

Breaking Changes

Features

`logo` data field

Improvements

Codebase simplification

Testing environment

Releases: microlinkhq/metascraper

v4.9.0

Remove sanitize-html

Setup CSS Insensitive Rules

Improve Date Rules

Improve URL detection

v4.6.0

Features

v4.0.0

Breaking Changes

Migration guide

v3.2.0

2.0.0

Breaking Changes

Features

logo data field

Improvements

Codebase simplification

Testing environment

Remove `sanitize-html`

`logo` data field