Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Travis VM Infrastructure #337

Closed
wants to merge 3 commits into from
Closed

Conversation

orangejulius
Copy link
Member

Building on the changes to Travis config in #336, this PR modifies our Elasticsearch setup to be compatible with VM based infrastructure. It's based heavily on the Travis Elasticsearch docs.

However, there currently appears to be an issue where for some reason, the Elasticsearch 5 builds think the Pelias index already exists. It almost looks as if the different builds are sharing data. However, that shouldn't be the case.

Since Travis has already started rolling out the switch to VM based builds, we have to fix this soon, and until then live with failing schema builds.

@missinglink any ideas?

Echelon9 and others added 2 commits December 2, 2018 12:55
Travis-CI has or will shortly make in early December 2018 a number of beneficial
changes to their Linux continuous integration testing infrastructure.

Changes that impact pelias/schema are:
* Linux infrastructure combined into one (virtualized), from two previously
  (virtualized and container-based). [0][1]
* Offering a more modern, supported Ubuntu Xenial (16.04 LTS). [2]
* Modest speed improvements from the fully virtualized-based infrastructure.

NOTE: Until openjdk/oraclejdk dependencies can be resolved on modern Ubuntu and
Travis-CI environment, keep the image at Ubuntu Trusty (14.04 LTS).

Projects using "sudo: false" (container-based infrastructure), have been
recommended to remove that configuration soon. In any case, the transition
will happen regardless for projects by December 7, 2018.

[0] https://blog.travis-ci.com/2018-10-04-combining-linux-infrastructures
[1] https://blog.travis-ci.com/2018-11-19-required-linux-infrastructure-migration
[2] https://docs.travis-ci.com/user/reference/xenial/
A hard switch-over is coming soon.
@missinglink
Copy link
Member

Hmm... Yeah that's super weird, I agree it looks like the VMs are sharing data between them.

I wonder if there is a new way of specifying the matrix config which enforces separation.

@missinglink
Copy link
Member

missinglink commented Dec 4, 2018

Agh.. so maybe this?

https://docs.travis-ci.com/user/caching/#caches-and-build-matrices

An easy test to see if it's caching related is to set cache: false at the top of the Travis config, then work from there.

@orangejulius
Copy link
Member Author

I don't think it's actually caching. If you look at the build output for a test i did, there are no indices returned from the cat Elasticsearch API. And it only happens for ES5. Maybe there is weird behavior in ES5 we don't understand yet.

@missinglink
Copy link
Member

This SO issue might be related, looks like "index already exists" might be returned in cases which are not strictly true and that additional error info might not be showing in our debug output?

https://stackoverflow.com/questions/46400502/elasticsearch-create-index-index-already-exists-exception

@orangejulius
Copy link
Member Author

orangejulius commented Dec 4, 2018

Okay, I figured it out. The duplicate index issue is a red-herring. It turns out Elasticsearch retries on index creation failure (as it does in all other requests, unless you set maxRetries: 0 in the elasticsearch-js config.

Running a debug travis build, I did so, and saw the real error:

 PUT http://localhost:9200/pelias => Parse Error
      at Log.error (/home/travis/build/geocodeearth/pelias-schema/node_modules/elasticsearch/src/lib/log.js:226:56)
      at checkRespForFailure (/home/travis/build/geocodeearth/pelias-schema/node_modules/elasticsearch/src/lib/transport.js:262:18)
      at HttpConnector.<anonymous> (/home/travis/build/geocodeearth/pelias-schema/node_modules/elasticsearch/src/lib/connectors/http.js:163:7)
      at ClientRequest.wrapper (/home/travis/build/geocodeearth/pelias-schema/node_modules/lodash/lodash.js:4935:19)
      at ClientRequest.emit (events.js:182:13)
      at Socket.socketOnData (_http_client.js:447:9)
      at Socket.emit (events.js:182:13)
      at addChunk (_stream_readable.js:283:12)
      at readableAddChunk (_stream_readable.js:264:11)
      at Socket.Readable.push (_stream_readable.js:219:10)

{ Error: Parse Error
    at Socket.socketOnData (_http_client.js:441:20)
    at Socket.emit (events.js:182:13)
    at addChunk (_stream_readable.js:283:12)
    at readableAddChunk (_stream_readable.js:264:11)
    at Socket.Readable.push (_stream_readable.js:219:10)
    at TCP.onStreamRead (internal/stream_base_commons.js:94:17) bytesParsed: 8308, code: 'HPE_HEADER_OVERFLOW' } '\n'

So what does that mean? Node.js has a (quite reasonable) limit set for maximum header length. Could Elasticsearch be sending back an invalid request that looks like lots of headers? I turned to curl to find out. I put our schema in a file and ran this script:

schema=`cat schema.json`

set -ex
curl -v -XPUT http://localhost:9200/pelias -d "$schema" -H'Content-Type: application/json'

The response is pretty...amazing:

* Hostname was NOT found in DNS cache
*   Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 9200 (#0)
> PUT /pelias HTTP/1.1
> User-Agent: curl/7.35.0
> Host: localhost:9200
> Accept: */*
> Content-Type: application/json
> Content-Length: 61064
> Expect: 100-continue
> 
< HTTP/1.1 100 Continue
< HTTP/1.1 200 OK
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [continent]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [country]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [macrocounty_a]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [locality_a]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [county]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [borough_a]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [borough_id]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [continent_a]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [borough]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [macroregion]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [region_a]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [localadmin]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [empire_a]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [dependency_id]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [macrocounty]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [marinearea_a]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [county_id]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [continent_id]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [postalcode_a]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [marinearea]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [neighbourhood]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [postalcode]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [empire]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [empire_id]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [marinearea_id]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [localadmin_id]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [macroregion_id]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [ocean_a]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [dependency]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [neighbourhood_id]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [country_a]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [macroregion_a]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [localadmin_a]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [region_id]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [locality]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [dependency_a]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [postalcode_id]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [locality_id]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [neighbourhood_a]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [ocean_id]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [ocean]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [region]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [macrocounty_id]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [country_id]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [county_a]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [bounding_box]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [zip]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [number]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [unit]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [street]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [name]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [source]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [layer]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [source_id]" "Tue, 04 Dec 2018 23:58:26 GMT"
< Warning: 299 Elasticsearch-5.6.12-cfe3d9f "The [string] field is deprecated, please use [text] or [keyword] instead on [category]" "Tue, 04 Dec 2018 23:58:26 GMT"
< content-type: application/json; charset=UTF-8
< content-length: 65
< 
* Connection #0 to host localhost left intact

It appears Elasticsearch decided it was a good idea to put deprecation warnings in the HTTP headers. We have a lot of them, which we can't remove until we drop ES2 support.

As far as I can tell, there is currently no way to disable these headers but there are lots of people complaining. I'll investigate why this only happens on TravisCI. But surely we can fix this somehow.

@missinglink
Copy link
Member

Face
Palm

@orangejulius
Copy link
Member Author

orangejulius commented Dec 5, 2018

Okay, I think I understand now. I don't believe this particular error is related to Travis VMs at all. I was able to reproduce the error locally on all versions of Node.js but the latest.

The 8192 byte header limit was added in the recent security fix Node.js releases, and it appears that Node.js is now adding some followup PRs to do things like allow configuring the limit on the command line, which would help us here.

There are actually new config parameters introduced to control deprecation warning headers, but they were only introduced in Elasticsearch 6.3 and don't appear to be backported to 5.

@orangejulius
Copy link
Member Author

I've updated this PR to change the Node.js versions to specify 8.13.0 and 10.13.0, the latest version in each release line without the header size limit, and everything now works!

I've opened elastic/elasticsearch#36243 to see if Elasticsearch would consider backporting the changes that can bring us to the latest Node.js version safely. If they do, we can use that version and at least set the defaults in our Docker images.

Due to a limit to header sizes in the latest security releases of
Node.js, combined with Elasticsearch's default of sending lots of
deprecation warning errors as headers, we need to use slightly older
versions of Node.js until either Elasticsearch offers more configuration
options, or Node.js releases a CLI option for the header limit.

See #337 for details
orangejulius added a commit to pelias/docker-baseimage that referenced this pull request Dec 16, 2018
Due to a conflict with large headers sent by Elasticsearch 5, Node.js
10.14.0, which introduces an 8k header limit, is not suitable for use
with pelias/schema.

Connects pelias/schema#337
orangejulius added a commit that referenced this pull request Dec 16, 2018
Due to a limit to header sizes in the latest security releases of
Node.js, combined with Elasticsearch's default of sending lots of
deprecation warning errors as headers, we need to use slightly older
versions of Node.js until either Elasticsearch offers more configuration
options, or Node.js releases a CLI option for the header limit.

See #337 for details
orangejulius added a commit that referenced this pull request Dec 16, 2018
This is to prevent conflicts between ES5 deprecation headers (which can
be quite large) and the Node.js 10.14.0+ header limit of 8kb.

See #337
@orangejulius
Copy link
Member Author

I realized that changing how we run our Travis tests is not required, only the Node.js version pinning is relevant, and opened #339 with just that change. This PR is no longer needed.

@orangejulius orangejulius deleted the travis-vm-support branch January 15, 2019 03:47
orangejulius added a commit that referenced this pull request Jul 5, 2019
This change makes our Elasticsearch schema compatible with Elasticsearch
5 and 6. It shouln't have any effect on performance or operation, but it
will completely drop compatibility for Elasticsearch 2.

The primary change is that Elasticsearch 5 introduces two types of text
fields: `text` and `keyword`, whereas Elasticsearch 2 only had 1:
`string`.

Roughly, a `text` field is for true full text search and a `keyword`
field is for simple values that are primarily used for filtering or
aggregation (for example, our `source` and `layer` fields). The `string` datatype previously filled both of those roles depending on
how it was configured.

Fortunately, we had already roughly created a concept similar to the
`keyword` datatype in our schema, but called it `literal`. This has been
renamed to `keyword` to cut down on the number of terms needed

One nice effect of this change is that it removes all deprecation
warnings printed by Elasticsearch 5. Notably, as discovered in
#337 (comment), these
warnings were quite noisy and required special handling to work around
Node.js header size restrictions. This special handling can now been
removed.

Fixes pelias/whosonfirst#457
Connects pelias/pelias#719
Connects pelias/pelias#461
orangejulius added a commit that referenced this pull request Jul 5, 2019
This change makes our Elasticsearch schema compatible with Elasticsearch
5 and 6. It shouldn't have any effect on performance or operation, but it
will completely drop compatibility for Elasticsearch 2.

The primary change is that Elasticsearch 5 introduces two types of text
fields: `text` and `keyword`, whereas Elasticsearch 2 only had 1:
`string`.

Roughly, a `text` field is for true full text search and a `keyword`
field is for simple values that are primarily used for filtering or
aggregation (for example, our `source` and `layer` fields). The `string` datatype previously filled both of those roles depending on
how it was configured.

Fortunately, we had already roughly created a concept similar to the
`keyword` datatype in our schema, but called it `literal`. This has been
renamed to `keyword` to cut down on the number of terms needed

One nice effect of this change is that it removes all deprecation
warnings printed by Elasticsearch 5. Notably, as discovered in
#337 (comment), these
warnings were quite noisy and required special handling to work around
Node.js header size restrictions. This special handling can now been
removed.

Fixes pelias/whosonfirst#457
Connects pelias/pelias#719
Connects pelias/pelias#461
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants