Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS: investigate 2 seconds delay before response is returned #150

Open
tripodsan opened this issue Mar 8, 2021 · 16 comments
Open

AWS: investigate 2 seconds delay before response is returned #150

tripodsan opened this issue Mar 8, 2021 · 16 comments

Comments

@tripodsan
Copy link
Contributor

tripodsan commented Mar 8, 2021

image

could be:

  • epsagon tracing
  • coralogix flushing
@tripodsan
Copy link
Contributor Author

it's probably a http request that is not traced. non of the functions in AWS show http traces, when helix-fetch is the
client.

/cc @ranrib

@ranrib
Copy link
Contributor

ranrib commented Mar 9, 2021

Hi @tripodsan - is this using http2 or 1? Are you using webpack or any other method to bundle the functions?

@tripodsan
Copy link
Contributor Author

Hi @tripodsan - is this using http2 or 1?

I think in this case we try to use http/2. but if the server supports it, i don't know

Are you using webpack or any other method to bundle the functions?

yes. we are using webpack. but no obfuscation / minification.

@stefan-guggisberg
Copy link
Contributor

Hi @tripodsan - is this using http2 or 1?

I think in this case we try to use http/2. but if the server supports it, i don't know

The protocol is negotiated via ALPN, i.e. we don't 'try' http/2. The negotiated protocol is most certainly http/2.

@ranrib
Copy link
Contributor

ranrib commented Mar 21, 2021

I've tried to re-create the following scenario:

const epsagon = require('epsagon');
const { fetch } = require('@adobe/helix-fetch');
epsagon.init({
  token: 'test',
  appName: 'Helix Test',
  metadataOnly: false,
});
async function test() {
    const resp = await fetch('https://httpbin.org/get');
    console.log(resp.ok);
}
const testFunction = epsagon.nodeWrapper(test);
testFunction()

And it works for me (I've verified that it's indeed HTTP/2). Do you have any idea or insight what is this different than your code?

@stefan-guggisberg
Copy link
Contributor

Hi @ranrib,
I tried your example but I am struggling to find the generated trace in the Epsagon UI. I used helix-fetch Test for appName but don't see that app on Epsagon. Pointers welcome.

@tripodsan
Copy link
Contributor Author

@stefan-guggisberg when you set process.env.EPSAGON_DEBUG='true'; so you see the trace sent to the epsagon server?

@stefan-guggisberg
Copy link
Contributor

@tripodsan yes

here's the full output (I also enabled helix-fetch debug output):

[EPSAGON] patching module: aws-sdk
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: aws-sdk
[EPSAGON] patching module: aws-sdk
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: aws-sdk
[EPSAGON] patching module: aws-sdk
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: aws-sdk
[EPSAGON] patching module: fetch-h2/dist/lib/context-http1
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: fetch-h2/dist/lib/context-http1
[EPSAGON] patching module: simple-oauth2/lib/client.js
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: simple-oauth2/lib/client.js
[EPSAGON] patching module: simple-oauth2/lib/client/client.js
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: simple-oauth2/lib/client/client.js
[EPSAGON] Patching http2 module
[EPSAGON] patching module: pg
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: pg
[EPSAGON] patching module: pg-pool
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: pg-pool
[EPSAGON] patching module: mysql2
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: mysql2
[EPSAGON] patching module: mysql2
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: mysql2
[EPSAGON] patching module: mysql/lib/Connection.js
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: mysql/lib/Connection.js
[EPSAGON] patching module: redis
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: redis
[EPSAGON] patching module: ioredis
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: ioredis
[EPSAGON] patching module: dy-ioredis/lib/redis.js
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: dy-ioredis/lib/redis.js
[EPSAGON] patching module: mongodb/lib/core/wireprotocol/index.js
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: mongodb/lib/core/wireprotocol/index.js
[EPSAGON] patching module: mongodb/lib/core/wireprotocol/index.js
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: mongodb/lib/core/wireprotocol/index.js
[EPSAGON] patching module: mongodb/lib/core/wireprotocol/index.js
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: mongodb/lib/core/wireprotocol/index.js
[EPSAGON] patching module: mongodb/lib/core/wireprotocol/index.js
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: mongodb/lib/core/wireprotocol/index.js
[EPSAGON] patching module: mongodb/lib/core/wireprotocol/index.js
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: mongodb/lib/core/wireprotocol/index.js
[EPSAGON] patching module: mongodb/lib/core/wireprotocol/index.js
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: mongodb/lib/core/wireprotocol/index.js
[EPSAGON] patching module: amazon-dax-client
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: amazon-dax-client
[EPSAGON] patching module: amazon-dax-client
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: amazon-dax-client
[EPSAGON] patching module: openwhisk/lib/actions.js
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: openwhisk/lib/actions.js
[EPSAGON] patching module: @google-cloud/common/
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: @google-cloud/common/
[EPSAGON] patching module: @google-cloud/pubsub/
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: @google-cloud/pubsub/
[EPSAGON] patching module: @google-cloud/pubsub/
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: @google-cloud/pubsub/
[EPSAGON] patching module: nats
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: nats
[EPSAGON] patching module: mqtt
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: mqtt
[EPSAGON] patching module: kafkajs
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: kafkajs
[EPSAGON] patching module: kafkajs/src/cluster/index.js
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: kafkajs/src/cluster/index.js
[EPSAGON] patching module: kafka-node
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: kafka-node
[EPSAGON] patching module: bunyan
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: bunyan
[EPSAGON] patching module: pino/lib/tools
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: pino/lib/tools
[EPSAGON] patching module: @azure/storage-blob
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: @azure/storage-blob
[EPSAGON] patching module: @azure/storage-blob
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: @azure/storage-blob
[EPSAGON] patching module: @azure/cosmos
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: @azure/cosmos
[EPSAGON] hooking winston-cloudwatch
[EPSAGON] hooked winston-cloudwatch
[EPSAGON] patching module: winston/lib/winston/logger
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: winston/lib/winston/logger
[EPSAGON] patching module: amqplib/lib/channel.js
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: amqplib/lib/channel.js
[EPSAGON] patching module: amqp/lib/exchange.js
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: amqp/lib/exchange.js
[EPSAGON] patching module: ldapjs
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: ldapjs
[EPSAGON] patching module: cassandra-driver/lib/client
[EPSAGON] found module copies: 0
[EPSAGON] done patching module: cassandra-driver/lib/client
[EPSAGON] loading azure metadata, url: (http://169.254.169.254/metadata/instance?api-version=2019-06-01)
[EPSAGON] Loading EC2 metadata
[EPSAGON] [http] captured call 169.254.169.254
[EPSAGON] [http] filtered ignored hostname 169.254.169.254
[EPSAGON] [http] captured call 169.254.169.254
[EPSAGON] [http] filtered ignored hostname 169.254.169.254
[EPSAGON] [http] captured call 169.254.169.254
[EPSAGON] [http] filtered ignored hostname 169.254.169.254
[EPSAGON] [http] captured call 169.254.169.254
[EPSAGON] [http] filtered ignored hostname 169.254.169.254
[EPSAGON] [http] captured call 169.254.169.254
[EPSAGON] [http] filtered ignored hostname 169.254.169.254
[EPSAGON] [http] captured call 169.254.169.254
[EPSAGON] [http] filtered ignored hostname 169.254.169.254
  helix-fetch:core established TLS connection: #1 (httpbin.org) +0ms
  helix-fetch:core httpbin.org -> h2 +1ms
  helix-fetch:h2 reusing socket #1 (httpbin.org) +0ms
  helix-fetch:h2 GET httpbin.org/get +2ms
  helix-fetch:h2 session https://httpbin.org established +2ms
  helix-fetch:h2 caching session https://httpbin.org +0ms
  helix-fetch:h2 session https://httpbin.org remoteSettings: {"headerTableSize":4096,"enablePush":true,"initialWindowSize":65536,"maxFrameSize":16777215,"maxConcurrentStreams":128,"maxHeaderListSize":4294967295,"maxHeaderSize":4294967295,"enableConnectProtocol":false} +1ms
  helix-fetch:h2 session https://httpbin.org localSettings: {"headerTableSize":4096,"enablePush":true,"initialWindowSize":65535,"maxFrameSize":16384,"maxConcurrentStreams":4294967295,"maxHeaderListSize":4294967295,"maxHeaderSize":4294967295,"enableConnectProtocol":false} +107ms
true
[EPSAGON] Sending trace async...
  helix-fetch:core resetting context +0ms
  helix-fetch:h2 resetContext: destroying session (socket #1, httpbin.org) +10ms
[EPSAGON] [QUEUE] Trace size 1606 Bytes pushed to queue
[EPSAGON] [QUEUE] Queue size: 1 traces, total size of 1606 Bytes
  helix-fetch:h2 session https://httpbin.org closed +4ms
  helix-fetch:h2 discarding cached session https://httpbin.org +0ms
[EPSAGON] Could not load azure metadata
[EPSAGON] Could not load EC2 metadata for instance-id
[EPSAGON] Could not load EC2 metadata for instance-type
[EPSAGON] Could not load EC2 metadata for local-ipv4
[EPSAGON] Could not load EC2 metadata for public-hostname
[EPSAGON] Could not load EC2 metadata for public-ipv4

@stefan-guggisberg
Copy link
Contributor

stefan-guggisberg commented Apr 2, 2021

The above output was generated by running this code with DEBUG=helix-fetch*:

require('dotenv').config();

const epsagon = require('epsagon');
const { fetch, reset } = require('@adobe/helix-fetch');

const { EPSAGON_TOKEN } = process.env;

epsagon.init({
  token: EPSAGON_TOKEN,
  appName: 'helix-fetch Test',
  metadataOnly: false,
});
async function test() {
    const resp = await fetch('https://httpbin.org/get');
    console.log(resp.ok);
}
const testFunction = epsagon.nodeWrapper(test);

testFunction()
  .catch(console.error)
  .finally(reset);

@stefan-guggisberg
Copy link
Contributor

Hi @ranrib, care to comment?

@ranrib
Copy link
Contributor

ranrib commented Apr 9, 2021

Hi @stefan-guggisberg ! Sorry for the late reply. Please add as an init parameter the following:
sendBatch: false

You should be able to see the trace afterwards

@stefan-guggisberg
Copy link
Contributor

Thanks @ranrib, that worked. I can see the traces now in epsagon. It would be nice if the http protocol version would be visible in the trace.

Now that we've confirmed that traces of http/2 requests generally work we still have to figure out why not a single http/2 request (there are tons!) is captured in our production env (aws lambda)...

@tripodsan
Copy link
Contributor Author

sendBatch: false

should update our epsagon instrumentation to always disable batches?

@ranrib
Copy link
Contributor

ranrib commented Apr 11, 2021

@tripodsan yes - since you're running on ephemeral envs, batching is less relevant. so please set it for all.

@tripodsan
Copy link
Contributor Author

@tripodsan yes - since you're running on ephemeral envs, batching is less relevant. so please set it for all.

I just check and we already set it.

@tripodsan
Copy link
Contributor Author

tripodsan commented Apr 20, 2021

@ranrib I did some more testing with a simpler function: https://github.com/adobe/helix-version-picker

I did the same as yesterday, i.e. deployed the function manually, but this time I didn't see the request.
https://app.epsagon.com/spans/0eb9412a-5550-e4e5-4be1-c045f5e5dbc6?tab=graph&start_time=1618850102

then I deliberately disabled HTTP2, redeployed and then I see the http request to github in the trace:
https://app.epsagon.com/spans/0eb94143-c87d-8db6-fe52-bfecd3be9a20?tab=graph&start_time=1618850268

I also created a completely new function, but still no request:
https://app.epsagon.com/spans/0eb94167-a131-d5db-23b9-65730b74bda7?tab=graph&start_time=1618850502

btw, here's how to invoke the function

 curl -sD - "https://lqmig3v5eb.execute-api.us-east-1.amazonaws.com/helix-services/version-picker-test/1.4.3" -H "x-owner: tripodsan" -H "x-repo: helix-pages-test" -H "x-ref: picker-unit-test"

but then, for the next invocation, there is a request:
https://app.epsagon.com/spans/0eb941b1-efb8-bc9d-8825-37c8c4e9e3e9?tab=graph&start_time=1618850989

but for one later, no:
https://app.epsagon.com/spans/0eb941d3-a84d-bce9-ec4d-868244f801ca?tab=graph&start_time=1618851210

could it be that epsagon doesn't catch requests on http2 keep alive connections?


then I did a test by creating a new fetch context for every invocation.... but same result. no requests recorded.
and playing more with it, I get mixed results. sometimes after a deploy, the traces to github are there, sometimes they aren't. I think it's related to some timing / initialization order.


then I created a version with EPSAGON_DEBUG enabled. interrestingly, the traces show the request to github:
https://app.epsagon.com/spans/0eb94369-4b0e-ddb4-1be6-03fa53c769e3?tab=graph&start_time=1618853869

but the cloudwatch logs don't mention it.

so, I have no clue so far, what could be the reason.


I also updated to node 14, but with the same result.
freshly deployed, no traces:
https://app.epsagon.com/spans/0eb9445d-8c93-62ed-ec2d-e0fed9418c92?tab=graph&start_time=1618855469

and then with enabled debug, also no capture:
https://app.epsagon.com/spans/0eb94474-49dd-544d-6688-8c171f76f17d?tab=graph&start_time=1618855618

btw, I noticed those INFO in the debug output:

[EPSAGON] require.resolve.paths is not a function

might this be a problem?

@dominique-pfister dominique-pfister changed the title AWS: investiagate 2second delay before response is returned AWS: investigate 2 seconds delay before response is returned Apr 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants