Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grid Node timeout value differs from value set on the hub #3064

Closed
testphreak opened this issue Nov 5, 2016 · 19 comments
Closed

Grid Node timeout value differs from value set on the hub #3064

testphreak opened this issue Nov 5, 2016 · 19 comments
Assignees

Comments

@testphreak
Copy link

Meta -

OS:
Latest Docker Image
Selenium Version:
3.0.1
Browser:
Chrome
Browser Version:
53.0.2785.101-1

Expected Behavior -

timeout value on node matches the timeout value set for hub. So if hub timeout is set to 300000, I expect the node timeout to also be 300000.

Actual Behavior -

timeout value for each node on the grid intermittently shows 1800 even though I set the timeout on the hub to 300000.

Steps to reproduce -

On Grid console page on each node I see the following configuration (note the timeout value)
browserTimeout: 900000
debug: false
help: false
jettyMaxThreads: -1
port: 5555
role: node
timeout: 1800 (should be 300000)
cleanUpCycle: 5000
host: 659103bcc9f9
maxSession: 1
capabilities: Capabilities [{seleniumProtocol=WebDriver, browserName=chrome, maxInstances=1, platform=LINUX}]
hub: http://hub:4444/grid/register
id: http://659103bcc9f9:5555
hubHost: hub
hubPort: 4444
nodeStatusCheckTimeout: 5000
proxy: org.openqa.grid.selenium.proxy.DefaultRemoteProxy
register: true
registerCycle: 5000
remoteHost: http://659103bcc9f9:5555

On the same grid console page, clicking on view config link, I can see the hub has the correct values including value for timeout.
Config for the hub :
browserTimeout : 900000
debug : false
help : false
jettyMaxThreads : -1
port : 4444
role : hub
timeout : 300000
cleanUpCycle : 5000
host : 10.255.0.5
maxSession : 1
hubConfig : /opt/selenium/config.json
capabilityMatcher : org.openqa.grid.internal.utils.DefaultCapabilityMatcher
newSessionWaitTimeout : -1
throwOnCapabilityNotPresent : true
Config details :
hub launched with : -browserTimeout 900000 -debug false -help false -jettyMaxThreads -1 -port 4444 -role hub -timeout 300000 -cleanUpCycle 5000 -host 10.255.0.5 -maxSession 1 -hubConfig /opt/selenium/config.json -capabilityMatcher org.openqa.grid.internal.utils.DefaultCapabilityMatcher -newSessionWaitTimeout -1 -throwOnCapabilityNotPresent true
the final configuration comes from :
the default :
browserTimeout : 0
debug : false
help : false
jettyMaxThreads : -1
role : hub
timeout : 1800
cleanUpCycle : 5000
maxSession : 1
capabilityMatcher : org.openqa.grid.internal.utils.DefaultCapabilityMatcher
newSessionWaitTimeout : -1
throwOnCapabilityNotPresent : true

updated with params :
browserTimeout : 900000
debug : false
help : false
jettyMaxThreads : -1
role : hub
timeout : 300000
cleanUpCycle : 5000
maxSession : 1
capabilityMatcher : org.openqa.grid.internal.utils.DefaultCapabilityMatcher
newSessionWaitTimeout : -1
throwOnCapabilityNotPresent : true

@testphreak
Copy link
Author

testphreak commented Nov 6, 2016

Here's an example of 2 nodes on the same grid having 2 different timeout values as shown in the screenshot below. The node on the right with 300000 ms timeout is what I set, but for some reason, some nodes in the grid show the default timeout which is 1800 ms.

screen shot 2016-11-05 at 7 45 48 pm

@mach6
Copy link
Member

mach6 commented Nov 6, 2016

Likely what's happening here is the node is failing to communicate with the hub, early in its registration process, to retrieve the timeout value from the hub. Do you see a WARN with Exception in the node logs and/or stdout for the node which ends up with the default value (1800)?

@mach6 mach6 added the C-server label Nov 6, 2016
@mach6
Copy link
Member

mach6 commented Nov 6, 2016

side note related to the code I linked above;

  • IMO, browserTimeout should be node specific and the hub has no business overriding a browserTimeout value specified by a node. That said, I've been resistant to make a change until I get consensus on this.

@testphreak
Copy link
Author

testphreak commented Nov 6, 2016

Here's the log for the node with the wrong timeout value during initialization. For some reason I see a connection refused warning as you suggested.

appending selenium options: -host 28327d3e373b
 18:28:59.112 INFO - Selenium build info: version: '3.0.1', revision: '1969d75'
 18:28:59.113 INFO - Launching a Selenium Grid node
 18:29:00.374 WARN - error getting the parameters from the hub. The node may end up with wrong timeouts.Connect to hub:4444 [hub/10.0.0.2] failed: Connection refused
 2016-11-05 18:29:00.388:INFO::main: Logging initialized @1583ms
 18:29:00.418 INFO - Driver provider org.openqa.selenium.ie.InternetExplorerDriver registration is skipped:
  registration capabilities Capabilities [{ensureCleanSession=true, browserName=internet explorer, version=, platform=WINDOWS}] does not match the current platform LINUX
 18:29:00.419 INFO - Driver provider org.openqa.selenium.edge.EdgeDriver registration is skipped:
  registration capabilities Capabilities [{browserName=MicrosoftEdge, version=, platform=WINDOWS}] does not match the current platform LINUX
 18:29:00.419 INFO - Driver class not found: com.opera.core.systems.OperaDriver
 18:29:00.419 INFO - Driver provider com.opera.core.systems.OperaDriver registration is skipped:
 Unable to create new instances on this machine.
 18:29:00.419 INFO - Driver class not found: com.opera.core.systems.OperaDriver
 18:29:00.419 INFO - Driver provider com.opera.core.systems.OperaDriver is not registered
 18:29:00.420 INFO - Driver provider org.openqa.selenium.safari.SafariDriver registration is skipped:
  registration capabilities Capabilities [{browserName=safari, version=, platform=MAC}] does not match the current platform LINUX
 2016-11-05 18:29:00.449:INFO:osjs.Server:main: jetty-9.2.15.v20160210
 2016-11-05 18:29:00.470:INFO:osjsh.ContextHandler:main: Started o.s.j.s.ServletContextHandler@3cc2931c{/,null,AVAILABLE}
 2016-11-05 18:29:00.482:INFO:osjs.ServerConnector:main: Started ServerConnector@ef9296d{HTTP/1.1}{0.0.0.0:5555}
 2016-11-05 18:29:00.482:INFO:osjs.Server:main: Started @1677ms
 18:29:00.482 INFO - Selenium Grid node is up and ready to register to the hub
 18:29:00.491 INFO - Starting auto registration thread. Will try to register every 5000 ms.
 18:29:00.491 INFO - Registering the node to the hub: http://hub:4444/grid/register
 18:29:00.497 INFO - Couldn't register this node: Error sending the registration request: Connect to hub:4444 [hub/10.0.0.2] failed: Connection refused
 18:29:05.512 INFO - Registering the node to the hub: http://hub:4444/grid/register
 18:29:05.520 INFO - The node is registered to the hub and ready to use
 18:29:15.634 INFO - SessionCleaner initialized with insideBrowserTimeout 0 and clientGoneTimeout 1800000 polling every 180000

Messages of interest -
First time with a warning -

2016-11-06T01:29:00.374758143Z  18:29:00.374 WARN - error getting the parameters from the hub. The node may end up with wrong timeouts.Connect to hub:4444 [hub/10.0.0.2] failed: Connection refused

Then again a few ms later, a connection refused, but eventually registers with the default timeout instead of the timeout I set.

2016-11-06T01:29:00.497440999Z  18:29:00.497 INFO - Couldn't register this node: Error sending the registration request: Connect to hub:4444 [hub/10.0.0.2] failed: Connection refused

@testphreak
Copy link
Author

I think I've been able to get around the issue. Now as soon as I start the hub, I check if /grid/console is reachable. Only when I get a 200 back from the request, I start adding nodes.

Perhaps it's a timing issue with the hub not able to take traffic right after it's started and needs a few seconds before nodes can start connecting to it.

@testphreak
Copy link
Author

Unrelated, but would be nice if there was an api available that returns the number of nodes registered in the grid and other useful grid stats besides just the console page.

@schmidtkp
Copy link

@testphreak You can write your own servlet and add it to the list of servlets in your GridHubConfiguration. In your servlet you can call the API's getRegistry().getAllProxies() and getRegistry().getDesiredCapabilities() to get additional information. I do this to generate a JSON response that is then consumed by a few web pages to display things like number of registered nodes to the hub, nodes currently in use (including browser instance), nodes available, requests in the hubs queue, etc.

@mach6
Copy link
Member

mach6 commented Nov 7, 2016

@testphreak @schmidtkp a custom servlet will definitely get the job done. That said, an api for this information is a request that comes up often. I'm hoping to get one into the main code in the near future.

@testphreak
Copy link
Author

Thank you @schmidtkp, I didn't know we could add our our own servlet.

@mach6, having apis we could call to get grid stats for hub and nodes would be awesome! There is a /grid/api/hub call, but it doesn't provide all the information.

For now, I have gone ahead with an HTML parser, albeit simple solution, that extracts the information I need (number of nodes) from the console page.

@mach6 mach6 self-assigned this Nov 9, 2016
@sherlock1982
Copy link

Still happens with 3.6.0

@wbh1
Copy link

wbh1 commented Jun 15, 2018

Experienced this on Grid 3.12, too. Had to set the "timeout" value in the node's JSON to make it take effect.

@mikimichaeli
Copy link

@wbh1 I am currently experiencing this in node 3.12 as well. Can you please elaborate on what you did exactly in order to make this stick?

@barancev
Copy link
Member

barancev commented Jul 1, 2018

@wbh1 @mikimichaeli Can you please upgrade to 3.13 and share a node log, it contains more information about configuration now that can help us to pinpoint the issue.

@mikimichaeli
Copy link

@barancev after some research I realized tgat my problem not related to selenium itself but to network errors.

@diemol diemol closed this as completed in 88340f4 Sep 2, 2018
@diemol
Copy link
Member

diemol commented Sep 2, 2018

After checking the code, I can see that the Node is actually fetching the Hub configuration and updating its own configuration, but only after registering.
The hub is only aware of the node configuration reported at the moment of registration, and since the configuration is updated after registering, the hub does not have the latest configuration. And whatever the hub has, is what is shown in the Grid Console.

This is only a problem when the configuration values are shown in the Grid Console (where is actually what people check). Nevertheless, the values used on the node side to check timeout and browserTimeoutare the correct ones. If you run the node in debug node, you'll see that the configuration is correct.

I've just changed the order to fetch the hub configuration first, update it locally and then send the registration request, therefore the hub will have always the latest configuration (which was the original behaviour), and it will be properly shown in the Grid Console. It should be available in the next release.

@pawelus
Copy link

pawelus commented Nov 14, 2018

@diemol are you sure node applies the config correctly?

I'm using docker-compose.yml to deploy to Docker Swarm:

version: '3.7'

services:
  hub:
    image: selenium/hub:3.14.0-krypton
    environment:
      GRID_BROWSER_TIMEOUT: 180
      GRID_TIMEOUT: 300
      GRID_DEBUG: "true"
      JAVA_OPTS: -Xmx512m
    ports:
      - "4444:4444"
    deploy:
      replicas: 1    
      placement:
        constraints:
          - node.role == manager

  chrome:
    image: selenium/node-chrome-debug:3.14.0-krypton
    environment:
      HUB_HOST: hub
      HUB_PORT: 4444
      GRID_DEBUG: "true"
      JAVA_OPTS: -Xmx512m
    ports:
      - "5900:5900"
    volumes:
      - /dev/shm:/dev/shm
    depends_on: 
      - hub
    deploy:
        replicas: 2
        placement:
          constraints:
            - node.role == worker
    entrypoint: bash -c 'SE_OPTS="-host $$HOSTNAME" /opt/bin/entry_point.sh'

Debug logs from the node claim that it fetched the config from the hub and applied it, but the node still behaves like browserTimeout is set to 0 (which is the default). It never times out on browser crashes.

It times out on browser crash only when I change my node entry point to:
entrypoint: bash -c 'SE_OPTS="-timeout 300 -browserTimeout 180 -host $$HOSTNAME" /opt/bin/entry_point.sh'
Which would mean default config is not substituted with the one from hub.

@diemol
Copy link
Member

diemol commented Nov 18, 2018

@pawelus yes, I am sure. But to be more specific, yes, the node fetches the configuration and sets it properly.

Nevertheless, browserTimeout was broken for a while, and it got fixed in the 3.141.0 release. So to be safe, please use the latest release 3.141.59

@diemol
Copy link
Member

diemol commented Nov 18, 2018

So even if the configuration was correct, the browserTimeout parameter was not being used properly.

@lock lock bot locked and limited conversation to collaborators Aug 15, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants