Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installing via helm - backend unable to connect to postgres #848

Open
yixu34 opened this issue Jun 23, 2020 · 6 comments
Open

Installing via helm - backend unable to connect to postgres #848

yixu34 opened this issue Jun 23, 2020 · 6 comments
Assignees
Labels

Comments

@yixu34
Copy link

yixu34 commented Jun 23, 2020

I've just pulled master and I'm on f8780ec (this was after the helm chart split, which I noticed in some of the 2.x tagged versions previously). I tried installing on our k8s cluster via helm, and the backend container of the backend service is giving an error:

{"thread":"main","level":"DEBUG","loggerName":"ai.verta.modeldb.utils.ModelDBHibernateUtil","message":"ModelDBHibernateUtil getSessionFactory() retrying for DB connection after 2560 millisecond ","endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","instant":{"epochSecond":1592876745,"nanoOfSecond":869000000},"threadId":1,"threadPriority":5,"hostName":"modeldb-staging-f8780e-backend-0","kubernetes.podIP":""}

{"thread":"main","level":"WARN","loggerName":"ai.verta.modeldb.utils.ModelDBHibernateUtil","message":"ModelDBHibernateUtil checkDBConnection() got error ","thrown":{"commonElementCount":0,"localizedMessage":"The connection attempt failed.","message":"The connection attempt failed.","name":"org.postgresql.util.PSQLException","cause":{"commonElementCount":19,"localizedMessage":"modeldb-postgresql","message":"modeldb-postgresql","name":"java.net.UnknownHostException","extendedStackTrace":"java.net.UnknownHostException: modeldb-postgresql
	at sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:567) ~[?:?]
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:333) ~[?:?]
	at java.net.Socket.connect(Socket.java:648) ~[?:?]
	at org.postgresql.core.PGStream.<init>(PGStream.java:75) ~[postgresql-42.2.6.jar!/:42.2.6]
	at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:91) ~[postgresql-42.2.6.jar!/:42.2.6]
	at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:192) ~[postgresql-42.2.6.jar!/:42.2.6]
"},"extendedStackTrace":"org.postgresql.util.PSQLException: The connection attempt failed.
	at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:292) ~[postgresql-42.2.6.jar!/:42.2.6]
	at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49) ~[postgresql-42.2.6.jar!/:42.2.6]
	at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:195) ~[postgresql-42.2.6.jar!/:42.2.6]
	at org.postgresql.Driver.makeConnection(Driver.java:458) ~[postgresql-42.2.6.jar!/:42.2.6]
	at org.postgresql.Driver.connect(Driver.java:260) ~[postgresql-42.2.6.jar!/:42.2.6]
	at java.sql.DriverManager.getConnection(DriverManager.java:677) ~[java.sql:?]
	at java.sql.DriverManager.getConnection(DriverManager.java:228) ~[java.sql:?]
	at ai.verta.modeldb.utils.ModelDBHibernateUtil.checkDBConnection(ModelDBHibernateUtil.java:483) [classes!/:1.0-SNAPSHOT]
	at ai.verta.modeldb.utils.ModelDBHibernateUtil.checkDBConnectionInLoop(ModelDBHibernateUtil.java:325) [classes!/:1.0-SNAPSHOT]
	at ai.verta.modeldb.utils.ModelDBHibernateUtil.createOrGetSessionFactory(ModelDBHibernateUtil.java:240) [classes!/:1.0-SNAPSHOT]
	at ai.verta.modeldb.App.initializeServicesBaseOnDataBase(App.java:363) [classes!/:1.0-SNAPSHOT]
	at ai.verta.modeldb.App.main(App.java:260) [classes!/:1.0-SNAPSHOT]
	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
	at java.lang.reflect.Method.invoke(Method.java:564) ~[?:?]
	at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:48) [modeldb-1.0-SNAPSHOT-client-build.jar:1.0-SNAPSHOT]
	at org.springframework.boot.loader.Launcher.launch(Launcher.java:87) [modeldb-1.0-SNAPSHOT-client-build.jar:1.0-SNAPSHOT]
	at org.springframework.boot.loader.Launcher.launch(Launcher.java:50) [modeldb-1.0-SNAPSHOT-client-build.jar:1.0-SNAPSHOT]
	at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:58) [modeldb-1.0-SNAPSHOT-client-build.jar:1.0-SNAPSHOT]
Caused by: java.net.UnknownHostException: modeldb-postgresql
	at sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:567) ~[?:?]
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:333) ~[?:?]
	at java.net.Socket.connect(Socket.java:648) ~[?:?]
	at org.postgresql.core.PGStream.<init>(PGStream.java:75) ~[postgresql-42.2.6.jar!/:42.2.6]
	at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:91) ~[postgresql-42.2.6.jar!/:42.2.6]
	at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:192) ~[postgresql-42.2.6.jar!/:42.2.6]
	... 19 more
"},"endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","instant":{"epochSecond":1592876745,"nanoOfSecond":869000000},"threadId":1,"threadPriority":5,"hostName":"modeldb-staging-f8780e-backend-0","kubernetes.podIP":""}

Is there something that's not working out of the box with the helm charts? Or did I not configure the secrets correctly? All I did was helm install modeldb-staging-f8780e . --namespace <our namespace>. Thanks!

@conradoverta
Copy link
Contributor

Hi, @yixu34! Thanks for reaching out.

According to the logs, I can see that apparently modeldb-postgresql is not present as a service. Could you run kubectl get svc --namespace <your namespace> to check? Related to modeldb, you should see something like the below (this is from a fresh install I did this morning with the charts):

$ k get svc
NAME                          TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
kubernetes                    ClusterIP   10.96.0.1       <none>        443/TCP                      10h
modeldb-backend               ClusterIP   10.96.36.191    <none>        8085/TCP,8086/TCP,3000/TCP   10h
modeldb-graphql               ClusterIP   10.96.212.125   <none>        3000/TCP                     10h
modeldb-postgresql            ClusterIP   10.96.122.16    <none>        5432/TCP                     10h
modeldb-postgresql-headless   ClusterIP   None            <none>        5432/TCP                     10h
modeldb-webapp                ClusterIP   10.96.65.187    <none>        3000/TCP                     10h

I imagine maybe something was off during the installation. If you could share the services you have, I can help you debug what happened.

@conradoverta conradoverta self-assigned this Jun 23, 2020
@yixu34
Copy link
Author

yixu34 commented Jun 23, 2020

Here are my services:

$ kubectl get svc | grep modeldb
modeldb-staging-f8780e-backend               ClusterIP   100.71.14.248    <none>        8085/TCP,8086/TCP,3000/TCP   150m
modeldb-staging-f8780e-graphql               ClusterIP   100.67.143.195   <none>        3000/TCP                     150m
modeldb-staging-f8780e-postgresql            ClusterIP   100.68.84.251    <none>        5432/TCP                     150m
modeldb-staging-f8780e-postgresql-headless   ClusterIP   None             <none>        5432/TCP                     150m
modeldb-staging-f8780e-webapp                ClusterIP   100.68.70.21     <none>        3000/TCP                     150m

I think I might see what the problem is, then: it looks like modeldb-postgresql is a hardcoded value, by way of the backend values.yaml, on lines 67 and 81. I suppose I can either install the helm chart with with --name modeldb, or change those values. Ideally, one would have a dependency on the other, or read from some common place, right?

@yixu34
Copy link
Author

yixu34 commented Jun 23, 2020

Ok, changing --name modeldb for the release did the trick. I've port forwarded the webapp to my localhost:3000, and I think the only remaining problem is that on the 'Repositories' page, I see a 504 error to http://localhost:3000/api/v1/graphql/query. My guess is that there's a typo here with the double dash. It seems like it should be value: "modeldb-backend:8085" instead of value: "modeldb--backend:8085". I can contribute a PR if that's the case.

@conradoverta
Copy link
Contributor

Nice catch. That does look like a typo and you are right that some of the names appear to be hardcoded (both in the DB reference and the graphql config). It should be based off the name of the release everywhere to avoid this situation. I'd definitely appreciate a PR with fixes!

@yixu34
Copy link
Author

yixu34 commented Jun 23, 2020

Ok cool, but let me make sure I have everything working first 😅 In addition to removing the double dash, I had to move the {{- if .Values.env }} on line 36 to below line 41. I noticed that this was preventing the MDB_ADDRESS and QUERY_PATH environment variables from even being set. I then went back to the 'Repositories' page, which then fires off a request to http://localhost:3000/api/v1/graphql/query. I still see a 504, with the error being Error occured while trying to proxy to: modeldb-backend:3000/query. I'm not sure why this is happening, because the webapp redirects all api/v1/graphql/ routes to the graphQL service. The graphQL service then uses MDB_ADDRESS, which I've now (correctly?) set to modeldb-backend:8085. So I'm not sure why it's trying to forward the request to port 3000 instead. Here are the environment variables when I describe the graphQL pod, by the way:

Environment:
      MDB_ADDRESS:  modeldb-backend:8085
      QUERY_PATH:   /api/v1/graphql/query

@conradoverta
Copy link
Contributor

conradoverta commented Jun 23, 2020

Ok, I think I was able to narrow down what happened.

First, the webapp logs were a bit misleading because BACKEND_API_DOMAIN was misconfigured. #853 is fixing that. It doesn't affect correctness, but it does affect the logs in the OSS component.

After that, I noticed that the graphql service was serving on port 4000, but the whole setup assumed it was on port 3000. The reason for this mismatch is that internally our services default to port 3000 for the exposed layer, but we had to move to 4000 to avoid collision on docker compose to simplify things for users. So the deployment template for graphql should have

           - name: QUERY_PATH
             value: "/api/v1/graphql/query"
+          - name: SERVER_HTTP_PORT
+            value: "3000"

which will set the correct port. This should resolve the issue you're seeing. I'm not sure how I missed that earlier. Could you double check?

I appreciate the help to debug while we open more of our platform! Our SaaS runs with a very specific configuration, so we need to reconsolidate progressively as we keep moving new parts to the open world. Our end to end CI is not fully compatible with the open version, but it's coming!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants