Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thin-Arbiter-Volumes.md has inaccurate and incomplete information. #621

Open
alphabet5 opened this issue Dec 12, 2020 · 13 comments
Open

Thin-Arbiter-Volumes.md has inaccurate and incomplete information. #621

alphabet5 opened this issue Dec 12, 2020 · 13 comments

Comments

@alphabet5
Copy link

  • The command 'glustercli' is used, which doesn't have any reference in docs until somewhere back in the changelog for v4.

  • The command to create a thin-arbiter volume file does not work. A file is not created, resulting in the following showing up in the logs.

[2020-12-12 22:52:47.880412] E [MSGID: 100009] [glusterfsd.c:633:get_volfp] 0-glusterfsd: loading volume file failed [{volume_file=/mnt/brick1/gvolume0/thin-arbiter.vol}, {errno=2}, {error=No such file or directory}]
  • The command to create a volume is incorrect.
glustercli volume create <volname> --replica 2 <host1>:<brick1> <host2>:<brick2> --thin-arbiter <quorum-host>:<path-to-store-replica-id-file>

Should be changed to:

gluster volume create <volname> replica 2 thin-arbiter 1 <host1>:<brick1> <host2>:<brick2> <quorum-host>:<path-to-store-replica-id-file>
@amarts
Copy link
Member

amarts commented Dec 13, 2020

@Sheetalpamecha @aspandey @itisravi @karthik-us can you please have a look when you get a chance... thanks.

@alphabet5
Copy link
Author

After some more digging, there is this: https://review.gluster.org/#/c/glusterfs/+/20056/

Which has a script to assist with configuring a service for the thin-arbiter process, as well as a template .vol file at glusterfs/extras/thin-arbiter/thin-arbiter/thin-arbiter.vol

It appears that you can't run a thin-arbiter on a node that is also running glusterd by default. (I was trying to test out 1 node from cluster2 being a thin-arbiter for cluster1)

A couple of things that I can't seem to find:

  • How can you view the status of the thin-arbiter? (gluster volume status doesn't show the thin-arbiter.)
  • Does the thin-arbiter need to be a peer?

It looks like the arbiter for 8.3 is not the same op-version as glusterfs?

# gluster peer probe arbiter
peer probe: failed: Peer arbiter does not support required op-version

@itisravi
Copy link
Member

@alphabet5

  • thin-arbiter process is supposed to be run on a node outside the gluster trusted storage pool (i.e. where there is no glusterd running). So it must not be a peer.
  • the script that you identified sets up the ta process as a systemd service which will auto start the process even if you kill it or it dies, so ideally there is no need to check its status. You can still ps aux|grep gluster on the thin arbiter node to find its pid.

@alphabet5
Copy link
Author

Thanks @itisravi. Is there a way to verify the status of the arbiter? If the arbiter is online, but unreachable from the cluster, how would I know?

I don't really want to take a brick offline to see if the arbiter still allows writes to the other brick. Is there another way to verify the arbiter status?

@amarts
Copy link
Member

amarts commented Dec 15, 2020

telnet <thin-arbiter-node> 24007

Ctl-]

@alphabet5
Copy link
Author

@amarts how does this verify that the arbiter is working?

# telnet arbiter 24007
Trying 192.168.1.254...
Connected to arbiter.
Escape character is '^]'.
^]

If I look at logs for the arbiter, it seems as though it might not be working, and I don't see how telnetting to the arbiter verifies its operational status.

[2020-12-15 15:22:31.615878] E [MSGID: 115001] [server-handshake.c:584:server_setvolume] 0-ta-server: Cannot authenticate client from CTX_ID:fe5e65be-0254-4e46-8a5c-fe7b8e453459-GRAPH_ID:0-PID:1495-HOST:server2-PC_NAME:gvolume0-ta-2-RECON_NO:-75028 8.3 because brick is not attached in graph [No such file or directory]

Even if you verify the service status:

root@arbiter:~# systemctl status thin-arbiter                                 ● thin-arbiter.service - GlusterFS, Thin-arbiter process to maintain quorum f>
     Loaded: loaded (/etc/systemd/system/thin-arbiter.service; enabled; vendo>
     Active: active (running) since Mon 2020-12-14 18:26:05 UTC; 20h ago
   Main PID: 9872 (glusterfsd)
     Memory: 1.0G
     CGroup: /system.slice/thin-arbiter.service
             └─9872 /usr/sbin/glusterfsd -N --volfile-id ta -f /mnt/brick1/gv>

Dec 14 18:26:05 arbiter systemd[1]: Started GlusterFS, Thin-arbiter process t>
lines 1-9/9 (END)

It doesn't validate that the thin-arbiter is operational.

@alphabet5
Copy link
Author

To clarify; I'm thinking all of this information would be useful to have in Thin-Arbiter-Volumes.md.

If you want me to take a stab at a pull request, let me know.

I also haven't found an example for using setup-thin-arbiter.sh yet. I'm guessing something like cd /mnt/dir/thin-arbiter-dir && sudo /?/?/?/setup-thin-arbiter.sh

@itisravi
Copy link
Member

If the arbiter is online, but unreachable from the cluster, how would I know

It needs to be reachable only from the (fuse) clients and not the cluster. So if it is not connected to any of the bricks including the TA brick, the fuse mount logs will have messages like disconnected from distrep-client-0etc. Conversely upon an established connection , you will see Connected to distrep-client-0 etc. in the logs.

If you want me to take a stab at a pull request, let me know.

Sure go ahead.

I also haven't found an example for using setup-thin-arbiter.sh yet

Slide 23 of
https://archive.fosdem.org/2020/schedule/event/sds_gluster_thin_arbiter/attachments/slides/4110/export/events/attachments/sds_gluster_thin_arbiter/slides/4110/gluster_thin_arbiter_fosdem_2020.pdf has an embedded demo, check it out!

@alphabet5
Copy link
Author

alphabet5 commented Dec 27, 2020

Is it possible to remove a thin-arbiter brick?

gluster volume remove-brick gvolume0 replica 2 thin-arbiter 1 arbiter:/mnt/brick1/gvolume0/thin-arbiter.vol force
wrong brick type: thin-arbiter, use <HOSTNAME>:<export-dir-abs-path>

Usage:
volume remove-brick <VOLNAME> [replica <COUNT>] <BRICK> ... <start|stop|status|commit|force>

@alphabet5
Copy link
Author

Per that slide deck, support for add/replace brick are on the todo list yet.

TODO

  • Support for add/replace-brick CLI:
    • convert existing replica 2/3/arbiter to TA volume.
    • replace brick for data-bricks and TA node.
  • Make reads aware of in-memory information about bad brick.
  • Fix reported bugs. �

@itisravi
Copy link
Member

on the todo list yet.

Yes @Sheetalpamecha is working on this via gluster/glusterfs#1528

@polachz-nxp
Copy link

I have fought last night with Thin Arbiter too. the MD file doesn't give ANY information about VOLUME_FILE, how to get this or create. Command to create volume with thin arbiter is still inaccurate.

And Even If I did my best to configure Thin-Arbiter correctly, I have no idea how verify that it works or not. And because here is no way how to reconfigure volume (gluster/glusterfs#1528 is dead for now) then any fix in the future means get all data out form the volume and re-create it.

I think that this part of documentation needs significant improvements...

@polachz
Copy link

polachz commented Aug 26, 2023

Finally I found a way how-to make GlusterFS Thin Arbiter up and running, Here is my How-To

https://polach.me/posts/howto-setup-glusterfs-thin-arbiter-at-homelab/

Maybe it can save some time to others...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants