Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] FUSE mount forces DIRECT I/O mode with Samba #573

Open
anon314159 opened this issue Apr 5, 2024 · 0 comments
Open

[BUG] FUSE mount forces DIRECT I/O mode with Samba #573

anon314159 opened this issue Apr 5, 2024 · 0 comments

Comments

@anon314159
Copy link

anon314159 commented Apr 5, 2024

Have you read through available documentation, open Github issues and Github Q&A Discussions?

Yes

System information

Various (3 independent air-gaped clusters exhibiting the same behavior/performance)

Your moosefs version and its origin (moosefs.com, packaged by distro, built from source, ...).

Moosefs Version: 3.0.117 (rpm distribution)

Operating system (distribution) and kernel version.

OS: Redhat Enterprise Edition 8.9 (Ootpa)
Kernel: 4.18.0-513.9.1.el8_9.x86_64
Samba: 4.19.3

Hardware / network configuration, and underlying filesystems on master, chunkservers, and clients.

Various, but all chunk/master servers have 40GbE interconnects and 40/10GbE clients.

Benchmarking data transfer rates between all servers in the cluster using iperf3 reveal no anomalies (i.e. averaging 34gbits/s with minimal retransmissions).

Local Storage subsystem (XFS) for each chunkserver also performs within expectations, averaging 3GB/s sequential read and 2.8GB/s sequential write.

**Alternative distributed file systems such as GlusterFS, Ceph, and BeeFGS behave normally and offer near line speed when re-exported as Samba shares on the exact same hardware (I have a non-production test environment).

hdparm and smartctl show all drives operating within normal performance parameters with no detected medium errors.

How much data is tracked by Moosefs master (order of magnitude)?

  • All fs objects: Varies
  • Total space: Varies
  • Free space: Varies
  • RAM used: Varies
  • last metadata save duration: Varies (Less than 4 seconds across all clusters)

Describe the problem you observed.

Problem: Samba forces uncached direct I/O mode when communicating with the mfsmount process. The has the net effect of slowing down any/all read operations when exporting MooseFS FUSE mounts as Samba shares.

Can you reproduce it? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.

Yes. Below are some examples that involve testing read speeds using the native fuse and smb client. Performance with SMB consistently hovers around 50 to 70 megabytes a second regardless of the client or server hardware and appears to be operating as though it is using direct I/O mode when attempting to read any data not already cached by the Samba proxy server.

Whereas I have an all-flash cluster that behaves similarly in terms of performance when accessing files through an SMB proxy server that's connected back to MooseFS via FUSE. For comparative purposes, I have also exported GlusterFS and local XFS mounts on the same exact SMB server yielding near line speed levels read and write performance backed by the same exact hardware and cluster configuration.

While it is reasonable to assume a slight performance drop-off while traversing the several layers of abstraction by exporting FUSE mounts as CIFS/SMB shares, averaging less than 1/10th of the FUSE native client's performance regardless of the hardware configuration makes this solution completely unacceptable for any type of real-world production use.

Note: All testing was done from a client with a Rzyen 5950X processor, 128GB DDR4 Memory and a Intel X550-T2 10GbE NIC ruining RHEL 8.9 with the same Kernel, MooseFS and Samba versions as the chunk/master servers within each cluster. Testing has also been completed on additional Windows and Linux clients yielding similar poor performance results.

Example of Sequential Read Tests:
Cluster-A (Average 10 Runs, 8GB Test File, MooseFS FUSE):
echo 3 > /proc/sys/vm/drop_caches
if=/mnt/net_shares/mfs-fuse/test-file of=/dev/null bs=1M count=8192 status=progress
8589934592 bytes (8.6 GiB, 8.0 GiB) copied, 12.2585 s, 701 MB/s

Cluster-A (Average 10 Runs, 8GB Test File, MooseFS FUSE SMB reexport):
mount -t cifs //smb-server/mfs-test /mnt/net_shares/cifs/ -o domain=redacted,username=redacted
echo 3 > /proc/sys/vm/drop_caches
if=/mnt/net_shares/cifs/test-file of=/dev/null bs=1M count=8192 status=progress
8589934592 bytes (8.6 GiB, 8.0 GiB) copied, 120.2585 s, 70.1 MB/s

Benchmarking data transfer rates between all servers in the cluster using iperf3 reveal no anomalies (i.e. averaging 34gbits/s with minimal retransmissions).

Local Storage subsystem for each chunkserver also performs within expectations, averaging 3GB/s sequential read and 2.8GB/s sequential write.

Exporting various local/distributed file systems from the same exact samba proxy servers yield the expected read/write performance with the exception of any/all fuse mounts hosted by MooseFS.

Samba Configuration File (/etc/samba/smb.conf):
[global]
bind interfaces only = yes
interfaces = enp216s0f1
netbios name =
server string = Samba Server Version %v
server multi channel support = yes
server role = member server
log file = /var/log/samba/log.%m
max log size = 50
security = ads
passdb backend = tdbsam
load printers = no
cups options = raw
kerberos method = secrets and keytab
idamp config : range = redacted-redacted
idamp config : backend = rid
idamp config * : range = redacted-redacted
idamp config * : backend = autorid
winbind use default domain = yes
winbind refresh tickets = yes
winbind offline logon = yes
winbind enum groups = yes
winbind enum users = yes
nt acl support = yes
workgroup = redacted
realm = redacted
hosts allow = redacted
hosts deny = ALL
durable handles = yes
ea support = no
strict locking = no
max xmit = 65535
socket options = TCP_NODELAY IPTOS_LOWDELAY
getcwd cache = yes
log level = 1
vfs objects = acl_xattr

[mfs-test]
acl_xattr:ignore system acls = no
acl_xattr:default acl style = windows
nt acl suport = yes
create mask 6660
directory mask 6750
map acls inherit = yes
path = /mnt/net_shares/moosefs/mfs-test
guest ok = yes
read only = no
available = yes
writable = yes
kernel share modes = no
kernel oplocks = no
map archive = no
map hidden = no
map readonly = no
map system = no
store dos attributes = no
hosts allow = redacted
hosts deny = ALL
posix locking = no
case senstive = true
default case = lower
preserve case = true
short case preserve = true
oplocks = yes

Include any warning/errors/backtraces from the system logs.

strace -T -fff -p 'pid of test SMB client' reveals an excess number of futex's and epoll api calls.
strace -T -fff -p 'pid of mfs mount' reveals an excess number of epoll api calls.

@anon314159 anon314159 changed the title [BUG] Slow Read Performance with MooseFS FUSE/SMB export [BUG] MooseFS FUSE/SMB export forces DIRECT I/O mode Apr 23, 2024
@anon314159 anon314159 changed the title [BUG] MooseFS FUSE/SMB export forces DIRECT I/O mode [BUG] FUSE mount SMB export forces DIRECT I/O mode Apr 23, 2024
@anon314159 anon314159 changed the title [BUG] FUSE mount SMB export forces DIRECT I/O mode [BUG] FUSE mount forces DIRECT I/O mode with Samba Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant