Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance #24

Closed
ryanstout opened this issue Aug 5, 2014 · 18 comments
Closed

Performance #24

ryanstout opened this issue Aug 5, 2014 · 18 comments

Comments

@ryanstout
Copy link

Sorry to be this guy, but I just wanted to get some thoughts from someone with more experience with fuse. I've done a mirror file system (as a test), and I'm seeing write performance at about 1/10th what it is to write directly to the disk (for a folder of 3mb images) Would direct_io help here? I saw where it got disabled. I tried increasing the block size, but that didn't seem to make much different. I'm on osx using osxfuse.

Thanks,
Ryan

@hesamrabeti
Copy link

What's your code look like?

On Wed, Aug 6, 2014 at 1:18 AM, Ryan Stout [email protected] wrote:

Sorry to be this guy, but I just wanted to get some thoughts from someone
with more experience with fuse. I've done a mirror file system (as a test),
and I'm seeing write performance at about 1/10th what it is to write
directly to the disk (for a folder of 3mb images) Would direct_io help
here? I saw where it got disabled. I tried increasing the block size, but
that didn't seem to make much different. I'm on osx using osxfuse.

Thanks,
Ryan


Reply to this email directly or view it on GitHub
#24.

@tv42
Copy link
Member

tv42 commented Aug 5, 2014

Yup, pretty much a known problem, and work to be done. I won't be personally happy until I'm bottlenecked by actual IO, for large files.

I'm personally seeing ~75 MB/s writes and 15 MB/s reads on a fairly complex app, so anything below that is your own fault ;) I've been meaning to write a simpler benchmark, but profiling currently puts the blame pretty hard on the FUSE kernel communication.

Avoiding OpenDirectIO helps, as it lets the kernel manage a writeback cache.

I wrote quick notes on this in the Bazil group, see item 5 in https://groups.google.com/d/msg/bazil-dev/z-PtgA84f-o/BP9u2_Ko0VQJ

To be improved.

@ryanstout
Copy link
Author

Humm,
I'm seeing about 17MB/s for writes, so it must be something on my end. I'm fairly new to golang, so that might be part of it. Is your ~75MB/s on OSX?

Here's the code I'm running:
https://gist.github.com/ryanstout/1e2ed8b19af8e84fe504

Again, the files I'm copying in is a directory of 3MB images. (1.4gig's worth)

Thanks for the help. If I could get it up to 75MB/s, that would be great.

@ryanstout
Copy link
Author

I should mention also that even if I disable the actual writing to the disk (which is an SSD), it still only does about 20MB/s per sec.

@hesamrabeti
Copy link

hmmm... things look pretty normal. Have you tried doing a CPU profile?

On Wed, Aug 6, 2014 at 1:58 AM, Ryan Stout [email protected] wrote:

I should mention also that even if I disable the actual writing to the
disk (which is an SSD), it still only does about 20MB/s per sec.


Reply to this email directly or view it on GitHub
#24 (comment).

@ryanstout
Copy link
Author

@hesamrabeti Humm. Was the 75MB/s was on osx?

@tv42
Copy link
Member

tv42 commented Aug 6, 2014

My measurements, and development, are primarily on Linux.

@ryanstout
Copy link
Author

@tv42 So do you think the performance could be osx related? Also, would it make any difference that I'm not calling methods with pointers? Thanks for the help.

@ryanstout
Copy link
Author

@tv42 @hesamrabeti One other question. So I read in a few places that for larger files (what I'm dealing with), increasing the block size will improve performance quite a bit. I noticed though when setting -fuse.debug=true that everything's coming in as a 4096 block. I see that for osx the block size seems to be hard coded as 4096

https://github.com/bazillion/fuse/blob/9802bb510ca4cd1c18ffc840cf6fe9ef5d1546a8/mount_darwin.go#L63

Would someone mind making this an option passed to mount? Thanks!

@tv42
Copy link
Member

tv42 commented Aug 7, 2014

That can't be changed unilaterally there, the receiving buffer also needs to be resized, see 9802bb5.

The incoming buffer management needs to change, once that's better the size from there can update iosize= too. And for the incoming buffer management, the linux side really wants to switch to vmsplice, so there's a bit more work to be done.

I'm writing simple benchmarks right now, just to get an idea of what the current state is, and to be able to measure any potential improvements.

@ryanstout
Copy link
Author

@tv42 Cool, thanks. I would love to use bazillion/fuse for a project, but we're using it to store photos (on osx), so I need it to be faster in order for it to make since. Thanks a bunch for the help.

@ryanstout
Copy link
Author

@tv42 Hopefully this helps incentivize some work on it. Thanks a bunch: Bountysource

@tv42
Copy link
Member

tv42 commented Aug 12, 2014

Commit 0f430c9 adds simple benchmarks, to keep track of any improvements. See the commit message for current expected numbers. OS X is currently a mystery; I don't know if the problem is just my Mac Mini, or all of OS X / OSXFUSE.

@tv42
Copy link
Member

tv42 commented Aug 12, 2014

Recording here for posterity: OS X performance work is hindered by kernel hangs:
macfuse/macfuse#153

@tv42
Copy link
Member

tv42 commented Aug 14, 2014

@ryanstout Can you try eccde64 and see if that helps? It bumps up the maximum write size, and dropped the syscall overhead for my workload significantly. This is nowhere near the end of the story, but a good start..

eccde64 (HEAD, github/master, master) Use the 128 KiB kernel receive buffer; set fuse.InitBigWrites
fdc933e Increase kernel receive buffer size to 128 KiB
f3d63ea Pass kernel receive buffer size as OS X mount option iosize
694d252 Enforce outgoing InitResponse.MaxWrite values to fit in buffer
df75b48 Saner identifiers for kernel receive buffer size limits
2ca794c Add known values for InitFlags

@tv42
Copy link
Member

tv42 commented Aug 14, 2014

Here's 3 concurrent writers writing into a full-blown filesystem where most of the CPU cost is in hashing and crypto:

  WRITE: io=3072.0MB, aggrb=649273KB/s, minb=216424KB/s, maxb=218362KB/s, mint=4802msec, maxt=4845msec

650 MB/s ain't too bad ;)

(Truth in advertising: that's probably cheating by not pushing all the data down to FUSE in time, before the benchmark ends. Slapping an fsync at the end gives a more modest 200 MB/s.)

I don't have fio set up on OS X, but a really stupid cat of a 100 MiB file says 100 MiB/s write speed is within reach even for a more complex file system, without much effort.

@ryanstout
Copy link
Author

@tv42 yea, 1/3rd the time of before. Nice work. That puts it where I need performance wise. Thanks a bunch for working on that. Feel free to claim the bountysource if you want.

@tv42
Copy link
Member

tv42 commented Aug 17, 2014

Alright, since this ticket never really stated specific numbers or specific changes, and the original reporter seems happy enough, I'm gonna close this as good enough.

Ongoing work with a specific goal still left in #35 (Linux only). As part of that, I may introduce a sync.Pool for the buffers used, even on OS X; that'll help a little bit more.

FUSE still has plenty of overhead, but it's role is definitely shrinking in my CPU profiles, to a point where I personally get more gains from optimizing the FS logic itself. Once again, that's Linux, and afaik OS X CPU profiling is busted.

@tv42 tv42 closed this as completed Aug 17, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants