Performance #24

ryanstout · 2014-08-05T23:18:16Z

Sorry to be this guy, but I just wanted to get some thoughts from someone with more experience with fuse. I've done a mirror file system (as a test), and I'm seeing write performance at about 1/10th what it is to write directly to the disk (for a folder of 3mb images) Would direct_io help here? I saw where it got disabled. I tried increasing the block size, but that didn't seem to make much different. I'm on osx using osxfuse.

Thanks,
Ryan

hesamrabeti · 2014-08-05T23:26:20Z

What's your code look like?

On Wed, Aug 6, 2014 at 1:18 AM, Ryan Stout [email protected] wrote:

Sorry to be this guy, but I just wanted to get some thoughts from someone
with more experience with fuse. I've done a mirror file system (as a test),
and I'm seeing write performance at about 1/10th what it is to write
directly to the disk (for a folder of 3mb images) Would direct_io help
here? I saw where it got disabled. I tried increasing the block size, but
that didn't seem to make much different. I'm on osx using osxfuse.

Thanks,
Ryan

—
Reply to this email directly or view it on GitHub
#24.

tv42 · 2014-08-05T23:32:55Z

Yup, pretty much a known problem, and work to be done. I won't be personally happy until I'm bottlenecked by actual IO, for large files.

I'm personally seeing ~75 MB/s writes and 15 MB/s reads on a fairly complex app, so anything below that is your own fault ;) I've been meaning to write a simpler benchmark, but profiling currently puts the blame pretty hard on the FUSE kernel communication.

Avoiding OpenDirectIO helps, as it lets the kernel manage a writeback cache.

I wrote quick notes on this in the Bazil group, see item 5 in https://groups.google.com/d/msg/bazil-dev/z-PtgA84f-o/BP9u2_Ko0VQJ

To be improved.

ryanstout · 2014-08-05T23:54:33Z

Humm,
I'm seeing about 17MB/s for writes, so it must be something on my end. I'm fairly new to golang, so that might be part of it. Is your ~75MB/s on OSX?

Here's the code I'm running:
https://gist.github.com/ryanstout/1e2ed8b19af8e84fe504

Again, the files I'm copying in is a directory of 3MB images. (1.4gig's worth)

Thanks for the help. If I could get it up to 75MB/s, that would be great.

ryanstout · 2014-08-05T23:58:40Z

I should mention also that even if I disable the actual writing to the disk (which is an SSD), it still only does about 20MB/s per sec.

hesamrabeti · 2014-08-06T14:05:58Z

hmmm... things look pretty normal. Have you tried doing a CPU profile?

On Wed, Aug 6, 2014 at 1:58 AM, Ryan Stout [email protected] wrote:

I should mention also that even if I disable the actual writing to the
disk (which is an SSD), it still only does about 20MB/s per sec.

—
Reply to this email directly or view it on GitHub
#24 (comment).

ryanstout · 2014-08-06T15:31:13Z

@hesamrabeti Humm. Was the 75MB/s was on osx?

tv42 · 2014-08-06T16:39:27Z

My measurements, and development, are primarily on Linux.

ryanstout · 2014-08-07T17:21:10Z

@tv42 So do you think the performance could be osx related? Also, would it make any difference that I'm not calling methods with pointers? Thanks for the help.

ryanstout · 2014-08-07T17:58:24Z

@tv42 @hesamrabeti One other question. So I read in a few places that for larger files (what I'm dealing with), increasing the block size will improve performance quite a bit. I noticed though when setting -fuse.debug=true that everything's coming in as a 4096 block. I see that for osx the block size seems to be hard coded as 4096

https://github.com/bazillion/fuse/blob/9802bb510ca4cd1c18ffc840cf6fe9ef5d1546a8/mount_darwin.go#L63

Would someone mind making this an option passed to mount? Thanks!

tv42 · 2014-08-07T18:28:23Z

That can't be changed unilaterally there, the receiving buffer also needs to be resized, see 9802bb5.

The incoming buffer management needs to change, once that's better the size from there can update iosize= too. And for the incoming buffer management, the linux side really wants to switch to vmsplice, so there's a bit more work to be done.

I'm writing simple benchmarks right now, just to get an idea of what the current state is, and to be able to measure any potential improvements.

ryanstout · 2014-08-07T19:38:35Z

@tv42 Cool, thanks. I would love to use bazillion/fuse for a project, but we're using it to store photos (on osx), so I need it to be faster in order for it to make since. Thanks a bunch for the help.

ryanstout · 2014-08-07T20:09:33Z

@tv42 Hopefully this helps incentivize some work on it. Thanks a bunch:

tv42 · 2014-08-12T01:32:17Z

Commit 0f430c9 adds simple benchmarks, to keep track of any improvements. See the commit message for current expected numbers. OS X is currently a mystery; I don't know if the problem is just my Mac Mini, or all of OS X / OSXFUSE.

tv42 · 2014-08-12T23:01:20Z

Recording here for posterity: OS X performance work is hindered by kernel hangs:
macfuse/macfuse#153

tv42 · 2014-08-14T17:30:16Z

@ryanstout Can you try eccde64 and see if that helps? It bumps up the maximum write size, and dropped the syscall overhead for my workload significantly. This is nowhere near the end of the story, but a good start..

eccde64 (HEAD, github/master, master) Use the 128 KiB kernel receive buffer; set fuse.InitBigWrites
fdc933e Increase kernel receive buffer size to 128 KiB
f3d63ea Pass kernel receive buffer size as OS X mount option iosize
694d252 Enforce outgoing InitResponse.MaxWrite values to fit in buffer
df75b48 Saner identifiers for kernel receive buffer size limits
2ca794c Add known values for InitFlags

tv42 · 2014-08-14T19:45:08Z

Here's 3 concurrent writers writing into a full-blown filesystem where most of the CPU cost is in hashing and crypto:

  WRITE: io=3072.0MB, aggrb=649273KB/s, minb=216424KB/s, maxb=218362KB/s, mint=4802msec, maxt=4845msec

650 MB/s ain't too bad ;)

(Truth in advertising: that's probably cheating by not pushing all the data down to FUSE in time, before the benchmark ends. Slapping an fsync at the end gives a more modest 200 MB/s.)

I don't have fio set up on OS X, but a really stupid cat of a 100 MiB file says 100 MiB/s write speed is within reach even for a more complex file system, without much effort.

ryanstout · 2014-08-14T19:52:55Z

@tv42 yea, 1/3rd the time of before. Nice work. That puts it where I need performance wise. Thanks a bunch for working on that. Feel free to claim the bountysource if you want.

tv42 · 2014-08-17T19:00:52Z

Alright, since this ticket never really stated specific numbers or specific changes, and the original reporter seems happy enough, I'm gonna close this as good enough.

Ongoing work with a specific goal still left in #35 (Linux only). As part of that, I may introduce a sync.Pool for the buffers used, even on OS X; that'll help a little bit more.

FUSE still has plenty of overhead, but it's role is definitely shrinking in my CPU profiles, to a point where I personally get more gains from optimizing the FS logic itself. Once again, that's Linux, and afaik OS X CPU profiling is busted.

tv42 closed this as completed Aug 17, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance #24

Performance #24

ryanstout commented Aug 5, 2014

hesamrabeti commented Aug 5, 2014

tv42 commented Aug 5, 2014

ryanstout commented Aug 5, 2014

ryanstout commented Aug 5, 2014

hesamrabeti commented Aug 6, 2014

ryanstout commented Aug 6, 2014

tv42 commented Aug 6, 2014

ryanstout commented Aug 7, 2014

ryanstout commented Aug 7, 2014

tv42 commented Aug 7, 2014

ryanstout commented Aug 7, 2014

ryanstout commented Aug 7, 2014

tv42 commented Aug 12, 2014

tv42 commented Aug 12, 2014

tv42 commented Aug 14, 2014

tv42 commented Aug 14, 2014

ryanstout commented Aug 14, 2014

tv42 commented Aug 17, 2014

Performance #24

Performance #24

Comments

ryanstout commented Aug 5, 2014

hesamrabeti commented Aug 5, 2014

tv42 commented Aug 5, 2014

ryanstout commented Aug 5, 2014

ryanstout commented Aug 5, 2014

hesamrabeti commented Aug 6, 2014

ryanstout commented Aug 6, 2014

tv42 commented Aug 6, 2014

ryanstout commented Aug 7, 2014

ryanstout commented Aug 7, 2014

tv42 commented Aug 7, 2014

ryanstout commented Aug 7, 2014

ryanstout commented Aug 7, 2014

tv42 commented Aug 12, 2014

tv42 commented Aug 12, 2014

tv42 commented Aug 14, 2014

tv42 commented Aug 14, 2014

ryanstout commented Aug 14, 2014

tv42 commented Aug 17, 2014