Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate legacy CNID backends #508

Closed
rdmark opened this issue Oct 2, 2023 · 17 comments · Fixed by #528
Closed

Deprecate legacy CNID backends #508

rdmark opened this issue Oct 2, 2023 · 17 comments · Fixed by #528

Comments

@rdmark
Copy link
Member

rdmark commented Oct 2, 2023

Presently, netatalk3 has code for the following CNID backends:

  • cdb
  • dbd
  • last
  • mysql
  • tdb

The tdb backend is called out as deprecated in the docs.
The last backend isn't recommended for general use. Sharing a read-only file system like a CD-ROM seems to the narrow usecase.
The mysql backend is poorly documented, and I haven't really tested it yet. Is it fully functional and reliable?
The cdb backend seems to be the historical default, before v2.1.

Are there specific usecases for these four backends that warrant keeping either of them (considering the maintenance overhead, attack vectors, etc.)?

@rdmark rdmark added this to 3.2.0 Oct 2, 2023
@rdmark rdmark moved this to Todo in 3.2.0 Oct 2, 2023
@slowfranklin
Copy link
Member

slowfranklin commented Oct 2, 2023 via email

@ghost
Copy link

ghost commented Oct 2, 2023

Now that would simplify things considerably. If dbd works with sharing RO filesystems I would go for one backend only. (@slowfranklin , @rdmark do you know if this is the case?). If not I'd be happy to axe cdb and tdb and continue with just 2 (dbd and last) backends based on Berkeley DB, especially as it is still being maintained by Oracle. Do you know why Debian and other linux distros seem to stick to version 5 in their package managers?** Surely we can simplify the BDB macro to remove support for pre-5 versions? Happy to work on this once consensus is reached...

**EDIT: It's due to a licensing change after Version 5

@rdmark
Copy link
Member Author

rdmark commented Oct 2, 2023

Now that would simplify things considerably. If dbd works with sharing RO filesystems I would go for one backend only.

I ran a simple test: In Linux mount an iso image as a block device.

dmark@macuntu:~$ sudo mount -o loop ./PrinceCDColl.iso /mnt/cdrom/
mount: /mnt/cdrom: WARNING: source write-protected, mounted read-only.

Configure a shared volume with /mnt/cdrom ... using the dbd backend connect with AFP and interact with the shared volume. Seems to work as expected without any explicit configuration in afp.conf -- file system can be read but not written to.

So... at a glance we don't really need the last backend...?

@cdevers-es
Copy link

Hello again.

As noted in the discussion for #493, my employer until recently supported using Netatalk/AFP for access to our shared storage product.

In our case, we offer a distributed solution where clients could connect to any of several servers to access the storage pool. We found that this wasn't always reliable for AFP clients, because users “Alice” & “Bob” might be working on the same volume, but getting to it via separate storage nodes, which led to AFP/CNID problems, because neither of them was able to see locks being created by the other user.

Switching to MySQL mitigated this problem, because then there was a single source of truth for all of the nodes in the group to synchronize with, and therefore the AFP clients had far fewer problems with invalid CNID information.

Eventually, as noted in #493, we solved our AFP problems by dropping AFP support and removing Netatalk (v3), so at this point this is of historical interest, at least for us. But if you have anyone else providing AFP access to distributed storage, they too might be using a dedicated MySQL host to manage this.

(That said, if the MySQL support was an experimental proof-of-concept, then that’s an argument against keeping it. We were certainly using it in production, and it seemed to be fine at the time for our needs, but ¯\(ツ)/¯.)

@rdmark
Copy link
Member Author

rdmark commented Oct 3, 2023

@cdevers-es Thanks again for sharing your insights and (historical) use cases! It's very valuable to learn that the MySQL backend has been used in a production setting. I assume that you didn't run into any critical bugs that you can think of, since you haven't mentioned catastrophic data loss yet? ;)

MySQL being a much more mainstream piece of technology is another reason for potentially keeping that backend. In fact, I found out the hard way that Alpine Linux deprecated their Berkeley DB (v5) package with v3.13 in 2021, leaving you to having to build BDB from scratch on that OS. The more forward-looking OSes may follow suite in the future. Going one step further and making the MySQL backend the default one will arguably future-proof Netatalk.

In this alternative scenario I propose we make MySQL the default, and keep dbd as the legacy fallback, disabled by default in order to remove the hard dependency on Berkeley DB. Deprecate the other 3 backends.

Thoughts?

@rdmark
Copy link
Member Author

rdmark commented Oct 3, 2023 via email

@slowfranklin
Copy link
Member

slowfranklin commented Oct 3, 2023 via email

@cdevers-es
Copy link

@cdevers-es Thanks again for sharing your insights and (historical) use cases! It's very valuable to learn that the MySQL backend has been used in a production setting. I assume that you didn't run into any critical bugs that you can think of, since you haven't mentioned catastrophic data loss yet? ;)

There’s always bugs with something somewhere, but we muddle through. Such is life. :-)

Over the years, we’ve supported a number of protocols for accessing our storage: AFP, SMB, NFS, FTP, etc. They all have pros & cons.

For a while there, AFP was a promising option for Mac users, mainly because it seemed to support better throughput than SMB, its primary mainstream alternative. But AFP always seemed to be a little …glitchy.

  • For example, if a server filled up, or if share hit its group quota limit & the CNID database file belonged to that group, then AFP users would “mysteriously” be locked to read-only access until a Samba user connected to free up some space, or an admin did a chgrp on the CNID database to cope with the quota problem. Changing our system to have the CNID database be owned by some other group by default helped avoid the quota threshold problems, but didn't change the fact that if the storage itself filled up, the AFP users had no ability to resolve this problem.
  • Sometimes the CNID database would just “get corrupt”, and the easiest remedy tended to be to just remove the previous database file and reindex the share from scratch. As far as we could tell, this didn't seem to lead to data loss that anyone ever noticed, to the point that I was sometimes tempted to just do this preventatively in a cron script.
  • When distributed storage became a thing, the previously-noted problems with the Berkeley DB files came up (users on server A wouldn't see locks held by users connected to server B, etc), which pushed us to move the CNID database to MySQL instead. This made the overall system quite a bit more complex, but it seemed to be more robust, most of the time.

But then Apple started removing support for AFP, and their SMB support got much better. And after a couple of years, we found that the majority of our Mac customers weren’t using AFP anymore, so by the time we decided to remove support for it, we had very little pushback about the change.

MySQL being a much more mainstream piece of technology is another reason for potentially keeping that backend. […]

In this alternative scenario I propose we make MySQL the default, and keep dbd as the legacy fallback, disabled by default in order to remove the hard dependency on Berkeley DB. Deprecate the other 3 backends.

Thoughts?

Another thing to consider is that MySQL is considerably more complex than Berkeley DB.

With the latter, it's just a monolithic file, and the software handles all interaction with it. For someone running a simple turnkey home or office file server, this is a pretty painless procedure to install, set up, and operate. You can go years without even realizing that Netatalk even uses such database files. (I certainly did just that.)

Moving to MySQL means setting up a proper Relational Database Management Server (tm), possibly on a separate host, which brings in the complexity of networking, user access, security, and general database administration. It's not necessarily rocket science to do all this, but it's a much steeper learning curve than something like a BDB or SQLite file: it won't be possible to stand up a new Netatalk instance from scratch without forcing new admins to contend with at least some of this complexity.

Alternatively, since SQLite is widely used, and public domain, it might provide a compelling alternative to BDB that could not be harmed in the future if it were to be acquired by a disinterested parent company; as public domain software, nobody can “own” it like that. But SQLite is about as architecturally simple as Berkeley DB — they’re both just a “flat file”. And since SQLite supports a SQL dialect, it might make Netatalk’s CNID database management code easier to maintain, being naturally closer (or even identical) to the SQL used for MySQL/MariaDB, rather than the non-SQL syntax needed for interaction with BDB.

@Michael-Wohlstadter
Copy link

The complexity of the MySQL setup and configuration is an important point. What is the primary demographic of our user base? Is it mostly businesses that have the in house expertise or access to such expertise? Or is it hobbyists and home users?

As to a file based backend, I second the consideration of SQLite. My primary database server is PostgreSQL for the spatial data work that I do. But I have found SQLite to be a remarkably functional substitute for use case environments that don't support a database server.

@rdmark
Copy link
Member Author

rdmark commented Oct 4, 2023

I withdraw my suggestion to make MySQL the default backend. It's critical to keep the default configuration seamless and self-contained. In my mind the primary userbase for netatalk is the latter category: hobbyists and home users. We know from community feedback that some people still run enterprise deployments of netatalk but I expect that share to decrease as older Macs get decommissioned.

FWIW the now-defunct netatalk-classic fork had a partially working sqlite backend last year. The fork has been purged from the internet unfortunately so we can't study the code.

@rdmark
Copy link
Member Author

rdmark commented Oct 11, 2023

FWIW the integration tests use the last backend for testing. Cf. /test/afpd/test.sh

@ghost
Copy link

ghost commented Oct 11, 2023

How about we make a start on the backends by removing the already deprecated CDB code?

@ghost
Copy link

ghost commented Oct 11, 2023

Where I at with the backends at the moment is to remove CDB, TDB and last, and have DBD as the default with MySQL as alternative. I searched the Internet Archive for a copy of the last netatalk-classic release but was only able to get a copy of the 2020 pre-sqlite code. I'll contact Mr Kobayashi to see if he's happy to share his old code (if he still has it!)

@Michael-Wohlstadter
Copy link

@rdmark
Copy link
Member Author

rdmark commented Oct 12, 2023

@dgsga I think we should keep last for now since it's low maintenance, and used by the integration test suite.

Otherwise I agree with your plan!

@ghost ghost self-assigned this Oct 14, 2023
@ghost
Copy link

ghost commented Oct 14, 2023

OK, leave it with me. We'll have dbd (default), last and MySQL

@ghost
Copy link

ghost commented Oct 14, 2023

@dgsga Is the code here of use? https://codeberg.org/cryu/netatalk-classic/src/branch/netatalk-classic/libatalk/cnid/sqlite

Great find, we will import the sqlite code to a branch here so we can work on it. Any code contributions are very welcome!!

@ghost ghost linked a pull request Oct 14, 2023 that will close this issue
@ghost ghost closed this as completed in #528 Oct 15, 2023
@github-project-automation github-project-automation bot moved this from Todo to Done in 3.2.0 Oct 15, 2023
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants