Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to CLD2 #9

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Upgrade to CLD2 #9

wants to merge 7 commits into from

Conversation

cbandy
Copy link

@cbandy cbandy commented May 30, 2014

See #8.

Before this is merged, we should update our licensing. The library has changed to the Apache license.

The size of the bundled library has grown significantly. The source itself is over 90 MiB. The gem is now 35 MiB (up from 6 MiB) and installed it uses 93 MiB (up from 17 MiB). If CLD2 ever releases a tarball, we can stop bundling it here and shrink the installed size to 2 MiB.

There are two possible CLD2 libraries to link against: libcld2.so and libcld2_full.so. The latter can detect twice as many languages and is 4 MiB larger. I arbitrarily chose the former, smaller library in this PR. Which would you prefer to be used by default? In either case, we can also make this configurable during gem install.

cbandy added 6 commits May 29, 2014 02:04
Repository Root: http://cld2.googlecode.com/svn
Repository UUID: b252ecd4-b096-bf77-eb8e-91563289f87e
Revision: 161
- Use ExtDetectLanguageSummary
- Build shared object
- Update tests:
  - Chinese (Traditional) is now "zh-Hant"
  - "Unknown" is no longer "reliable", see
    https://code.google.com/p/cld2/issues/detail?id=1
- Makes the installed gem about 4% smaller.
- The shared object is installed to lib.
- Installs the shared object in lib.
- A test cycle now takes one Rake command:
  `rake clean compile spec`
- Defaults to the smaller version of CLD2.
- Allow cleanup of temporary files to be disabled.
@jtoy
Copy link
Owner

jtoy commented Jun 3, 2014

wow, that is a very large gem! is there any way we can reduce this? 6mb was already too much.

@cbandy
Copy link
Author

cbandy commented Jun 4, 2014

I found that some of the CLD2 source files are not necessary to build the libraries. The gem is now 17 MiB and installed uses 46 MiB. If we commit to just one of libcld2.so or libcld2_full.so, we can reduce this further.

The unavoidable fact is that the source contains large tables of pre-computed n-grams. cld2_generated_quad0122.cc is required to build libcld2_full.so and is 27 MiB. Gems are already compressed, so minimizing the number of these source files in the shipped gem is the only way to save bits.

If CLD2 were to release an archive/tarball, we could ship zero source files and download it before compiling the extension using something like mini_portile.

I looked into downloading bare files from the project repository, but we either need to

  1. depend on more tools (e.g. svn or wget) or
  2. maintain something approaching their complexity or
  3. maintain a list of source files/URLs to download.

@cbandy
Copy link
Author

cbandy commented Jun 4, 2014

Another option is to ship binary/pre-compiled gems. At first pass, it looks like the smaller gem would be less than 2 MiB and the larger would be less than 5 MiB.

I don't have any experience releasing a binary gem.

@mattdoller
Copy link

Any chance there has been any progress or updates with this? I'd love to help out with this if possible.

@adityapatadia
Copy link

I would also like to contribute. Let's solve this issue asap. This issue p is pending for more than a year just because of size of CLD.

@adityapatadia
Copy link

Here is similar implementation in JavaScript. We can take cues from that: https://github.com/dachev/node-cld

@craig-day
Copy link
Collaborator

@jtoy can we reconsider this? The gem did get larger, but so did the source library. I don't think there is a clean way to avoid this and still allow anyone to use the gem.

@mmahalwy
Copy link

any update on this?

@grosser
Copy link
Collaborator

grosser commented Oct 11, 2015

@craig-day can you merge and release this ?

@craig-day
Copy link
Collaborator

I'll take a look hopefully tomorrow or Monday morning at the latest.

On Oct 10, 2015, at 8:38 PM, Michael Grosser [email protected] wrote:

@craig-day can you merge and release this ?


Reply to this email directly or view it on GitHub.

@cbandy
Copy link
Author

cbandy commented Oct 11, 2015

CLD2 project has moved to https://github.com/CLD2Owners/cld2/

@craig-day
Copy link
Collaborator

@cbandy is this still ready to go? I'd like to merge and release a new major version.

@cbandy
Copy link
Author

cbandy commented Jun 27, 2016

It has been a long time since I looked at this.

  • Something still needs to be done about the licensing.
  • The project moved, so any links should updated. I see one in the README.
  • Should we pull in any changes to CLD2 since May 2014, if any?

If CLD2 were to release an archive/tarball...

I still don't see a tarball; at least not one provided by GitHub tags/releases.

I looked into downloading bare files from the project repository...

Maybe this is more reasonable now that it hosted in Git? I forget how common it is for Gem installers to have git available.

@cbandy
Copy link
Author

cbandy commented Jun 27, 2016

Should we pull in any changes to CLD2 since May 2014, if any?

This appears to be the revision/commit that I imported in this PR: CLD2Owners/cld2@d076f5e

@craig-day
Copy link
Collaborator

@cbandy can you update the readme link and pull in any new changes? I'm not sure if the tarball is a concern right now. I'd rather avoid a git dependency because not all places gems get installed have git (like production servers).

@craig-day
Copy link
Collaborator

As far as licensing, I think you can copy the apache license from the CLD2 owners. It looks like our original license was just copied from them anyway.

@guilleiguaran
Copy link

guilleiguaran commented Jun 11, 2018

@cbandy I don't think this project will be updated, I suggest you to release your code as a new cld2 gem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants