Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with names starting with number and including weird chars #92

Open
berndbischl opened this issue Apr 21, 2014 · 8 comments
Open

Comments

@berndbischl
Copy link
Owner

Well, I said this multiple times in the beginning, but we now have problems with names in the data objects.

One example are the algorithms starting with numbers.

R complains in multiple places about this and produces errors.

So? What now?

Basically the only solution is to directly rename these after parsing the files.
But then we can also rename in the original files, because:
a) to spare other users of other languages potentially the same problem
b) our official site would otherwise display named that do not match the ones in the data files

@mlindauer
Copy link
Collaborator

sorry, but this indeed a stupid feature of R.
There is no way to handle strings as strings regardless of the characters?

If we have to rename the algorithms (and instances),
we should do it only internally,
i.e., mapping after parsing the files and map back for the results.
Maybe you could use some kind of hashing?

We cannot justify why we have to rename in particular algorithms.
(independent whether the results are on the homepage or in the paper).
The authors of the algorithms could be upset about a renaming.

Cheers,
Marius

@berndbischl
Copy link
Owner Author

sorry, but this indeed a stupid feature of R.
There is no way to handle strings as strings regardless of the characters?

So what? llama is in R, mlr too, and I mentioned this. I am pretty sure stuff like that happens in other modelling languages as well.

If we have to rename the algorithms (and instances),
we should do it only internally,

I am not programming this, as this is extremely annoying and error-prone.

But the repo is open, so be my guest and do this?

@berndbischl
Copy link
Owner Author

PS:
It also does not work, as sometimes we would have to change the labels before we go into a specific function that would be used for display.

The is then no "place" / "time" to code back.

An example is the scatter plot matrix code we produce. That does not work if the algo names start with eg "2".

@berndbischl
Copy link
Owner Author

I will for now call this function on all names that go into plotting or modelling:

https://stat.ethz.ch/R-manual/R-devel/library/base/html/make.names.html

Note that this results in this here:

>  make.names("2algo")
[1] "X2algo"

@larskotthoff
Copy link
Collaborator

I agree with Bernd that we should have restrictions on the names. While the particular restrictions in R are maybe a bit too restrictive, we can't allow arbitrary strings. Apart from avoiding possible memory issues, we're also making life much easier for everybody who works with the format (think embedding Javascript in the name of an algorithm and then putting it on a website...)

I would propose to restrict names to [A-Za-z0-9-_]{1,64} and internally prefix features with "feature-" and algorithms with "algorithm-".

@berndbischl
Copy link
Owner Author

We cannot internally prefix, because this would then go into plots and tables....
See my note above.

What you propose is totally OK for R (or probably any other language or tool), we just cannot start with a number.

@larskotthoff
Copy link
Collaborator

I see your point, but also Marius' that we can't just rename the algorithms/solvers. Prefixing is something you can see easily enough when looking at the plots/tables and has the additional benefit that we always know what something is by looking at its name.

@berndbischl
Copy link
Owner Author

But we

a) need to make is shorter then. a_ and f_

b) disable it if it is not needed. either automatically or by a user option.

this might be a good combo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants