-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow to merge sections with the same name from different sources #11
Comments
I think I understand the use case, and I've had a similar case myself (also concerning paths -- maybe this shows that path configuration often has that case) that I never gotten around to implementing in LayeredConfig, primarily because I couldn't come up with a simple API. As I understand your suggestion, you have a new "merge" argument to the LayeredConfig constructor that specifies which config keys should be merged instead of replaced. As you write, merging could be done in different ways (either create a flattened list out of several lists, or just tie them together in a nested list). And there could be other methods for consolidating multiple values, such as adding integer values or concatenating string values. I think that specifying such a "consolidation strategy" for a particular config key is similar to specifying it's type, and wonder if the following API would make sense. Assuming the following: ~/.config/tool/config.yml
./config.yml
code:
I.e. the specified consolidators would be methods that take two versions of a config value (the first being the version with lower precedence), and returns a consolidated version. The default consolidator would be
I think that the above API is very flexible with regards to how a merge/consolidation should be performed. I worry that it might be overly complex though (An API requiring functions as arguments, and giving examples using lambdas is not exactly simple). What do you think? |
I also thought about other values but didn't came up with something. Your suggestion is definitely a pretty neat solution and in combination with the default consolidator a simple one to use in my opinion. Way better than dictating a merge strategy. Honestly I wouldn't simplify anything here. If that is still needed though I could think of providing some default strategies similar to what you did with def find_latest_timestamp(previous_source, next_source):
# datetime comparison
_CONSOLIDATORS= {
'int_values': 'add',
'paths': 'merge',
'simplelist': 'flatten',
'pathstring': lambda x,y: x + ";" + y, # simple special cases
'timestamps': find_latest_timestamp, # more complex special case
}
# ...
cfg = LayeredConfig(*sources, consolidators=_CONSOLIDATORS) Internally you differentiate between strings and callables and map the strings to default functions. That way a user does not need to fully understand how to work with consecutive sources. At least in simple cases which might be enough most of the time. I think a more difficult issue is handling changes to these values. In my local implementation I hadn't implemented something like that just yet. Some ideas that come to my mind are
# setup other sources
sources.append(YAMLFile(..), name='somename')
cfg = LayeredConfig(*sources)
# make your changes
cfg.paths = cfg.paths + ['/even/more/paths']
# access individual sources
target = cfg.sources.somename
# and save them
LayeredConfig.write(cfg, target_source=target)
[edit] |
Your suggestion to have a set of default consolidators make a lot of sense. Maybe they could be provided as functions by the main module, so that you could handle them like constants, something like
This makes it clear to the intermediate user that these "constants" are just callables that accepts two arguments and returns one. Regarding what to do when a consolidated value is changed, I think it would be least surprising if the current, highest-precedence-writable-source, approach is used. The issue when changing a value to a different type than it's original already exists (i.e. if you specify that a value should be a str, then change it to a list in a initialized config object), and it's not clear to me that it should be handled any differently than now. Is "consolidated" the best term for what we're discussing here? Maybe "merged", which you suggested in the title of this issue, is just as descriptive and easier to understand? |
Before I also thought about importing the callables from a submodule. However if we use predefined strategies anyway we could simplify their usage even more by just using strings. Though the advantage of importing callables is that one can simply inspect them with something like ipdb. In the end I am absolutely fine with both ways. In terms of handling changes you are right. There is no need to change anything if that functionality is already in place. I wasn't sure about the current state. So let's forget about my concerns. Now for the naming part. I would not call it "merged" as this is an adjective. Maybe "mergers" although I wouldn't go for merge in general as it implies that the values are combined somehow. This however contradicts the case of simply shadowing them which is the default behavior. Other than that I thought about what context we are in. It is about configuration settings, files, applications and development in general and as such I could think of words like "strategies" or "processors". Sounds a bit generic but in the end we are doing exactly that; processing values with different strategies. So personally I would go for "strategies" which clearly communicates that we are doing the same thing (handling values) in different ways. Although it requires the user to know what they are for which should be the case if he wants to use them in the first place. In the end I really have no strong feelings one way or the other. I am absolutely happy if this feature makes it into layeredconfig. :-) from layeredconfig import LayeredConfig, strategy
_STRATEGIES= {
'int_values': strategy.add,
'paths': strategy.merge,
'simplelist': strategy.flatten, ...
}
config = LayeredConfig(*sources, strategies=_STRATEGIES) [edit] |
This is starting to look like a solid extension of the API -- thank you very much for the productive discussion! I'll try to write some tests and see if it can be implemented, and if so, release it as version 0.4.0. Due to time constraints, I won't be able to do it within the next few weeks, but hopefully before year's end. PR's are welcome in the meantime, of course... :-) |
You are most welcome. I enjoyed the discussion and am eager to hop into the code. Let's see if I can get something up and running that is more robust than my current modifications. |
I had a look into it and tried to prepare some tests. While doing that I found pytest to be easier to read and use for debugging purposes. However it does not work without small adjustments to the test suite. Out of curiosity what do you think about pytest? Personally I am huge fan of pytest as it reduces much of the boilerplate needed with unittest. Is it an option for you? |
I would prefer to use only whats included in the stdlib, as every dependency creates some risk (in particular, that it won't support all platforms and versions that layeredconfig itself supports). But since pytest would be a developer-only dependency and very well supported on all platforms (I guess?), I could consider using it. What kind of adjustments were needed? |
I never had any issues with pytest together with any python version or os platform. According to its tox file it also looks very compatible. If you are not familiar with pytest it basically searches for standalone functions or method starting with "test". Additionally for methods the containing class itself has to start with "Test". Subclassing from With unittest this does not make a difference as none of the mixins subclass I pushed a branch to make communication a bit easier. :-) |
Looks good, please feel free to open as a pull request!
Best regards,
Staffan
2017-03-10 11:27 GMT+01:00 hakkeroid <[email protected]>:
… I never had any issues with pytest together any python version or os
platform. According to its tox file
<https://github.com/pytest-dev/pytest/blob/master/tox.ini> it also looks
very compatible.
If you are not familiar with pytest it basically searches for standalone
functions or method starting with "test". Additionally for methods the
containing class itself has to start with "Test". It is not required though
that it has to subclass unittest.TestCase. That way pytest supports but
is not restricted to unittest-style test classes.
Now in the test suite there are two classes (TestLayeredConfigHelper
<https://github.com/staffanm/layeredconfig/blob/3f67c1/tests/test_layeredconfig.py#L39>
and TestConfigSourceHelper
<https://github.com/staffanm/layeredconfig/blob/3f67c1/tests/test_layeredconfig.py#L144>)
which are not meant to be executed directly but work as mixins. However due
to their name pytest collects and executes them, too. Specifically with
TestConfigSourceHelper this is an issue because the methods start with
"test" and will fail by reason of the missing self.simple and self.complex
variables. So I renamed both helper classes to LayeredConfigHelperTests and
ConfigSourceHelperTests.
With unittest this does not make a difference as none of the mixins
subclass unittest.TestCase anyway.
I pushed a branch
<https://github.com/staffanm/layeredconfig/compare/master...hakkeroid:feature/rename-tests-to-enable-pytest?expand=1>
as it makes communication a bit easier. :-)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#11 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AEOs4yTGi3HwUrhaZuH9o2FDHlgcyk0eks5rkSWjgaJpZM4KeHsy>
.
|
I stumbled across a couple of things which are related to internal behavoir. Do you want to discuss them here or do you prefer another channel to keep the comment section clean and readable for others? Besides of implementation details I wondered how to best approach strategies on nested keys. We could either define that a key in the strategy map is applied globally (so on any nested level) or that we use something like dots to denote the nesting where to apply the rule. I.e.: _STRATEGIES= {
'int_values': strategy.add,
'paths.homdir': strategy.merge,
'mymodule.nested.simplelist': strategy.flatten, ...
} |
Hi staffanm, sorry for the lengthy comment. However I want to be as clear as possible about my reasoning. tl;dr [edit] I created a repository with the conceptual code. Long version from layeredconfig import YamlSource
cfg = YamlSource('/path/to/cfg.yml')
assert cfg.mymodule.extra == True After the foundation was layed out the next step was to reintroduce layering. Because both concepts - handling a source and layering multiple sources - were separated from each other, adding the final solution for folding over the values (merge stategies) was easy to do. from layeredconfig import LayeredConfig, YamlSource, strategies
cfg = LayeredConfig(
INISource('/path/to/cfg.ini'),
YamlSource('/path/to/cfg.yml'),
strategies={'paths': strategies.collect}
)
assert cfg.paths == [['/path/from/yaml1', '/path/from/yaml2'], ['/path/from/ini']] Also it was fairly easy to introduce another feature. For example providing a custom type-conversion map which I definitely would have opened another issue for, too. from pathlib import Path
from datetime import datetime
from layeredconfig import YamlSource
cfg = YamlSource('/path/to/cfg.yml',
type_map={
'homedir': lambda p: Path(p),
'date': lambda d: datetime.strptime(d, '%Y-%m-%d')
}
)
assert cfg.homedir == Path('/path/from/yaml1')
assert cfg.extra.date == datetime(2017, 4, 4) Meanwhile I also could fix (at least in my opinien) a couple of smaller inconveniences. For example the dump-feature which requires the config object as a parameter although it can be called from the config itself. from layeredconfig import YamlSource
cfg = YamlSource('/path/to/cfg.yml')
# currently
assert cfg.dump(cfg.mymodule) == {'extra': True}
# also the following statement does not what one might expect
assert cfg.mymodule.dump(cfg) == {'extra': True} # fails, as it returns {'mymodule': {'extra': True}}
# in this version
assert cfg.mymodule.dump() == {'extra': True}
assert cfg.dump() == {'mymodule': {'extra': True}} I also added dictionary-like access without the need for dumping the config beforehand. That should simplify accessing values programmatically. from layeredconfig import YamlSource
cfg = YamlSource('/path/to/cfg.yml')
assert cfg.mymodule['extra'] == True
assert cfg['mymodule'].extra == True Additionally the keys are resolved in a lazy fashion, so changes to source files are visible in the moment one asks for a key. This could be easily extended with caching. Sources now only need to implement at least one method # source implementation for yaml file
class YamlFile(Source):
def __init__(self, source, **kwargs):
super(YamlFile, self).__init__(**kwargs)
self._source = source
def _read(self):
with open(self._source) as fh:
return yaml.load(fh)
def _write(self, data):
with open(self._source, 'w') as fh:
yaml.dump(data, fh) For the test suite I created a mocked etcd-storage. So there is no need to install an etcd storage on localhost (at least for quick checkouts). Also the yaml and etcd (requests) dependencies are optional, too. So the code checks at runtime whether they are missing. For a personal use case I find that convenient as I am deploying to my raspbian where it is a hurdle to install pyyaml. from layeredconfig import YamlSource
def test_etcd():
cfg = EtcdStore('/bogus/url')
cfg._connector = FakeConnector()
assert cfg.mymodule['extra'] == True
assert cfg['mymodule'].extra == True Test coverage is almost 100%. That's it for now. I hope I did not overwhelm you. Also I did not provide implementations for all sources as I wanted to get your feedback first. As everything was written from scratch I can't provide a simple "branch" to look at. I might create a repository to push stuff into it. However maybe you already have some strong feelings one way or the other. If you think that a merge is not applicable or if you simply do not want to go that route this is fine, too. Just let me know what you think. |
Hi hakkeroid, Apologies for the late answer. I'm very impressed with the amount of effort you've poured into this, and would be happy to merge your changes. I do have some questions though:
Mocking etcd-storage in the unit tests is fine -- we'll still need to keep tests that run against a live etcd instance, but that's really more of an integration test, not a unit test. Again, thank you so much for all the work you've put into this! I really like how it's shaping up! |
Hi staffanm, welcome back and I am very happy to hear that you like the approach. So let me clarify your questions.
assert cfg.mydata['dump'].other_key.dump() == {'subkey': 'value'} For private methods I also considered using double underscores to automatically prepend the class name and further reduce any naming conflicts so that for example
# consider we want to turn a list into a set while we work with it
source_data = {'simple_list': [1, 2, 3]}
# on read: customize it to set
# on write: reset it to list
type_map = {'simple_list': CustomType(customize=set, reset=list)}
cfg = DefaultDict(source_data, type_map=type_map)
assert isinstance(cfg.simple_list, set)
cfg.simple_list.add(3) # does not change it
cfg.simple_list.add(4)
assert cfg.simple_list == set([1, 2, 3, 4])
# source_data was written while adding "4"
assert source_data == {'simple_list': [1, 2, 3, 4]} Regarding the naming of customize and reset; maybe more descriptive names might be
setup(
...
extras_require = {
'yaml': ['pyyaml'],
'etcd': ['requests'],
}
) Now you can install the whole suite with the following command. pip install layeredconfig[yaml, etcd] So from the looks of it there does not seem to be a way in
cfg.nested.get('my_key', False)
cfg.nested['my_key']
cfg.nested.setdefault('my_key', []).append(True)
Now regarding the "compatibility layer". The current implementation is not fully compatible with the way layeredconfig works right now. So I think it would definitly be a major version bump if we follow semver.org. However we should of course create a small facade that provides the current behavoir and possibly deprecate some features like calling the static methods and instead promote the non-static counterparts. This facade would be released as a minor version so that people can convert their code at their own speed. I really don't want to bloat this comment section but I think it is important to clearify some things. So because you were asking for the save method let me explain how that works. Basically there is no save-method at all. By default changes are immediately written to the underlaying source through cfg = EtcdStore('/some/host')
# this calls the etcd store three times (two reads, one writes)
cfg.sum = cfg.num_a + cfg.num_b
cfg = EtcdStore('/some/host', cached=True)
# this calls it only once
cfg.sum = cfg.num_a + cfg.num_b
# and now a second time
cfg.write_cache() In reality the etcdstore uses caching by default. Also the naming of the method could be changed but I found it helpfull when the name precisely reflects what it actually does. When providing a facade we could still add a save method though and activate caching by default for all sources. There is more to it but it is already way too much text. Anyway you are welcome and I really enjoyed wrapping my head around this stuff. :-) [edit] I think I will use the README in my repository to add a couple of examples to show how it might work right now. Or maybe you can have a look into the test cases for the sources. [edit2] One last thing; generally the dump method became really unimportant now that the configs behave almost exactly like dictionaries. [edit3] I updated the type_map question (2) as I made a mistake. |
Hey staffanm, I kept thinking about how to best approach merging both code bases. On the one hand I would like to use the strategy part in another project as soon as possible and on the other hand both code bases are very different from each other and it requires some effort to merge them. For that we should take small steps to prevent any breakage.
|
Hi hakkeroid, I would be totally ok with you publishing your code base as a separate project. As you've noticed, I am unable to devote much time to Layeredconfig right now. As I understand it, the core of your code base is a clean-room (no code shared) implementation of the API? |
Now I was busy.. darn! Yes, you are right about it. What is shared is mostly the API, so basically how everything works from the user's perspective (although I have to rename the tool for now, of course). Significant differences code wise are:
To make it clearer, I am talking about this part. Specifically creating, storing and returning stripped down subsources. In comparison I separated the vertical source tree traversal from the horizontal sources list traversal which simplyfied applying the strategy and custom type parts. I am preparing some documentation with extended examples right now and will add a link to the released version to my comment afterwards. Then you can have a look at it if you like. :-) |
I started a similar project to yours a while ago but ditched it in favor of yours as i really like it. However there is one use case I can't really apply.
I want to have a configuration setting that can be extended as opposed to be overridden by the subsequent configuration sources. The following example isn't meant to work with list types only but could be applied to other data types, too.
Consider I have a tool that loads plugins from a list of paths. I want to allow a user of the tool to extend this list with additional paths coming from a configuration file within the current directory, the user's home directory and so on. When any of the sources provides the same name for the paths-key the previous settings will be overridden (or ignored from the perspective of the code)
~/.config/tool/config.yml
./config.yml
Now in python:
Finally it would be great to do a simple
config.paths
and get a list of multiple paths. Either as a list of lists or flattened to one list. Although I prefer the nested list as this enables the user to deal with equal values.I patched layeredconfig locally and can open a PR if you like. However I wasn't really able to get the tests running due to interpreter and invocation errors. So the changes doesn't include additional tests.
The text was updated successfully, but these errors were encountered: