doc/catalog.txt


pkg
CATALOG AND INVENTORY OPERATIONS

[Suspended] Premise 1:  From the client's perspective, the catalog is just
another package.  (It's only special in that its contents drive
automatic package operations.)

   Because we pull catalogs from multiple repositories, managing N
   packages for N repositories seems overcomplicated.

Premise 2:  From the server perspective, the catalog is a file whose
contents are updated as a result of the set of packaging operations that
have occurred within the server's repository.  (No operation ever
affects the catalog directly.)

1.  Server catalog

The primary content of the catalog is the relationships between
packages.  We've already said that there are packages that depend on
other packages and packages that incorporate other packages.  These
latter packages, the base- and stack-style packages, are really packages
that, in addition to listing a set of packages, also freeze those
packages, such that it takes a relaxation of the base/stack package for
the client to modify the incorporated packages.

The detailed manifest for each package contains the dependencies,
contents, and actions.  But, as it seems that the set of incorporation
relationships are of particular interest for reporting and determining
package retrieval ordering, we place incorporation dependencies in the
catalog as well.

It's reasonably clear that the package repository needs to have
observations of package popularity, so that the most aged packages can
be purged, once unneeded.

2.  Client catalog and client inventory

The client catalog is a compilation of package sourcing information from
a collection of repositories, in the form of one server catalog per
repository.  Its purpose is to advertise what package versions are
"available", which means that it is the union of the repositories it
sources.

In contrast, the client inventory is the set of packages currently in an
installation state on the system, stored in some representation.  One
form of this representation will be in the form of a server catalog.

The inventory is stored under 

/var/pkg/db

If some instance of which@2.16 is installed, we would have a directory

/var/pkg/db/pkg/[publisher?]/which/2.16
				publisher
				manifest
				state

XXX Can we have two packages with the same name, but from different
publishers?

An open question is whether we store one or more reverse indices in the
inventory.  We could have indices into basenames, pathnames, and
contents, which would look something like:

/var/pkg/db/index/
		{basename,path}/
			[escaped basename or full path]/
				named links to content files
		
		content/
			[SHA1 hash of contents] -> package directory

Unfortunately, since FILE_MAX < PATH_MAX, the escaped representation for
full paths is insufficient, and we might need to provide a hash-based
directory name, and a metadata file with the full path.

2.1.  Multiple catalogs

If a package is available from two or more repositories, we have a
choice:  we can prefer a repository (as we would for a request for a
first installation of a package), or we can store both catalogue
entries.

In a directory-based approach, we might have

/var/pkg/catalog/repository_publisher/category/package/versions

with example

/var/pkg/catalog/pkg.opensolaris.org/application/which/2.16
/var/pkg/catalog/blueslugs.com/application/which/2.16

In a file-based approach, we would have

/var/pkg/catalog/[escaped_catalog_URL]
...

XXX Investigate efficiency of encoding versions as entries in
a file versus encoding as separate files

XXX Are FMRI-level categories actually a help, or is a detailed,
hierarchical namespace going to get in the way?