# Quest for a good Photo Manager



## sgbotsford (Nov 18, 2018)

###Introduction

Currently using Apple Aperture.  Need a replacement.

I've been thinking a lot about photo management.

I'm starting to avoid the word 'DAM' as it increasingly refers to industrial sized software costing tens or hundreds of thousands of dollars.  So let's look at what I mean by a photo manager:

* Browser -- look at a bunch of pictures
* Tagger -- add metadata either singly or in batches.
* Searcher -- use complex searches to narrow down what I look at.
* Version tracker -- ability to keep track of derived images.

That's the TL;DR version.

Next level of detail:

**Browser**  Pix initially come in in bunches, and as such they go somewhere in some folder structure on your computer.  Many people will use some combination of Year/Month/ and string to describe event.  Often remembering that the shot took place on your trip to Italy, or that it was the Smith & Brown wedding is sufficient.

**Tagger** But what do you do when you are looking for the closeup of a butterfly.  It was an incidental pic on some holiday, but which one.  Now metadata comes into play.  If all your holiday shots are tagged  'Holiday' and your program can search existing metadata, your problem is solved.  Search for holiday and focal distance less than 5 feet.  You still may have a bunch to wade through.  If, in addition to some general keywords for the batch you add a few per image you have a big win.  E.g. "Butterfly"

Tagging is hard.  You want to tag it with multiple things.  E.g:

 * Describe the scene.
 * Identify the location.  GPS is fine, but "Lock Ness, Scotland" or Kensington market, Toronto, Ontario, Canada is easier to visualize.
 * ID the people in the scene. Classify them more generally. (Woman with child; Young boy...)
 * Describe the technical aspects -- close up, high/low key, lighting
 * One or more classes of description about the scene -- weather, mood.
 * Usage:  Have you sold exclusive rights for this image?  Exclusive for 18 months for a calendar?

These are *facets*. I prefer to go through a set of images several times, concentrating on one of these at a time. Sometimes a facet is irrelevant.  Weather makes no sense for an interior shot.  If you do facets, you need a way to search for images that don't have an entry for facet X.  You also need a way to mark a facet as irrelevant.

Crudely you can implement these with constructs like WEA:Cloudy  but then you still have to be able to search for images that don't have WEA:* as a keyword.
And you have to decide on what to do where it's not relevant.  WEA:N/A

Having some kind of support for actual facets would be a big win. 

Hierarchical keywords are important, and it's partner, controlled vocabulary.  You really want to be able to avoid having entries for Smith, John and be unable to distinguish between the one from Hoboken, NY and the one from East Horsebiscuit, SD.  You also want to avoid Rachmaninov, Sergie and Rachmaninoff, Sergie.  So controlled vocabulary is your friend.  At the very least it should require you to take an extra step to add a new word.

The database needs to be bulk editable.  E.g. When you started out you had a category, "People" and everyone was under people.  Well, after a while that was getting cumbersome.  So many friends.  So you want to introduce some subcategories  People -> High School Friends; people -> industry acquaintances... You want to be able to move someone from one category to another, and have those changes propagate to the images involved.

**Searcher**  No point in tagging if you can't search the data.  Two programs I trialed, Mylio and Photoshop Supreme, had no provision to search exif data -- where the stuff like time of day, and focal length, and camera model is kept.  One program allows you to search for only one tag at a time.  I could search for Holiday.  Or I could search for butterfly.  But I couldn't search for shots that have both "Holiday" and "butterfly"  One program allows 'and' but not 'or'. Ideally you want full boolean search support with 'and,' 'or', & 'not', parentheses for grouping, and wild cards for partial matches.


**Version tracker** A photograph for a professional may have a long history.  You often have a shot, then export it in some altered form (cropped, resized, sharpened, colour adjusted, watermarked) Nice to be able to find the original 5 years from now.  One recommended practice I ran into had the following:

* Master image was Raw.
* Archive version was digital negative.
* Processing version was 16 bit tiff or PSD
* Delivered version was tiff or jpeg.

This requires a minimum of 4 versions.  Add to that:

* Watermarked versions.
* Reduced resolution versions for web pages.
* Colour matched versions for specific printing environments.
* Cropped versions for mobile web pages.

So that's the base case.  Implementations may differ, and they refine this somewhat.

Be nice if you could attach rights to a version.  E.g. if you sell exclusive rights to a photograph for 18 months for a calendar, you better not sell a slightly different version to someone else.



## Requirements:

The four functions above describe what it should do.  Here are some more details about how it should do it.

### Server requirements
I can see implementing this in one of two ways:  Either as a stand alone program or as a local web server.  The latter has the advantage that it would scale for family or small photo business.

Cloud services are a possibility, but are slow when you are talking about 10-12 Mbyte files. My network connect to the world takes several seconds per MByte.  Cloud services for metadata have to be well optimized -- you really don't want to issue 3000 keyword change requests individually when you change the spelling of a keyword.  So:

 * Not cloud based.
 * Runs on Mac or on local apache web server.

### Keyword handling
 * Fast keywording.  Aperture allows drag and drop from a list, multiple sets of hotkeys for words used frequently, copy paste of keywords from one photo to another, and keywords organized in folders. It also allows search for a keyword, and a list matching what you typed so far appears.  Other programs that have good keywording include IMatch and Photomechanic. One of the key aspects of this is to have multiple ways to do things.
 I like aperture's multiple preset buttons -- combine with facets.

 A *history* of keywords might help:  A pane with the last N keywords in it.  Chances are that the next word I use will be one of the last 20 I use about 80% of the time.

 * Full access to standard metadata: EXIF, ITPC,  subject to limits of the file format.
 * Controlled vocabulary.  I want an extra step to add a new keyword to my list of keywords. This helps with the the Sommer Vacashun problem.
 * Hierarchical vocabulary.  E.g. Separate entries for Birds -> raptors -> falcon and Planes -> fighters -> falcon.  Parents are stored with keywords. Moving a keyword in the master list, or changing spelling, corrects all usage in photos.  This can be done as a background task.
 * Parent items are automatically entered as keywords. (With the correct database linkage, this comes free as a side effect of the point above.
 * Synonyms -- I can define "Picea glauca" as a synonym for "White Spruce" entering one, enters the other.
 * Facets: For a set of pictures I want to be able to define a set of facets or categories for collections or folders.  Facets would be things like: Weather; Who; Where; Ecosystem; Season; Lighting  Not all collections would have all facets, but a collection having a facet would nag me to put it in.  A facet would have a negation for not applicable (Weather isn't applicable inside a house; Who isn't applicable in a landscape shot.)  Facets allow me to go through a collection in multiple passes and get the missing keywords.

### Searching
 * Complex searches: Find all shots between 2012 and 2015 shot in December or January, shot with my Nikon D70, with keyword "snow" rating of 3 or better shot after 3pm in the day. (Yes, I do use searches like that)
 * Saved Searches.  These are the equivalent of smart albums in Aperture.  As new pix meet the standards they would be shown.

### Version Tracking
 *  If I make a lower resolution, cropped, photoshopped, composited or a black and white image from a master, the system should show that it's a derived image, and allow access to the master.  A master should be able to  list derived images.  Derived images are not linear but form a multi-branched tree.
 * If my camera produces JPEG and Raw versions, I want the JPEG to be shown as being derived from the Raw version.
 * Metadata applied to a master should propagate down to derived images.
 * Some form of exception handling for this: e.g. -keyword to prevent a
   people identifier being applied to an image where that person was
   cropped out.
 * Ability to track through external editing programs.  E.g. If I edit a program in photoshop, it will mark the PSD file as being derived, restore as much of the metadata as the PSD format allows.  If Photoshop is used to create a jpeg image, that too is tracked.

### Data robustness
 * All metadata is indexed.
 * Metadata is also written to sidecar files.
 * Where possible metadata is written to the image file itself. (optional -- can stress automated backup systems)
 * Through file system watching, name changes and directory reorganization are caught.  Relevant sidecars are also renamed, and the database updated with new file location/name.  Sidecar contents include the name of their master file.
 * Should be possible to rebuild entire database from images + sidecars.  Should be able to restore all file metadata from database.  This requires a lot of under-the-hood time stamps to determine which has priority.

 * All database actions should be logged and journaled, so they are reversible.
 * Reasonable speed with catalogs of more than 100,000 images.
 * Support for previews of all common image formats and most raw formats.
 * Previews and thumbnails are treated as versions of the master.  They inherit metadata.



### Nice to have:

 * Simple non-destructive editing -- crop, brightness, contrast.
 * Rating system
 * Smart albums
 * Drag and drop functionality with other mac apps.



### Notes on current state of the art:

 * Nothing I've found supports version tracking, especially through an external program.  Lightroom and Aperture both support a type of versions -- different edits on same master.  Aperture supports Stacks -- a group of related pictures.
 * Lightroom:  Doesn't support PNG, very clunky interface, can be slow on large catalogs;
 * Mylio home version doesn't support hierarchical keywords; doesn't index exif information, does not allow or syntax for searches,
 * Photomechanic is fast for keywording and culling, but has very limited search capability.
 * IMatch.  Possible contender, Requires MS windows box.
 * Photo Supreme:  Erratic quirks.  Crashes. One man shop. Can't search Exif in useful way.
 * Fotostation:  AFAIK no underlying database.  Has to read metadata from images/sidecar files on startup.  Slow after 10K images. (They have server based software too that is big bucks.)  On my "look at" list.
 * Luminar:  A DAM has been promised Real Soon Now, but no demos, storyboards or feature lists have been published.  There is a claim that it is in beta, but no one on their fairly active forum will admit to being part of the beta group.
 * Affinity:  Similar to Luminar.
* ACDSee:  Mac version is always substantially behind.  May be worth a look.

#### Commandline tools

Much of the special features for version tracking could be implemented with scripts using calls to these programs.

 * ImageMagick -- good for whole-image conversions, also can read/write internal metadata and sidecars.
 * Exiftool -- read/write exif data reads most makernotes.
 * fswatch -- not really an image processor, but hooks into the operating system and can alert when files have changed -- modified, renamed, moved.



#### Enterprise level

There are a raft of these with vaguely defined abilities and very high price tags.  Most are SaaS and cost hundreds to thousands of dollars per month.

 * WebDAM No real information about capabilities on web site.
 * Extensis.  Expensive.
 * Bynder.  Joke program. Cloud based set of shoeboxes.
 * WIDEN.  Cloud only.
 * Asset Bank.  Starts at $500/month for up to 50 users.

### Metadata Storage

There are three places metadata can be stored:

* In the image.
* In a database.
* In a separate file for each image (sidecar file)  Typically these files have the same name as the primary file, but a different suffix.

If at least some cataloging information is written to the image, then you can reconnect a file to your database. In principle this can be a single unique ID.

This saves you from:

*  You moved or renamed an image file.
If you can write more info into the file -- keywords, captions -- then you are saved from:

*  Your database is corrupted.
* You upgraded your computer and your database program doesn't work there.

Sidecar files allow you to recover all your metadata if your database crashes.

***Downsides of storing data in the image***

Writing to the original files can corrupt the file. Most RAW formats are well understood enough now to at least identify and replace strings of metadata with the same length string. If you tell your camera to put the copyright string

    Copyright 2018 J. Random Shutterbug Image XXXX-XXXX-XXXX-XXXX-XXXX-XXXX

Then as long as the DAM keeps that string the same length you are golden.

Keeping all metadata (or as much as you can) in the original images makes for very slow access. Your program has to read at least the first few blocks of every image.  Depending on the file structure, adding too much data may require rewriting the entire file.  Any program that moves the boundaries of data sub blocks better be well tested.

Writing data back is time consuming.

Some file formats don't have any metadata capability.

Some file formats (Photoshop PSD) are noted for mangling metadata.

A glitch during the write process can corrupt the image file. The alternative, writing a new file, then replacing the old file requires that the entire file be both read and written, rather than just a chunk of it. This has serious performance issues.

***Downsides of Databases***

Databases are fast, but they are blobby, and you are writing into the middle of blobs of data. If the implementation of the database is solid, there isn't much to worry about. But hard disks have errors, and a single error can make a database partially or fully unusable. Good database design has redundancy built in so that you can repair/rebuild.

Databases are frequently proprietary. Data may be compressed for speed. Getting your data out may be tricky. (Problem for people using Apple Aperture)

Databases frequently are optimized in different ways. In general robustness is gained at the cost of decreased performance and greater complexity. One compromise is to write all changes first to a transaction file (fast...) and then a background process does the database update in the background.  This slows down access some:  Have to check both the main database and the transaction file, but unless the transaction file gets to be bigger than memory, this shouldn't be noticeable.

***Downsides of Sidecars***

You have to read a zillion files at startup.

If you do a batch change (Add the keyword "Italy" to all 3000 of your summer holiday trip shots) the catalog program has open, modify and write back 3000 files.

If you rename a file, and don't rename the sidecar file too, your meta data is no longer connected to your image.

### Recommended practice


* You want a unique asset tag ID that resides in the image. This can be an actual tag like the copyright one mentioned above, or it can be a derived tag from information in the image. This could be the EXIF time stamp (Not unique -- multiple shots per second, multiple cameras.) If your program reads makernotes, the best one is Camera model + Camera serial number + timestamp + hundredths of a second.  Adding shuttercount to this gives some redundancy for sorting, and can also reveal missing images for critical applications like forensic photography.

* You want a database for speed. It, of course has the unique ID

* You want sidecars for rebuilding your database, and for data portability. They have the unique ID.

If the database crashes, it can be rebuild from the sidecars.

If a sidecar is corrupted, it can be rebuilt from the database.

If an image is renamed the ID can be used to reconnect it to the sidecar, and to fix the database.

To make this work, you have to use a lot of timestamps. If the sidecar is more recent than the latest time stamp in the database record, then the sidecar is the authoritative record.

You also have to have internal checks on data integrity.  The record for an image (sidecar or database) needs a checksum to verify that that data isn't corrupt.

Given the relatively fragile nature of raw files, best practice is a system that only writes zero or once to the Raw file. This is why the exif time stamp + hundredths, copyright work well.  You can include the camera model and serial number in the copyright so that now the copyright message is unique to the camera.  At this point you have the ability to create, and recreate a unique ID for each image.  If the DAM has the ability to modify the file, you can create this ID once. This saves some time if you ever have to rebuild the database.

Having as much of the metadata in the file as possible means that it travels with the file.  This is a win, but comes with the risk of potential corruption.  Possibly the best strategy is to leave the original intact, and for clients who need raw data, either add metadata to a copy, or to a derived full data equivalent (e.g. DNG)

Sidecars don't need to be updated in real time. The slick way to do this would be that whenever the database makes a change to a record:

* Make a new record that duplicates the old record in the database.

* Make the change in the new record.
* New record is flagged, "not written to sidecar"
* Old record is marked "obsolete"
* Another thread writes the sidecar files out, writing out the new one, then deleting the old one (or renaming the new one to the old one's name).
* Periodically you run a cleanup on the database removing obsolete records older than X days. This gives you the ability to rollback changes.

This is not complete: It doesn't address the issue of non-destructive edits. Many programs now allow the creation of multiple images from the same master file, and do not create a new bitmap, but rather a file with a series of instructions for how to make the image from the master. AFAIK all such methods are proprietary. This results in a quandary as the apps that do a good job of tracking metadata may not be able to deal with the non-destructive edits. This can be critical if you crop a person out of an image, crop to emphasis a different aspect, and receive a different caption, etc.

The workaround is that you always write out a new bitmap image from a serious edit. Ideally you have a script that looks for new NDEs and writes out an image based on this, copying the metadata from the master and at some point bringing it up for review for mods to the metadata.



### Robustness against external programs.

I like having an underlying file structure organization.  I like the idea that if I produce a bunch of cropped, watermarked, lower resolution, etcetera versions of an image that my catalog will track that too.  I like being able to use any editor on my image files.

But if the underlying file structure is exposed to Explorer or Finder, then you have the risk of a file being renamed or moved, and the database is no longer in sync with your file system.

To budnip answers of the form "This is impossible" here's how to "Finder-proof" your image database.


 * When an image is edited, a file system watcher notes that the file was opened.  The file goes onto the 'watch' list.  (the program fswatcher does this on mac.  I use it to update my web page when my local copy has been edited.)

 * When a new file appears in a monitored directory tree, it's noted.

 * When a file is closed, this is also noted.  If there has been a new file created it is checked for metadata.  If the new file's metadata has a match for an existing file, then existing file metadata is used to repopulate missing data in the file.  (Photoshop is notorious for not respecting all metadata.) If the new file does not have a matching ID, then an alert needs to be brought up. 

 * Database is updated with the new file being marked as derivative of the original file.

 * optionally a suffix may be added to the new file's ID, showing whether it derives directly from the original or from another derivative.

To make this work, the two components are a unique id that can be calculated from the master, and a file system monitor program that catches create, move, change, and rename events.


----------

