Saturday, April 23, 2011

Simple Image Search Database

A few ponys on ponychan have posted the ways that they organize their pony pics because there is just so much pony out there now. But with so much pony, comes a long search time to find an image that you want. When posting to ponychan or other forums, seconds count when you want to be as funny or on topic as possible.

As I am a programmer at heart, I decided to create my own image database program with Microsoft SQL as a backend and .NET as the frontend. The current working name is SISD for Simple Image Search Database. A screenshot of the program in action is to the left. The screenshot of SISD is displaying 2 really simple searches (rainbow_dash and spike) along with importing a directory of images (one of rainbow_dash). It is multi-threaded, so a pony can import a bunch of stuff at once, or search and import, or whatever.



Major goals of the program (which are accomplished):
1. Do not store an image more than once.
2. Easily search images.
3. Tag images with infinite tags per image.
4. Easily add and update images and metadata
5. be insanely quick, even with large amounts of images.

The program uses the MD5 hash of an image to make sure it is only stored once. If the same file is encountered again, the file is not duplicated because the MD5 will match. By the way, almost all image sites use Md5 for dupe detection, so it is a pretty good method. SISD allows searching by the tags and such that you attach to each image. Tags can be something like 'rarity' or 'pony' or any combination of letters. tags are also not duplicated (in case they are typed twice) and are separated by spaces. If adding a space in an action tag, it expects an underscore. Such as 'rainbow_dash' instead of 'rainbow dash'. Up to 2.1 billion characters worth of tags can be tracked for each image.

SISD is made for me, and I am one lazy pony, so updating needs to be easy. For example, when somepony posts a 'rarity' thread, I download all the images to a directory. Then, I go though and delete anything not 'rarity'. finally, SISD imports them all with a 'rarity' tag. If an image already exists in SISD, the tag 'rarity' will be added to it if 'rarity' doesn't already exist on it. finally, the program is insanely quick. It can run as fast as your computer can. I have a separate database and file server, and I can import about 600 images in a minute. If running all on one hard drive, it is closer to about 2 images per second which is still pretty fast. I imported a few million pics, and the speed of 600/min holds even after loading more than a million pics, so the database design is very stable.

So why am I posting this here? I have been asked to make it available for everypony. I have been trying to clean up the errors and such in it over the past 2-3 weeks and make it more user-friendly. I am getting pretty close to something I can share finally. The bad thing is that you need MS SQL 2008/2008R2 Standard or Enterprise to run it. It should work with SQL Express 2008 advanced as long as the version is the one with full text indexing. If you have no idea what I am talking about SISD probably isn't for you.

This was also picked up by equestriadaily.  I am getting lots of suggestions, and am working on them before I release the first version out to the wild.