[postgis-users] Re: rasters in PostGIS...

Thu Sep 29 16:45:16 PDT 2005

Patrick,

Please understand that comments were only meant to offer help.  As you
pointed out, I'm not a scientist, I'm only a software developer and I
don't claim to be an image analyst.  All of my points are simply
solutions that have worked well for the specific applications I've
worked on, and by no means should be considered as the best solution for
every case.  I will try to further explain why the comments I made below
worked well for us.

>??? I'd like to see some arguments on these two issues.

1.  A data file that is smaller will load into memory faster and will
require less cpu cycles to process.  It's easier to backup a 5 GB
database than a 5 TB database (the reasons for this should be obvious)
like for one, there are signifcantly less bytes to copy and transfer
over the network.

2.  A large database may have to scaled horizontally across multiple
servers where a small database may run entirely on one machine.  Scaling
a small database may be as simple as buying a more powerful machine,
whereas scaling a multi-terrabyte system may require several machines,
which only a few database can handle.  Oracle can do this, but I'm not
sure that PostgreSQL can do this, therefore, scaling a large database
may require moving to a new database all together.

>Tape???? HDD sell for less than a $/GB nowadays and tape *systems*
(meaning 
>jukeboxes to match HDD capacity) are many times more expensive than any
HDD 
>system. TB RAID arrays can be bought for under $10K from multiple
vendors.

3.  First, raid array is not a storage system.  Most of our customers
don't consider disk to be cheap.  Our customers to name a few are the
USGS, USDA, and the US military.  I should also mention that we are the
largest hardware reseller in the US, so this is first hand knowledge.
When you have archives that are in the tens and thousands of terrabytes
in size, it quickly becomes very expensive (millions of dollars) to keep
all of your data on fast disk.  Not to mention the man-power required to
manage these systems.  Our customers rarely talk in terms of GB, usually
it's TB.  For instance, the USGS regularly recieves boxes of tapes from
EROS data center.  To put all of this data on fast disk would cost
millions if not tens of millions of dollars.  The military is even
worse. 

>The same was true when PostGIS first appeared with support for vector
data 
>only. With the ascent of a user community came the applications. If
raster 
>data are to be included in PostGIS you can safely assume that
applications 
>will support it before the blink of an eye, given the large number of 
>PostGIS users right now.

4.  That may be true given a year or two from now but I was speaking
about the here and now.  And to date, I don't know of too many
"commercial" applications that come with native support for PostGIS.
ArcGIS and Erdas for instance control something like 80 percent of the
commercial desktop market for GIS and Image processing and since PostGIS
is in essence a competitor, I doubt they will add native support anytime
soon.  Just a guess.  Plus, they haven't even added vector support for
PostGIS yet...how did they miss that one??

>This is a complete non-argument. Storing data in an RDBMS simply adds 
>another format to the already long list of formats. There is some
overhead, 
>no doubt, but there are also many, many advantages.

5. We won the USGS, Digital Globe and USDA contracts away from ESRI for
exactly this reason.  You need to do your home work better on this one.
Each client cited the same problem.  "It takes way to long to ingest our
rasters into the database" was their comments.  We're talking months.

>As a scientist, I can tell that you are not.

>Clipping or mosaicking has nothing to do with data loss; compression
may be 
>lossy, but even that is not necessarily so. Many spatial analysis
functions 
>actually work on a local kernel of data (e.g. slope function,
hydrological 
>functions, filters in image processing, local statistics, etc), which
goes 
>along very nicely with tiled raster data and with no problem to cross
tile 
>boundaries.

6.  Very true, I'm not a scientist, I'm a software developer.  Answer me
this please... What happens when you clip an image using the pixel
coordinate as area versus point?  Doesn't the pixel get cut in half when
the coordinate is area?  Also, what happens when you resample that
clipped image with a bilinear interpolation or cubic convolution?
Doesn't the pixel value change?  My question being...how do you reverse
that to get the original image back?  The scientists at Digital Globe
and the USGS tell me that you can't.  Perhaps their wrong?

>Sure, backing up a TB-sized database involves a little more work than a
GB 
>database, but the raster data needs to be backed up anyways, whether it
is 
>in or outside of the database. If it is all under one umbrella, the
backup 
>system would need to be less complicated if you ask me...

7.  Ask any production DBA on this one.  A TB database will takes days
if not weeks to copy.  A GB database alone if 60 or 80 gig may take all
night long to back up.  All the DBA's I've talked to think even a multi
GB database takes too long to back up.  Also, you're forgetting about
real-life.  In real-life, backups fail and the network goes requiring
backups to be restarted.  Also, most companies don't have lot's of extra
storage capacity to store multiple copies of GB databases let alone TB
size databases.  And the final note, most companies/agencies don't keep
backups of imagery on disk.  They back it up on tape.  If your DB file
gets corrupt, then all is lost and it's much easier to just restore text
data versus imagery.  

Martin

-----Original Message-----
From: postgis-users-bounces at postgis.refractions.net
[mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of
Patrick
Sent: Thursday, September 29, 2005 1:38 PM
To: postgis-users at postgis.refractions.net
Subject: [postgis-users] Re: rasters in PostGIS...

I don't easily get excited by news group posts, but this one really got 
me...

"Chapman, Martin" <MChapman at sanz.com> wrote in message 
news:ED3A48B9840E594890A2BC172D11946502F9C0AC at mailman.san.com...
> On another note, it's a bad idea to store rasters in an RDMS in 
> general.

You mean to say (I hope) that "in your humble opinion there might be
some 
arguments that indicate that storing raster data in an RDBMS may not be
an 
altogether good thing". Just to get the nuance in there.

> Databases do not handle BLOB data efficiently.  A better solution is 
> to store your rasters on a file system and keep path information in 
> the database in order to access your image data.  There are numerous 
> reasons this is better.  Listed below are just a few:
>
> 1.  Your database will be faster and easier to manage.
> 2.  Your database will scale much easier.

??? I'd like to see some arguments on these two issues.

> 3.  You will require less fast disk for your db.  Rasters can be 
> stored on tape, while your db can run on expensive fast disk.

Tape???? HDD sell for less than a $/GB nowadays and tape *systems*
(meaning 
jukeboxes to match HDD capacity) are many times more expensive than any
HDD 
system. TB RAID arrays can be bought for under $10K from multiple
vendors.

> 4.  You can open images with other application like ArcMap, Erdas, 
> that have no knowledge of how to access raster data in a database 
> without a specific driver.

The same was true when PostGIS first appeared with support for vector
data 
only. With the ascent of a user community came the applications. If
raster 
data are to be included in PostGIS you can safely assume that
applications 
will support it before the blink of an eye, given the large number of 
PostGIS users right now.

> 5.  Easier and faster to ingest rasters into your application.  If you

> store rasters in a database you will have to copy all image data when 
> importing into the database.  This can take a long time and require a 
> lot of cpu power/ram/disk space.

This is a complete non-argument. Storing data in an RDBMS simply adds 
another format to the already long list of formats. There is some
overhead, 
no doubt, but there are also many, many advantages.

> 6.  Some databases, like SDE, actually split the image into tiles, 
> which means your data is rendered useless from a scientific 
> perspective, because once an image is clipped, you can never get the 
> real image back, because pixels are changed/lossed in the 
> clipping/mosaicking process.

As a scientist, I can tell that you are not.

Clipping or mosaicking has nothing to do with data loss; compression may
be 
lossy, but even that is not necessarily so. Many spatial analysis
functions 
actually work on a local kernel of data (e.g. slope function,
hydrological 
functions, filters in image processing, local statistics, etc), which
goes 
along very nicely with tiled raster data and with no problem to cross
tile 
boundaries.

> 7.  If your database size gets too big...it may be a lengthy, if not 
> impossible task to back up the database.

Sure, backing up a TB-sized database involves a little more work than a
GB 
database, but the raster data needs to be backed up anyways, whether it
is 
in or outside of the database. If it is all under one umbrella, the
backup 
system would need to be less complicated if you ask me...

So what exactly were your arguments?

Patrick 

_______________________________________________
postgis-users mailing list postgis-users at postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users