[postgis-users] Re: Proposed SQL interface for PGRaster

Tue Dec 5 23:08:33 PST 2006

Hi all,

Before I add my 2c, let me thank Stephen for taking the lead on this,
and express my gratitude and excitement at the prospect of good raster
DB.  And pardon me if this reads a little like I am thinking aloud,
but I kind of am thinking aloud....

Stephen wrote:
> Is there anything about the concept of a tile that makes it different
> from a general raster?

I am the last person to pose as an expert, but here I go.  I guess my
overall theme is design for the user, at least at the end point, not
for the programmer. I think the reason why tiling and the "master
raster" (hehe maybe "logical raster"?) come up, at least for me, is
that it _feels_ like appropriate design in several respects:  (1) from
an entity-relation modeling standpoint, (2) from the vague design
feeling transformations should happen in the output direction and be
as flexible as possible, and in (3) order to support huge rasters.
More detail:

(1) Entity-relation: Last year I worked with DOQs for California
generating maps of rural properties, and it drove me crazy that I had
to handle so many raster datasets that were basically identical to
each other, which partition the space quite nicely and which "feel"
like they are really "one big thing" (and they might be on a USGS
computer somewhere). I needed lots contiguous images in order to
overlay property lines that crossed the image boundaries. I was using
a crappy Arc-whatever application, so I dealt with the inelegance, but
in Postgis it would be silly to have construct a query that
references, say, 100 separate tiles, which would be quite easy to
imagine if one were trying to generate an image that was the 100 yard
buffer of, for example, the Mississippi river within a single county.
If there were gaps in the logical raster, you could either throw an
error or return a region of null values.

I am not sure when one WOULD use separate logical rasters -- I think
if they were measuring different things (like one an rgb tiff, one a
hyperspectral image of something) I would definitely want seperate
rows for them, but if they are just different regions and otherwise
identical in band content and resolution, I definitely think they
should be represented by a single row in a database.  (The resolution
thing is weird...  I will think...)

(2) Slicing and dicing belong on the query side, not the data side:  I
think in regards to generating output tiles, the most elegant thing
(from the DBA viewpoint) would be to overlay a single polygon on a
single master-raster. We should keep interfaces as simple as possible.

(3) Storage:  There are many important raster datasets that are TB in
size -- satellite remote sensing, etc, etc. These can't be stored in a
single blob or what have you, especially on 32bit file systems, plus
we probably want the rasters to be dumpable, and we would need to do
this in multiple pieces.

(3+) Storage: I think that most people would appreciate it if we found
the best single internal representation and used it without any
options, converting into it when necessary.  For output, the sky is
the limit :)

(4) Miscellaneous:  my tiling approach would obviate the merge()
functions, but would necessitate an add_data() function (as we add
DOQ's to the master, for example).  There might be a need to come up
with a postgis file or directory format outside of the database for
dumping and transmitting our big, complex logical rasters. I don't
think there should be any parameters for endian-ness unless that is
part of the output format; no DBA wants to fiddle with that, I think,
but please correct me if I am wrong.

One more comment:

> The first has consequences for optimization.  However, I'm not convinced
> it is absolutely necessary to save tile-raster relationships to be able
> to quickly merge raster together to form a larger raster.  The second
> may save a small amount of storage, but is probably not very important,
> particularly since coordinate system information is already stored by
> reference (e.g. using an SRID).
>
> Am I missing something more fundamental that distinguishes a group of
> tiles from a group of rasters that have the same type and resolution and
> boundaries that line up with one another?

I think you are probably dead on as far as the low level, speed and
storage details go, but I think it is crucial to think in terms of how
these datatypes will be used, at least if we want PostGIS to conquer
the world.

Another note:

> Patrick wrote:
> >  In other words, there
> >should be a get_tile(raster, tile_index_x, tile_index_y)

This could be done by a polygon overlay, maybe optimized for a bounding box.

Cheers (and very excited),
W