Welcome, Guest. Please login or register.

Author Topic: Extensible file format spec ...  (Read 2886 times)

Description:

0 Members and 1 Guest are viewing this topic.

Offline JoseTopic starter

  • Hero Member
  • *****
  • Join Date: Feb 2002
  • Posts: 2871
    • Show only replies by Jose
Extensible file format spec ...
« on: September 04, 2006, 08:21:17 PM »
Hi. I was messing around with a simple variable data size file format I made, wich basically consists of a bunch of offsets to the data and it's size in the beggining of the file. Problem is that if the file gets VERY big it's just not practical to rewrite it all over again each time an update is needed. I understand that the reason for this being cummon practice is that today's storage systems are very fast, so no harm in just rewritting the file again.
Heck, for the use I'll give it it's probably not even worth it but just for the fun here's what I've come up with, if anyone sees something that could be better just post.. :-) Data can be anything. I guess to be a cool thing the only things missing are file data identification and nesting ? IFF comes to mind, but from what I remotely remember it doesn't support file data extension does it ?

(NOte: I know 1st is not an acceptable way to start a C var, it's just a scratch..)

LONG FileSize; /* Could be a quadword :)*/
/* Header 1st Part */
LONG 1stDataElement1stPartOffst;
LONG 1stDataElement1stPartSize;
LONG 1stDataElement2ndPartOffstOffst; /* Offset to 1stDataElement2ndPartOffst. Could be null to terminate 1st Data element. IMPORTANT: 2nd Offset is right after 2nd data */
LONG 2ndDataElement1stPartOffst;
LONG 2ndDataElement1stPartSize;
LONG PadNULL; /* NULL 2 terminate 2nd Data Element */
LONG 1;/*1 Indicates Next LONG is offset to header 2nd part */
LONG Header2ndPartOffst;
DATA
1stData Element1st Part:
...
...
2nd Data Element 1st (and only) Part:
...
...
...
..
LONG 1stDataElement2ndPartOffst;
LONG 1stDataElement2ndPartSize;
LONG PadNULL; /* NULL 2 terminate 1st Data Element IMPORTANT: could be an offset to 3rd part */
1stData Element 2nd Part:
...
...
Header2ndPart
...
LONG 2 /* 2 Indicates Header termination */
\\"We made Amiga, they {bleep}ed it up\\"
 

Offline JoseTopic starter

  • Hero Member
  • *****
  • Join Date: Feb 2002
  • Posts: 2871
    • Show only replies by Jose
Re: Extensible file format spec ...
« Reply #1 on: September 04, 2006, 09:27:10 PM »
What ?! Noone ? At least tell me what you think, even if you think it's crap  :-D.
\\"We made Amiga, they {bleep}ed it up\\"
 

Offline McVenco

  • Hero Member
  • *****
  • Join Date: Feb 2006
  • Posts: 1428
    • Show only replies by McVenco
    • http://www.amigascene.nl
Re: Extensible file format spec ...
« Reply #2 on: September 05, 2006, 08:28:07 AM »
Well, I'd love to comment on this, but I'm as fluent in programming languages as I am in Chinese, so my comment would be: "looks nice - what does it do?"  :lol:
| A4000 | CS-MK3 060@50 | Picasso IV |
| Member of Team Amiga (tm) | FidoNet 2:286/414.18 (long ago) |
| SysOp The Missing Channel BBS | Member of AGA BBS Intl. |
 

Offline weirdami

  • Hero Member
  • *****
  • Join Date: Jan 2003
  • Posts: 3776
    • Show only replies by weirdami
    • Http://Bindingpolymer.com
Re: Extensible file format spec ...
« Reply #3 on: September 05, 2006, 09:16:42 AM »
@McVenco

How do you say "looks nice - what does it do?" in Chinese?
----
Binding Polymer: Keeping you together since 1892.
 

Offline McVenco

  • Hero Member
  • *****
  • Join Date: Feb 2006
  • Posts: 1428
    • Show only replies by McVenco
    • http://www.amigascene.nl
Re: Extensible file format spec ...
« Reply #4 on: September 05, 2006, 09:20:52 AM »
@weirdami

I'll ask the waiter the next time I go to a Chinese restaurant :lol:
| A4000 | CS-MK3 060@50 | Picasso IV |
| Member of Team Amiga (tm) | FidoNet 2:286/414.18 (long ago) |
| SysOp The Missing Channel BBS | Member of AGA BBS Intl. |
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show only replies by Karlos
Re: Extensible file format spec ...
« Reply #5 on: September 05, 2006, 10:37:33 AM »
@Jose

I have something you could use that I used to solve machine independent, extensible binary storage some time ago that I called 'xsf' for extensible storage format. However, it is C++ based.

The very root is the idea of endian aware XSFStreamIn and StreamOut classes that provide protected methods for block reading and writing varying length data items endian safe. A file header comprised of only byte values contains some basic information about the endian nature of the file (by default the signature of the machine creating the file but you can override this), the expected data alignment and some other properties.

On top of this, there is an XSFStorable class which is able to access the IO methods of the above streams. It defines a basic wrapper for an object you want to be able to store and you get the properties by inheriting XSFStorable in your target class. What this class is, is entirely up to you, it can be all kinds of data. You just use the XSFStorable methods to define how it is serialized and unserialized, which in turn use the block IO routines. The serialized XSFStorable object becomes a well defined chunk within the file, complete with the XSFStorable header information that the system uses when parsing files.

The basic file structure and XSFStorable itself provides some basic type information that allows the system to parse serialized XSFStorable chunks it doesn't recognise (at the very least you can skip past them).

Lastly, the amigaos version's IO routines are realised using asynchronous double buffered routines similar to asyncio.library (in fact based on some old example stuff that I think became the basis of said library too) but also providing on the fly endian conversion for block reads and writes (byteswapping copy, if you prefer) where needed. Naturally those bits use asm ;-)

If you are interested I can send you the code, but as it is part of a larger system, you'll need to pull out the bits you need.
int p; // A
 

Offline JoseTopic starter

  • Hero Member
  • *****
  • Join Date: Feb 2002
  • Posts: 2871
    • Show only replies by Jose
Re: Extensible file format spec ...
« Reply #6 on: September 06, 2006, 08:02:20 PM »
@Karlos

Hey! Thnx for the offer but I never programmed in C++ and I don't fell like learning now even if it's easy:) I think I'll make my own just for the fun:)

One interesting thing I noticed is that with extensible data the file ends up having the same problems filesystems face when some data in the middle is deleted: fragmentation. But IIRC there are some functions to tell the filesystem that a part of a file is to be scrapped and it will take care of fragmentation issues itself (if it does at all). How did you handle this issue ?
\\"We made Amiga, they {bleep}ed it up\\"
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show only replies by Karlos
Re: Extensible file format spec ...
« Reply #7 on: September 06, 2006, 08:11:25 PM »
One of the standard XSFStorable classes was a catalogue class. If a file contained multiple serialised objects, the catalogue tracked them. The catalogue chunk was always the very last chunk in any file that had one. If you deleted a chunk, its space was up for recycling. If it was too big to fit in any gap, it went to the end of the file (as in the catalogue gets overwritten and then the updated one added at the new end of file).

I wrote a simple CLI tool that would optimise any XSF file by collapsing out the free space and updating the catalogue.

I extended it with an experimental free space tracking one to make a more effective system but I never intended to make some sort of replacement 'in file' filesystem. 99% of the time a file only ever had one serialized object in it anyway :-)

-edit-

Its also noteworthy that the catalogue was totally optional. Nothing prevented you having many chunks in a file. If you attempted to read an object, the system would skip to the next chunk from wherever it was (because you never ever read directly, only unserialize, you can only ever be 'on' a chunk. Of course it would scan if this wasnt the case, perhaps a serialized chunk has an invalid rawSize).

I'll see if I can find my design docs.
int p; // A