Author Topic: GeneralSaver (was: Access to compiler's variable types possible ?) (Read 7508 times)

Jose · « **Reply #14 from previous page:** December 21, 2006, 03:27:44 PM »

@Karlos

Hmm, hadn't thought about portability between platforms, that would screw things up. But if the code was to run on 68k or PPC I guess there would be no problem. Even if it was to run on x86 it should work provided the user used files saved in the same platform (not ideal).
Regarding the alignment thing, didn't thought about that either :-) But I thought alignment of structures was made by padding bytes at the end or not ? If so, my method above would still work, though if there is a better one bring it on.

As to the (un)serialization functions that's cool but it sort of defeats the whole purpose I think, wich is to minimize the needed work to save data to disk (e.g. if I have to write a serialization function for each struct it will be more or less the same trouble as writtin a custom loader/saver for each).

BTW I liked Sigler's idea of having a separate description outside of each struct, that makes it unecessary to repeat it inside each one, though a pointer to it is still needed of course.

Jose · « **Reply #15 on:** December 21, 2006, 03:38:16 PM »

@Cymric
" Not a bad idea, but it needs extension: you must store that type string in the dump too, and that requires some indication as to its length..."
Well, if one establishes a format the loader already knows what data it's dealing with, unless we're to make a general loader too :-)

"... Second, you need to be able to distinguish between various structs (and unions)"
Each struct begins with a pointer to it's description.

Cymric · « **Reply #16 on:** December 21, 2006, 03:54:29 PM »

No, the loader doesn't know, especially if you use variable-length strings. As for the structs: that's nice, but if you write an 's' in the description to indicate a struct, which one are we referring to...?

Karlos · « **Reply #17 on:** December 21, 2006, 07:19:17 PM »

Quote

Cymric wrote:
The main problem Jose is facing is that there is no keyword (not in C, not in C++) which gives you back some structure outlining the variables and their type in a given scope.

Precisely why I suggest the OO approach. You can still have a set of functions that serialize primitive types based on their byte size that your structure serializers use to perform the physical IO.

This approach keeps the implementation well out of sight of the application code and allows you to vary them indepentendtly.

I should add that it was the use of mechanisms like this that finally persuaded me to go C++ in the end anyway ;-)

Karlos · « **Reply #18 on:** December 21, 2006, 07:21:43 PM »

Quote

Jose wrote:
@Karlos
...
Regarding the alignment thing, didn't thought about that either :-) But I thought alignment of structures was made by padding bytes at the end or not ? If so, my method above would still work, though if there is a better one bring it on.
...

Well, alignment issues can cause structures to have holes in them. If your implementation prefers ints to be on a 32-bit boundary, the following structure would likely have a hole between the members:

struct Foo {
short s;
long l;
};

sigler · « **Reply #19 on:** December 21, 2006, 08:40:21 PM »

You have a choice, either have each struct contain a pointer to a serialize function like others have suggested, or have each struct contain a pointer to a type info descriptor. Both choices are okay, the 'pointer to serialize function' is probably the easiest to implement, though requires that you write the serialize function for each struct. The pointer to type info is a fancier approach, that could take some more time implementing, ideally, the type info string should have been made automatically when you compile your program, but it doesn't, so you must do some extra work for each struct here as well.
If you DO go for the pointer to typeinfo.. there are ways to solve most of the problems others have raised with it. endianness.. not a problem, just be consistant, choose either little endian or big endian in the file format. Alignment issues, when writing: you know the alignment of the architecture your currently running on, just have a #define __alignment__ 2 or 4 in your program or something like that. parse the typeinfo string character by character, and fetch the fields from the struct (which is just an array of bytes at this point, obeying the __alignment__ define. But write it to file with no padding between the fields. When reading, just do the reverse, read the fields from the file with no padding, and write them into the struct memory obeying the __alignment__ define.

The typeinfo string (it doesn't need to be string, and it doesn't need to have one character per field either, you could go for a more complex typeinfo struct or string)
can also contain info on which fields you want or don't want to be serialized.. like maybe a '|' before a character could mean, don't serialize this field (you still need it, in order to skip it when fetching the fields from memory). As for unions, they're more complex, you could say that there's always an int before the union saying which field of the union is currently active.. like this

struct mystruct
{
char* typeinfo;
int type;
union
{
char* string;
int integer;
float floatnumber;
} u;
}

You'd need to expand the typeinfo string syntax also: so the above struct is:

i=int
f=float
P = null terminated string
u(...) = union, this implies an int first, and the elements in the union within the parantheses

char* typeinfo_mystruct = "u(Pif)";

--
Sigurd Lerstad

sigler · « **Reply #20 on:** December 21, 2006, 09:15:50 PM »

Also, and this is valid for both the pointer to Serialize, and pointer to typeinfo string. A good way to serialize pointers and also necessary if the pointers go in cycles, is to store each pointer as an int

struct Object
{
char* typeinfo;
};

You need some map helper stuff, in c++, this is easy with std::map<..>

std::map mapStore;
std::map mapLoad;

StorePointer(struct* pObject)
{
if (pObject == NULL)
{
objectid = 0;
write objectid;
return;
}

int objectid = mapStore.find(pObject)

if (objectid > 0)
{
write objectid;
}
else
{
objectid = mapStore.size()+1;

mapStore.insert(pObject, objectid);

write objectid;

// the following is when using the pointer to typeinfo method

write pObject->typeinfo string or something here

for each field in pObject->typeinfo
if field is pointer
StorePointer(pFieldPointer) // recurse
}
}

struct Object* LoadObject()
{
int objectid;
read objectid;
if (objectid == 0) return NULL;

pObject = mapLoad.find(objectid);

if (pObject)
{
return pObject;
}
else
{
objectid = mapLoad.size()+1;

// the following is when using the pointer to typeinfo method

read typeinfo string or something
calculate size of struct based on typeinfo string
pObject = allocate memory

mapLoad.insert(objectid, pObject)

for each field in typeinfo
if field is pointer
fieldvalue = LoadObject() // Recurse
}
}

--
Sigurd Lerstad

Jose · « **Reply #21 on:** December 24, 2006, 12:17:39 AM »

@sigler
Thanks for taking the time but I never bothered with C++ :-) Maybe one of these days...

@All

I've been thinking about the better way to implement a description of data and came up with the framework bellow. But I'd still like your suggestions/critics on it before setting down and try to write the code that uses it.

Another important doubt is that a string would apparently be better cause unecessary fields for some options could simply be omitted thus making the thing simpler.

But at the same time if I'm to use nested data descriptions how am I to put a pointer in them if they're inside the string ? Even if it's possible it probably won't be very practical. Another problem of strings is that they're numbers have to be parsed, wich slows down things. I could parse them on startup but that would be a bit weird..

Here it is: :-D

Code: [Select]

/* Description of a block of data in memory, 
description of a struct is a zero terminated array 
(Dscrpt->Type == 0) of these */
struct Dscrpt
{ WORD Type;
  UWORD Size; /* Size of block of variables with this Type, 
or variable lenght block definition (see bellow) */
  LONG AddtInfo;
};
/* Some considerations about struct Dscrpt */
/* - At 1st it was to be a one value approach but it's 
changed for the following reasons:
     - The size field allows descriptions to comprise very 
big sequence of variables with similar attributes
       so it effectively renders descriptions smaller in 
most cases
     - The AddInfo field was added to allow for usage of 
pointers in description nesting, wich couldn't be 
       done any other way, but it now also provides other 
features
   - All in all, this is probably one of the (or just the ? 
;)) best possible compromises between size and simplicity 
versus features, powerfullness and flexibility
*/

/* Type definition field */
#define TYPEDEFFLD 0x0000ffff
/* Type extension field */
#define TYPEFLGFLD 0xffff0000
/* Possible values in Type. If not mentioned, Size and 
AddtInfo do not apply (are ignored) */
#define NS 0 /* Do not save following block of data */
#define NP 1 /* Block of non pointer variables */

/* Block of pointer variables. Indirection level given by c 
*/
#define PO(c) 2|(c<<8) /* Block of pointer variables (used 
if != NULL), whose object Dscrpt is at the object pointed to 
+ offset given in AddtInfo */
#define PP(c) 3|(c<<8) /* Block of pointer variables (used 
if != NULL), whose object Dscrpt is pointed to by AddtInfo 
*/

/* Block of structs */
#define SO 4 /* Struct inside this one whose Dscrpt is at 
it's 1st element + offset given in AddtInfo */
#define SP 5 /* Struct inside this one whose Dscrpt is 
pointed to by AddtInfo */

/** Block of arrays (fixed or variable lenght) **/
/* IMPORTANT: For these upper 8 bytes in Size == variable's 
block size, lower 24 bytes == extra info (see type 
descriptions) */
/* 	      Following macro must be used to define Size 
(a- this variable's block size, b- extra info): */
#define SZ(a, b) (a<<24)|b

/* Fixed lenght arrays. Nr. of elements given in b */

#define AO 6 /* Array's base type description pointed to by 
AddtInfo */
#define AP 7 /* Array's base type description pointed to by 
each element at the offset given in AddtInfo */

/* Variable lenght arrays */
#define AMO 8 /* Size memo based. Same as AO but array's 
size is given by value (size memo) in address of 1st element 
+ offset in b */
#define AMP 9 /* Size memo based. Same as AP but &quot;&quot; */
#define AZO(c) 10|(c<<8) /* Zero terminated. Same as AO but 
array is zero terminated by type given in c (type 
definitions bellow). If base type is complex, b is offset to 
the terminating field on each element */
#define AZP(c) 11|(c<<8) /* Zero terminated. Same as AP but 
&quot;&quot; */
  /* Possible types in c */
  #define 1: BT /* BYTE */
  #define 2: WR /* WORD */
  #define 3: LN /* LONG */
  #define 4: FL /* float */
  #define 5: DB /* double */

Jose · « **Reply #22 on:** December 24, 2006, 12:24:45 AM »

These are the features I tried to keep in mind (my notes before I made the actuall "standard" sketch.

- Fast and relatively easy to use:
- No need to type too many or fancy time consuming multiflaged masked definitions. When needed, last ones are done by a macro
- Descriptions are written only once, then refered using pointers
- Flexible:
- Support external or internal (pointed to inside the data itself) data block descriptions in any place, to best fit any particular needs
- Description nesting (support for nested complex variables)
- Support for multi indirection pointers
- 2 ways to support variable lenght arrays: size memo and zero terminated
- Support for multidimensional arrays (using nested descriptions, see provided example)
- Support for multiple unpredictable datatype trees and arrays using internal descriptions
- General, practically any complex/variable lenght data type combinations should be described

Jose · « **Reply #23 on:** December 24, 2006, 12:41:47 AM »

A drawback of this is that a simple NULL terminated string description would be:
struct Dscrpt NullTermStr_Desc[] = {AZO(1), 1, 0};

Maybe trying to make it powerfull and with features, it ended up too complicated/counterproductive :-? :-(
If there's a big sequence of strings in a struct say 20, I think it rules though, it would simple become {AZO(1), 20, 0}; :-D

A string based description could be more simple for simple pieces of data but due to what I said above, I don't know if it would be adequate for these features.
If you have a good idea to implement these features using a string let me know ...

Waiting for comments...
.....

8-)

Cymric · « **Reply #24 on:** December 24, 2006, 12:15:07 PM »

Quote

Waiting for comments...

Yes, start coding already, dagnabbit! Unless you want us to come up with the program ourselves, of course. Try it out, see what works and what doesn't.

Another hint I can give you is that you should avoid falling into the trap of thinking 'oooo, that's a lot of C, that's going to take awhile'. On a moderarely fast CPU (which you are surely using), such routines hardly take more than a few milliseconds. Even if you add a small parser. Don't worry about speed. You'll be spending far more time coding the routines than can be made up by code optimisations or clever algorithms.

And finally, could you please cut long comments in two? It is so damn annoying to have to scroll the window sideways in order to read the main text.

sigler · « **Reply #25 on:** December 27, 2006, 10:25:03 PM »

Quote

Jose wrote:
@sigler
Thanks for taking the time but I never bothered with C++ :-) Maybe one of these days...

The one thing that was c++ specific was the use of the map class, the rest can be used in c as well. The map class simple associates a key with some value, and if you search for the key, it returns the value. the map class implements this as a binary tree, so searching for a key is very quick, but you can implement it using a simple list as well, doing a linear search for the key in the list, and once found, return the value.

Anyway, the pseudocode I wrote was how to deal with cyclic pointers, e.g

struct myclass1
{
struct myclass2* pointer;
};

struct myclass2
{
struct myclass1* pointer;
};

Also, maybe you should follow your original idea, extract the debug info in the exe file, parsing the hunks, and the stabs debug info in the debug hunk. I once did this, but it wasn't usable for me, since I needed c++ debug info, and stabs doesn't support namespaces (a feature of c++), but if you only do c, then that's okay.

You still need to embed a pointer in each struct to some typeinfo in order to know of what type a struct is dynamically. But this typeinfo is now automatically created from the stabs info, instead of you needing to manually create it.

--
Sigurd Lerstad

Jose · « **Reply #26 on:** January 20, 2007, 04:05:29 PM »

Hi there:-D. I've made the saver part of the code and it's compiling well for about a week now. I'm now correcting the remaining bugs that make the file save incorrectly. Most types are saving well though, but I've stumbled into something that I hope doesn't make the whole thing useless: Arrays (inside structs or not). It seems that sasc (will try with VBCC at the end cause I need the sasc debugger for now) saves fixed size arrays inside the struct itself. Can I assume this to be the standard or are there some compiler/compiler settings that just add the array's pointer and keep the elements in other place ? If the answer is yes then we're screwed, it won't be "general saver" anymore.

Finally, if this turns out to work, do you guys think it's worth it to make a library of it ?

:pint:

sigler · « **Reply #27 on:** January 26, 2007, 12:52:13 AM »

Hi, yes, struct like

struct x
{
int arr[5];
};

will have the arr data embedded inside the struct at the point of declaration. If not, it would be a pointer like

struct x
{
int *arr;
};

about making it a library: why not.

Quote

Jose wrote:
Hi there:-D. I've made the saver part of the code and it's compiling well for about a week now. I'm now correcting the remaining bugs that make the file save incorrectly. Most types are saving well though, but I've stumbled into something that I hope doesn't make the whole thing useless: Arrays (inside structs or not). It seems that sasc (will try with VBCC at the end cause I need the sasc debugger for now) saves fixed size arrays inside the struct itself. Can I assume this to be the standard or are there some compiler/compiler settings that just add the array's pointer and keep the elements in other place ? If the answer is yes then we're screwed, it won't be "general saver" anymore.

Finally, if this turns out to work, do you guys think it's worth it to make a library of it ?

:pint:

Author Topic: GeneralSaver (was: Access to compiler's variable types possible ?) (Read 7508 times)

Jose

Re: Access to compiler's variable types possible ?

Jose

Re: Access to compiler's variable types possible ?

Cymric

Re: Access to compiler's variable types possible ?

Karlos

Re: Access to compiler's variable types possible ?

Karlos

Re: Access to compiler's variable types possible ?

sigler

Re: Access to compiler's variable types possible ?

sigler

Re: Access to compiler's variable types possible ?

Jose

Re: Access to compiler's variable types possible ?

Jose

Re: Access to compiler's variable types possible ?

Jose

Re: Access to compiler's variable types possible ?

Cymric

Re: Access to compiler's variable types possible ?

sigler

Re: Access to compiler's variable types possible ?

Jose

Re: Access to compiler's variable types possible ?

sigler

Re: Access to compiler's variable types possible ?