DOM future work: Difference between revisions
→HexBinary type: remove "long" which I think is extraneous, not to indicate a data type |
replace <code> with bold & catg sort & some minor editing & Qs |
||
Line 1: | Line 1: | ||
==Performance optimizations== | ==Performance optimizations== | ||
One major performance optimization can be had by replacing | One major performance optimization can be had by replacing '''printf''' and '''scanf''' in '''daeAtomicType''' with custom XML aware text parsing functions. This is needed for two reasons: | ||
* | *The speed increase | ||
* | *The standard C string formatting differs from XML string formatting. An example illustrating this is floating point infinity and NaN. XML Schema defines these as INF –INF and NaN. Standard c '''printf/scanf''' use #inf, -#inf, #nan. | ||
It may be possible to add accelerator functions to the metaCMPolicy objects. Next to scanf I believe placeElement is the next performance bottleneck. | It may be possible to add accelerator functions to the metaCMPolicy objects. Next to '''scanf''' I believe placeElement is the next performance bottleneck. | ||
==Class hierarchy reorganization== | ==Class hierarchy reorganization== | ||
Many classes inherit from | Many classes inherit from '''daeElement''' just because of the smartRef reference counting. This is incredibly ugly! It bloats and complicates those subclasses (other reference counted objects). It would be nice to change that, but maybe a lot of work requiring a minor release, such as DOM 1.3.0. | ||
==HexBinary type== | ==HexBinary type== | ||
Line 18: | Line 18: | ||
Currently nobody uses it so it’s not a big problem. But if someone were to provide <image><data> it would not be read or written correctly. | Currently nobody uses it so it’s not a big problem. But if someone were to provide <image><data> it would not be read or written correctly. | ||
Currently HexBinary is defined as a daeCharArray. But it needs to be a two-dimensional array daeTArray< daeTArray< daeUChar > * > | Currently HexBinary is defined as a daeCharArray. But it needs to be a two-dimensional array: | ||
daeTArray< daeTArray< daeUChar > * > | |||
This is because hexbinary is a string of characters encoded in hex. 1A2B3C is 3 bytes (characters). The COLLADA Schema uses a list of HexBinary. So “1A2B3C 4D5E6F” requires two three-character arrays. | |||
The major setback for this to work in the DOM is that there needs to be a new metaAttributeArrayArray type. And the logic would be different than the current metaAttribute and MetaAttributeArray classes. | The major setback for this to work in the DOM is that there needs to be a new metaAttributeArrayArray type. And the logic would be different than the current metaAttribute and MetaAttributeArray classes. | ||
Line 28: | Line 30: | ||
I have written a proposal for standardization of external floating point and integer data. | I have written a proposal for standardization of external floating point and integer data. | ||
For doing so I added the daeRawResolver, which allows DOM users to have that extra functionality without the need for clients to do any extra work. | For doing so I added the '''daeRawResolver''', which allows DOM users to have that extra functionality without the need for clients to do any extra work. | ||
The problem is that the URI to specify the [[.raw file]] where the data is requires a query string to store some information. daeURI does not have support for the query string. It needs to be added for the .raw and RawResolver to work correctly. | The problem is that the URI to specify the [[.raw file]] where the data is requires a query string to store some information. daeURI does not have support for the query string. It needs to be added for the .raw and RawResolver to work correctly. | ||
Line 34: | Line 36: | ||
Currently the libxml raw saver and the rawResolver only support 32-bit numbers but the query string “?precision= “ needs to be supported to allow for arbitrary precisions. | Currently the libxml raw saver and the rawResolver only support 32-bit numbers but the query string “?precision= “ needs to be supported to allow for arbitrary precisions. | ||
== | ==I/O plug-in and resolver reorganization== | ||
Working on the Verse asset management database | Working on the Verse asset management database I/O plugin and the COLLADA RT I realized that the current structure for I/O plug-ins is insufficient. | ||
The COLLADA DOM should allow for multiple | The COLLADA DOM should allow for multiple I/O plug-ins to be “registered” with the DOM to allow loading from different sources. (similar to the way OSG IO plug-ins work.) | ||
When doing that the current way resolvers and | When doing that the current way resolvers and plug-ins work will need to be reversed. ''((ANDY: Is that: "When doing that, the current way that rslvs & pis work will need to be reversed", or is it "When doing that the current way, rslvs & pis work will need to be reversed"?))'' | ||
Currently there is a list of resolvers. Each resolver can resolve only specific URI schemes and file extensions. If the resolver qualifies to resolve a URI it may (the default one does) call the | Currently there is a list of resolvers. Each resolver can resolve only specific URI schemes and file extensions. If the resolver qualifies to resolve a URI it may (the default one does) call the I/O plug-in to load a document if the document is not already loaded into the database. | ||
The better way would be for a single resolver class that queries the database for a specific element. The database then has a list of | The better way would be for a single resolver class that queries the database for a specific element. The database then has a list of I/O plug-ins which can only load from specific URI schemes and file extensions. If the database doesn’t have the document the resolver is searching for it can load the document. The loading would be handled by the most appropriate plugin, i.e. http and file schemes handled by '''libxml''' plug-in, verse scheme handled by verse plug-in etc. | ||
==SID resolvers== | ==SID resolvers== | ||
Line 50: | Line 52: | ||
The SID resolver as it stands works. | The SID resolver as it stands works. | ||
The COLLADA schema needs to be pushed to give some more semantic meaning to the types it uses. Often there are xs:NCName with semantic meaning but no way to know based on the name, just the context. | The COLLADA schema needs to be pushed to give some more semantic meaning to the types it uses. Often there are '''xs:NCName''' with semantic meaning but no way to know based on the name, just the context. | ||
The data type should be named SIDType and SIDRef type to give these | The data type should be named '''SIDType''' and SIDRef type ''((ANDY: do you mean "SIDRefType"?))'' to give these '''NCName'''s a semantic meaning. | ||
When that happens the SIDResolver can be made to resolve SIDRef types automatically similar to the way URI and IDRef | When that happens, the '''SIDResolver''' can be made to resolve SIDRef types automatically similar to the way URI and IDRef are resolved automatically upon load. | ||
==String table and memory system== | ==String table and memory system== | ||
Line 62: | Line 64: | ||
They would both drastically improve DOM memory usage, the stringTable should most likely help a lot more than the memorySystem. | They would both drastically improve DOM memory usage, the stringTable should most likely help a lot more than the memorySystem. | ||
[[Category:DOM project]] | [[Category:DOM project|Future work]] |
Revision as of 22:37, 23 March 2007
Performance optimizations
One major performance optimization can be had by replacing printf and scanf in daeAtomicType with custom XML aware text parsing functions. This is needed for two reasons:
- The speed increase
- The standard C string formatting differs from XML string formatting. An example illustrating this is floating point infinity and NaN. XML Schema defines these as INF –INF and NaN. Standard c printf/scanf use #inf, -#inf, #nan.
It may be possible to add accelerator functions to the metaCMPolicy objects. Next to scanf I believe placeElement is the next performance bottleneck.
Class hierarchy reorganization
Many classes inherit from daeElement just because of the smartRef reference counting. This is incredibly ugly! It bloats and complicates those subclasses (other reference counted objects). It would be nice to change that, but maybe a lot of work requiring a minor release, such as DOM 1.3.0.
HexBinary type
The xs:Hexbinary type is not implemented correctly.
Currently nobody uses it so it’s not a big problem. But if someone were to provide <image> it would not be read or written correctly.
Currently HexBinary is defined as a daeCharArray. But it needs to be a two-dimensional array:
daeTArray< daeTArray< daeUChar > * >
This is because hexbinary is a string of characters encoded in hex. 1A2B3C is 3 bytes (characters). The COLLADA Schema uses a list of HexBinary. So “1A2B3C 4D5E6F” requires two three-character arrays.
The major setback for this to work in the DOM is that there needs to be a new metaAttributeArrayArray type. And the logic would be different than the current metaAttribute and MetaAttributeArray classes.
I don’t know what would need to be done for this to work.
Raw format
I have written a proposal for standardization of external floating point and integer data.
For doing so I added the daeRawResolver, which allows DOM users to have that extra functionality without the need for clients to do any extra work.
The problem is that the URI to specify the .raw file where the data is requires a query string to store some information. daeURI does not have support for the query string. It needs to be added for the .raw and RawResolver to work correctly.
Currently the libxml raw saver and the rawResolver only support 32-bit numbers but the query string “?precision= “ needs to be supported to allow for arbitrary precisions.
I/O plug-in and resolver reorganization
Working on the Verse asset management database I/O plugin and the COLLADA RT I realized that the current structure for I/O plug-ins is insufficient.
The COLLADA DOM should allow for multiple I/O plug-ins to be “registered” with the DOM to allow loading from different sources. (similar to the way OSG IO plug-ins work.)
When doing that the current way resolvers and plug-ins work will need to be reversed. ((ANDY: Is that: "When doing that, the current way that rslvs & pis work will need to be reversed", or is it "When doing that the current way, rslvs & pis work will need to be reversed"?))
Currently there is a list of resolvers. Each resolver can resolve only specific URI schemes and file extensions. If the resolver qualifies to resolve a URI it may (the default one does) call the I/O plug-in to load a document if the document is not already loaded into the database.
The better way would be for a single resolver class that queries the database for a specific element. The database then has a list of I/O plug-ins which can only load from specific URI schemes and file extensions. If the database doesn’t have the document the resolver is searching for it can load the document. The loading would be handled by the most appropriate plugin, i.e. http and file schemes handled by libxml plug-in, verse scheme handled by verse plug-in etc.
SID resolvers
The SID resolver as it stands works.
The COLLADA schema needs to be pushed to give some more semantic meaning to the types it uses. Often there are xs:NCName with semantic meaning but no way to know based on the name, just the context.
The data type should be named SIDType and SIDRef type ((ANDY: do you mean "SIDRefType"?)) to give these NCNames a semantic meaning.
When that happens, the SIDResolver can be made to resolve SIDRef types automatically similar to the way URI and IDRef are resolved automatically upon load.
String table and memory system
Implement them to actually do what they should.
They would both drastically improve DOM memory usage, the stringTable should most likely help a lot more than the memorySystem.