This is a TODO file for XSL-T 2.0 support. - LHF: * Warning bug, last parameter is always whined about. * Box in comment/PI/text/ws(?) handling -- pending Matthias * type036 -- namespace on top element isn't copied * XTDE0865 * Attend XSLTTokenizer::isXSLT() * Remove redundant set() calls in setFocusHelper(). - Missing features: General Priority --------------------- * 1.0 QXmlQuery::evaluateTo(QIODevice *) P1 DONE * 1.0 Test suite integration P1 DONE * 1.0 xsl:key P1 * 1.0 fn:key() P1 * 1.0 2.0 Compatibility mode P1 * 1.0 Regular parameters in templates P1 * 1.0 xsl:include P1 * 1.0 xsl:copy-of P1 * 1.0 xsl:copy P1 * 1.0 xsl:import P1 * 1.0 fn:format-number P1 * 1.0 xsl:message P2 * 1.0 fn:current() P1 DONE * 2.0 fn:type-available() P3 DONE * 2.0 xsl:use-when P3 * 2.0 fn:unparsed-entity-uri() P3 * 2.0 fn:unparsed-entity-public-id() P3 * 2.0 Tunnel Parameters P3 * 2.0 xsl:attribute-set P3 * 1.0 xsl:decimal-format P2 * 1.0 xmlpatterns: initial template P1 DONE * 1.0 xsl:number P1 * 1.0 Complete handling of xsl:sort P2 * 2.0 Grouping - fn:current-group() - fn:grouping-key() - xsl:for-each-group() * 2.0 Regexp - xsl:analyze-string - xsl:matching-substring - xsl:non-matching-substring - fn:regex-group() * 2.0 Date & Time formatting - fn:format-dateTime() - fn:format-date() - fn:format-time() Serialization & Output: ---------------------- * 1.0 xsl:output --- Tie together serialization. Should we add QXmlQuery::evaluateTo(QIODevice 1.0 const) ? * 2.0 xsl:character-maps * 2.0 xsl:character-map * 2.0 xsl:result-document --- Should the "default output" be handle with xsl:result-document? Would depend on compilation. Optimizations: * Remove adjacent text node constructors * Remove string-join when first arg's static cardinality is not more than one * Remove string-join when the second arg is statically known to be the empty string. * Remove string-join when the second arg is a single space and the parent is a text node ctor. * Rewrite to operand if operands are one. What about type conversions? * Replace lookups with xml:id with calls on id(). * Recognize that a/(b, c) is equal to a/(b | c). The later does selection and node sorting in one step. * Remove LetClause which has empty sequence as return clause, or no variable dependencies at all. * Do a mega test for rewriting /patterns/: "node() | element()" => element() "comment() | node()" => comment() and so forth. This sometimes happens in poorly written patterns. How does this rewrite affect priority calculation? Tests: * xml:id - Come on, the stuff needs to be reorganized xml:id. - Read in xml:id document with whitespace in attrs, write the doc out. Attrs should be normalized. - Do lookups of IDs with xml:id attrs containing whitespace. * current() - Use current() inside each instruction - In a template pattern - Several invocations: current()/current()/current() * Diagnosticsts: - See http://www.w3.org/Bugs/Public/show_bug.cgi?id=5643 . Comments should be taken into account when comparing. This suggests that we don't have any test which produces a document with XML comments. * element-available() - Review the tests. - Try using declarations in XSL-T, should return false - Use xsl:variable(both instr and decl) - invoke with all the XSL-T instructions. - Should return false for when, otherwise, matching-substring, non-matching-substring, etc? - Supply the namespace in the name via the default namespace, no prefix. * unparsed-text() - Load an empty file - Use a fragment in the URI - Use an invalid URI - Use device bindings and a QRC to ensure that we're not using a generic network manager. - Duplicate all the network tests. Same as for doc() * unparsed-text-available() - Same as for unparsed-text() * Sequence constructor that contains only: - XML comment - whitespace text node - processing instruction - a mix of the three * xsl:function - Ensure that it's not it's not in scope for use-when. - xsl:function/xsl:param: use processing instructions, whitespace and comments as child: should be stripped - Use : @name missing. - Don't strip ws, and have ws between two xsl:param, and between xsl:function and xsl:param. - Use xsl:function with no body. - use xsl:param/@tunnel = no - use xsl:param/@tunnel = yes - use an invalid value for xsl:param/@tunnel = yes - Have a non-WS text node in xsl:function/xsl:param/ - Have a WS text node in xsl:function/xsl:param/ - Have a WS text node in xsl:function/xsl:param/ while preserving WS. - use a comment as child of xsl:param - use a PI as child of xsl:param - XTSE0770 with import precedence and all that. - have two identical function in the stylesheet. The last has override=no. Should still report XTSE0770. - have @override with invalid value. - have whitespace inside xsl:param with different strip modes. - Have @select => error - Have body => error - call current() inside body. XPDY0002? * Does xml:base/StaticBaseURI and StaticCompatiblityStore prevent proper type checking due to expectedOperandTypes() returns item()*? * xsl:template/xsl:param - Have @required=yes, and have @select => error - Have @required=yes, and have body => error - Have a variable reference in a template after another, which has param, to ensure they aren't in scope. * xsl:template/@match - Have a pattern with unions, and have a body which relies on its static type. * @version: Have @version on *all* attributes. * xsl:call-template - Have a variable reference just after a xsl:call-template which has with-param, to ensure they aren't in scope. - Have an xsl:with-param which isn't used in the template. Error? - Have an xsl:with-param that has a type error. - an xsl:with-param is not in scope for the next one. Test this => error. - Have a call:template, whose with-param computes its value by calling another template, while using an with-param too. * XQuery: - DONE Ensure namespace {expr} {expr} is flagged as invalid - Use all XSL-T functions: error. Or we do that already? - Ensure order by collation 1 + 1 is an error - Ensure order by collation {1 + 1} is an error * document() - Basic node deduplication, no test exists for that. * xsl:perform-sort - Have no xsl:sort. Error. Must be at least one. - have xsl:sort with invalid value. - sort atomic values. - Trigger "The stable attribute is permitted only on the first xsl:sort element within a sort key specification" - have xsl:sort with no select and no seq ctor. - trigger the delegated queueing. All instructions inside.. xsl:sort? - have multiple sort statements, with the last being only. - have WS between xsl:sort that is not ignorable. - Use a variable reference whose name is equal to our synthetic name. This should be XPST0008, but probably isn't. - Have an invalid value in xsl:sort/order. Use space - have xsl:sort return numbers, but data-type specify string. - have an AVT in xsl:sort/@lang - have an AVT in xsl:sort/@case-order - have an AVT in xsl:sort/@data-type - have an AVT in xsl:sort/@stable - Have mixed result, and hence incorrectly trigger XPTY0018 which the code currently raise. - Depend on the context position inside xsl:sort, when being child of perform-sort. Currently we create singleton focuses(I think), while we want the focus to be over the whole input sequence, not on indivual items. - Have : xsl:sort is missing - Use current() in the xsl:sort and the body, to ensure the right scope is picked up * xsl:copy-of - Have a text node. It's not allowed. - Have PIs, comments, and ignorable whitespace as children. Sigh. * xsl:namespace - Use xsl:fallback. - Use xsl:namespace inside xsl:variable and introspec the result in various ways. This is a big area, we don't have namespace nodes in XQuery. Yes, calling evaluateSingleton() will probably crash. - Use no select and no body, error: XTSE0910 - Have name expression evaluate to the empty sequence. * Sequence ctor that: - Has invalid element in XSL namespace. E.g, xsl:foo * xsl:import - Have element as child as xsl:import: disallowed. - Have text as child as xsl:import: disallowed. - Have PIs and comments as child as xsl:import: allowed. * xsl:include - Have element as child as xsl:include: disallowed. - Have text as child as xsl:include: disallowed. - Have PIs and comments as child as xsl:include: allowed. * xsl:strip-space - Have PIs, comments, whitespace as child. * xsl:element - Extract EBV from result. - Use space in validation element. * xsl:perform-sort - Have PIs and comments in between xsl:sort elements. * xml:space - We never pop our stack. Fix the bug, and ensure we have tests for it. * fn:unparsed-entity-uri - Check type of return value - Do basic unparsed-entity-uri("does-not-exist") * fn:unparsed-entity-public-id - Do basic unparsed-entity-uri("does-not-exist"), two permutations, check the spec * xsl:element - Use disallowed attribute: select - use unknown type in @type - Use @namespace, but be not in the lexical space of xs:anyURI - use disallowed enumeration in @validation - have a name expression that evaluates to a xs:QName value as opposed to a string. - have a name expression that evaluates to a xs:QName value as opposed to a string. but also have the namespace attribute * xsl:attribute - Use disallowed attribute: match - use unknown type in @type - Use @namespace, but be not in the lexical space of xs:anyURI - use disallowed enumeration in @validation - have a name expression that evaluates to a xs:QName value as opposed to a string. - have a name expression that evaluates to a xs:QName value as opposed to a string. but also have the namespace attribute * xsl:template - Use the union keyword, it's forbidden, only "|" is allowed - Use an expression other than Literal and VarRef in KeyValue[8] - use a function other than key(). - have a declaration that only can apperar as a child of xsl:stylesheet. - Have an element in the XSL-T namespace, but which is invalid, e.g "bar" - Use an axis other than child or attribute in pattern. - Have a template that no no match and no name attribute., XTSE0500 - use child::document-node() in pattern - use @foo/child in pattern - apply templates to parentless attributes. - Have 3e3 in @priority - Have a @match with more than two alternatives, e.g "a | b | c", and have them all actually matching. - Use an XML name in the mode so we trigger NCNameConstructor::validateTargetName() - A template which only has a non-WS text node. - A template with param, followed by text node. * Simplified stylesheet - Use @version attribute only on doc element. Should fail, since @{XSL-T]version must be present * fn:current() - Have * xsl:variable have a variable reference appearing before its global declaration, and then somehow trigger recursion. * xsl:choose - elements/nodes intermixed with xsl:choose/xsl:when - invalid attribute on xsl:choose - invalid attribute on xsl:when - invalid attribute on xsl:otherwise - invalid attribute on xsl:if - invalid attribute on xsl:template - invalid attribute on xsl:stylesheet - invalid attribute on xsl:transform - xsl:otherwise in the middle between xsl:when elements. - use namespace declarations on xsl:when - use namespace declarations on xsl:otherwise - use namespace declarations on xsl:choose * Namespaces: - Have: * XPath - For each XQuery-specific expression, add a test using that expression: - typeswitch - let - validate - extension expression - unordered - ordered - for - computed text node constructor - computed attribute constructor - computed comment constructor - computed PI constructor - computed element constructor - computed document constructor - direct element constructor - direct comment constructor - direct PI constructor - all declarations - Use all the predefined prefixes in XQuery; non are in XSL-T. * xsl:when - Use xml:space on it * xsl:otherwise - Use xml:space on it * xsl:version - Use letters, XTSE0110 - Use a float: 2e3, XTSE0110 - Use a weird number, 2.00000001 * xsl:document - use disallowed attribute: select. - use unknown type in @type - use disallowed enumeration in @validation - What happens if the type in @type is unknown? - Use xml:base attr and check URI. * xsl:sequence - use match attribute * xsl:stylesheet - Use @xsl:default-collation on xsl:stylesheet. Shouldn't have any effect. Or? - Use an XSL-T instruction as child -- invalid. - Have an element in the XSL-T namespace, but which is invalid, e.g "foo" - Have xsl:default-collation="http://example.com/" on xsl:stylesheet - Use prefix local: in say a function name. Not allowed. - Use comments after document element. - XTSE0010: - Change the version with @xsl:version on all elements that we have. * Patterns. - Basic */* test: - Basic a/b test: * xsl:strip-whitespace - Use a namespace prefix which is not unboudn - have a syntax error in one of the node tests * xsl:preserve-whitespace - Use a namespace prefix which is not unboudn - have a syntax error in one of the node tests * xsl:value-of - select attribute, and comment in body(error, XTSE0870) - select attribute, and processing instruction in body(error, XTSE0870) - select attribute, and CCDATA in body(error, XTSE0870) - select attribute, and element in body(error, XTSE0870) - use xsl:sequence in body. Default separator should be none. - use match attribute - use double apostrophes/quotes. How are they dealt with? * xsl:apply-templates - use match attribute - apply in a mode for which no templates are declared - apply in a mode which is mispelled for another. - Have: We CRASH * xsl:for-each - No body: - No select attribute: text - Have mixed result, and hence incorrectly trigger XPTY0018 which the code currently raise. - Have: * xsl:variable - Test that function conversion rules are invoked - For what is an xsl:variable in scope? Where does the spec state it? Test that it is not in scope where applicable. - Have: * xsl:text - count the result of a template that has text node(non-ws), xsl:text(content), xsl:content(zero content), text node(non-ws - Have an element inside xsl:text: XTSE0010. - Use comments and PIs intermixed with text inside. * xsl:for-each - use match attribute - What should this produce? Saxon produces "123" but with xsl:text removed, "1 2 3". * xsl:if - Have body. Error - Have . That is, empty body. * xsl:sequence - select attribute missing: - content other than xsl:fallback, e.g text node. - How do we sort? * for every state for XSL-T parsing: - Use invalid element that is in the XSL-T namespace. * In all cases expressions are queued/generated: - Trigger expression precedence bugs, due to lack of paranteses. * Use xml:space in stylsheeet that doesn't have value preserve nor default. * For each case we have while(!reader.atEnd()): - test that triggers parser error and that we detect it properly. * for every element that allows text: * Use CDATA. QXmlStreamReader distinguishes between the two. text before and after.:wa * Use XML Comments and split up text nodes. * Patterns: * Ensure node() doesn't match document nodes(). * "//" is an invalid pattern * Is there some expression which has no children? findAxisStep() * Use @*/asdo * XPST0003: key(concat("abc", "def"), "abc") * XPST0003: id(concat("abc", "def")) * XPST0003: concat('abc', 'def') // WILL CRASH * XPST0003: unknownFunction() * Use double: key("abc", 3e3) * Use three argument key() in pattern. * Simplified stylsheet modules: * Omit the xsl:version attribute. XTSE0010 * type-available() * We have no tests at all? * have xml:base on the following elements and check them with static-base-uri(): - all instructions - all declarations, eg: - xsl:choose, xsl:choice, xsl:otherwise - xsl:template - xsl:function - etc Embedded stylesheet modules - Verify that we don't choke on xsl attrs with invalid attributes outside; "In an embedded stylesheet module, standard attributes appearing on ancestors of the outermost element of the stylesheet module have no effect." Parsing: - Error message for: - Use the document " *( ((/)/call-template(t0)) (*/call-template(t1)) (element(asd)/call-template(t2)) (comment()/call-template(t3)) (a/b/call-template(t3)) ) ==> */typeswitch(.) case $g0 as document-root() return call-template(t0) case $g0 as element() return call-template(t1) case $g0 as element(asd) return call-template(t2) case $g0 as comment() return (call-template(t3) case $g0 as a/b return (call-template(t4) Patterns are used in: - xsl:for-each-group/@group-starting-with - xsl:key/@match - xsl:number/(@count, @from) - xsl:template/@match c/b => child-or-self::element(b)[parent::element(c)] c/b/a => child-or-self::element(a)[parent::element(b)[parent::element(c)]] d/c/b/a => child-or-self::element(a)[parent::element(b)[parent::element(c)[parent::element(d)]]] ----------------------------------- => child::element(foo) map apply-template(#default) ----------------------------------- ----------------------------------- => let $g0 := for $g1 in child::element(foo) order by @bar return $g1 return apply-template(yo) ----------------------------------- ----------------------------------- => sort $in/order by @sortKey ----------------------------------- ----------- John Snelson of Oracle Berkeley DB XML & XQilla writes in private mail: I'd had the same thought myself, too - you must be reading my mind ;-) What is he referring to? If one spends some time on fancy diagrams, QtXmlPatterns[LINK]'s, the XQuery engine that shipped in Qt 4.4, architecture looks like this: Recently I've started implementing XSL-T 2.0(hopefully to be done for Qt 4.5) and the whole approach to this is modifying the existing codebase as follows: Put differently, when QtXmlPatterns is dealing with XSL-T stylesheets, it replaces the XQuery tokenizer with a tokenizer which translates the stylesheet into XQuery tokens, that is consumed by the existing XQuery parser, extended with both grammar non-terminals and tokens to accomodate the XSL-T features that XQuery doesn't have. In compiler terms, it can be seen as an "extreme" frontend. Not only is the same intermediate representation used, the grammar is too. What is the point of this? The functional overlaps XQuery, XSL-T and others as well have is of course widely established. Even the specifications are at times generated from the same source documents, and that implementations subsequently modularize code is of course second nature to any engineer, and seen to some degree or another in contemporary implementations. Typically this happens in a traditional fashion, classes are re-used, their functionality widened to cover both/more languages. However, I believe doing it directly on the grammar level introduce several advantages. The parser is based on Bison and since it's not running in the experimental pull mode, it uninterruptedly calls the tokenizer. The tokenizer, class XSLTTokenizer, in turns calls an pull-based XML parser: QXmlStreamReader. What often complicate in ocassions like this is who that gets the right to call who, and who gets the convenience of tracking state in a natural way through a call stack. XSLTTokenizer is conveniently implemented: as it encounters declarations and instructions in the stylsheet, it recursively descends in the XSL-T grammar through its own functions, adding tokens to a queue, which is delivered to the parser when asked -- and when the queue is empty it resumes queuing tokens. The tokenizer is fairly crude, it queues up tokens for instructions uninterrupted, and only have states between declarations. Hence, XSLTTokenizer queues up tokens for each template and function body, but enters "delivery mode" inbetween. This of course periodically breaks streaming since it's buffering up tokens, but considering that the memory usage for tokens is low and that a finer granularity for states(say, being able to pop the stacks when being inbetween two xsl:when elements) requires a significant effort, this is fine until proven otherwise. Advantages --------------- discuss analysis. XSLTTokenizer rewrite XSL-T to XQuery as follows:' Instructions ------------- xsl:if An if/then/else expression whose else branch is the empty sequence xsl:choose: again, a nesting of if/then/else expressions xsl:value-of: a computed text node constructor. Its body contains a call to string-join() involving the separator attribute xsl:variable: a let/return binding. Since XSL-T is statement-like in its sequence constructors, parantheses are used to ensure the variable binding is in-scope for all subsequent statements. for-each: it is the iteration/mapping mechanism XQuery fails to supply, despite path steps and the FLWOR machinery. for-each iterates using a focus(which for doesn't, but paths do), but can do so over atomic values and unconditionally without sorting the result by document order(which paths can't/doesn't, but for do). For implementations that normalize paths into for loops as the formal semantics do, the approach is straight forward. In QtXmlPatterns' case, a synthetic token is queued which signals to create a "relaxed" path expression which skips halting on atomic values in its operands(XPTY0019) and also skips node sorting. All "direct" node constructors, like , and "computed" node constructors, like xsl:element, are all rewritten into the corresponding XQuery computed node constructors. In some cases direct node constructors could have been used, but in anycase the IR yielded is the same, and that computed constructors happen to use less tokens. A particular case is xsl:namespace, an instruction which doesn't have any corresponding expression in XQuery. In the case of QtXmlPatterns, the code obvious already have a notion of "create this namespace on this element", and an AST node was trivially added for fetching the namespace components computationally. However, the introduction of xsl:namespace in an XQuery implementation is not to be taken lightly wrt. to for instance testing, since it introduces a new node type. perform-sort: surprisingly this expression of all complicate matters, for the simple reason that its operands occur in the opposite order compared to XQuery when the input sequence is supplied through a sequence constructor, hence breaking the streamed approach. XSLTokenizer solves this by buffer: the attributes of the xsl:perform-sort elements are stored, the xsl:sort elements queued onto a temporary queue, and subsequently is either the select attribute or the sequence constructor queued, and the tokens for xsl:sort appended afterwards. This complicated code greatly, since XSLTokenizer had to be able to "move around" sequences of tokens. In addition perform-sort has the same problem as for-each, the iteration mechanism falls inbetween paths and the for loop. The focus for perform-sort is also the focus for the sequence constructor and the select attribute, but the focus for the xsl:sort elements is the initial sequence. This is approached by having a for loop, and where the expression in each order by clause has a relaxed path expression whose left operand is a variable reference to what the for loop bound. TODO Doesn't work. Focus size wrong. This is an approach that implementations of the "second generation" of the technologies can take. The bif difference is that XSL-T 2.0 doesn't have the restrictions of 1.0, more evident in XQuery's syntax. xsl:sort XSL-T is much more dynamic than XQuery through the use of templates, but also because more decisions can be taken at runtime through all attribute value templates. xsl:sort is surely a good example of this with its AVTs for language, order, collation, stability and what not. XQuery's order by stands in strong contrast, which has these coded in the grammar. In QtXmlPatterns' case, the AST node corresponding to order by was generalized to take things such as stability and order from operands. This is paid by the code paths in XQuery since for them are constants generated and inserted as operands even though its known at compile time what is needed. However, considering that these evaluations are not inside the actual sort loop, but instead only computed on each sort invocation, it shouldn't be too bad. xsl:message Templates ------------------------- A big bucket of questions for an XQuery implementation is of course the introduction of templates. In this case it is too of large interest to rewrite relevant code into primitive XQuery expressions. Templates' drawback is often mentioned to be their dynamic nature which makes static inferences hard or impossible. However, by again rewriting in clever ways and making the code visible in a standard way, existing analysis code can operate upon it. For the purposes of this discussion, templates can be broken down into three distinct problems: A Finding what nodes to invoke upon. This is the expression found on xsl:apply-templates/@select, in the case of template rules B Concluding what template to invoke. This is the analyzis and evaluation of patterns, as found on xsl:template/@match, in the case of templates rules. This is seen as a critical, as for instance Michael Kay emphasizes in Saxon: Anatomy of an XSLT processor [LINK http://www.ibm.com/developerworks/library/x-xslt2/] C Invoking the template for the given context node For these three steps, the two first are specific to template rules, while the latter, invoking templates, can be seen to be treated identically regardless of kind: template rules as well as named templates. With this perspective as background, lets try to write it into XQuery primitives. First, all templates regardless of kind are instanciated by name. In the case of templates rules, a synthetic name is given. They are invoked by an XPath function named call-template() that as first argument takes the name of the template, and also handles template parameters. This "template callsite" which is separated from what it is invoked with and whether it is invoked, knows its target template statically, and hence can be subject to inlining, and usual functional analysis. Focus and concatenation of output handled. One should consider whether templates couldn't be considered as functions, with specialized arguments in the case of tunnel parameters. Knowing what templates will be invoked could be used to conclude node sorting is not necessary. Mention how we do builtin templates Attribute Value Templates ------------------------- XSL-T make extensive use of Attribute Value Templates(AVTs), which are handled by turning the grammar piece in XQuery that is closest, into an expression. Simply, ExprSingle[32] is extended with the branch: AVT LPAREN AttrValueContent RPAREN where AVT is a synthetic token XSLTokenizer generates. This means that the code handling AVTs in XQuery's direct attribute constructors handles AVTs as generic expressions. AttrValueContent creates a call to the concat() function, over the operands. Deal with fn:current by using let $current := . return instruction. Another thing related to order and parsing is that XSL-T has more freedom wrt. to where variables are in scope. For instance, a variable declaration appearing after a user function declaration is in scope for the function in XSL-T, but that's not the case in XQuery. This means that delayed variable resolution must be added, something which wasn't, and cannot be active, for the XQuery code. See 9.7 Scope of Variables. The parser generates for the builtin template rules: declare template matches (text() | @*) mode #all { text{.} }; * By having templates invocations essentially expressed as a callsite, or branching, allows control flow analysis in a traditional manner, and hence the possiblity to conclude what templates that are possibly invoked in various contexts (or not invoked). One good example where this could improve template matching is patterns containg predicates: let's say a template matches text nodes with a predicate, but , doh I'm wrong. The problem with expressing template invocation with if expressions, is finding ambiguous matches. Although normalizing down to a small set of primitives has its advantages, one problem is with doing it too early. When doing it directly when tokenization, the higher-level perspective is lost and therefore must be restored again(example?). For instance, if an error is reported in a primitive, it must not appear as originating from that primitive. It's not contstrained to error reporting(example?). However, this is a general problem when compilers shift between different representations. One effect this parsing approach has, is that the stylesheet cannot be used as an input document(e.g, what document("") would evaluate to); in that case it has to be parsed again. I think this is for the better; in the case that the stylsheet has this dual role, it means representations are used which are designed specifically for these respective roles. Although doing a dual parsing step is costly, it's somewhat relieved by that the input is typically cached at the byte level(file system and higher layers such as web/application caches) in the case of traditional file loading. Another problem is that the grammar is used to solve implementation details, and this might show as part of when the parser do error reporting. If one decide to not send XSL-T through the XQuery parser, it can be an advantage to have as little business logic as possible in the XQuery parser such that it can be reused. Some parts of XSL-T's syntax doesn't translate well to XQUery syntax. Some parts doesn't follow structure very strongly, surely not the structures that map well to XQuery's syntax. These are xml:base, @version and other attributes that can appear on any element. Their information needs to be preserved and need to affect the output code, but these cannot be done in a way which fits naturally with the XQuery syntax, and hence leads to workarounds. Have whole section on how we do @version and @xml:base. Another problem is namespace declarations on the top document element. What largely makes me believe this technique fails is that the large and most important parts, templates, instructions, maps well to XQuery, but the small but yet not ignorable details like @version and @xml:base does not, to the degree that the approach at large fails. fn:document() ------------------------ See class documentation for DocumentFN. Document what optimizations one typically wants to implement(const-fold on card 1, constant propagate). In other words, it's reasonable to believe that it's possible to extend the XQuery grammar such that it functionality wise is able to do the same as XSL-T, but this doesn't equal that it is a good way to reach every gritty corner of the XSL-T specification. Patterns -------------------- The up-side-down turning, discuss id/key(). Declarations --------------------- xsl:function: the 'declare function' declaration. TODO override XSL-T's error codes goes against good refactoring. Its codes are specific on each usage, compared to for instance XPTY0004. Optimizations: string-join()/value-of => document-node()/child::element(doc) map apply-template matches child-or-top::element(doc) => N/root(.)//(EE) N == document-node() EE == child::element(doc) => document-node()/root(.)/descendant-or-self::node()/child::element(doc) Optimize out already in createCopyOf() Bugs: - DynamicContextStore and CurrentItemStore needs to implement evaluateToReceiver(). - Don't we have a parsing bug in each place where we call insideSequenceConstructor(), and don't wrap the result in parantheses? E.g, a whitespace node followed by an instruction will lead to parse error if the parent is for instance xsl:when. In patterns we find: - Function :id() - Function :key() - AxisStep - GenericPredicate. Also used for paths. - (CombineNodes) - empty sequence; attribute::foo()/child::asd Test case, tokenizer asserts(fixed in 2a0e83b): Typing code: Compat mode in attribute sets: Space in mode: Type error in global template: Variables are not in scope before its siblings: Crashes: Whitespace handling, the important part is WS after xsl:template: Whitespace handling, preserve, but not inside xsl:apply-templates: MATCH Have top-level xml:space, ensure whitespace as child of xsl:stylesheet is ignored: MATCH Compat mode, Saxon & QtXmlPatterns fails: Compat mode, this is not in the suite: Crashes: Incorrectly yields compile error, XPST0003: Have a basic simplified stylesheet module: Have no @version: Is valid: Is valid: TEXT XTSE0020: XTSE0020: XTSE0805: not XPST0003, not in test suite: Parsing of many exprs in xsl:value-of(with separator): Parsing of many exprs in xsl:value-of(without separator): Check type of empty variables: Crashes: invalid standard attributes on a simplified stylesheet module. Asserts(not wellformed): From within a function, use the focus /through/ a variable reference: Loops infinitely: Gives crash in coloring code: Stylesheet: Focus: < Should evaluate to true: Crashes, should be XTTE0570: def * Parse error: * Write tests with xsl:with-param whose body is empty. That's effectively an empty sequence(?) which needs to be handled properly, and (dynamically) type checked correctly. -------------------------------------------------------------------------- -------------------------------------------------------------------------- ------------------------------------------------------------- /a/b => b[parent::a[parent::document()]] but we currently have: (b[parent::a])[parent::document()] ------------------------------------------------------------- a/b => b[parent::a] ------------------------------------------------------------- a/b/c => c[parent::b[parent::a]] ------------------------------------------------------------- a/b/c/d => d[parent::c[parent::b[parent::a]]] ------------------------------------------------------------- /a/b/c/d => d[parent::c[parent::b[parent::a[parent::document()]]]] This is handled specially; see | SLASH RelativePathPattern b/c rewrites to: TruthPredicate AxisStep self::element(c) AxisStep parent::element(b) For a/b/c we get: TruthPredicate TruthPredicate AxisStep self::element(c) AxisStep parent::element(b) AxisStep parent::element(a) But we want: TruthPredicate AxisStep child-or-top::element(c) TruthPredicate AxisStep parent::element(b) AxisStep parent::element(a) For a/b/c/d we get: TruthPredicate TruthPredicate TruthPredicate AxisStep self::element(d) AxisStep parent::element(c) AxisStep parent::element(b) AxisStep parent::element(a) For a/b/c/d we want: TruthPredicate AxisStep self::element(d) TruthPredicate AxisStep parent::element(c) TruthPredicate AxisStep parent::element(b) AxisStep parent::element(a) For /a/b we get: TruthPredicate TruthPredicate: AxisStep self::element(b) AxisStep parent::element(a) AxisStep parent::document() but we want: TruthPredicate AxisStep self::element(b) TruthPredicate: // PREDICATE AxisStep parent::element(a) AxisStep parent::document() // PREDICATE -------------------------------------------------------------- For a/b/c we get: TruthPredicate AxisStep self::element(c) TruthPredicate parent::element(b) parent::element(a)