diff options
Diffstat (limited to 'chromium/third_party/cygwin/lib/perl5/vendor_perl/5.10/XML/SAX/Intro.pod')
-rw-r--r-- | chromium/third_party/cygwin/lib/perl5/vendor_perl/5.10/XML/SAX/Intro.pod | 407 |
1 files changed, 0 insertions, 407 deletions
diff --git a/chromium/third_party/cygwin/lib/perl5/vendor_perl/5.10/XML/SAX/Intro.pod b/chromium/third_party/cygwin/lib/perl5/vendor_perl/5.10/XML/SAX/Intro.pod deleted file mode 100644 index 4a9a405160d..00000000000 --- a/chromium/third_party/cygwin/lib/perl5/vendor_perl/5.10/XML/SAX/Intro.pod +++ /dev/null @@ -1,407 +0,0 @@ -=head1 NAME - -XML::SAX::Intro - An Introduction to SAX Parsing with Perl - -=head1 Introduction - -XML::SAX is a new way to work with XML Parsers in Perl. In this article -we'll discuss why you should be using SAX, why you should be using -XML::SAX, and we'll see some of the finer implementation details. The -text below assumes some familiarity with callback, or push based -parsing, but if you are unfamiliar with these techniques then a good -place to start is Kip Hampton's excellent series of articles on XML.com. - -=head1 Replacing XML::Parser - -The de-facto way of parsing XML under perl is to use Larry Wall and -Clark Cooper's XML::Parser. This module is a Perl and XS wrapper around -the expat XML parser library by James Clark. It has been a hugely -successful project, but suffers from a couple of rather major flaws. -Firstly it is a proprietary API, designed before the SAX API was -conceived, which means that it is not easily replaceable by other -streaming parsers. Secondly it's callbacks are subrefs. This doesn't -sound like much of an issue, but unfortunately leads to code like: - - sub handle_start { - my ($e, $el, %attrs) = @_; - if ($el eq 'foo') { - $e->{inside_foo}++; # BAD! $e is an XML::Parser::Expat object. - } - } - -As you can see, we're using the $e object to hold our state -information, which is a bad idea because we don't own that object - we -didn't create it. It's an internal object of XML::Parser, that happens -to be a hashref. We could all too easily overwrite XML::Parser internal -state variables by using this, or Clark could change it to an array ref -(not that he would, because it would break so much code, but he could). - -The only way currently with XML::Parser to safely maintain state is to -use a closure: - - my $state = MyState->new(); - $parser->setHandlers(Start => sub { handle_start($state, @_) }); - -This closure traps the $state variable, which now gets passed as the -first parameter to your callback. Unfortunately very few people use -this technique, as it is not documented in the XML::Parser POD files. - -Another reason you might not want to use XML::Parser is because you -need some feature that it doesn't provide (such as validation), or you -might need to use a library that doesn't use expat, due to it not being -installed on your system, or due to having a restrictive ISP. Using SAX -allows you to work around these restrictions. - -=head1 Introducing SAX - -SAX stands for the Simple API for XML. And simple it really is. -Constructing a SAX parser and passing events to handlers is done as -simply as: - - use XML::SAX; - use MySAXHandler; - - my $parser = XML::SAX::ParserFactory->parser( - Handler => MySAXHandler->new - ); - - $parser->parse_uri("foo.xml"); - -The important concept to grasp here is that SAX uses a factory class -called XML::SAX::ParserFactory to create a new parser instance. The -reason for this is so that you can support other underlying -parser implementations for different feature sets. This is one thing -that XML::Parser has always sorely lacked. - -In the code above we see the parse_uri method used, but we could -have equally well -called parse_file, parse_string, or parse(). Please see XML::SAX::Base -for what these methods take as parameters, but don't be fooled into -believing parse_file takes a filename. No, it takes a file handle, a -glob, or a subclass of IO::Handle. Beware. - -SAX works very similarly to XML::Parser's default callback method, -except it has one major difference: rather than setting individual -callbacks, you create a new class in which to recieve the callbacks. -Each callback is called as a method call on an instance of that handler -class. An example will best demonstrate this: - - package MySAXHandler; - use base qw(XML::SAX::Base); - - sub start_document { - my ($self, $doc) = @_; - # process document start event - } - - sub start_element { - my ($self, $el) = @_; - # process element start event - } - -Now, when we instantiate this as above, and parse some XML with this as -the handler, the methods start_document and start_element will be -called as method calls, so this would be the equivalent of directly -calling: - - $object->start_element($el); - -Notice how this is different to XML::Parser's calling style, which -calls: - - start_element($e, $name, %attribs); - -It's the difference between function calling and method calling which -allows you to subclass SAX handlers which contributes to SAX being a -powerful solution. - -As you can see, unlike XML::Parser, we have to define a new package in -which to do our processing (there are hacks you can do to make this -uneccessary, but I'll leave figuring those out to the experts). The -biggest benefit of this is that you maintain your own state variable -($self in the above example) thus freeing you of the concerns listed -above. It is also an improvement in maintainability - you can place the -code in a separate file if you wish to, and your callback methods are -always called the same thing, rather than having to choose a suitable -name for them as you had to with XML::Parser. This is an obvious win. - -SAX parsers are also very flexible in how you pass a handler to them. -You can use a constructor parameter as we saw above, or we can pass the -handler directly in the call to one of the parse methods: - - $parser->parse(Handler => $handler, - Source => { SystemId => "foo.xml" }); - # or... - $parser->parse_file($fh, Handler => $handler); - -This flexibility allows for one parser to be used in many different -scenarios throughout your script (though one shouldn't feel pressure to -use this method, as parser construction is generally not a time -consuming process). - -=head1 Callback Parameters - -The only other thing you need to know to understand basic SAX is the -structure of the parameters passed to each of the callbacks. In -XML::Parser, all parameters are passed as multiple options to the -callbacks, so for example the Start callback would be called as -my_start($e, $name, %attributes), and the PI callback would be called -as my_processing_instruction($e, $target, $data). In SAX, every -callback is passed a hash reference, containing entries that define our -"node". The key callbacks and the structures they receive are: - -=head2 start_element - -The start_element handler is called whenever a parser sees an opening -tag. It is passed an element structure consisting of: - -=over 4 - -=item LocalName - -The name of the element minus any namespace prefix it may -have come with in the document. - -=item NamespaceURI - -The URI of the namespace associated with this element, -or the empty string for none. - -=item Attributes - -A set of attributes as described below. - -=item Name - -The name of the element as it was seen in the document (i.e. -including any prefix associated with it) - -=item Prefix - -The prefix used to qualify this element's namespace, or the -empty string if none. - -=back - -The B<Attributes> are a hash reference, keyed by what we have called -"James Clark" notation. This means that the attribute name has been -expanded to include any associated namespace URI, and put together as -{ns}name, where "ns" is the expanded namespace URI of the attribute if -and only if the attribute had a prefix, and "name" is the LocalName of -the attribute. - -The value of each entry in the attributes hash is another hash -structure consisting of: - -=over 4 - -=item LocalName - -The name of the attribute minus any namespace prefix it may have -come with in the document. - -=item NamespaceURI - -The URI of the namespace associated with this attribute. If the -attribute had no prefix, then this consists of just the empty string. - -=item Name - -The attribute's name as it appeared in the document, including any -namespace prefix. - -=item Prefix - -The prefix used to qualify this attribute's namepace, or the -empty string if none. - -=item Value - -The value of the attribute. - -=back - -So a full example, as output by Data::Dumper might be: - - .... - -=head2 end_element - -The end_element handler is called either when a parser sees a closing -tag, or after start_element has been called for an empty element (do -note however that a parser may if it is so inclined call characters -with an empty string when it sees an empty element. There is no simple -way in SAX to determine if the parser in fact saw an empty element, a -start and end element with no content.. - -The end_element handler receives exactly the same structure as -start_element, minus the Attributes entry. One must note though that it -should not be a reference to the same data as start_element receives, -so you may change the values in start_element but this will not affect -the values later seen by end_element. - -=head2 characters - -The characters callback may be called in serveral circumstances. The -most obvious one is when seeing ordinary character data in the markup. -But it is also called for text in a CDATA section, and is also called -in other situations. A SAX parser has to make no guarantees whatsoever -about how many times it may call characters for a stretch of text in an -XML document - it may call once, or it may call once for every -character in the text. In order to work around this it is often -important for the SAX developer to use a bundling technique, where text -is gathered up and processed in one of the other callbacks. This is not -always necessary, but it is a worthwhile technique to learn, which we -will cover in XML::SAX::Advanced (when I get around to writing it). - -The characters handler is called with a very simple structure - a hash -reference consisting of just one entry: - -=over 4 - -=item Data - -The text data that was received. - -=back - -=head2 comment - -The comment callback is called for comment text. Unlike with -C<characters()>, the comment callback *must* be invoked just once for an -entire comment string. It receives a single simple structure - a hash -reference containing just one entry: - -=over 4 - -=item Data - -The text of the comment. - -=back - -=head2 processing_instruction - -The processing instruction handler is called for all processing -instructions in the document. Note that these processing instructions -may appear before the document root element, or after it, or anywhere -where text and elements would normally appear within the document, -according to the XML specification. - -The handler is passed a structure containing just two entries: - -=over 4 - -=item Target - -The target of the processing instrcution - -=item Data - -The text data in the processing instruction. Can be an empty -string for a processing instruction that has no data element. -For example E<lt>?wiggle?E<gt> is a perfectly valid processing instruction. - -=back - -=head1 Tip of the iceberg - -What we have discussed above is really the tip of the SAX iceberg. And -so far it looks like there's not much of interest to SAX beyond what we -have seen with XML::Parser. But it does go much further than that, I -promise. - -People who hate Object Oriented code for the sake of it may be thinking -here that creating a new package just to parse something is a waste -when they've been parsing things just fine up to now using procedural -code. But there's reason to all this madness. And that reason is SAX -Filters. - -As you saw right at the very start, to let the parser know about our -class, we pass it an instance of our class as the Handler to the -parser. But now imagine what would happen if our class could also take -a Handler option, and simply do some processing and pass on our data -further down the line? That in a nutshell is how SAX filters work. It's -Unix pipes for the 21st century! - -There are two downsides to this. Number 1 - writing SAX filters can be -tricky. If you look into the future and read the advanced tutorial I'm -writing, you'll see that Handler can come in several shapes and sizes. -So making sure your filter does the right thing can be tricky. -Secondly, constructing complex filter chains can be difficult, and -simple thinking tells us that we only get one pass at our document, -when often we'll need more than that. - -Luckily though, those downsides have been fixed by the release of two -very cool modules. What's even better is that I didn't write either of -them! - -The first module is XML::SAX::Base. This is a VITAL SAX module that -acts as a base class for all SAX parsers and filters. It provides an -abstraction away from calling the handler methods, that makes sure your -filter or parser does the right thing, and it does it FAST. So, if you -ever need to write a SAX filter, which if you're processing XML -> XML, -or XML -> HTML, then you probably do, then you need to be writing it as -a subclass of XML::SAX::Base. Really - this is advice not to ignore -lightly. I will not go into the details of writing a SAX filter here. -Kip Hampton, the author of XML::SAX::Base has covered this nicely in -his article on XML.com here <URI>. - -To construct SAX pipelines, Barrie Slaymaker, a long time Perl hacker -who's modules you will probably have heard of or used, wrote a very -clever module called XML::SAX::Machines. This combines some really -clever SAX filter-type modules, with a construction toolkit for filters -that makes building pipelines easy. But before we see how it makes -things easy, first lets see how tricky it looks to build complex SAX -filter pipelines. - - use XML::SAX::ParserFactory; - use XML::Filter::Filter1; - use XML::Filter::Filter2; - use XML::SAX::Writer; - - my $output_string; - my $writer = XML::SAX::Writer->new(Output => \$output_string); - my $filter2 = XML::SAX::Filter2->new(Handler => $writer); - my $filter1 = XML::SAX::Filter1->new(Handler => $filter2); - my $parser = XML::SAX::ParserFactory->parser(Handler => $filter1); - - $parser->parse_uri("foo.xml"); - -This is a lot easier with XML::SAX::Machines: - - use XML::SAX::Machines qw(Pipeline); - - my $output_string; - my $parser = Pipeline( - XML::SAX::Filter1 => XML::SAX::Filter2 => \$output_string - ); - - $parser->parse_uri("foo.xml"); - -One of the main benefits of XML::SAX::Machines is that the pipelines -are constructed in natural order, rather than the reverse order we saw -with manual pipeline construction. XML::SAX::Machines takes care of all -the internals of pipe construction, providing you at the end with just -a parser you can use (and you can re-use the same parser as many times -as you need to). - -Just a final tip. If you ever get stuck and are confused about what is -being passed from one SAX filter or parser to the next, then -Devel::TraceSAX will come to your rescue. This perl debugger plugin -will allow you to dump the SAX stream of events as it goes by. Usage is -really very simple just call your perl script that uses SAX as follows: - - $ perl -d:TraceSAX <scriptname> - -And preferably pipe the output to a pager of some sort, such as more or -less. The output is extremely verbose, but should help clear some -issues up. - -=head1 AUTHOR - -Matt Sergeant, matt@sergeant.org - -$Id: Intro.pod,v 1.3 2002/04/30 07:16:00 matt Exp $ - -=cut |