Converting ICalendar to XML
I've started working on a CalDAV implementation, which also requires analysis of ICalendar (rfc 2554) objects.
ICalendar objects have properties, components (such as VEVENT, VTODO) and attributes. This is awfully familiar to XML. So instead of trying to come up with a complicated parser and object structure, I decided to just convert it to XML and use PHP's simplexml.
This is my current script:
<?php
function iCalendarToXML($icalendarData) {
// Detecting line endings
if (strpos($icalendarData,"\r\n")) $lb = "\r\n";
elseif (strpos($icalendarData,"\n")) $lb = "\n";
else $lb = "\r\n";
// Splitting up items per line
$lines = explode($lb,$icalendarData);
// Properties can be folded over 2 lines. In this case the second
// line will be preceeded by a space or tab.
$lines2 = array();
foreach($lines as $line) {
if ($line[0]==" " || $line[0]=="\t") {
$lines2[count($lines2)-1].=substr($line,1);
continue;
}
$lines2[]=$line;
}
$xml = '<?xml version="1.0"?>' . "\n";
$spaces = 0;
foreach($lines2 as $line) {
$matches = array();
// This matches PROPERTYNAME;ATTRIBUTES:VALUE
if (preg_match('/^([^:^;]*)(?:;([^:]*))?:(.*)$/',$line,$matches)) {
$propertyName = strtoupper($matches[1]);
$attributes = $matches[2];
$value = $matches[3];
// If the line was in the format BEGIN:COMPONENT or END:COMPONENT, we need to special case it.
if ($propertyName == 'BEGIN') {
$xml.=str_repeat(" ",$spaces);
$xml.='<' . strtoupper($value) . ">\n";
$spaces+=2;
continue;
} elseif ($propertyName == 'END') {
$spaces-=2;
$xml.=str_repeat(" ",$spaces);
$xml.='</' . strtoupper($value) . ">\n";
continue;
}
$xml.=str_repeat(" ",$spaces);
$xml.='<' . $propertyName;
if ($attributes) {
// There can be multiple attributes
$attributes = explode(';',$attributes);
foreach($attributes as $att) {
list($attName,$attValue) = explode('=',$att,2);
$xml.=' ' . $attName . '="' . htmlspecialchars($attValue) . '"';
}
}
$xml.='>'. htmlspecialchars($value) . '</' . $propertyName . ">\n";
}
}
return $xml;
}
?>
This will convert:
BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Example Corp.//CalDAV Client//EN
BEGIN:VTIMEZONE
LAST-MODIFIED:20040110T032845Z
TZID:US/Eastern
BEGIN:DAYLIGHT
DTSTART:20000404T020000
RRULE:FREQ=YEARLY;BYDAY=1SU;BYMONTH=4
TZNAME:EDT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
END:DAYLIGHT
BEGIN:STANDARD
DTSTART:20001026T020000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:EST
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DESCRIPTION:Hello Im evert
Next line also
Blabla
ATTENDEE;PARTSTAT=ACCEPTED;ROLE=CHAIR:mailto:cyrus@example.com
ATTENDEE;PARTSTAT=NEEDS-ACTION:mailto:lisa@example.com
DTSTAMP:20060206T001220Z
DTSTART;TZID=US/Eastern:20060104T100000
DURATION:PT1H
LAST-MODIFIED:20060206T001330Z
ORGANIZER:mailto:cyrus@example.com
SEQUENCE:1
STATUS:TENTATIVE
SUMMARY:Event #3
UID:DC6C50A017428C5216A2F1CD@example.com
X-ABC-GUID:E1CX5Dr-0007ym-Hz@example.com
END:VEVENT
END:VCALENDAR
To:
<?xml version="1.0"?>
<VCALENDAR>
<VERSION>2.0</VERSION>
<PRODID>-//Example Corp.//CalDAV Client//EN</PRODID>
<VTIMEZONE>
<LAST-MODIFIED>20040110T032845Z</LAST-MODIFIED>
<TZID>US/Eastern</TZID>
<DAYLIGHT>
<DTSTART>20000404T020000</DTSTART>
<RRULE>FREQ=YEARLY;BYDAY=1SU;BYMONTH=4</RRULE>
<TZNAME>EDT</TZNAME>
<TZOFFSETFROM>-0500</TZOFFSETFROM>
<TZOFFSETTO>-0400</TZOFFSETTO>
</DAYLIGHT>
<STANDARD>
<DTSTART>20001026T020000</DTSTART>
<RRULE>FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10</RRULE>
<TZNAME>EST</TZNAME>
<TZOFFSETFROM>-0400</TZOFFSETFROM>
<TZOFFSETTO>-0500</TZOFFSETTO>
</STANDARD>
</VTIMEZONE>
<VEVENT>
<DESCRIPTION>Hello Im evertNext line also Blabla</DESCRIPTION>
<ATTENDEE PARTSTAT="ACCEPTED" ROLE="CHAIR">mailto:cyrus@example.com</ATTENDEE>
<ATTENDEE PARTSTAT="NEEDS-ACTION">mailto:lisa@example.com</ATTENDEE>
<DTSTAMP>20060206T001220Z</DTSTAMP>
<DTSTART TZID="US/Eastern">20060104T100000</DTSTART>
<DURATION>PT1H</DURATION>
<LAST-MODIFIED>20060206T001330Z</LAST-MODIFIED>
<ORGANIZER>mailto:cyrus@example.com</ORGANIZER>
<SEQUENCE>1</SEQUENCE>
<STATUS>TENTATIVE</STATUS>
<SUMMARY>Event #3</SUMMARY>
<UID>DC6C50A017428C5216A2F1CD@example.com</UID>
<X-ABC-GUID>E1CX5Dr-0007ym-Hz@example.com</X-ABC-GUID>
</VEVENT>
</VCALENDAR>
I hope this is useful to anyone else.
CDATA in xml.. bad idea?
While working on a simple feed parser, I hit upon some wordpress feeds.
I noticed that wordpress feeds make heavy usage of CDATA to encode content. I always figured this was a bad idea if you cannot control what ends up in the xml feed. (Example here.).
Doing some googling to see if I'm not just kicking dust brought me to an xml.com article titled 'Escaped Markup Considered Harmful, which seems to agree with my standpoint for the following reason:
Escaping markup, particularly with CDATA sections, just doesn't work. There are other things that might be wrong that would make the documents not well formed. There are Unicode characters that are forbidden, there are encoding issues for the characters that are allowed, and there are sequences of characters that must be avoided. (e.g., "]]>"). Not to mention the fact that CDATA sections don't nest.
CDATA can't be used to just dump in any type of content that won't work in normal XML sections.. You're still obligated to make your data valid unicode. In fact, it's the opposite; There's no way you could ever escape the ]]> character sequence.







