Formatting XML Output using PowerShell

Like many people I have to deal with XML output from commands. My main source of pain has been the toXML method of Windows Event Log events, which give lots of useful information, but are hard on the eye. Using the following PowerShell:

> $e = Get-WinEvent -LogName Security -FilterXPath "*[EventData[Data[@Name='TargetUserName']='Angelique.Cortez']]" -MaxEvents 20
> $e[-1].toXML()

Gives the following output:

<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'><System><Provider
 Name='Microsoft-Windows-Security-Auditing' Guid='{54849625-5478-4994-A5BA-
3E3B0328C30D}'/><EventID>4738</EventID><Version>0</Version><Level>0</Level>
<Task>13824</Task><Opcode>0</Opcode><Keywords>0x8020000000000000</Keywords><TimeCreated
 SystemTime='2019-10-08T08:45:56.119393800Z'/><EventRecordID>363027</EventRecordID>
<Correlation/><Execution ProcessID='600' ThreadID='2176'/><Channel>Security</Channel>
<Computer>WIN-90CID1J2CS5.carisbrookelabs.local</Computer><Security/></System><EventData>
<Data Name='Dummy'>-</Data><Data Name='TargetUserName'>Angelique.Cortez</Data><Data
 Name='TargetDomainName'>CARISBROOKELABS</Data><Data Name='TargetSid'>S-1-5-21-447422785-
3715515833-3878445295-1308</Data><Data Name='SubjectUserSid'>S-1-5-21-447422785-
3715515833-3878445295-500</Data><Data Name='SubjectUserName'>Administrator</Data><Data
 Name='SubjectDomainName'>CARISBROOKELABS</Data><Data
 Name='SubjectLogonId'>0x8eb4d</Data><Data Name='PrivilegeList'>-</Data><Data
 Name='SamAccountName'>-</Data><Data Name='DisplayName'>-</Data><Data
 Name='UserPrincipalName'>-</Data><Data Name='HomeDirectory'>-</Data><Data
 Name='HomePath'>-</Data><Data Name='ScriptPath'>-</Data><Data Name='ProfilePath'>-
</Data><Data Name='UserWorkstations'>-</Data><Data Name='PasswordLastSet'>-</Data><Data
 Name='AccountExpires'>-</Data><Data Name='PrimaryGroupId'>-</Data><Data
 Name='AllowedToDelegateTo'>-</Data><Data Name='OldUacValue'>-</Data><Data
 Name='NewUacValue'>-</Data><Data Name='UserAccountControl'>-</Data><Data
 Name='UserParameters'>-</Data><Data Name='SidHistory'>-</Data><Data Name='LogonHours'>-
</Data></EventData></Event>

It’s not easy to find the particular XML node or attribute you are looking for. One great shortcut is to coerce the output (which is a string object) into an XML object, then use the Save method to output to the console.

([xml]$e[-1].toXML()).Save([Console]::Out)

This produces nicely-formatted output. Unfortunately this doesn’t work when you are connected to a computer using PowerShell remoting. The [Console] device isn’t there for you. If you spend any amount of time in remote powershell sessions then this can become a little annoying. Another solution is required.

Some examples on the web show the use of the System.XMl.XmlTextWriter class to output to the screen. Interestingly, the Microsoft Docs entry for XmlTextWriter carries the following note:

Starting with the .NET Framework 2.0, we recommend that you create XmlWriter instances by using the XmlWriter.Create method and the XmlWriterSettings class to take advantage of new functionality.

Microsoft Docs – XMLTextWriter

So here’s an updated function to do just that, Format-XMLText (also available as a gist):

Function Format-XMLText {
    Param(
        [Parameter(ValueFromPipeline=$true,Mandatory=$true)]
        [xml[]]
        $xmlText
    )
    Process {
        # Use a StringWriter, an XMLWriter and an XMLWriterSettings to format XML
        $stringWriter = New-Object System.IO.StringWriter
        $stringWriterSettings = New-Object System.Xml.XmlWriterSettings

        # Turn on indentation
        $stringWriterSettings.Indent = $true

        # Turn off XML declaration
        $stringWriterSettings.OmitXmlDeclaration = $true

        # Create the XMLWriter from the StringWriter
        $xmlWriter = [System.Xml.XmlWriter]::Create($stringWriter,$stringWriterSettings)

        # Write the XML using the XMLWriter
        $xmlText.WriteContentTo($xmlWriter)

        # Don't forget to flush!
        $xmlWriter.Flush()
        $stringWriter.Flush()

        # Output the text
        $stringWriter.ToString()
        # This works in a remote session, when [Console]::Out doesn't
        }
    }

So now we can pipe the results of toXML to Format-XMLText :

> $e = get-WinEvent -LogName Security -FilterXPath "*[EventData[Data[@Name='TargetUserName']='Angelique.Cortez']]" -MaxEvents 20
> $e[-1].toXML() | FormatXMLText

The output from Format-XMLText looks like this:

<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-Security-Auditing" Guid="{54849625-5478-4994-A5BA-3E3B0328C30D}" />
    <EventID>4738</EventID>
    <Version>0</Version>
    <Level>0</Level>
    <Task>13824</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8020000000000000</Keywords>
    <TimeCreated SystemTime="2019-10-08T08:45:56.119393800Z" />
    <EventRecordID>363027</EventRecordID>
    <Correlation />
    <Execution ProcessID="600" ThreadID="2176" />
    <Channel>Security</Channel>
    <Computer>WIN-90CID1J2CS5.carisbrookelabs.local</Computer>
    <Security />
  </System>
  <EventData>
    <Data Name="Dummy">-</Data>
    <Data Name="TargetUserName">Angelique.Cortez</Data>
    <Data Name="TargetDomainName">CARISBROOKELABS</Data>
    <Data Name="TargetSid">S-1-5-21-447422785-3715515833-3878445295-1308</Data>
    <Data Name="SubjectUserSid">S-1-5-21-447422785-3715515833-3878445295-500</Data>
    <Data Name="SubjectUserName">Administrator</Data>
    <Data Name="SubjectDomainName">CARISBROOKELABS</Data>
    <Data Name="SubjectLogonId">0x8eb4d</Data>
    <Data Name="PrivilegeList">-</Data>
    <Data Name="SamAccountName">-</Data>
    <Data Name="DisplayName">-</Data>
    <Data Name="UserPrincipalName">-</Data>
    <Data Name="HomeDirectory">-</Data>
    <Data Name="HomePath">-</Data>
    <Data Name="ScriptPath">-</Data>
    <Data Name="ProfilePath">-</Data>
    <Data Name="UserWorkstations">-</Data>
    <Data Name="PasswordLastSet">-</Data>
    <Data Name="AccountExpires">-</Data>
    <Data Name="PrimaryGroupId">-</Data>
    <Data Name="AllowedToDelegateTo">-</Data>
    <Data Name="OldUacValue">-</Data>
    <Data Name="NewUacValue">-</Data>
    <Data Name="UserAccountControl">-</Data>
    <Data Name="UserParameters">-</Data>
    <Data Name="SidHistory">-</Data>
    <Data Name="LogonHours">-</Data>
  </EventData>
</Event>

I think you’ll agree that the output is much easier for the human eye to parse. This function works inside a remoting session and can handle multiple XML objects via the pipeline.

Let me know if this post has helped you in any way.

Posted in PowerShell | Leave a comment

Who’s The Host?

A question was asked on r/PowerShell about how to tell whether a script was running within an instance of the Console or the Integrated Scripting Environment.

A little research identified that the Get-Host cmdlet can provide the required information via the Name property.

Interestingly, Get-Host can also identify when a script is running under Visual Studio Code or via the System.Management.Automation.PowerShell class.

HostHost Name
Console ConsoleHost
Integrated Scripting Environment Windows PowerShell ISE Host
Visual Studio Code Integrated Console Visual Studio Code Host
System.Management.Automation.PowerShell Default Host

Posted in PowerShell | Leave a comment

Validating XHTML using PowerShell and .Net

XML and HTML both originally derived from Standard Generalized Markup Language (SGML) but they have a slightly difficult relationship. It is possible to create HTML that isn’t XML, but Web browsers will read and accept it.

Later versions of HTML (or XHTML) were developed to be more easily processed by XML tools. HTML5, the most recent version of HTML, is now its own entity, not necessarily conforming to previous XML or HTML standards. W3 Schools has an interesting history page for HTML5.

Using XHTML allows you to use automation techniques when building HTML artefacts, such as validating them. These artefacts don’t need to be HTML pages as such; they could be emails or content streams for other applications.

My interest in all of this stems from the fact that Microsoft OneNote uses XHTML when you interact with Notes using the Microsoft Graph API. I wanted to be able to validate the XHTML before I sent it to OneNote using my PowerShell module, and I assumed that it would be easy to use a schema file using PowerShell and the .Net Framework (.Net).

I was nearly right.

HTML as XML

To make HTML act as if it’s XML you need to make it well formed. This means that tags can’t overlap, and must always be closed (including ’empty’ tags). Element and attribute names must be in lower case. Attribute values must always be quoted. For a full list of criteria see the W3C page on the differences between XHTML and HTML4. Once these issues are sorted, you can use XML tools to manipulate HTML.

OneNoteUtilitiesGraph

OneNoteUtilitiesGraph (OUG) is a project of mine on GitHub. It’s a PowerShell module designed to manipulate data stored in OneNote using the Microsoft Graph.

One of the capabilities I’ve been developing is creating content using OUG and accessing it consistently.

OneNote uses what it calls input and output html. This is a subset of HTML tags with additional attributes designed specifically for use managing OneNote content.

My interest in validation came from wanting to sanity check my OneNote input HTML.

Schemas and Validation

There are a number of XHTML schemas you can use to validate your XHTML.
These are available from the World Wide Web Consortium (W3C) web site, so you can access them from your PowerShell scripts.

As usual PowerShell works with .Net to help us do the validation. We can build an XmlDocument object $document and add a schema to it, then call the Validate method. Here’s a sample XHTML document:

<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.w3.org/1999/xhtml
                          http://www.w3.org/2002/08/xhtml/xhtml1-strict.xsd">
  <head>
    <title>Title of document</title>
  </head>
  <body>
    <p>some content</p> <div>hello</div>
    <p data-id='hello' id=''></p>
    <div>
        <img src='img00100.jpg' alt='image'/>
    </div>
    <h7></h7>
  </body>
</html>

It has a few issues that validation may help to identify.

Notice that in the following script I’ve supplied a script block to the Validate method that handles each error as it occurs. This is known as a callback or a delegate. I’ve learned that in C# you would provide the name of a function to be called each time a validation error occurs. PowerShell allows us to use a script block. This can be inline, as below.

I’ve cast the Schemas.Add method to [Void], so that it’s output doesn’t appear on screen.

$document = New-Object System.Xml.XmlDocument
$document.Load('C:\Users\Stuart\Desktop\xhtml.html')
[Void]$document.Schemas.Add("http://www.w3.org/1999/xhtml", "http://www.w3.org/2002/08/xhtml/xhtml1-strict.xsd")
$document.Validate({Write-Host $args[1].Message})

When the above code is run in a script I get the following results:

The 'data-id' attribute is not declared.
The 'id' attribute is invalid - The value '' is invalid according to its datatype 'http://www.w3.org/2001/XMLSchema:ID' - The empty string '' is not a valid name.
The element 'body' in namespace 'http://www.w3.org/1999/xhtml' has invalid child element 'h7' in namespace 'http://www.w3.org/1999/xhtml'. List of possible elements expected: 'p, h1, h2, h3, h4, h5, h6, div, ul, ol, dl, pre, hr, blockquote, address, fieldset, table, form, noscript, ins, del, script' in namespace 'http://www.w3.org/1999/xhtml'.

All of this error text has been generated by the validation process, from the schema file. The result is quite verbose, but also helpful. data-id is an attribute used by OneNote, so using this schema may not be as straightforward as I thought.

If had wanted to have a more complex ValidationEventHandler routine I could have created a script block using a here-string like the one below:

$validationEventHandler = @'
param($sender, [System.Xml.Schema.ValidationEventArgs]$evtArgs)
if ($evtArgs.Severity -eq [System.Xml.Schema.XmlSeverityType]::Warning) {
    Write-Host "`nWARNING: ";
 }
 elseif ($evtArgs.Severity -eq [System.Xml.Schema.XmlSeverityType]::Error) {
   Write-Host "`nERROR: ";
 }
 Write-Host $evtArgs.Message
'@
$validationEventHandler =  [ScriptBlock]::Create($code)

I would then put a reference to $validationEventHandler in the Validate method call, as in:

$document.Validate($validationEventHandler)

The script block stored in the $validationEventHandler variable is called every time a ValidationEvent occurs.

Performance Issues

Performance is an issue using the primary source schema documents. Downloading and processing them can take seconds. This is fine for one-off validations but users get twitchy if things appear to ‘hang’. Maybe there’s a better way?

A Possible Workaround

I can download the XHTML schema using:

Invoke-WebRequest 'http://www.w3.org/2002/08/xhtml/xhtml1-strict.xsd' -OutFile xhtml1-strict.xsd

and load it from the local file system using:

$document.Schemas.Add("http://www.w3.org/1999/xhtml", "xhtml1-strict.xsd")

This works OK, but now I’ve got a local dependency on an .xsd file that I don’t own.

A Solution?

Having worked out how to download and use schema files from the web, it now looks like a local schema file might be the best solution. This could be tailored for the OneNote version of HTML.

This could be a longer project! Expect a further blog post on this.

Going through the process of working out how schemas and validation works has been educational.

What have I learned?

New knowledge I have gained includes:

  • Schemas
    How to download them, add them and use them to validate.
  • Validation
    How to make it work for XML/XHTML documents.
  • Callbacks and Delegates
    How to specify a delegate or callback using an inline script block or a reference to one created from a here-string.
  • HTML5
    Isn’t SGML or XML or old-style HTML.

Wrapping up

So you can use PowerShell and .Net to validate XML or XHTML documents using schemas. Unfortunately I can’t use the publicly available schemas to validate OneNote input HTML directly. Hopefully these notes will help anyone else looking to use PowerShell to validate XHTML or XML documents using schemas. Please let me know how you get on.

References

OneNoteUtilitiesGraph on GitHub
HTML5 Intro on W3 Schools
XML Document Class – Validate Method on Microsoft Docs
XHTML Schemas on the W3C site
Creating OneNote Pages on Microsoft Docs

Posted in Uncategorized | Tagged , | Leave a comment

Adam the Automator and Friends

I’ve recently joined Adam Betram on his Adam the Automator site as a contributor. My ATA page has a list of my articles which is automatically updated as I produce new content. This doesn’t mean the end of the Mad Scientist blog; I will be producing more commercial content for ATA – my personal musings, e-books etc. will continue to appear here.

Posted in Uncategorized | Leave a comment

I have published a Kindle book – PowerShell and Windows Event Logs

Front Cover

I’m happy to announce that my second Kindle book is published!

As the title suggests, PowerShell and Windows Event Logs covers accessing and managing Windows Event Logs using PowerShell.

As well as covering the two families of eventlog cmdlets available in Windows I also introduce the underlying .Net classes and how they can be used, along with our old friend WMI. I also talk a little bit about the history of Windows Event Logs (yes there is a screenshot of Windows NT 3.51).

There is information on using XML and XPath to filter for the records you are interested in, as well as an approach to extracting data from event logs – a task that is not so easy as it might first appear.

There are also a number of ‘Task Helpers’ that cover the use of .Net from PowerShell where the cmdlets cannot be used directly to achieve a given results as well as some common XPath patterns you can use to build your own event log queries.

PowerShell and Windows Event Logs is available now from the Amazon kindle book store. You can preview the book or view the book page on Amazon.

Posted in My Books, PowerShell | Leave a comment

PowerShell and OneNote for Windows 10

I have long been a fan of OneNote. I have always wanted a way of using it with PowerShell. My initial work in this area focused on the Windows desktop version, but there is now a REST API available via the Microsoft Graph.

OneNoteUtilitiesGraph is a PowerShell module designed to be used either in scripts or at the command line. Its purpose is to allow both read and write access to your OneNote data. For example:

Get-ONPages "starts with(title,'August')"

Will return a list of pages whose title starts with the word ‘August’.

The module also has the capability to create new NoteBooks, Sections, SectionGroups and Pages.

This module is available at https://github.com/wightsci/onenoteutilitiesgraph

Future plans:

  • Page templates (script-side)
  • Support for Tags
  • HTML Templates for Pages
  • Merge-to-OneNote
  • Out-ONPage
  • New Graph REST features as they are released
Posted in My Software, PowerShell | Leave a comment

Liverpool Football Club Season Review 2018-19

My history with Liverpool Football Club (LFC) is documented elsewhere on this blog.

This season ended as did the last one, with a Champions league final, but first things first…

The years since my previous report have seen the development of Jurgen Klopp’s vision for LFC and also the departure of some significant players, including Phillipe Coutinho and Raheem Stirling. Finishing 8th in the English Premier League (EPL) for the 2015-16 season was a low point, but there has been gradual improvement, with successive 4th place finishes in 2016-17 and 2017-18. The arrival of Mané and Salah have boosted our attacking threat, and the development of a world class defence around van Dijk, Matip, Robinson and Alexander-Arnold have made us difficult to beat, home or away. The 2018-19 season seemed unlikely to replicate the ‘Heavy Metal’ football of the previous season which saw LFC scoring many goals – Salah being top scorer with an amazing 44 goals in all competitions – but confidence was high that we could achieve something special, something to counter the heartbreak of the Champions League final loss of the previous year.

Liverpool were different this year – not perfect, but almost so. Losing only one EPL game, drawing 7 and winning the remaining 30 for a total of 97 points. The team’s approach seemed more clinical, more determined, calmer. In the end they scored more goals (89) than the previous season, but also conceded fewer too. The team’s style of football was praised as exciting by pundits (some more grudging in their praise than others).

Unfortunately there was another team enjoying (another) great season. Manchester City’s all-stars finished one point ahead of Liverpool, 98 points to 97. The race to the end of the season was the most enthralling for many years, but again Liverpool fell short.

At the same time as the EPL campaign Liverpool were taking part in the 27th season of the UEFA Champions’ League. In honesty Liverpool struggled in Group C, winning three games and losing three and coming second in the group behind Paris Saint-Germain. Once in the knock-out stages though, things were different. Bayern Munich and FC Porto were brushed aside. Barcelona awaited in the Semi-final.

The first leg at the Camp Nou would have been considered a disaster in previous seasons. Liverpool defeated 3-0 including the obligatory magical free-kick from Lionel Messi. Somehow it didn’t feel like the end of the world, not this season, not with this team. The second, home leg was still to come.

The 7th of May 2019 saw one of the most remarkable Liverpool performances I have ever seen, topped only by that night in Istanbul. Braces of goals from Divock Origi and Georginio Wijnaldum, and one of the quickest-thinking corners I have ever seen resulted in Messi, Coutinho, Suarez and associates leaving the competition and Liverpool advancing to the final. They would be joined by Tottenham Hotspur, who managed their own recovery from a losing position the following day.

June the 1st 2019 at the Stadio Metropolitano, Madrid, Spain was the venue for the final, the first to be contested between two English teams for a decade. Confidence was high that Liverpool could win, given that they had defeated Spurs at both meetings during the EPL season. All expectations were exceeded early on, following a penalty awarded only thirty seconds or so into the game, which Salah duly dispatched. From then on the game settled into a strange pattern. Both teams were clearly lacking sharpness, possibly due to the three week break from the end of the EPL, possibly due to the heat and humidity in Spain. Spurs retained the majority of the possession. The first half finished at 1-0 to Liverpool.

The second half saw Spurs increase their intensity – forcing Alisson Becker to make a number of excellent saves, but the final word fell to Liverpool, via the boot of Divock Origi, scorer of the decisive goal against Barcelona. His quick-witted finish, a low driven shot across the keeper in the 87th minute, meant that the contest was effectively over. Liverpool were crowned European Champions for the 6th time, placing them third on the all-time list of winners, and the top English team. Once again an English team, captained by an English player raised the European Cup in Triumph.

This season in statistics:

  • Beat every EPL team at least once except Manchester City
  • Beat French Ligue 1 Champions 2018-19 – Paris Saint-Germain
  • Beat German Bundeslige Champions – Bayern Munich
  • Beat Spanish La Liga Champions 2018-19 – Barcelona
  • Beat Italian Serie A 2nd Placed Team 2018-19 – Napoli
  • Two players shared the EPL Golden Boot – Sadio Mane and Mohamed Salah (22 goals each)
  • EPL Golden Glove – Alisson Becker (21 clean sheets)
  • Premier League Player of the Season – Virgil van Dijk
  • PFA Players’ Player of the Year – Virgil van Dijk

After all the statistics are analysed, all of the emotions have subsided, one thing that Michael Owen said during the TV commentary stuck in my mind – I’ll try to paraphrase it.

Liverpool played brilliant football in the Premier League, gained 97 points and ended up with nothing. Liverpool played poorly today and still came away with the trophy.

Michael Owen, BT Sport

That’s the difference between Liverpool 2017-18 and 2018-19: the ability to take a difficult situation and still come through triumphant.

Here’s to the Red Men:

Fabinho, Virgil van Dijk, Georginio Wijnaldum, Dejan Lovren, James Milner, Naby Keïta, Roberto Firmino, Sadio Mané, Mohamed Salah, Joe Gomez, Alisson Becker, Jordan Henderson, Daniel Sturridge, Alberto Moreno, Adam Lallana, Alex Oxlade-Chamberlain, Simon Mignolet, Xherdan Shaqiri, Andrew Robertson, Divock Origi, Joël Matip, Curtis Jones, Ki-Jana Hoever, Rafael Camacho, Trent Alexander-Arnold, Nathaniel Clyne

and of course

Jürgen Klopp

See you next season.

#YNWA #SixTimes

Posted in Football | Tagged , | Leave a comment