The Treasury Board's policy suite is conceptually a giant graph structure, but is frustratingly resistant to automated analysis.
Some annoyances:
The policy suite straddles tbs-sct.gc.ca and canada.ca and policies often draw their authorities from material on laws-lois.justice.gc.ca
There is frustratingly little common structure you can rely on. If you think you found a structure, you just need to see a few more policies
Links between policies or to laws rarely link to relevant sections
Only a few policies have an XML data representation, most are available only as HTML, making web scraping the most reliable approach
Markers indicating sections, clauses etc. are not consistent across HTML documents making web scraping extremely annoying
Multiple requirements often occur in a single ("and")
Enabling programmatic analysis of policy would be broadly valuable both inside and outside government.
This should be an #opendata #dataproduct but it seems like these documents are largely treated like marketing material: if it looks OK in the browser it's done.
#gcdigital