Geekzone: technology news, blogs, forums
Guest
Welcome Guest.
You haven't logged in yet. If you don't have an account you can register now.


mdf



2104 posts

Uber Geek
+1 received by user: 634

Trusted
Subscriber

Topic # 205672 22-Nov-2016 20:35
Send private message

I'm picking there's either a really easy or a really hard answer to this query.

 

I want to add custom XML or some kind of other metadata tag to parts of a document - ideally in Word but I'm open to other text editing programs if necessary. For example:

 

 

<topic1> text text text text </topic1>

 

<topic2> text text text text text </topic2>

 

text text text text

 

<topic1> text text text </topic1>

 

 

Ideally I'd like to be able to toggle viewing these tags off and on.

 

Ultimately I'm aiming to be able to pull the XML information into a consolidated database so you could see at a glance all the items tagged <topic1> across multiple documents.

 

I've considered using Word's References tools (especially the mark entry for indexing) and Styles but neither does do quite what I want it to. Styles are too easy to break between (cough, within) documents. Ideally, I'd also be able to apply formatting styles and my custom XML tags to the same pieces of text. The References tool seems to be intra-document only. 

 

From the looks of things, older versions of word had the ability to markup custom XML but it was removed following a patent infringement claim.

 

Unzipping the .docx and manually adding XML tags might work, but is too cumbersome and I'll probably end up breaking something.


Create new topic
1293 posts

Uber Geek
+1 received by user: 144


  Reply # 1675700 22-Nov-2016 21:45
One person supports this post
Send private message

Using the developer toolbar and you can add form controls (which are embedded into office XML and XML parts), or you can add custom XML parts.   You can easily write PowerShell or C# code to process the documents.  The document can also be protected so that data can only be entered into the form controls.

 

It is also possible to achieve something similar using bookmarks.





Software Engineer

 


1866 posts

Uber Geek
+1 received by user: 478

Lifetime subscriber

  Reply # 1675711 22-Nov-2016 21:57
Send private message

I'd just say to follow the standard to ensure consistency and interoperability:

 

http://www.ecma-international.org/publications/standards/Ecma-376.htm

 

Open XML SDK 2.5 for Microsoft Office

 

 


 
 
 
 


mdf



2104 posts

Uber Geek
+1 received by user: 634

Trusted
Subscriber

  Reply # 1675954 23-Nov-2016 12:05
Send private message

Thanks for that, very useful first steer - I didn't want to bang my head against Word for weeks on end to figure out what I want to do isn't possible.

 

I will have a play with the Developer XML tools. I've used bookmarks before for something related, and the downfall of those is that each bookmark has to be unique. You can't tag separated paragraphs with the same tag.


1293 posts

Uber Geek
+1 received by user: 144


  Reply # 1676266 23-Nov-2016 17:18
Send private message

Once your have gotten familiar with it, take the time to re-think the problem you are trying to solve.

For example, paragraphs in a document are encapsulated in a tag but they dont have unique identifiers and they contain different types of sub elements., Annotations (such as bookmarks) are one of these sub-elements that sit inside paragraphs and they do have unique identifiers. Text doesnt directly sit inside a paragraph, instead it sits inside a run , again it doesnt typically contain a unique identifier.

Bookmarks can be a way of identifying an important block of text that you want to work with.

Also, content controls can be used with the xml mapping feature and custom xml.




Software Engineer

 


mdf



2104 posts

Uber Geek
+1 received by user: 634

Trusted
Subscriber

  Reply # 1676936 24-Nov-2016 19:20
Send private message

TwoSeven: Once your have gotten familiar with it, take the time to re-think the problem you are trying to solve.

For example, paragraphs in a document are encapsulated in a tag but they dont have unique identifiers and they contain different types of sub elements., Annotations (such as bookmarks) are one of these sub-elements that sit inside paragraphs and they do have unique identifiers. Text doesnt directly sit inside a paragraph, instead it sits inside a run , again it doesnt typically contain a unique identifier.

Bookmarks can be a way of identifying an important block of text that you want to work with.

Also, content controls can be used with the xml mapping feature and custom xml.

 

Sorry, not sure I really followed some of this. I should probably just have explained what I was trying to do.

 

I have a bunch of contracts. All contain the same basic types of clauses you'd expect to see in any contract (Party A, Party B, term, price etc.). But the content of each contract and each clause is different. So one might have a term of one year from the date of signing, another might have a term of two years from 1 January 2016 to 31 December 2017. Completely different wording, but both relate to the duration of the contract.

 

I'd like to (manually) tag all the term clauses in this bunch of contracts with some kind of machine-readable metadata. So if I want to know the term/expiry date of a whole lot of contracts, I can run a script that will automatically show me all the term clauses (or at least, all the clauses tagged as "term") without having to manually open and review each document.

 

The problem with bookmarks is that some contracts might have a start date and and end date in separate clauses. I'd need to tag both as "term" to get meaningful automation out of my script. AFAIK, bookmarks have to have a unique name for each bookmark (which makes sense), so I couldn't tag two clauses with the same bookmark.


Create new topic



Twitter »

Follow us to receive Twitter updates when new discussions are posted in our forums:



Follow us to receive Twitter updates when news items and blogs are posted in our frontpage:



Follow us to receive Twitter updates when tech item prices are listed in our price comparison site:



Geekzone Live »

Try automatic live updates from Geekzone directly in your browser, without refreshing the page, with Geekzone Live now.


Geekzone Live »

Our community of supporters help make Geekzone possible. Click the button below to join them.

Support Geezone on PressPatron



Are you subscribed to our RSS feed? You can download the latest headlines and summaries from our stories directly to your computer or smartphone by using a feed reader.

Alternatively, you can receive a daily email with Geekzone updates.