Share via


Simple Tool for text substitution plus Design questions

I wrote a simple C# app to do text substitution. It takes a (key,value) mapping and then replaces any keys in between {% ... %} with their value.

It's a really trivial app. But it also quickly opens a Pandora's box of design questions.

 

It takes an xml file that provides the (key,value) , such as values.xml:

 <?xml version="1.0" encoding="utf-8" ?>
<Data>
  <entry key="Name">Mike</entry>
  <entry key="Blog">https://blogs.msdn.com/jmstall</entry>
  <entry key="Company">Microsoft</entry>
</Data>

And then an input template file (form.txt in this example). Example usage is:

 C:\temp\8> type form.txt{%Name%} has a blog, {%Blog%},and works at {%Company%}C:\temp\8>  %r% values.xml form.txt output.txtC:\temp\8> type output.txtMike has a blog, https://blogs.msdn.com/jmstall,and works at Microsoft

 

Very simple. Sometimes it's fun to write something simple.

However, it does raise an interesting design issue about how to properly generalize the key --> value mapping.  In this case, I have a simple ILookup interface that has a method 'string Lookup(string)'. I have 1 instance of this, XmlStorage, that performs the mapping based off contents in an xml file.

Here's the source for my simple implementation. The design questions are at the end.

 

 // Mike Stall
// https://blogs.msdn.com/jmstall

// Take in a file and Map of Keys-->values, and replace all {%key%} with Map[key]= value.

using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;
using System.IO;
using System.Xml;
using System.Diagnostics;

namespace Sample
{
    // Interface for mapping a string to a replacement string.
    interface ILookup
    {
        string Lookup(string value);
    }
        

    // Create a key,value around an XML file.
    // Sample xml file would be:
    //  <Data>
    //    <entry key="abc">Goo!</entry>
    //    <entry key="def">Bar!</entry>
    //  </Data>
    class XmlStorage : ILookup
    {
        XmlDocument m_doc;

        public XmlStorage(string filename)
        {
            m_doc = new XmlDocument();
            m_doc.Load(filename);
        }

        // Substitute in {%variable%} for value.
        // Lookup 'variable' and return 'value'.
        public string Lookup(string input)
        {
            // Use XPath to find a node with key=input. 
            // Value is the inner text of that node.
            XmlNode n = m_doc.SelectSingleNode("Data/entry[@key=\"" + input + "\"]");
            if (n == null)
                return null;
            
            string val = n.InnerText;
            return val;
        }
    }


    // Main entry point.
    class Program
    {
        static void Main(string[] args)
        {
            // Read in values.
            string xmlInput = args[0];            
            string fileTemplate = args[1];
            string fileOutput = args[2];

            ILookup lookup = new XmlStorage(xmlInput);
            Worker2(lookup, fileTemplate, fileOutput);
        }


        static void Worker2(ILookup fpMapper, string fileTemplate, string fileOutput)
        {
            // Read in
            string template = File.ReadAllText(fileTemplate);

            // Do the replacement
            Regex r = new Regex(@"{%\s*(.+?)\s*%}");
            string output = r.Replace(template, delegate (Match m)
                {
                    string input = m.Groups[1].ToString();
                    string result = fpMapper.Lookup(input);
                    if (result == null)
                        throw new Exception("Value file does not contain a value for '" + input + "'.");
                    return result;
                });

            // Write out
            File.WriteAllText(fileOutput, output);
        }

    }
} // end namespace Sample

 

--

Other implementations:

So let's say you wanted to extend it with other storage types. Xml files are nice for simple static storage, but let's say you needed something more dynamic, like auto-generating guids.

You could have a GuidStorage class:

     // A store to map any string starting with "guid" to a unique guid.
    // Passing the same input yields the same guid back, so this allows
    // guids that are used in multiple places.
    class GuidStorage : ILookup
    {
        Dictionary<string, Guid> m_guids = new Dictionary<string, Guid>();
        public string Lookup(string input)
        {
            input = input.ToLowerInvariant();
            if (!input.StartsWith("guid")) return null;

            Guid g;
            if (!m_guids.TryGetValue(input, out g))
            {
                g = Guid.NewGuid();
                m_guids[input] = g;
            }
            return g.ToString("b");            
        }
    }

 

So if you wired that up and ran it, you'd get:

 C:\temp\8> type guid.txtFirst: {%guid #1%}Second: {%guid #2%}First again: {%guid #1%}C:\temp\8>  %r% values.xml guid.txt output.txtC:\temp\8> type output.txtFirst: {6e148fd7-bc33-4ded-b8c1-0d23a80f6e2e}Second: {c3a13466-1d3e-4bd0-b18b-9a9db3ceda3e}First again: {6e148fd7-bc33-4ded-b8c1-0d23a80f6e2e}

 

Other storages could include: IDictionary<string, object>, Reflection for field/property lookup, database lookup, environment var lookup... The list could go on forever.

 

Design questions:

After you move beyond a single static ILookup, a lot of design questions quickly come at you:

  1. What's the escaping mechanism to output {% %} ?
  2. What sort of derivations of ILookup would be interesting?
  3. What sort of semantics should be on ILookup?
    - Note that the interface for ILookup doesn't require that it have a fixed set of keys (Contrast to IDictioniary). For example, you could have a lookup that evalutes expressions: {%1+2*3%} --> "7".
    - In fact, you could argue it doesn't have to return the same value every time. For example, maybe the first lookup returns a verbose value ("Mr. Bob Bobson, here after referred to as 'the victim' "), and subsequent lookups return a short value ("The Victim").   (Although this then begs the question of evaluation order)
    - Since an ILookup implementation could have state, could you have lookups that actually just serve as control words for other lookups?  In other words, it starts off appearing like you have something computationally simple, but as soon as you have the extensibility hook, it quickly degenerates into a full-fledged Turing machine.
  4. Once you have multiple ILookup stores, how do they cooperate together? Do the lookups' keys have to be disjoint? Or can they overlap and have precedence? What's the precedence policy?
  5. What about error handling? If a key is not found, how do you report the error? How does the user know which ILookup they should have expected the key to come from?

 

Could this become complex enough that you'd want debug support? You laugh, but:

  1. if {% ... %} lookups have side-effects (or if you have a lookup that used reflection to invoke a C# property), a debugger could be handy.
  2. Or if you need to diagnose which ILookup was supposed to have a certain key.
  3. Or if you want to figure out where text is coming from. For example, debugging how "{%a%}{%b%}{%c%}" produced "123456" could be pretty challenging.  Perhaps you want to set a write-breakpoint to fire whenever a certain word is output so that you can track which key it's coming from.

 

The point is that, without even adding new features (like more a more complex ILookup interface, more complex control patterns, etc), just fleshing out the computational power of this initial design is actually pretty far reaching.

 

Takeaway:

This ILookup {% key %} = value replacement initially appears to be a very simple feature. Think about how much work it would be to properly spec this feature and fill all the spec holes. 

Now appreciate how a system a million times more complex (like anything in the real world) has so many spec holes.

Comments

  • Anonymous
    October 31, 2007
    I used a very similar approach for automating generation of IDL definitions for a C# COM application. Indeed, the basic idea (yet a very handy utility) can be extended to a full-blown language grammar, if you like. But the further you go, the more you lose the virtue of simplicity. :)

  • Anonymous
    November 01, 2007
    Ruslan, yeah, it's the classic tradeoff of simplicty vs. power.