Saturday 17 January 2015

Recursive XML path parsing

I was asked yesterday for some help with PHP at work... this was something of a shock! Not only was it a shock because someone was asking me for help with something other than javascript, but they wanted help with PHP!

Anyway - the requirement was to list all possible routes through an XML file and at the time I suggested an iteration over each branch elements of a given trunk... but I did say that it would be better to use a recursive function. Then the pattern of the recursive function came to me about 05:00 this morning so I wrote a simple example to get all possible routes through an XML file:

<?php
$xmlstr = <<<XML
<?xml version='1.0' standalone='yes'?>
<movies>
    <movie>
        <title>PHP: Behind the Parser</title>
        <characters>
            <character>
                <name>Ms. Coder</name>
                <actor>Onlivia Actora</actor>
            </character>
            <character>
                <name>Mr. Coder</name>
                <actor>El Act&#211;r</actor>
            </character>
        </characters>
        <plot>
            So, this language. It's like, a programming language. 
            Or is it a scripting language? All is revealed in this 
            thrilling horror spoof of a documentary. 
        </plot>
        <great-lines>
            <line>PHP solves all my web problems</line>
        </great-lines>
        <rating type="thumbs">7</rating>
        <rating type="stars">5</rating>
    </movie>
</movies>
XML;

$movies = new SimpleXMLElement($xmlstr);

getPaths($movies->getName(), $movies);

function getPaths($str, $xml){
    $children = $xml->children();
    if($children->count() > 0){
        foreach($children as $child){
            getPaths($str."/".$child->getName(), $child);
        }
    }else{
        echo $str."<br/>";
    }
}

I do like recursion but it does sometimes need me to think about it a wee bit before clocking what it is I need to do and how to do it.

Tuesday 13 January 2015

Hierarchical data in select form elements - Self defined ethnicity

I’ve recently been doing quite a bit with self defined ethnicity, both at home and at work, and I’ve come up with a really rather cunning way of storing and retrieving it. This technique has saved me a lot of time and I’ve implemented it in other areas - other than self defined ethnicity - so I’m presenting it here so it might help you too. I’m using PHP but with a couple of extra twists (Medoo and Twig); the same approach would work just as well with other technologies.

I know that this is applicable to other areas where we have a hierarchy of data… where we have a general and specific thing which we need to both display and save. For instance we might want to look at cars: we might be interested in the make and colour of cars. Should a car manufacturer make a model painted the same colour as a car made by another manufacturer we’d need some way of distinguishing the the cars within our database. We could add two field which reference the make and colour or we could lump the data in the same way as the Home Office in the UK does for self defined ethnicity and which is illustrated in the table below:


Code
Group
Value
A1
Asian or Asian British
Indian
A2
Asian or Asian British
Pakistani
A3
Asian or Asian British
Bangladeshi
A9
Asian or Asian British
Any other Asian ethnic background
B1
Black or Black British
Caribbean
B2
Black or Black British
African
B9
Black or Black British
Any other Black ethnic background
M1
Mixed
White and Black Caribbean
M2
Mixed
White and Black African
M3
Mixed
White and Asian
M9
Mixed
Any other Mixed ethnic background
O1
Chinese or other ethnic group
Chinese
02
Chinese or other ethnic group
Any other ethnic group
W1
White
British
W2
White
Irish
W9
White
Any other White ethnic background
NS
Not Stated
Not Stated


This means that the Code for a Blue Ford would be different from the Code for a Blue BMW and we would just need to reference the original table in order to distinguish between the two. This approach is perhaps not the most elegant and it doesn’t follow the dictates of proper database normalization but in the use case which I have - that of saving the self defined ethnicity against a persons record - it is fine. Besides, the Home Office have allowed themselves some wiggle room in the codes… one wonders what ethnicity the code W3 will be used for.

This approach to the codification of things is quite widespread amongst statutory bodies within the UK and I’ve done something similar with a subset of the NHS’s Read codes.

I use Medoo to retrieve the data but a simple PDO query would suffice… I just like Meddo as it’s relatively new and shiny (what’s the point of being a developer if you can not play now and again):

$self_defined_ethnicities = $db->select(
   "self_defined_ethnicity", // Table name
   "*" // All rows
);

This gives me an array which I can iterate over in PHP like this:

echo "<select>";
foreach($ethnicities as $e){
   echo "<option value='" . $e["code"] . "'>";
   echo $e["group"] . " - " . $e["value"];
   echo "</option>";
}
echo "</select>";

Which produces this HTML markup for the page:

<select>
   <option value='A1'>Asian or Asian British - Indian</option>
   <option value='A2'>Asian or Asian British - Pakistani</option>
   <option value='A3'>Asian or Asian British - Bangladeshi</option>
   <option value='A9'>Asian or Asian British - Any other Asian ethnic background</option>
   <option value='B1'>Black or Black British - Caribbean</option>
   <option value='B2'>Black or Black British - African</option>
   <option value='B9'>Black or Black British - Any other Black ethnic background</option>
   <option value='E1'>Chinese or other ethnic group - Eastern European</option>
   <option value='M1'>Mixed - White and Black Caribbean</option>
   <option value='M2'>Mixed - White and Black African</option>
   <option value='M3'>Mixed - White and Asian</option>
   <option value='M9'>Mixed - Any other Mixed background</option>
   <option value='NS'>Not stated - Not Stated</option>
   <option value='O1'>Chinese or other ethnic group - Chinese</option>
   <option value='O9'>Chinese or other ethnic group - Any other ethnic group</option>
   <option value='W1'>White - British</option>
   <option value='W2'>White - Irish</option>
   <option value='W9'>White - Any other White ethnic background</option>
</select>

But that is ugly and not particularly user friendly and besides, within HTML, we have access to the optgroup tag. So if we process the initial query result like this once we’ve retrieved it from the database:

foreach($ethnicities as $ethnicity){
   $sdes[$ethnicity["group"]][$ethnicity["code"]] = $ethnicity["value"];
}

We have got the specific Values stored within an array which has the Group as it’s index within the enclosing array and we can get to work on generating the HTML. To an extent we really don’t want to worry about the Group which the specific Value belongs to - except that we can use it as the label for the optgroup:

echo "<select>";
foreach($sdes as $key => $group){
   echo "<optgroup label='" . $key . "'>";
   foreach($group as $code => $value){
       echo "<option value='" . $code . "'>" . $value . "</option>";
   }
   echo "</optgroup>";
}
echo "</select>";

This produces much nicer markup:

<select>
   <optgroup label='Asian or Asian British'>
       <option value='A1'>Indian</option>
       <option value='A2'>Pakistani</option>
       <option value='A3'>Bangladeshi</option>
       <option value='A9'>Any other Asian ethnic background</option>
   </optgroup>
   <optgroup label='Black or Black British'>
       <option value='B1'>Caribbean</option>
       <option value='B2'>African</option>
       <option value='B9'>Any other Black ethnic background</option>
   </optgroup>
   <optgroup label='Chinese or other ethnic group'>
       <option value='E1'>Eastern European</option>
       <option value='O1'>Chinese</option>
       <option value='O9'>Any other ethnic group</option>
   </optgroup>
   <optgroup label='Mixed'>
       <option value='M1'>White and Black Caribbean</option>
       <option value='M2'>White and Black African</option>
       <option value='M3'>White and Asian</option>
       <option value='M9'>Any other Mixed background</option>
   </optgroup>
   <optgroup label='Not stated'>
       <option value='NS'>Not Stated</option>
   </optgroup>
   <optgroup label='White'>
       <option value='W1'>British</option>
       <option value='W2'>Irish</option>
       <option value='W9'>Any other White ethnic background</option>
   </optgroup>
</select>

And as well as producing better markup it is also much more aesthetically pleasing to the user and it means that we can now store the Self defined ethnicity code against a record. I said in my introduction that I’ll be using Twig and this is the snippet of code I use to generate my select form control:

<div class="form-group">
   <label for="ethnicityValue">Ethnicity</label>
   <select class="form-control" id="ethnicityValue">
       {% for key, group in sdes %}
           <optgroup label="{{ key }}">
               {% for code, value in group %}
                   <option value="{{ code }}">{{ value }}</option>
               {% endfor %}
           </optgroup>
       {% endfor %}
   </select>
</div>

In the initial markup we produced we display both the general and the specific within the same option… this can be helpful but select boxes tend to flow to the width of their widest text element (59 characters in that example), in our second example of markup we reduced the potential width of the select box (34 characters) as well as making the select easier to navigate. In my use case I display the select box within a modal window so that the user can edit it and I use jQuery to pre-select the existing self defined ethnicity like this:

$("#ethnicityEdit").on("click", function(){
   var current = $("#currentEthnicity").data("sde");
   $("#ethnicityValue").find("option").each(function(k, v){
       if($(v).val() === current){
           $(v).prop("selected", true);
       }else{
           $(v).prop("selected", false);
       }
   });
});

I’ve noted that I have used this method a few times now so I do hope it helps someone else. It is simple to implement and I dare say I’m not the first to use it but I do find it very helpful for the storage of such hierarchical statutory data. I’m moving to a more NoSQL approach to the storage of data and this lends itself very well to this method as I generally don’t need the extra step of processing the query results as the data is already stored in an appropriate format:

{
   "Asian or Asian British": [
       {
           "A1": "Indian"
       },
       {
           "A2": "Pakistani"
       },
       {
           "A3": "Bangladeshi"
       },
       {
           "A9": "Any other Asian ethnic background"
       }
   ],
   "Black or Black British": [
       {
           "B1": "Caribbean"
       },
       {
           "B2": "African"
       },
       {
           "B9": "Any other Black ethnic background"
       }
   ],
   "Chinese or other ethnic group": [
       {
           "E1": "Eastern European"
       },
       {
           "O1": "Chinese"
       },
       {
           "O9": "Any other ethnic group"
       }
   ],
   "Mixed": [
       {
           "M1": "White and Black Caribbean"
       },
       {
           "M2": "White and Black African"
       },
       {
           "M3": "White and Asian"
       },
       {
           "M9": "Any other Mixed background"
       }
   ],
   "Not stated": [
       {
           "NS": "Not Stated"
       }
   ],
   "White": [
       {
           "W1": "British"
       },
       {
           "W2": "Irish"
       },
       {
           "W9": "Any other White ethnic background"
       }
   ]
}

The only drawback I can see to this way of selecting hierarchical data is that only one level of optgroup is allowed within a select so we are limited to a shallow hierarchy. This might mean looking at something like the Shadow DOM in order to create selects for deeper hierarchies, perhaps by placing increasing numbers of non-breaking spaces before the text of the sub-hierarchies labels. I’ll leave that for you to ponder though.