Exporting from CSV format into Markdown – the long way.

Posted by on Apr 03 2020, in kraxn

{… or how to export from the texpattern cms the hard way}

Kraxn.io recently migrated from textpattern to a static site generator. *

This was a journey, that didn’t happen overnight. In fact, as I couldn’t find any out of the box solutions, it took a couple of weeks before I took a stab at it again. I had over 350 posts, so I could do it manually, or figure out a more automated way to do it. Here are the steps I took to do this:

  • Export a csv from mysql (you can use mysql command line, but used phpMyAdmin export function since I had it installed).
  • Convert each row in of csv into an markdown (.md) file. Each row represents a blog entry.

Before I decided this path, I decide if I could just do a quick & dirty download with wget eg:

wget -mkEpnp http://example.org

Unfortunately – there was too much “dirty html” to clean by running markdown. There are probably ways of doing this using Regex, and perhaps even the php striptags function (in hindsight this could have been a possibility). I would have to put in some effort to know all the html tags I used on the site. So I opted for the export csv method.

Exporting blog data as CSV.

From PHP myadmin:

  • login to phpmyadmin
  • open the database you use click on the table you wish to use
  • click export: method = quick; format = csv ** and download

If you want export from mysql command line, I suggest you try this link.

Generate md files from CSV rows using PHP

Now that we’ve got a csv file with our content, let’s output it as as a .

Below is the script I used, some notes on data hygiene:

  • Download Markdownify and references the scripts.
  • Copy csv to php array for this script.
  • Posted – This is the date posted – this is essential for retaining the post date; I use php touch() to revert to the original post date.
  • Body_html – this the main content
  • Title – this is the title also used for URL, it had some funky characters, therefore the long list of php string functions. Ideally a regex would be faster!
  • Image – the image file which is numbers.

An aside: I had a mix of png and jpg images. I actually wanted all of them to be jpgs. Luckily I had just installed ImageMagick, which could bulk convert images. By converting them to jpgs, my script would also work. Here is the command I used inside my images folder:

sudo mogrify -format jpg *.png

So is here is the php script:

<?php
declare(strict_types = 1);

# https://github.com/Elephant418/Markdownify
require_once ("/Markdownify-master/src/Converter.php");
require_once ("/Markdownify-master/src/Parser.php");


/**

 * Convert a comma separated file into an associated array.
 * The first row should contain the array keys.
 * 
 * Example:
 * 
 * @param string $filename Path to the CSV file
 * @param string $delimiter The separator used in the file
 * @return array
 * @link http://gist.github.com/385876
 * @author Jay Williams <http://myd3.com/>
 * @copyright Copyright (c) 2010, Jay Williams
 * @license http://www.opensource.org/licenses/mit-license.php MIT License
   */


function csv_to_array(string $filename='', string $delimiter=',')
{
    if(!file_exists($filename) || !is_readable($filename))
        return FALSE;

    $header = NULL;
    $data = array();
    if (($handle = fopen($filename, 'r')) !== FALSE)
    {
        while (($row = fgetcsv($handle, 1000, $delimiter)) !== FALSE)
        {
            if(!$header)
                $header = $row;
            else
                $data[] = array_combine($header, $row);
        }
        fclose($handle);
    }
    return $data;

}


$csv = csv_to_array('textpattern.csv',",");


$counter = 0;



    foreach( $csv as $item){

    //collect all data for html body: heading, body html, and image.
    $file_data =
                "<h1>" . $item['Title'] . "</h1>" .
                $item['Body_html'] .
                "<img src='/inc/images/" . $item['Image'] ."t.jpg'>"; 


​    

    $converter = new MarkdownifyConverter;
    $file_data = $converter->parseString($file_data);    

    //title is a bit janky, but working.     
    if($item['Title']){
     $file_name = 
            strtolower ( str_replace ( " ", "-", strip_tags (  str_replace("/", "-", trim ($item['Title']) ) ) ) ).".md" ;    
    }  


​        

        if($file_name != false){

            $file_handler    = fopen($file_name, 'w');

            fwrite($file_handler, $file_data);


            fclose($file_handler);  

             $get_timestamp  = strtotime( $item['Posted'] );   //important  

            touch($file_name, $get_timestamp);

        }    


​         

    }       


?>

Using declare(strict_types = 1); may issue some warnings on on $data[] = array_combine($header, $row);. These can be ignored, or they can be fixed by deleting any columns you don’t in the csv file.


* Of all database driven cms’s, I prefer textpattern over wordpress. In fact I had accumulated over 300 posts on textpattern, and had virtually no issues. I decided to move because I used luapress on a demo site, and fell in love with it. My workflow is much better with Typora and luapress. It just felt natural. I would still recommend textpattern over wordpress.

** alternatively – I noticed there was php_array here too. I haven’t tested it, but it theoretically might have saved some steps.

Update 2020-09-30:  I’ve recently learned of a much easier to export files from MySQL directly into a file just by issuing a few simple SQL commands. See how to save mysql query output to a file.