Skip to content

String Parsing Test Code

captainkuro edited this page Jan 10, 2013 · 6 revisions

This code tests the speed of various parsing techniques. It is here in the wiki because the forum does not accept PHP files. enjoy. link from this thread: http://codeigniter.com/forums/viewthread/83428/P15/

<?php
error_reporting(E_ALL);

$marker_in_string = 1;

if ($marker_in_string) {
$string = "When in the Course of human events it becomes necessary for one people to dissolve the political bands which have connected them with another and to assume among the powers of the earth, the separate and equal station to which the Laws of Nature and of Nature's God entitle them, a decent respect to the opinions of mankind requires that they should declare the causes which impel them to the separation.<|more|>
We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness. — That to secure these rights, Governments are instituted among Men, deriving their just powers from the consent of the governed, — That whenever any Form of Government becomes destructive of these ends, it is the Right of the People to alter or to abolish it, and to institute new Government, laying its foundation on such principles and organizing its powers in such form, as to them shall seem most likely to effect their Safety and Happiness. Prudence, indeed, will dictate that Governments long established should not be changed for light and transient causes; and accordingly all experience hath shewn that mankind are more disposed to suffer, while evils are sufferable than to right themselves by abolishing the forms to which they are accustomed. But when a long train of abuses and usurpations, pursuing invariably the same Object evinces a design to reduce them under absolute Despotism, it is their right, it is their duty, to throw off such Government, and to provide new Guards for their future security. — Such has been the patient sufferance of these Colonies; and such is now the necessity which constrains them to alter their former Systems of Government. The history of the present King of Great Britain is a history of repeated injuries and usurpations, all having in direct object the establishment of an absolute Tyranny over these States. To prove this, let Facts be submitted to a candid world.";
}else{
$string = "When in the Course of human events it becomes necessary for one people to dissolve the political bands which have connected them with another and to assume among the powers of the earth, the separate and equal station to which the Laws of Nature and of Nature's God entitle them, a decent respect to the opinions of mankind requires that they should declare the causes which impel them to the separation.
We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness. — That to secure these rights, Governments are instituted among Men, deriving their just powers from the consent of the governed, — That whenever any Form of Government becomes destructive of these ends, it is the Right of the People to alter or to abolish it, and to institute new Government, laying its foundation on such principles and organizing its powers in such form, as to them shall seem most likely to effect their Safety and Happiness. Prudence, indeed, will dictate that Governments long established should not be changed for light and transient causes; and accordingly all experience hath shewn that mankind are more disposed to suffer, while evils are sufferable than to right themselves by abolishing the forms to which they are accustomed. But when a long train of abuses and usurpations, pursuing invariably the same Object evinces a design to reduce them under absolute Despotism, it is their right, it is their duty, to throw off such Government, and to provide new Guards for their future security. — Such has been the patient sufferance of these Colonies; and such is now the necessity which constrains them to alter their former Systems of Government. The history of the present King of Great Britain is a history of repeated injuries and usurpations, all having in direct object the establishment of an absolute Tyranny over these States. To prove this, let Facts be submitted to a candid world.";
}

// start off with a huge string we will cut down as needed
$string .= $string;
$string .= $string;
$string .= $string;
$string .= $string;


echo (str_word_count($string).' words total. <br>');
$repeat_tot = 1000;
$splitat = '<|more|>';
$preg_chop_off_after = '/\<\|more\|\>.*+/si';
$preg_split_precut_at = '/\<\|more\|\>/s';

// these determine how many words will be sent to the functions
// varied string size
$a = array(0, 10, 200, 400, 800, 1600, 3200);
// varied but mostly short string sizes
//$a = array(10,20,30,40,50,60,70,80,90,100);
// progressively larger string sizes
// reveals that substr() function is best overall
// because its performance is always the same no matter the string size
//$a = array(500);
//$a = array(1000);
//$a = array(2000);
//$a = array(3000);

foreach ($a as $limit) {
    // trim the big string to each of the smaller word counts as listed in the array
    $str_limited = word_limiter_optimized_regex_3($string, $limit);
    
    // test preg_match
    $i = 1;
    $time_start = microtime_float();
    while ($i <= $repeat_tot) 
    {
        $string8 = preg_rep_cut($str_limited, $preg_chop_off_after); $i++;
    }
    $time_end = microtime_float();
    $time = $time_end - $time_start;
    $time_bottle[$preg_chop_off_after][$limit] = round($time,4);
    
    
    
    // test preg_split pre cut
    $i = 1;
    $time_start = microtime_float();
    while ($i <= $repeat_tot) 
    {
        $string9 = preg_split_pre_cut($str_limited, $preg_split_precut_at); $i++;
    }
    $time_end = microtime_float();
    $time = $time_end - $time_start;
    $time_bottle[$preg_split_precut_at][$limit] = round($time,4);
    
    
    
    // test substr
    $i = 1;
    $time_start = microtime_float();
    while ($i <= $repeat_tot) 
    {
        $string10 = substr_cut($str_limited, $splitat); $i++;
    }
    $time_end = microtime_float();
    $time = $time_end - $time_start;
    $time_bottle['substr'][$limit] = round($time,4);


    
    // test explode
    $i = 1;
    $time_start = microtime_float();
    while ($i <= $repeat_tot) 
    {
        $string11 = explode_cut($str_limited, $splitat); $i++;
    }
    $time_end = microtime_float();
    $time = $time_end - $time_start;
    $time_bottle['explode'][$limit] = round($time,4);

    // make sure we are parsing correctly, at least with the last one
    p('string:'.word_limiter_optimized_regex_3($string8, 100));
    p('string:'.word_limiter_optimized_regex_3($string9, 100));
    p('string:'.word_limiter_optimized_regex_3($string10, 100));
    p('string:'.word_limiter_optimized_regex_3($string11, 100));
}

// show the results
p($time_bottle);

$time_avgs = array();
foreach ($time_bottle as $key => $time_set) 
{
    $time_avgs[$key] = array_sum($time_set)/count($time_set);
}

asort($time_avgs);
p($time_avgs);


function preg_rep_cut($in,$split_at_regex)
{
    return preg_replace($split_at_regex, '', $in, 1);
}

function preg_split_pre_cut($in,$split_at_regex)
{
    return current(preg_split($split_at_regex, $in, 2));
}

function substr_cut($in, $splitat)
{
    $out = (substr($in, 0, strpos($in, $splitat)));
    return $out?$out:$in;
}

function explode_cut($in, $splitat)
{
    return current(explode($splitat, $in, 2));
}

function word_limiter_optimized_regex_3($str, $limit = 100, $end_char = '&#8230;') {
    
    // Don't bother about empty strings.
    // Get rid of them here because the regex below would match them too.
    if (trim($str) == '')
        return $str;     
    // return empty string if limit is zero   
     if ($limit === 0)
        return '';
        
    // Added the initial \s* in order to make the regex work in case $str starts with whitespace.
    // Without it a string like " test" would be counted for two words instead of one.
    preg_match('/\s*(?:\S+\s*){1,'. (int) $limit .'}+/', $str, $matches);
    
    // Only add end character if the string got chopped off.
    if (strlen($matches[0]) == strlen($str))
        $end_char = '';
    
    // Chop off trailing whitespace and add the end character.
    return rtrim($matches[0]) . $end_char;
}

/**
 * Simple function to replicate PHP 5 behaviour
 */
function microtime_float()
{
   list($usec, $sec) = explode(" ", microtime());
   return ((float)$usec + (float)$sec);
}

// function to print with pre
function p($x){echo '<pre>';print_r($x);echo '</pre>';}

?>
Clone this wiki locally