Home Page
Archive > Posts > 2016 > September
Search:

Monitoring PHP calls

I recently had a Linux client that was, for whatever odd reason, making infinite recursive HTTP calls to a single script, which was making the server process count skyrocket. I decided to use the same module as I did in my Painless migration from PHP MySQL to MySQLi post, which is to say, overriding base functions for fun and profit using the PHP runkit extension. I did this so I could gather, for debugging, logs of when and where the calls that were causing this to occur.


The below code overrides all functions listed on the line that says “List of functions to intercept” [Line 9]. It works by first renaming these built in functions to “OVERRIDE_$FuncName[Line 12], and replacing them with a call to “GlobalRunFunc()” [Line 13], which receives the original function name and argument list. The GlobalRunFunc():

  1. Checks to see if it is interested in logging the call
    • In the case of this example, it will log the call if [Line 20]:
      • Line 21: curl_setopt is called with the CURLOPT_URL parameter (enum=10002)
      • Line 22: curl_init is called with a first parameter, which would be a URL
      • Line 23: file_get_contents or fopen is called and is not an absolute path
        (Wordpress calls everything absolutely. Normally I would have only checked for http[s] calls).
    • If it does want to log the call, it stores it in a global array (which holds all the calls we will want to log).
      The logged data includes [Line 25]:
      • The function name
      • The function parameters
      • 2 functions back of backtrace (which can often get quite large when stored in the log file)
  2. It then calls the original function, with parameters intact, and passes through the return [Line 27].

The “GlobalShutdown()” [Line 30] is then called when the script is closing [Line 38] and saves all the logs, if any exist, to “$GlobalLogDir/$DATETIME.srl”.

I have it using “serialize()” to encode the log data [Line 25], as opposed to “json_encode()” or “print_r()” calls, as the latter were getting too large for the logs. You may want to have it use one of these other encoding functions for easier log perusal, if running out of space is not a concern.

<?
//The log data to save is stored here
global $GlobalLogArr, $GlobalLogDir;
$GlobalLogArr=Array();
$GlobalLogDir='./LOG_DIRECTORY_NAME';

//Override the functions here to instead have them call to GlobalRunFunc, which will in turn call the original functions
foreach(Array(
        'fopen', 'file_get_contents', 'curl_init', 'curl_setopt', //List of functions to intercept
) as $FuncName)
{
        runkit_function_rename($FuncName, "OVERRIDE_$FuncName");
        runkit_function_add($FuncName, '', "return GlobalRunFunc('$FuncName', func_get_args());");
}

//This optionally 
function GlobalRunFunc($FuncName, $Args)
{
        global $GlobalLogArr;
        if(
                ($FuncName=='curl_setopt' && $Args[1]==10002) || //CURLOPT enumeration can be found at https://curl.haxx.se/mail/archive-2004-07/0100.html
                ($FuncName=='curl_init' && isset($Args[0])) ||
                (($FuncName=='file_get_contents' || $FuncName=='fopen') && $Args[0][0]!='/')
        )
                $GlobalLogArr[]=serialize(Array('FuncName'=>$FuncName, 'Args'=>$Args, 'Trace'=>array_slice(debug_backtrace(), 1, 2)));

        return call_user_func_array("OVERRIDE_$FuncName", $Args);
}

function GlobalShutdown()
{
        global $GlobalLogArr, $GlobalLogDir;
        $Time=microtime(true);
        if(count($GlobalLogArr))
                file_put_contents($GlobalLogDir.date('Y-m-d_H:i:s.'.substr($Time-floor($Time), 2, 3), floor($Time)).'.srl', implode("\n", $GlobalLogArr));

}
register_shutdown_function('GlobalShutdown');
?>
PHP String Concatenation - Stringbuilder results

I wrote the code at the end of this post to test the different forms of string concatenation and they really are all almost exactly equal in both memory and time footprints.


The two primary methods I used are concatenating strings onto each other, and filling an array with strings and then imploding them. I did 500 string additions with a 1MB string in PHP 5.6 (so the result is a 500MB string). At every iteration of the test, all memory and time footprints were very very close (at ~$IterationNumber*1MB). The runtime of both tests was 50.398 seconds and 50.843 seconds consecutively which are most likely within acceptable margins of error.

Garbage collection of strings that are no longer referenced seems to be pretty immediate, even without ever leaving the scope. Since the strings are mutable, no extra memory is really required after the fact.


HOWEVER, The following tests showed that there is a different in peak memory usage WHILE the strings are being concatenated.


$OneMB=str_repeat('x', 1024*1024);
$Final=$OneMB.$OneMB.$OneMB.$OneMB.$OneMB;
print memory_get_peak_usage();
Result=10,806,800 bytes (~10MB w/o the initial PHP memory footprint)

$OneMB=str_repeat('x', 1024*1024);
$Final=implode('', Array($OneMB, $OneMB, $OneMB, $OneMB, $OneMB));
print memory_get_peak_usage();
Result=6,613,320 bytes (~6MB w/o the initial PHP memory footprint)

So there is in fact a difference that could be significant in very very large string concatenations memory-wise (I have run into such examples when creating very large data sets or SQL queries).

But even this fact is disputable depending upon the data. For example, concatenating 1 character onto a string to get 50 million bytes (so 50 million iterations) took a maximum amount of 50,322,512 bytes (~48MB) in 5.97 seconds. While doing the array method ended up using 7,337,107,176 bytes (~6.8GB) to create the array in 12.1 seconds, and then took an extra 4.32 seconds to combine the strings from the array.


Anywho... the below is the benchmark code I mentioned at the beginning which shows the methods are pretty much equal. It outputs a pretty HTML table.

<?
//Please note, for the recursion test to go beyond 256, xdebug.max_nesting_level needs to be raised.
//You also may need to update your memory_limit depending on the number of iterations

//Output the start memory
print 'Start: '.memory_get_usage()."B<br><br>Below test results are in MB<br>";

//Our 1MB string
global $OneMB, $NumIterations;
$OneMB=str_repeat('x', 1024*1024);
$NumIterations=500;

//Run the tests
$ConcatTest=RunTest('ConcatTest');
$ImplodeTest=RunTest('ImplodeTest');
$RecurseTest=RunTest('RecurseTest');

//Output the results in a table
OutputResults(
  Array('ConcatTest', 'ImplodeTest', 'RecurseTest'),
  Array($ConcatTest, $ImplodeTest, $RecurseTest)
);

//Start a test run by initializing the array that will hold the results and manipulating those results after the test is complete
function RunTest($TestName)
{
  $CurrentTestNums=Array();
  $TestStartMem=memory_get_usage();
  $StartTime=microtime(true);
  RunTestReal($TestName, $CurrentTestNums, $StrLen);
  $CurrentTestNums[]=memory_get_usage();

  //Subtract $TestStartMem from all other numbers
  foreach($CurrentTestNums as &$Num)
    $Num-=$TestStartMem;
  unset($Num);

  $CurrentTestNums[]=$StrLen;
  $CurrentTestNums[]=microtime(true)-$StartTime;

  return $CurrentTestNums;
}

//Initialize the test and store the memory allocated at the end of the test, with the result
function RunTestReal($TestName, &$CurrentTestNums, &$StrLen)
{
  $R=$TestName($CurrentTestNums);
  $CurrentTestNums[]=memory_get_usage();
  $StrLen=strlen($R);
}

//Concatenate 1MB string over and over onto a single string
function ConcatTest(&$CurrentTestNums)
{
  global $OneMB, $NumIterations;
  $Result='';
  for($i=0;$i<$NumIterations;$i++)
  {
    $Result.=$OneMB;
    $CurrentTestNums[]=memory_get_usage();
  }
  return $Result;
}

//Create an array of 1MB strings and then join w/ an implode
function ImplodeTest(&$CurrentTestNums)
{
  global $OneMB, $NumIterations;
  $Result=Array();
  for($i=0;$i<$NumIterations;$i++)
  {
    $Result[]=$OneMB;
    $CurrentTestNums[]=memory_get_usage();
  }
  return implode('', $Result);
}

//Recursively add strings onto each other
function RecurseTest(&$CurrentTestNums, $TestNum=0)
{
  Global $OneMB, $NumIterations;
  if($TestNum==$NumIterations)
    return '';

  $NewStr=RecurseTest($CurrentTestNums, $TestNum+1).$OneMB;
  $CurrentTestNums[]=memory_get_usage();
  return $NewStr;
}

//Output the results in a table
function OutputResults($TestNames, $TestResults)
{
  global $NumIterations;
  print '<table border=1 cellspacing=0 cellpadding=2><tr><th>Test Name</th><th>'.implode('</th><th>', $TestNames).'</th></tr>';
  $FinalNames=Array('Final Result', 'Clean');
  for($i=0;$i<$NumIterations+2;$i++)
  {
    $TestName=($i<$NumIterations ? $i : $FinalNames[$i-$NumIterations]);
    print "<tr><th>$TestName</th>";
    foreach($TestResults as $TR)
      printf('<td>%07.4f</td>', $TR[$i]/1024/1024);
    print '</tr>';
  }

  //Other result numbers
  print '<tr><th>Final String Size</th>';
  foreach($TestResults as $TR)
    printf('<td>%d</td>', $TR[$NumIterations+2]);
  print '</tr><tr><th>Runtime</th>';
    foreach($TestResults as $TR)
      printf('<td>%s</td>', $TR[$NumIterations+3]);
  print '</tr></table>';
}
?>
Deep object compare for javascript
function DeepObjectCompare(O1, O2)
{
	try {
		DOC_Val(O1, O2, ['O1->O2', O1, O2]);
		return DOC_Val(O2, O1, ['O2->O1', O1, O2]);
	} catch(e) {
		console.log(e.Chain);
		throw(e);
	}
}
function DOC_Error(Reason, Chain, Val1, Val2)
{
	this.Reason=Reason;
	this.Chain=Chain;
	this.Val1=Val1;
	this.Val2=Val2;
}

function DOC_Val(Val1, Val2, Chain)
{
	function DoThrow(Reason, NewChain) { throw(new DOC_Error(Reason, NewChain!==undefined ? NewChain : Chain, Val1, Val2)); }

	if(typeof(Val1)!==typeof(Val2))
		return DoThrow('Type Mismatch');
	if(Val1===null || Val1===undefined)
		return Val1!==Val2 ? DoThrow('Null/undefined mismatch') : true;
	if(Val1.constructor!==Val2.constructor)
		return DoThrow('Constructor mismatch');
	switch(typeof(Val1))
	{
		case 'object':
			for(var m in Val1)
			{
				if(!Val1.hasOwnProperty(m))
					continue;
				var CurChain=Chain.concat([m]);
				if(!Val2.hasOwnProperty(m))
					return DoThrow('Val2 missing property', CurChain);
				DOC_Val(Val1[m], Val2[m], CurChain);
			}
			return true;
		case 'number':
			if(Number.isNaN(Val1))
				return !Number.isNaN(Val2) ? DoThrow('NaN mismatch') : true;
		case 'string':
		case 'boolean':
			return Val1!==Val2 ? DoThrow('Value mismatch') : true;
		case 'function':
			if(Val1.prototype!==Val2.prototype)
				return DoThrow('Prototype mismatch');
			if(Val1!==Val2)
				return DoThrow('Function mismatch');
			return true;
		default:
			return DoThrow('Val1 is unknown type');
	}
}