Home Page
Archive > Posts > Tags > Examples
Search:

Checking permissions before updating Flatpaks
AI IS FUCKING CRAZY

I wanted to make a script to check if flatpak update app permissions have changed before updating them. I decided to use ChatGPTs newest model, 01-preview for this task and holy crap am I impressed. I had to give it 14 commands to get to the final product and it seems to work great. That number could have been reduced quite a bit had I done things a bit differently.

I still had to find the problems, look up reference docs to correct it, and even debug in an IDE a little. But just telling it where the problems were, it got there in the end, and its user interaction output is way better than what I was planning on doing.

Anywho, here are the commands I gave it, followed by the code it gave me in the end. I have not throughly checked and tested the code, but it seems to work from a few quick trials. For demonstration purposes, I have left all of my original typos uncorrected below.


Write me a program in php that checks if there are any updates for my installed flatpaks. and then for each flatpak check and see if the permissions have changed between my current version and the new version. report to me the changes. then ask me if i want to continue with the updates.
your code doesnt work.
first $command = 'flatpak list --app --updates --columns=application,app,branch,origin,installation'; should be flatpak remote-ls --updates --columns=application,app,branch,origin (installation is an invalid column)
and then flatpak info --show-permissions doesnt give the original permissions. you have to get the original non-user-modified permissions from /var/lib/flatpak
the line if (empty($output)) { is incorrect, it should be if (count($output)) { as $output is an array, not a string
$columns = preg_split('/\s{2,}/', trim($line)); is incorrect. its tab delimited, so it just needs to be explode("\t", trim($line));
you also have a problem with the $metadataPath. its not always in /var/lib/flatpak/app/ . It may be in /var/lib/flatpak/runtime/. You can confirm which directory it should be in with the options column having "runtime" as an option.
in the --columns for your first command, "application" should be "name"
actually, you cant get the options from flatpak remote-ls --updates . you have to get the options from flatpak list
flatpak remote-info does not have an option --show-permissions
in your flatpak remote-info command, there is no --branch option.
there is no --assumeno option
there is no --dry-run permission. perhaps use --no-deploy?
you are incorrect. flatpak remote-info does not have an option --show-permissions . we already went over that. perhaps flatpak remote-info --show-metadata to get the permissions
flatpak remote-info does not have --branch
when checking the metadata, do not compare the info under [Application]
also, move the "Get architecture" line above its for loop
you have a problem with exec($remoteInfoCommand, $remoteOutput); you need to always empty $remoteOutput before running the exec. Otherwise, it appends the data to what is already there.


<?php
/**
 * This script checks for Flatpak updates and reports permission changes.
 */

// Function to parse permissions from Flatpak metadata
function parsePermissions($content) {
    $permissions = [];
    $lines = is_array($content) ? $content : explode("\n", $content);
    $currentSection = '';
    $skipSections = ['Application']; // Sections to skip

    foreach ($lines as $line) {
        $line = trim($line);

        if (empty($line)) {
            continue;
        }

        // Check for section headers
        if (preg_match('/^\[(.*)\]$/', $line, $matches)) {
            $currentSection = $matches[1];

            // Skip the [Application] section
            if (in_array($currentSection, $skipSections)) {
                $currentSection = '';
                continue;
            }

            $permissions[$currentSection] = [];
        } elseif ($currentSection !== '') {
            // Only process lines within non-skipped sections
            $parts = explode('=', $line, 2);
            if (count($parts) == 2) {
                $key = $parts[0];
                $values = explode(';', trim($parts[1], ';'));
                $permissions[$currentSection][$key] = $values;
            } else {
                // Handle keys without '=' (e.g., single permissions)
                $permissions[$currentSection][$line] = [];
            }
        }
    }

    return $permissions;
}

// Function to compare permissions
function comparePermissions($current, $new) {
    $differences = [];

    // Get all sections
    $sections = array_unique(array_merge(array_keys($current), array_keys($new)));

    foreach ($sections as $section) {
        $currentSection = isset($current[$section]) ? $current[$section] : [];
        $newSection = isset($new[$section]) ? $new[$section] : [];

        // Get all keys in this section
        $keys = array_unique(array_merge(array_keys($currentSection), array_keys($newSection)));

        foreach ($keys as $key) {
            $currentValues = isset($currentSection[$key]) ? $currentSection[$key] : [];
            $newValues = isset($newSection[$key]) ? $newSection[$key] : [];

            // Compare values
            $added = array_diff($newValues, $currentValues);
            $removed = array_diff($currentValues, $newValues);

            if (!empty($added) || !empty($removed)) {
                $differences[$section][$key] = [
                    'added' => $added,
                    'removed' => $removed,
                ];
            }
        }
    }

    return $differences;
}

//Dakusan: Added this function and wrapped other echo statements into it after the fact
//Function to output a line with appId (if given), colored red (IsError=true) or green (IsError=false)
function outputColor($appId, $str, $isError=true)
{
    //Determine if coloring should be used
    static $hasColors=null;
    if(!isset($hasColors)) {
        $hasColors=stream_isatty(STDOUT) && (!trim(`command -v tput`) || intval(`tput colors`)>=16);
    }

    echo
        (!$hasColors ? '' : ($isError ? "\e[31m" : "\e[32m")).
        ($appId ? "$appId: " : '').
        $str.
        (!$hasColors ? '' : "\e[0m").
	"\n";
}

// Function to display permission differences
function displayDifferences($appId, $differences) {
    outputColor($appId, 'Permission changes:');
    foreach ($differences as $section => $keys) {
        outputColor(null, "  [$section]");
        foreach ($keys as $key => $changes) {
            if (!empty($changes['added'])) {
                outputColor(null, "    + $key = " . implode(';', $changes['added']));
            }
            if (!empty($changes['removed'])) {
                outputColor(null, "    - $key = " . implode(';', $changes['removed']));
            }
        }
    }
    echo "\n";
}

// Get architecture (moved above the loop)
$archCommand = 'flatpak --default-arch';
exec($archCommand, $archOutput);
$arch = trim(implode('', $archOutput));

// Step 1: Get the list of installed Flatpaks
$installedCommand = 'flatpak list -a --columns=application,branch,origin,options'; //Dakusan: Added -a after the fact
exec($installedCommand, $installedOutput);

// Remove header line if present
if (!empty($installedOutput) && strpos($installedOutput[0], 'Application ID') !== false) {
    array_shift($installedOutput);
}

// Build an associative array of installed applications
$installedApps = [];

foreach ($installedOutput as $line) {
    // The output is tab-delimited
    $columns = explode("\t", trim($line));
    if (count($columns) >= 4) {
        $appId = $columns[0];
        $branch = $columns[1];
        $origin = $columns[2];
        $options = $columns[3];

        $installedApps[$appId] = [
            'appId' => $appId,
            'branch' => $branch,
            'origin' => $origin,
            'options' => $options,
        ];
    }
}

// Get the list of available updates
$updatesCommand = 'flatpak remote-ls --updates --columns=application';
exec($updatesCommand, $updatesOutput);

// Remove header line if present
if (!empty($updatesOutput) && strpos($updatesOutput[0], 'Application ID') !== false) {
    array_shift($updatesOutput);
}

// Build a list of applications that have updates
$updatesAvailable = array_map('trim', $updatesOutput);

if (empty($updatesAvailable)) {
    echo "No updates available for installed Flatpaks.\n";
    exit(0);
}

$permissionChanges = [];

foreach ($updatesAvailable as $appId) {
    if (!isset($installedApps[$appId])) {
        outputColor($appId, 'Installed app not found. Skipping.'); //Dakusan: Added this line after the fact
        continue;
    }

    $app = $installedApps[$appId];
    $branch = $app['branch'];
    $origin = $app['origin'];
    $options = $app['options'];

    // Determine if it's an app or runtime
    $isRuntime = strpos($options, 'runtime') !== false;

    // Paths to the metadata files
    if ($isRuntime) {
        $metadataPath = "/var/lib/flatpak/runtime/$appId/$arch/$branch/active/metadata";
    } else {
        $metadataPath = "/var/lib/flatpak/app/$appId/$arch/$branch/active/metadata";
    }

    // Check if the metadata file exists
    if (!file_exists($metadataPath)) {
        outputColor($appId, 'Metadata file not found. Skipping.');
        continue;
    }

    // Read current permissions from the metadata file
    $metadataContent = file_get_contents($metadataPath);
    $currentPermissions = parsePermissions($metadataContent);

    // Get new metadata from remote
    $ref = $appId . '/' . $arch . '/' . $branch;
    $remoteInfoCommand = 'flatpak remote-info --show-metadata ' . escapeshellarg($origin) . ' ' . escapeshellarg($ref);

    // Clear $remoteOutput before exec()
    $remoteOutput = [];
    exec($remoteInfoCommand, $remoteOutput);

    if (empty($remoteOutput)) {
        outputColor($appId, 'Failed to retrieve remote metadata. Skipping.');
        continue;
    }

    // Parse new permissions from the remote metadata
    $newPermissions = parsePermissions($remoteOutput);

    // Compare permissions
    $differences = comparePermissions($currentPermissions, $newPermissions);

    if (!empty($differences)) {
        $permissionChanges[$appId] = $differences;
        displayDifferences($appId, $differences);
    } else { //Dakusan: Added this condition after the fact
        outputColor($appId, 'No permission changes found.', false);
    }
}

// If there are no permission changes, inform the user
if (empty($permissionChanges)) {
    echo "No permission changes detected in the available updates.\n";
}

// Ask user if they want to proceed
echo "Do you want to proceed with the updates? (y/N): ";
$handle = fopen("php://stdin", "r");
$line = fgets($handle);
$answer = trim(strtolower($line));

if ($answer == 'y' || $answer == 'yes') {
    // Proceed with updates
    echo "Updating Flatpaks...\n";
    passthru('flatpak update -y');
} else {
    echo "Updates canceled.\n";
}

fclose($handle);
?>
Plex recover posters & art from previous backup

My Plex server decided to go through my database and screw with all the posters and art, so I made the below script which can be used to recover them. You should only use this if doing a full revert to backup is not a viable option. Do also note that if you have refreshed the metadata on any tv series since the backup, it could cause issues with the episode posters.


#Run the following commands as root sudo su #User variables SERVICE_NAME='plexmediaserver' BACKUP_LOCATION='/ROOT_BACKUP_LOCATION/' PLEX_USER='plex' #Derived variables (Slashes at end of directory paths are required) SQLITE3_PATH="/usr/lib/$SERVICE_NAME/Plex SQLite" DB_PATH="/var/lib/$SERVICE_NAME/Library/Application Support/Plex Media Server/Plug-in Support/Databases/com.plexapp.plugins.library.db" IMG_PATH="/var/lib/$SERVICE_NAME/Library/Application Support/Plex Media Server/Metadata/" CACHE_PATH="/var/lib/$SERVICE_NAME/Library/Application Support/Plex Media Server/Cache/PhotoTranscoder/" #Stop the plex service until we have updated everything echo "Stopping Plex" systemctl stop "$SERVICE_NAME" #Set the (user_thumb_url, user_art_url, and updated_at) from the old database in the new database. #The updated_at is required to match the cached images echo "Updating database" sudo -u "$PLEX_USER" "$SQLITE3_PATH" "$DB_PATH" -cmd "ATTACH DATABASE '$BACKUP_LOCATION$DB_PATH' as OLDDB" "
UPDATE metadata_items AS NMDI SET user_thumb_url=OMDI.user_thumb_url, user_art_url=OMDI.user_art_url, updated_at=OMDI.updated_at FROM OLDDB.metadata_items AS OMDI WHERE NMDI.id=OMDI.id
"
#Copy the posters and art from the old directory to the new directory echo "Copying posters" cd "$BACKUP_LOCATION$IMG_PATH" find -type d \( -name "art" -o -name "posters" \) | xargs -n100 bash -c 'rsync -a --relative "$@" "'"$IMG_PATH"'"' _ #Copy the cached images over echo "Copying cache" rsync -a "$BACKUP_LOCATION$CACHE_PATH" "$CACHE_PATH" #Restart plex echo "Starting Plex" systemctl start "$SERVICE_NAME"
Password transfer script for Roboform to BitWarden
I created this script because I wasn’t happy with the native Roboform importer inside Bitwarden. This fixed multiple problems including:
* Ignoring MatchUrls
* Parent folders weren’t created if they had no items in them (e.x. if I had identities in Financial/Banks, but nothing in /Financial, then it wouldn’t be created /Financial)
* Completely ignoring the extra fields (RfFieldsV2)


This fixes all those problems.
This needs to be ran via php in a command line interface, so `php convert.php`. It requires the [bitwarden cli “bw” command](https://bitwarden.com/help/cli/). There are 2 optional arguments you can pass: 1) The name of the import file. If not given, `./RfExport.csv` will be used 2) The “bw” session token. If not given as a parameter, it can be set directly inside this file on line #4. You get this token by running `bw unlock` (after logging in with `bw login`).
This script does the following: * Reads from a csv file exported by Roboform to import into bitwarden * Runs `bw sync` before processing * Imported Fields: * Name: Becomes the name of the item * Url: Becomes the URL for the item * MatchUrl: If this is not the same “Url” then it is added as another URL for the item * Login: Becomes the login (user) name * Pwd: Becomes the password * Note: Becomes the note * Folder: Item is put in this folder * RfFieldsV2 (multiple): * Fields that match the login or password fields are marked as “linked fields” to those. If the field names are the following, they are not added as linked fields: *user, username, login, email, userid, user id, user id$, user_email, login_email, user_login, password, passwd, password$, pass, user_password, login_password, pwd, loginpassword* * Fields that are different than the login or password are added as appropriate “Custom fields”. Supported types: '', rad, sel, txt, pwd, rck, chk, are * Each field has 5 values within it, and 3 of those seem to be different types of names. So as long as they are not duplicates of each other, each name is stored as a seperate field. * If all fields are blank but “name” and “note” then the item is considered a “Secure Note” * While reading the csv file for import, errors and warnings are sent to stdout * After the csv import has complete, it will give a total number of warnings/errors and ask the user if they want to continue * Creates missing folders (including parents that have no items in them) * During export to bitwarden: * This process can be quite slow since each instance of running “bw” has to do a lot of work * Keeps an active count of items processed and the total number of items * If duplicates are found (same item name and folder) then the user is prompted for what they want to do. Which is either s=skip or o=overwrite. A capitol of those letters can be given to perform the same action on all subsequent duplicates. * Errors are sent to stdout


Download script here
<?php
global $BW_SESSION;
/** @noinspection SpellCheckingInspection,RedundantSuppression */
$BW_SESSION=($argv[2] ?? 'FILL_ME_IN');

print RunMe($argv)."\n";

//Make sure “bw” is installed, given session is valid, and is synced
function CheckBWStatus(): ?string
{
	if(is_string($StatusCheck=RunBW('status')))
		return $StatusCheck;
	else if(!is_object($StatusCheck) || !isset($StatusCheck->status))
		return 'bw return does not have status';
	else if($StatusCheck->status==='locked')
		return 'bw is locked';
	else if($StatusCheck->status!=='unlocked')
		return 'bw status is invalid';

	$ExpectedMessage='Syncing complete.';
	if(is_string($SyncCheck=RunBW('sync', null, false)))
		return $SyncCheck;
	else if($SyncCheck[0]!==$ExpectedMessage)
		return "Sync expected “${ExpectedMessage}” but got “${SyncCheck[0]}”";

	return null;
}

//Pull the known folders
function PullKnownFolders(): string|array
{
	$FolderIDs=[];
	if(is_string($FolderList=RunBW('list folders')))
		return $FolderList;
	foreach($FolderList as $Folder)
		$FolderIDs[$Folder->name]=$Folder->id;
	unset($FolderIDs['No Folder']);
	return $FolderIDs;
}

//Get the file handle and the column indexes
function GetFileAndColumns(): string|array
{
	//Prepare the import file for reading
	ini_set('default_charset', 'UTF-8');
	$FileName=$argv[1] ?? './RfExport.csv';
	if(!file_exists($FileName))
		return 'File not found: '.$FileName;
	else if(!($f=fopen($FileName, 'r')))
		return 'Error opening file: '.$FileName;

	//Check the header row values
	if(!($HeadRow=fgetcsv($f)))
		return 'Error opening file: '.$FileName;
	else if(!count($HeadRow) || $HeadRow[0]===NULL)
		return 'Missing head row: '.$FileName;
	if(str_starts_with($HeadRow[0], "\xEF\xBB\xBF")) //Remove UTF8 BOM
		$HeadRow[0]=substr($HeadRow[0], 3);
	unset($FileName);

	$ExpectedCols=array_flip(['Url', 'Name', 'MatchUrl', 'Login', 'Pwd', 'Note', 'Folder', 'RfFieldsV2']);
	$ColNums=[];
	foreach($HeadRow as $Index => $Name) {
		if(!isset($ExpectedCols[$Name]))
			if(isset($ColNums[$Name]))
				return 'Duplicate column title: '.$Name;
			else
				return 'Unknown column title: '.$Name;
		$ColNums[$Name]=$Index;
		unset($ExpectedCols[$Name]);
	}
	if(count($ExpectedCols))
		return 'Required columns not found: '.implode(', ', array_keys($ExpectedCols));
	else if($ColNums['RfFieldsV2']!==count($ColNums)-1)
		return 'RfFieldsV2 must be the last column';

	return [$f, $ColNums];
}

//Process the rows
function ProcessRows($f, array $ColNums, array &$FolderIDs): array
{
	$Counts=['Error'=>0, 'Warning'=>0, 'Success'=>0];
	$RowNum=0;
	$FolderNames=[];
	$Items=[];
	while($Line=fgetcsv($f, null, ",", "\"", "")) {
		//Process the row and add result type to counts
		$RowNum++;
		$Result=ProcessRow($Line, $ColNums);
		$Counts[$Result[0]]++;

		//Handle errors and warnings
		if($Result[0]!=='Success') {
				print "$Result[0]: Row #$RowNum $Result[1]\n";
				continue;
		} else if(isset($Result['Warnings']) && count($Result['Warnings'])) {
			print "Warning(s): Row #$RowNum ".implode("\n", $Result['Warnings'])."\n";
			$Counts['Warning']+=count($Result['Warnings']);
		}

		//Add the folder to the list of folders (strip leading slash)
		$FolderName=($Line[$ColNums['Folder']] ?? '');
		$FolderName=substr($FolderName, ($FolderName[0] ?? '')==='/' ? 1 : 0);
		$Result['Data']->folderId=&$FolderIDs[$FolderName];
		$FolderNames[$FolderName]=1;

		//Save the entry
		$Items[]=$Result['Data'];
	}
	fclose($f);
	return [$Counts, $FolderNames, $Items, $RowNum];
}

//Process a single row
function ProcessRow($Line, $ColNums): array
{
	//Skip blank lines
	if(!count($Line) || (count($Line)===1 && $Line[0]===null))
		return ['Warning', 'is blank'];

	//Extract the columns by name
	$C=(object)[];
	foreach($ColNums as $Name => $ColNum)
		$C->$Name=$Line[$ColNum] ?? '';

	//Check for errors and end processing early for notes
	if($C->Name==='')
		return ['Error', 'is missing a name'];
	else if($C->Url==='' && $C->MatchUrl==='' & $C->Login==='' && $C->Pwd==='') {
		return ['Success', 'Data'=>(object)['type'=>2, 'notes'=>$C->Note, 'name'=>$C->Name, 'secureNote'=>['type'=>0]]];
	}

	//Create a login card
	$Ret=(object)[
		'type'=>1,
		'name'=>$C->Name,
		'notes'=>$C->Note===''  ? null : $C->Note,
		'login'=>(object)[
			'uris'=>[(object)['uri'=>$C->Url]],
			'username'=>$C->Login,
			'password'=>$C->Pwd,
		],
	];
	if($C->MatchUrl!==$C->Url)
		$Ret->login->uris[]=(object)['uri'=>$C->MatchUrl];

	//Create error string for mist fields
	$i=0; //Declared up here so it can be used in $FormatError
	$FormatError=function($Format, ...$Args) use ($ColNums, &$i): string {
		return 'RfFieldsV2 #'.($i-$ColNums['RfFieldsV2']+1).' '.sprintf($Format, ...$Args);
	};

	//Handle misc/extra (RfFieldsV2) fields
	$Warnings=[];
	for($i=$ColNums['RfFieldsV2']; $i<count($Line); $i++) {
		//Pull in the parts of the misc field
		if(count($Parts=str_getcsv($Line[$i], ",", "\"", ""))!==5)
			return ['Error', $FormatError('is not in the correct format')];
		$Parts=(object)array_combine(['Name', 'Name3', 'Name2', 'Type', 'Value'], $Parts);
		if(!trim($Parts->Name))
			return ['Error', $FormatError('invalid blank name found')];

		//Figure out which “Names” to process
		$PartNames=[trim($Parts->Name)];
		foreach([$Parts->Name2, $Parts->Name3] as $NewName)
			if($NewName!=='' && !in_array($NewName, $PartNames))
				$PartNames[]=$NewName;

		//Process the different names
		foreach($PartNames as $PartName) {
			//Determined values for the item
			$LinkedID=null;
			$PartValue=$Parts->Value;
			/** @noinspection PhpUnusedLocalVariableInspection */ $Type=null; //Overwritten in all paths

			//Handle duplicate usernames and password fields
			if(($IsLogin=($PartValue===$C->Login)) || $PartValue===$C->Pwd) {
				if($Parts->Type!==($IsLogin ? 'txt' : 'pwd'))
					$Warnings[]=$FormatError('expected type “%s” but got “%s”', $IsLogin ? 'txt' : 'pwd', $Parts->Type);
				$Type=3;
				$PartValue=null;
				$LinkedID=($IsLogin ? 100 : 101);

				//If a common name then do not add the linked field
				/** @noinspection SpellCheckingInspection */
				if(in_array(
					strToLower($PartName),
					$IsLogin ?
						['user', 'username', 'login', 'email', 'userid', 'user id', 'user id$', 'user_email', 'login_email', 'user_login'] :
						['password', 'passwd', 'password$', 'pass', 'user_password', 'login_password', 'pwd', 'loginpassword']
				))
					continue;
			} else {
				//Convert the type
				switch($Parts->Type) {
					case '': //For some reason some text fields have no type given
					case 'rad':
					case 'sel':
					case 'txt': $Type=0; break;
					case 'pwd': $Type=1; break;
					case 'rck': //Radio check?
						$Type=0;
						if($PartName!==$Parts->Name) //Ignore second names for radio checks
							continue 2;
						if(count($RadioParts=explode(':', $PartName))!==2)
							return ['Error', $FormatError('radio name needs 2 parts separated by a colon')];
						$PartName=$RadioParts[0];
						$PartValue=$RadioParts[1];
						break;
					case 'chk':
						$Type=2;
						if($PartValue==='*' || $PartValue==='1')
							$PartValue='true';
						else if($PartValue==='0')
							$PartValue='false';
						else
							return ['Error', $FormatError('invalid value for chk type “%s”', $PartValue)];
						break;
					case 'are': //This seems to be a captcha
						continue 2;
					default:
						return ['Error', $FormatError('invalid field type “%s”', $Parts->Type)];
				}
			}

			//Create the return object
			if(!isset($Ret->fields))
				$Ret->fields=[];
			$Ret->fields[]=(object)[
				'name'=>$PartName,
				'value'=>$PartValue,
				'type'=>$Type,
				'linkedId'=>$LinkedID,
			];
		}
	}

	//Return finished item and warnings
	if(count($Warnings))
		$Warnings[0]="RfFieldsV2 warnings:\n".$Warnings[0];
	return ['Success', 'Data'=>$Ret, 'Warnings'=>$Warnings];
}

//Ask the user if they want to continue
function ConfirmContinue($Counts, $RowNum): ?string
{
	//Ask the user if they want to continue
	printf("Import from spreadsheet is finished. Total: %d; Successful: %d, Errors: %d, Warnings: %d\n", $RowNum, $Counts['Success'], $Counts['Error'], $Counts['Warning']);
	while(1) {
		print 'Do you wish to continue the export to bitwarden? (y/n): ';
		fflush(STDOUT);
		switch(trim(strtolower(fgets(STDIN)))) {
			case 'n':
				return 'Exiting';
			case 'y':
				break 2;
		}
	}

	return null;
}

//Get folder IDs and create parent folders for children
function GetFolders($FolderNames, &$FolderIDs): ?string
{
	foreach(array_keys($FolderNames) as $FolderName) {
		//Skip “No Folder”
		if($FolderName==='') {
			$FolderIDs[$FolderName]='';
			continue;
		}

		//Check each part of the folder tree to make sure it exists
		$FolderParts=explode('/', $FolderName);
		$CurPath='';
		foreach($FolderParts as $Index => $FolderPart) {
			$CurPath.=($Index>0 ? '/' : '').$FolderPart;

			//If folder is already cached then nothing to do
			if(isset($FolderIDs[$CurPath])) {
				continue;
			}

			//Create the folder
			print "Creating folder: $CurPath\n";
			if(is_string($FolderInfo=RunBW('create folder '.base64_encode(json_encode(['name'=>$CurPath])), 'create folder: '.$CurPath)))
				return $FolderInfo;
			else if (!isset($FolderInfo->id))
				return "bw folder create failed for “${CurPath}”: Return did not contain id";
			$FolderIDs[$CurPath]=$FolderInfo->id;
		}
	}
	return null;
}

//Pull the known items
function PullKnownItems($FoldersByID): string|array
{
	$CurItems=[];
	if(is_string($ItemList=RunBW('list items')))
		return $ItemList;
	foreach($ItemList as $Item) {
		$CurItems[($Item->folderId===null ? '' : $FoldersByID[$Item->folderId].'/').$Item->name]=$Item->id;
	}
	return $CurItems;
}

function StoreItems($Items, $CurItems, $FoldersByID): int
{
	print "\n"; //Give an extra newline before starting the processing
	$NumItems=count($Items);
	$NumErrors=0;
	$HandleDuplicatesAction=''; //The "always do this" action
	foreach($Items as $Index => $Item) {
		//Clear the current line and print the processing status
		printf("\rProcessing item #%d/%d", $Index+1, $NumItems);
		fflush(STDOUT);

		//If the item already exists then request what to do
		$FullItemName=($Item->folderId===null ? '' : $FoldersByID[$Item->folderId].'/').$Item->name;
		if(isset($CurItems[$FullItemName])) {
			//If no "always" action has been set in $HandleDuplicatesAction then ask the user what to do
			if(!($Action=$HandleDuplicatesAction)) {
				print "\n";
				while(1) {
					print "A duplicate at “${FullItemName}” was found. Choose your action (s=skip,o=overwrite,S=always skip,O=always overwrite):";
					fflush(STDOUT);
					$Val=trim(fgets(STDIN));
					if(in_array(strtolower($Val), ['s', 'o']))
						break;
				}
				$Action=strtolower($Val);
				if($Val!==$Action) //Upper case sets the "always" state
					$HandleDuplicatesAction=$Action;
			}

			//Skip the item
			if($Action==='s')
				continue;

			//Overwrite the item
			if(is_string($Ret=RunBW('edit item '.$CurItems[$FullItemName].' '.base64_encode(json_encode($Item)), 'edit item: '.$FullItemName))) {
				$NumErrors++;
				printf("\nError on item #%d: %s\n", $Index+1, $Ret);
			}
			continue;
		}

		//Create the item
		if(is_string($Ret=RunBW('create item '.base64_encode(json_encode($Item)), 'create item: '.$FullItemName))) {
			$NumErrors++;
			printf("\nError on item #%d: %s\n", $Index+1, $Ret);
		}
	}
	return $NumErrors;
}

//Run a command through the “bw” application
function RunBW($Command, $Label=null, $DecodeJSON=true): string|object|array
{
	//$Label is set to $Command if not given
	if($Label===null)
		$Label=$Command;

	//Run the command and check the results
	global $BW_SESSION;
	exec(sprintf('BW_SESSION=%s bw %s --pretty 2>&1', $BW_SESSION, $Command), $Output, $ResultCode);
	if($ResultCode===127)
		return 'bw is not installed';
	else if($ResultCode===1)
		return "bw “${Label}” threw an error: \n".implode("\n", $Output);
	else if($ResultCode!==0)
		return "bw “${Label}” returned an invalid status code [$ResultCode]: \n".implode("\n", $Output);
	else if($Output[0]==='mac failed.')
		return 'Invalid session ID';
	else if(!$DecodeJSON)
		return [implode("\n", $Output)];
	else if(($JsonRet=json_decode(implode("\n", $Output)))===null)
		return "bw “${Label}” returned non-json result";

	//Return the json object
	return $JsonRet;
}

function RunMe($argv): string
{
	//Make sure “bw” is installed and given session is valid
	if(is_string($Ret=CheckBWStatus()))
		return $Ret;

	//Pull the known folders
	if(is_string($FolderIDs=PullKnownFolders()))
		return $FolderIDs;

	//Get the file handle and the column indexes
	if(is_string($Ret=GetFileAndColumns()))
		return $Ret;
	[$f, $ColNums]=$Ret;
	unset($argv);

	//Process the rows and ask the user if they want to continue
	[$Counts, $FolderNames, $Items, $RowNum]=ProcessRows($f, $ColNums, $FolderIDs);
	if(is_string($Ret=ConfirmContinue($Counts, $RowNum)))
		return $Ret;
	unset($Counts, $RowNum, $f, $ColNums);

	//Get folder IDs and create parent folders for children
	if(is_string($Ret=GetFolders($FolderNames, $FolderIDs)))
		return $Ret;
	unset($FolderNames);

	//Pull the known items
	$FoldersByID=array_flip($FolderIDs);
	if(is_string($CurItems=PullKnownItems($FoldersByID)))
		return $CurItems;

	//Store all items
	$FolderIDs['']=null; //Change empty folder to id=null for json insertions
	$NumErrors=StoreItems($Items, $CurItems, $FoldersByID);

	//Return completion information
	return "\nCompleted ".($NumErrors ? "with $NumErrors error(s)" : 'successfully');
}
6 annoyances in GoLang
I still love it though

Go has been around for quite a while now and has had a lot of time to grow and mature, and it is really quite a lovely language. Watching it develop since its inception has been a delight. Its native concurrency is best in the industry, it is designed for things to be super-clean and standardized, and it compiles really fast. The support for it in IntelliJ/GoLand is top-notch and makes it really a blast to work in.


There are of course a lot of design decisions in the language that could be debated, like how setting a variable has to be a top level statement that cannot be embedded in other statements, how there is no “real” class based system, and how their specific name casing conventions are actually built into the language and enforced. However, I understand the reason for all of these decisions and feel that the designers of the language were justified in the decisions they made for the reasons they give.


There are at least 6 problems I’ve run into a lot recently though that I feel could be better. I would love to jump into the source of the language and see about making these fixes myself, but the GoLang team has a history of completely ignoring outsiders, and it would be highly unlikely that they would accept anything I submitted, especially since a number of these things would require very long and complicated proposals to even start looking at. Go may be an open source language, but it is not open development.


These first 3 would be additions to the language that would not break anything, which is in line with their promise of never breaking anything.

1) Interface with constraints elements cannot be used as variable types. Example:
type intish interface{ int64 | uint64 }
func adder[T intish](val T) T { return val + 1 } //This is allowed
var val intish //Error: cannot use type intish outside a type constraint: interface contains type constraints

IntelliJ inspection Error: Interface includes constraint elements '...', can only be used in type parameters

It wouldn’t be a big change for the compiler and runtime to be able to enforce these types of constraints when typecasting and I really wish the language had this. This is the type of thing TypeScript does beautifully since it is actually part of its native design and functioning.


2) Member functions (methods with receivers) cannot have generics
type foobar int
func (*foo) bar[T any](val T) {} //Method cannot have type parameters

3) []generic cannot be typecast to []any

There is no real reason that an array containing interfaces shouldn’t be convertable via typecast to another array of interface types as long as they are compatible.

type testAny any
startVar := make([]testAny, 3)
endVar := []any(startVar) //cannot convert startVar (variable of type []testAny) to type []any


This next one is an unfortunate consequence of the language design and would not be easily fixable.


4) Import cycle not allowed

This means there cannot be circular dependencies. If package A includes package B, then package B cannot include package A. This often wasn’t a problem with languages like C since the headers were independent of the implementations, but because packages in Go do not have headers, and their package types cannot be used as method receivers in other packages, this one is just not possible. And it can be quite limiting.



This next one is a bug in either the language or the documentation, and I consider it to be a security problem, but the go team does not seem to care about it. See https://github.com/golang/go/issues/65201


[Edit on 2024-04-10] Oh wow! they fixed it!


5) sql.RawBytes modifies the current buffers instead of setting new buffers like the documentation says.

All the documentation says about RawBytes is “RawBytes is a byte slice that holds a reference to memory owned by the database itself. After a Rows.Scan into a RawBytes, the slice is only valid until the next call to Rows.Next, Rows.Scan, or Rows.Close.”


If the value you are reading is a []byte/string then reading into a RawBytes works as expected. However, if it is anything else, like an int, it reads into the buffer that RawBytes already holds. This can lead to buffer injections and other really nasty bugs. For example:

db.Exec(`CREATE TEMPORARY TABLE goTest (i int NOT NULL, str varchar(10)) ENGINE=MEMORY`)
db.Exec(`INSERT INTO goTest VALUES (?, ?)`, 6, "foobar")
var scanIn sql.RawBytes
rows, _ := db.Query(`SELECT str FROM goTest WHERE i=?`, 6)
rows.Next()
rows.Scan(&scanIn)
//Do something with scanIn
rows.Close()
rows, _ = db.Query(`SELECT i FROM goTest WHERE i=?`, 6)
rows.Next()
rows.Scan(&scanIn) //This corrupts the internal sql driver buffer since it reads an int into the pointer we received earlier


This final one has been an annoyance of mine since day 1 of working with Go, and it still annoys the hell out of me. There could be ways to fix it without breaking things, but I cannot think of any truly elegant solutions.


6) Short variable declaration can’t handle setting both new and already existing variables
This is the most common pattern you’ll see in Go by far:
var data string
if _data, err := someFunc(); err != nil {
    fmt.Println(err)
} else {
    data = _data
}

You can either do it this way, or also declare err in the outer scope, thereby polluting the scope with unneeded variables. There is no way to set both the temporary error variable and also set the outer data variable in 1 line.


One non-breaking hack would be to add a symbol before variables you wanted to set in the outer scope. Example:

var data string
if +data, err := someFunc(); err != nil {
    fmt.Println(err)
}
The math behind the RSA encryption algorithm

I’ve always thought that the RSA and Diffie–Hellman public key encryption algorithm systems are beautiful in their complex simplicity. While there are countless articles out there explaining how to implement them, I have never really found one that I think describes the math behind then in a simple way, so I thought I’d give a crack at it.

Both algorithms are derived from 3 math axioms:
  1. This is called Modular exponentiation (hereby referred to as modexp). In the following, x is a prime numbers and p is an integer less than x.
    1. p^(x  ) mod x = p (e.x. 12^(17  ) mod 17 = 12)
    2. p^(x-1) mod x = 1 (e.x. 12^(17-1) mod 17 = 1 )
  2. A further derivation from the above formulas shows that we can combine primes and they work in the same manner. In the following, x and y are prime numbers and p is an integer less than x*y.
    1. p^((x-1)*(y-1)  ) mod (x*y) = 1 (e.x. 12^((13-1)*(17-1)  ) mod (13*17) = 1 )
      Note: This formula is not used in RSA but it helps demonstrate how the formulas from part 1 becomes formula 2b.
      Due to how modexp works with primes, values of p that are multiples of x or y do not work with 2a.
    2. p^((x-1)*(y-1)+1) mod (x*y) = p (e.x. 12^((13-1)*(17-1)+1) mod (13*17) = 12)
  3. The final axiom is how modexp can be split apart the same way as in algebra where (x^a)^b === x^(a*b). For any integers p, x, y, and m:
    (p^(x*y) mod m) === ((p^x mod m)^y mod m)

With these 3 axioms we have everything we need to explain how RSA works. To execute an RSA exchange, encrypted from Bob and decrypted by Alice, the following things are needed.

The variable Variable name Who has it Who uses it Description
Prime Numbers 1 and 2 Prime1, Prime2 Alice Alice Alice will use these to derive variables PubKey, PrivKey, and Modulo. In our examples we use small numbers, but in reality, very large primes will be used, generally of at least 256 bit size.
Public key PubKey Alice, Bob Bob Alice sends this to Bob so he can encrypt data to her. Bob uses it as an exponent in a modexp.
Private key PrivKey Alice Alice Alice uses this to decrypt what Bob sends her. Alice uses it as an exponent in a modexp.
Modulo Modulo Bob, Alice Bob, Alice Alice sends this to Bob. They both use it as a modulo in a modexp
Payload Data Payload The data bob starts with and turns into EncryptedPayload. Alice derives Payload back from EncryptedPayload

Now, let’s start with axiom 2b:
Payload^((Prime1-1)*(Prime2-1)+1) mod (Prime1*Prime2) = Payload

Let’s change this up so the exponent is just 2 multiplications so we can use axiom 3 on it. We need to find 2 integers to become PubKey and PrivKey such that:
PubKey*PrivKey=(Prime1-1)*(Prime2-1)+1

And Modulo is Prime1*Prime2.
So we now have:
Payload^(PubKey*PrivKey) mod Modulo = Payload

Now, using axiom 3, we can turn it into this:
(Payload^PubKey mod Modulo)^PrivKey mod Modulo = Payload

Now, we can split this up into:
Bob calculates and sends to Alice: Payload^PubKey mod Modulo=EncryptedPayload
Alice uses the received EncryptedPayload and performs: EncryptedPayload^PrivKey mod Modulo = Payload

And the process is complete!


However, there is 1 caveat that I didn’t cover which makes the encryption that what we currently have weak. The calculation of PubKey and PrivKey from Prime1 and Prime2 needs to follow some rather specific complex rules to make the keys strong. Without this, an attacker may be able to figure out Prime1 and Prime2 from the Modulo and PubKey, and could then easily derive PrivKey from it. I generally see the PubKey as 65535, or another power of 2 minus 1.


Fixing VeraCrypt EFI Boot

I recently decided to swap around my hard drives to different SATA slots so my most used hard drives were on the fastest ports. Unfortunately, when I did this, my computer stopped booting to Windows. I never did figure out why my bootable EFI partitions only showed up randomly in BIOS depending on which hard drives were plugged in, but I found a configuration the computer liked and I was able to see the Microsoft Boot EFI partition and EFI boots on my USB keys.


The next step was to get the computer actually booting to something I could run commands on. When I try to boot directly to the EFI shell, the resolution is always screwed up and I can only see the top half of what should be visible, so I can’t actually see the command line I am typing too. This actually happens to everything I directly boot to that uses console text. The way around this for me is that I need to boot to the BIOS setup, and from there tell it to boot immediately to the EFI option of my choice when exiting the BIOS. From there, the proper resolution is used and everything is visible.


Next, in the EFI shell, you can run map to see all of the available possible mounts. This should automatically run when the EFI shell starts anyways, so you should already have that information. Any detected EFI partition on any bootable device should be given a mapping of “fs#” where # is a number. In my case, it was fs0. So to mount that, I ran mount fs0 x. “x” could be whatever you want, it doesn’t really matter. It’s analogous to a drive letter in windows, and you can make it any string (within reason, I believe anything alphanumeric should be fine). So next you would run x: to switch to that drive. From there, you can run cd EFI\Microsoft\Boot and then bootmgfw.efi to boot to windows.


Since I use VeraCrypt system encryption, I had to go to “EFI\VeraCrypt” and run DcsBoot.efi to finally boot into Windows through VeraCrypt.


Finally, to get the Windows Boot manager to start with VeraCrypt, run in the Windows command prompt bcdedit /set '{bootmgr}' path \EFI\VeraCrypt\DcsBoot.efi.

Reading Mailchimp batch request results

It’s a bit of a pain reading results from batch requests to Mailchimp. Here is a quick and dirty bash script to get and pretty print the JSON output. It could be cleaned up a little, including combining some of the commands, but meh.


#Example variables
BATCHID=abc1234567;
APIKEY=abcdefg-us11@us11.api.mailchimp.com;
APIURL=us11.api.mailchimp.com;

#Request the batch information from Mailchimp
curl --request GET --url "https://dummy:$APIKEY@$APIURL/3.0/batches/$BATCHID" 2> /dev/null | \

#Get the URL to the response
grep -oP '"response_body_url":"https:.*?"' | \
grep -oP 'https:[^"]*' | \

#Get the response
xargs wget -O - 2> /dev/null | \

#The response is a .tar.gz file with a single file in it. So get the contents of this file
tar -xzvO 2> /dev/null | \

#Pretty print the json of the full return and the “response” objects within
php -r '$Response=json_decode(file_get_contents("php://stdin"), true); foreach($Response as &$R) $R["response"]=json_decode($R["response"], true); print json_encode($Response, JSON_PRETTY_PRINT);'
Babyface Pro Volume Modification via Mousewheel

Part of my workstation’s audio setup uses the RME Babyface Pro. Until the most recent update of their software, the built-in Window’s sound’s master volume for the device was ignored. So while this script isn’t as important as before, I still find it very useful. So the following is an AutoHotkey script which modifies the master volume in the TotalMix FX window via the mousewheel (when alt+ctrl is held down). This expects the TotalMix FX window to be sized as small as it can, and to have a channel selected for the control room’s Main Out. It should look like this:

TotalMix FX Sized For Volume Modification

The script is as follows:
;Function to create lparam/wparam for SendMessage
CalculatePARAM(w1, w2)
{
	IfLess, w1, 0
		w1 := 65535 + w1 + 1
	IfLess, w2, 0
		w2 := 65535 + w2 + 1

	return (w2<<16 | w1)
}

;Send a mouse wheel action to a window
SendMouseWheel(WindowHWND, Steps, XPos, YPos)
{
	;Constants
	WM_MOUSEWHEEL := 0x20A
	WheelStepAmount := 120

	;Calculate and execute the message
	WinGetPos, ScreenX, ScreenY,,, ahk_id %WindowHWND%
	wparam := CalculatePARAM(0, Steps*WheelStepAmount)
	lparam := CalculatePARAM(XPos+ScreenX, YPos+ScreenY)
	SendMessage, %WM_MOUSEWHEEL%, %wparam%, %lparam%,, ahk_id %WindowHWND%
}

^!WheelUp::
ControlGet, ControlHWND, Hwnd,,AfxFrameOrView100s1,RME TotalMix
if ControlHWND
	SendMouseWheel(ControlHWND, 1, 36, 428)
return

^!WheelDown::
ControlGet, ControlHWND, Hwnd,,AfxFrameOrView100s1,RME TotalMix
if ControlHWND
	SendMouseWheel(ControlHWND, -1, 36, 428)
return
MD5Sum List Script
#This script takes a newline delimited file list from STDIN for md5 hashing
#This script requires the `md5sum`, `pv`, `paste`, `bc`, and 'numfmt' commands

#The output of the md5s are stored in the file specified by the first parameter
#The format for each md5 hash to the output file is "$FileName\t$Hash\n"

#File sizes are always output in megabytes with 3 decimal places
#While calculating the hashes the script keeps the user informed of the progress of both the current file and all the files as follows:
#1) Before file starts: "Hashing: $FileName ($FileSize MiB)\n"
#2) During transfer: The progress of the hash of the current file ran through `pv`
#3) During transfer: The progress of the hashing of all the files, ran through `pv`
#4) After transfer: "Finished $TotalProgressPercent% ($ProcessedBytes/$TotalBytes MiB)\n\n"

#Get $Outfile from the first argument and the $FileList from STDIN (newline delimited)
OutFile="$1";
FileList=`cat /dev/stdin`

#Format a byte count in MegaBytes with comma grouping and 3 decimal places
MbFmtNoExt ()
{
	echo "scale=3; $1/1024/1024" | bc | echo -n `xargs numfmt --grouping`
}

#Add " MiB" to the end of MbFmtNoExt
MbFmt ()
{
	echo `MbFmtNoExt $1`" MiB"
}

#Calculate and output the total size of the file list
echo -n "Calculating total size: "
TotalSize=`echo "$FileList" | xargs -d"\n" stat --printf="%s\n" | paste -s -d+ | bc`
MbFmt $TotalSize
echo #Add an extra newline

#Create a fifo to keep track of the total complete
TotalDoneFifo=$(mktemp)
TotalDoneBG=0
rm "$TotalDoneFifo"
mkfifo "$TotalDoneFifo"
cat > "$TotalDoneFifo" & #Do not close read of fifo
Cleanup() {
	rm "$TotalDoneFifo"
	kill $TotalDoneBG
	exit 0
}
trap Cleanup SIGTERM SIGINT

#Start the TOTAL line
tail -f "$TotalDoneFifo" | pv -s $TotalSize -F  "%b %t %p %e" > /dev/null &
TotalDoneBG=$!

#Run over the list (newline delimited)
CalculatedBytes=0
IFS=$'\n'
for FileName in `echo "$FileList"`
do
	#Output the file size and name to STDOUT
	FileSize=`stat --printf="%s" "$FileName"`
	echo "Hashing: $FileName ("`MbFmt $FileSize`")"

	#Output the filename to $OutFile
	echo -n $FileName$'\t' >> $OutFile

	#Run the md5 calculation with `pv` progress
	#Output the hash to $OutFile after the FileName and a tab
	cat "$FileName" | pv -s $FileSize -c | tee -a "$TotalDoneFifo" | md5sum | awk '{print $1}' >> $OutFile

	#Output the current progress for the entire file list
	#Format: "Finished $TotalProgressPercent% ($ProcessedBytes/$TotalBytes MiB)\n\n"
	CalculatedBytes=$(($CalculatedBytes+$FileSize))
	echo -n "Finished "
	printf "%.3f" `echo "scale=4; $CalculatedBytes*100/$TotalSize" | bc`
	echo "% ("`MbFmtNoExt $CalculatedBytes`"/"`MbFmt $TotalSize`$')\n'
done

Cleanup
Opening IntelliJ via the Symfony ide setting
Nasty Escaping Problems

I wanted a simple setup in Symfony where the programmer could define their ide in the parameters file. Sounds simple, right? Just add something like ide_url: 'phpstorm' to parameters.yml->parameters and ide: '%ide_url%' to config.yml->framework. And it worked great, however, my problem was much more convoluted.

I am actually running the Symfony server on another machine and am accessing the files via NFS on Windows. So, it would try to open PHPStorm with the incorrect path. Symfony suggests the solution to this is writing your own custom URL handler with %f and %l to fill in the filename and line, and use some weird formatting to do string replaces. So I wrote in 'idea://%%f:%%l&/PROJECT_PATH_ON_SERVER/>DRIVE_LETTER:/PATH_ON_WINDOWS/' (note the double parenthesis for escaping) directly in the config.yml and that worked, kind of. The URL was perfect, but IntelliJ does not seem to register the idea:// protocol handler like PHPStorm theoretically does (according to some online threads) with phpstorm://. So I had to write my own solution.

This answer on stackoverflow has the answer on how to register a protocol handler in Windows. But the problem now was that the first parameter passed to IntelliJ started with the idea:// which broke the command-line file-open. So I ended up writing a script to fix this, which is at the bottom.

OK, so we’re almost there; I just had to paste the string I came up with back into the parameters.yml, right? I wish. While this was now working properly in a Symfony error page, a new problem arose. The Symfony bin/console debug:config framework command was failing with You have requested a non-existent parameter "f:". The darn thing was reading the unescaped string as 'idea://%f:%l&...' and it thought %f:% was supposed to be a variable. Sigh.

So the final part was to double escape the strings with 4 percent signs. 'idea://%%%%f:%%%%l&...'. Except now the URL on the error pages gave me idea://%THE_PATH:%THE_LINE_NUMBER. It was adding an extra parenthesis before both values. This was simple to resolve in the script I wrote, so I was finally able to open scripts directly from the error page. Yay.



So here is the final set of data that has to be added to make this work:
Registry: HKCR/idea/(default) = URL:idea Protocol HKCR/idea/URL Protocol = "" HKCR/idea/shell/open/command = "PATH_TO_PHP" -f "PATH_TO_SCRIPT" "%1" "%2" "%3" "%4" "%5" "%6" "%7" "%8" "%9" parameters.yml: parameters: ide_url: 'idea://%%%%f:%%%%l&/PROJECT_PATH_ON_SERVER/>DRIVE_LETTER:/PATH_ON_WINDOWS/' config.yml: framework: ide: '%ide_url%' PHP_SCRIPT_FILE:
<?php
function DoOutput($S)
{
	//You might want to do something like output the error to a file or do an alert here
	print $S;
}

if(!isset($argv[1]))
	return DoOutput('File not given');
if(!preg_match('~^idea://(?:%25|%)?([a-z]:[/\\\\][^:]+):%?(\d+)/?$~i', $argv[1], $MatchData))
	return DoOutput('Invalid format: '.$argv[1]);

$FilePath=$MatchData[1];
if(!file_exists($FilePath))
	return DoOutput('Cannot find file: '.$FilePath);

$String='"C:\Program Files\JetBrains\IntelliJ IDEA 2018.1.6\bin\idea64.exe" --line '.$MatchData[2].' '.escapeshellarg($FilePath);
DoOutput($String);
shell_exec($String);
?>
Download all of an author’s fictionpress stories

I was surprised in my failure to find a script online to download all of an author’s stories from Fiction Press or Fan Fiction.Net, so I threw together the below.

If you go to an author’s page in a browser (only tested in Chrome) it should have all of their stories, and you can run the following script in the console (F12) to grab them all. Their save name format is STORY_NAME_LINK_FORMAT - CHAPTER_NUMBER.html. It works as follows:

  1. Gathers all of the names, chapter 1 links, and chapter counts for each story.
  2. Converts this information into a list of links it needs to download. The links are formed by using the chapter 1 link, and just replacing the chapter number.
  3. It then downloads all of the links to your current browser’s download folder.

Do note that chrome should prompt you to answer “This site is attempting to download multiple files”. So of course, say yes. The script is also designed to detect problems, which would happen if fictionpress changes their html formatting.

//Gather the story information
const Stories=[];
$('.mystories .stitle').each((Index, El) =>
	Stories[Index]={Link:$(El).attr('href'), Name:$(El).text()}
);
$('.mystories .xgray').each((Index, El) =>
	Stories[Index].NumChapters=/ - Chapters: (\d+) - /.exec($(El).text())[1]
);

//Get links to all stories
const LinkStart=document.location.protocol+'//'+document.location.host;
const AllLinks=[];
$.each(Stories, (_, Story) => {
	if(typeof(Story.NumChapters)!=='string' || !/^\d+$/.test(Story.NumChapters))
		return console.log('Bad number of chapters for: '+Story.Name);
	const StoryParts=/^\/s\/(\d+)\/1\/(.*)$/.exec(Story.Link);
	if(!StoryParts)
		return console.log('Bad link format for stories: '+Story.Name);
	for(let i=1; i<=Story.NumChapters; i++)
		AllLinks.push([LinkStart+'/s/'+StoryParts[1]+'/'+i+'/'+StoryParts[2], StoryParts[2]+' - '+i+'.html']);
});

//Download all the links
$.each(AllLinks, (_, LinkInfo) =>
	$('a').attr('download', LinkInfo[1]).attr('href', LinkInfo[0])[0].click()
);

jQuery('.blurb.group .heading a[href^="/works"]').map((_, El) => jQuery(El).text()).toArray().join('\n');
Ping Connectivity Monitor

The following is a simple bash script to ping a different domain once a second and log the output. By default, it pings #.castledragmire.com, where # is an incrementing number starting from 0.

The script is written for Cygwin (See the PING_COMMAND variable at the top) but is very easily adaptable to Linux.

The log output is: EPOCH_TIMESTAMP DOMAIN PING_OUTPUT


#This uses Window's native ping since the Cygwin ping is sorely lacking in options
#"-n 1"=Only runs once, "-w 3000"=Timeout after 3 seconds
#The grep strings are also directly tailored for Window's native ping
PING_COMMAND=$(
	echo 'C:/Windows/System32/PING.EXE -n 1 -w 3000 $DOMAIN |';
	echo 'grep -iP "^(Request timed out|Reply from|Ping request could not find)"';
)

i=0 #The subdomain counter
STARTTIME=`date +%s.%N` #This holds the timestamp of the end of the previous loop

#Infinite loop
while true
do
	#Get the domain to run. This requires a domain that has a wildcard as a primary subdomain
	DOMAIN="$i.castledragmire.com"

	#Output the time, domain name, and ping output
	echo `date +%s` "$DOMAIN" $(eval $PING_COMMAND)

	#If less than a second has passed, sleep up to 1 second
	ENDTIME=`date +%s.%N`
	SLEEPTIME=$(echo "1 - ($ENDTIME - $STARTTIME)" | bc)
	STARTTIME=$ENDTIME
	if [ $(echo "$SLEEPTIME>0" | bc) -eq 1 ]; then
		sleep $SLEEPTIME
		STARTTIME=$(echo "$STARTTIME + $SLEEPTIME" | bc)
	fi

	#Increment the subdomain counter
	let i+=1
done
Better Regular Expression Lists
Regular expressions have been one of my favorite programming tools since I first discovered them. They are wonderfully robust and things can usually be done with them in many ways. For example, here are multiple ways to match an IPv4 address:
  • ^\d\d?\d?\.\d\d?\d?\.\d\d?\d?\.\d\d?\d?$
  • ^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$
  • ^(\d{1,3}\.){3}\d{1,3}$
  • ^([0-9]{1,3}\.){3}[0-9]{1,3}$

One of my major annoyances though has always been lists. I have always done them like ^(REGEX,)*REGEX$.

For example, I would do a list of IP addresses like this: ^(\d{1,3}\.){3}\d{1,3},)*\d{1,3}\.){3}\d{1,3}$.

I recently realized however that a list can much more elegantly be done as follows: ^(REGEX(,|$))+(?<!,)$. I would describe this as working by:

  • ^: Start of the statement (test string)
  • (REGEX(,|$))+: A list of items separated by either a comma or EOS (end of statement). If we keep this regular expression as not-multi-line (the default), then the EOS can only happen at the end of the statement.
  • (?<!,): This is a look-behind assertion saying that the last character before the EOS cannot be a comma. If we didn’t have this, the list could look like this, with a comma at the end: “ITEM,ITEM,ITEM,”.
  • $: The end of the statement

So the new version of the IP address list would look like this ^((\d{1,3}\.){3}\d{1,3}(,|$))+(?<!,)$ instead of this ^((\d{1,3}\.){3}\d{1,3},)*(\d{1,3}\.){3}\d{1,3}$.


Also, since an IP address is just a list of numbers separated by periods, it could also look like this: ^(\d{1,3}(\.|$)){4}(?<!\.)$.

Weird compiler problem

I wanted to write about a really weird problem I recently had while debugging in C++ (technically, it’s all C). Unfortunately, I was doing this in kernel debugging mode, which made life a bit harder, but it would have happened the same in userland.

I had an .hpp file (we’ll call it process_internal.hpp) that was originally an internal file just to be included from a .cpp file (we’ll call it process.cpp), so it contained global variables as symbols. I ended up needing to include this process_internal.hpp file elsewhere (for testing, we’ll call it test.cpp). Because of this, the same symbol was included in multiple files, so the separate .o builds were not properly interacting. I ended up using “#ifdef”s to only include the parts I needed in the test.cpp file, and doing “extern” defines of the global variables for it. It looked something like the following:

enum { FT_Inbound, FT_Outbound };
typedef struct FilteringLayer {
	int FilterTypeNum, OriginalID;
	const char *Name;
} FilteringLayer;
const int FT_NumTypes=2;

#ifdef _PROCESS_INTERNAL
	FilteringLayer FilterTypes[FT_NumTypes]={
		{FT_Inbound,  5, "Inbound"),
		{FT_Outbound, 8, "Outbound"),
	};
#else
	extern "C" FilteringLayer *FilterTypes;
#endif

So I was accessing this variable in test.cpp and getting a really weird problem. The code looked something like this:

struct foo { int a, b; };
foo Stuff[]={...};
void FunctionBar()
{
	for(int i=0;i<FT_NumTypes;i++)
		Stuff[FilterTypes[i].OriginalID].b=1;
}

This was causing an access exception, which blue screened my debug VM. I tried running the exact same statements in the visual studio debugger, and things were working just as they were supposed to! So I decided to go to the assembly level. It looked something like this: (I included descriptions)

L#CodeDescriptionCombined description
for(int i=0;i<FT_NumTypes;i++)
1 mov qword ptr [rsp+58h],0 int i=0
2 jmp MODULENAME!FunctionBar+0xef JUMP TO #LINE@6
3 mov rax,qword ptr [rsp+58h] RAX=i
4 inc rax RAX++ i++
5 mov qword ptr [rsp+58h],rax I=RAX
6 cmp qword ptr [rsp+58h],02h CMP=(i-FT_NumTypes)
7 jae MODULENAME!FunctionBar+0x11e IF(CMP>=0) GOTO #LINE@15 if(i>=FT_NumTypes) GOTO #LINE@15
Stuff[FilterTypes[i].OriginalID].b=i;
8 imul rax,qword ptr [rsp+58h],10h RAX=i*sizeof(FilterTypes)
9 mov rcx,[MODULENAME!FilterTypes ]RCX=(void**)&FilterTypes
10movzx eax,word ptr [rcx+rax+4] RAX=((UINT16*)(RCX+RAX+4) RAX=((FilteringLayer*)&FilterType)[i].OriginalID
11imul rax,rax,30h RAX*=sizeof(foo)
12lea rcx,[MODULENAME!Stuff ] RCX=(void*)&Stuff
13mov dword ptr [rcx+rax+04h],1 *(UINT32*)(RCX+RAX+0x4)=1 Stuff[RAX].b=1
14jmp MODULENAME!FunctionBar+0xe2 GOTO #LINE@3
15...

I noticed that line #9 was putting 0x0000000C`00000000 into RCX instead of &FilterTypes. I knew the instruction should have been an “lea” instead of a “mov” to fix this. My first thought was compiler bug, but as many programming mantras say, that is very very rarely the case. If you want to guess now what the problem is, now is the time. I’ve given you all the information (and more) to make the guess.



The answer: extern "C" FilteringLayer *FilterTypes; should have been extern "C" FilteringLayer FilterTypes[];. Oops! The debugger was getting it right because it had the extra information of the real definition of the FilterTypes variable.

MySQL: Update multiple rows with different values
There are 3 different methods for updating multiple rows at once in MySQL with different values:
  1. INSERT: INSERT with ON DUPLICATE KEY UPDATE
    			INSERT INTO FooBar (ID, foo)
    			VALUES (1, 5), (2, 8), (3, 2)
    			ON DUPLICATE KEY UPDATE foo=VALUES(foo);
    		
  2. TRANSACTION: Where you do an update for each record within a transaction (InnoDB or other DBs with transactions)
    			START TRANSACTION;
    			UPDATE FooBar SET foo=5 WHERE ID=1;
    			UPDATE FooBar SET foo=8 WHERE ID=2;
    			UPDATE FooBar SET foo=2 WHERE ID=3;
    			COMMIT;
    		
  3. CASE: In which you a case/when for each different record within an UPDATE
    			UPDATE FooBar SET foo=CASE ID
    				WHEN 1 THEN 5
    				WHEN 2 THEN 8
    				WHEN 3 THEN 2
    			END
    			WHERE ID IN (1,2,3);
    		

I feel knowing the speeds of the 3 different methods is important.

All of the following numbers apply to InnoDB.


I just tested this, and the INSERT method was 6.7x faster for me than the TRANSACTION method. I tried on a set of both 3,000 and 30,000 rows and got the same results.


The TRANSACTION method still has to run each individually query, which takes time, though it batches the results in memory, or something, while executing. The TRANSACTION method is also pretty expensive in both replication and query logs.


Even worse, the CASE method was 41.1x slower than the INSERT method w/ 30,000 records (6.1x slower than TRANSACTION). And 75x slower in MyISAM. INSERT and CASE methods broke even at ~1,000 records. Even at 100 records, the CASE method is BARELY faster.


So in general, I feel the INSERT method is both best and easiest to use. The queries are smaller and easier to read and only take up 1 query of action. This applies to both InnoDB and MyISAM.


Bonus stuff:

Using the INSERT method, there can be a problem in which NON-NULL fields with no default (in other words, required fields) are not being updated. You will get an error like “Field 'fieldname' doesn't have a default value”. The solution is to temporarily turn off STRICT_TRANS_TABLES and STRICT_ALL_TABLES in the SQL mode: SET SESSION sql_mode=REPLACE(REPLACE(@@SESSION.sql_mode,"STRICT_TRANS_TABLES",""),"STRICT_ALL_TABLES",""). Make sure to save the sql_mode first if you plan on reverting it.


As for other comments I’ve seen that say the auto_increment goes up using the INSERT method, I tested that too and it seems to not be the case.


Code to run the tests is as follows: (It also outputs .SQL files to remove PHP interpreter overhead)
<?
//Variables
$NumRows=30000;

//These 2 functions need to be filled in
function InitSQL()
{

}
function RunSQLQuery($Q)
{

}

//Run the 3 tests
InitSQL();
for($i=0;$i<3;$i++)
    RunTest($i, $NumRows);

function RunTest($TestNum, $NumRows)
{
    $TheQueries=Array();
    $DoQuery=function($Query) use (&$TheQueries)
    {
        RunSQLQuery($Query);
        $TheQueries[]=$Query;
    };

    $TableName='Test';
    $DoQuery('DROP TABLE IF EXISTS '.$TableName);
    $DoQuery('CREATE TABLE '.$TableName.' (i1 int NOT NULL AUTO_INCREMENT, i2 int NOT NULL, primary key (i1)) ENGINE=InnoDB');
    $DoQuery('INSERT INTO '.$TableName.' (i2) VALUES ('.implode('), (', range(2, $NumRows+1)).')');

    if($TestNum==0)
    {
        $TestName='Transaction';
        $Start=microtime(true);
        $DoQuery('START TRANSACTION');
        for($i=1;$i<=$NumRows;$i++)
            $DoQuery('UPDATE '.$TableName.' SET i2='.(($i+5)*1000).' WHERE i1='.$i);
        $DoQuery('COMMIT');
    }

    if($TestNum==1)
    {
        $TestName='Insert';
        $Query=Array();
        for($i=1;$i<=$NumRows;$i++)
            $Query[]=sprintf("(%d,%d)", $i, (($i+5)*1000));
        $Start=microtime(true);
        $DoQuery('INSERT INTO '.$TableName.' VALUES '.implode(', ', $Query).' ON DUPLICATE KEY UPDATE i2=VALUES(i2)');
    }

    if($TestNum==2)
    {
        $TestName='Case';
        $Query=Array();
        for($i=1;$i<=$NumRows;$i++)
            $Query[]=sprintf('WHEN %d THEN %d', $i, (($i+5)*1000));
        $Start=microtime(true);
        $DoQuery("UPDATE $TableName SET i2=CASE i1\n".implode("\n", $Query)."\nEND\nWHERE i1 IN (".implode(',', range(1, $NumRows)).')');
    }

    print "$TestName: ".(microtime(true)-$Start)."<br>\n";

    file_put_contents("./$TestName.sql", implode(";\n", $TheQueries).';');
}
Monitoring PHP calls

I recently had a Linux client that was, for whatever odd reason, making infinite recursive HTTP calls to a single script, which was making the server process count skyrocket. I decided to use the same module as I did in my Painless migration from PHP MySQL to MySQLi post, which is to say, overriding base functions for fun and profit using the PHP runkit extension. I did this so I could gather, for debugging, logs of when and where the calls that were causing this to occur.


The below code overrides all functions listed on the line that says “List of functions to intercept” [Line 9]. It works by first renaming these built in functions to “OVERRIDE_$FuncName[Line 12], and replacing them with a call to “GlobalRunFunc()” [Line 13], which receives the original function name and argument list. The GlobalRunFunc():

  1. Checks to see if it is interested in logging the call
    • In the case of this example, it will log the call if [Line 20]:
      • Line 21: curl_setopt is called with the CURLOPT_URL parameter (enum=10002)
      • Line 22: curl_init is called with a first parameter, which would be a URL
      • Line 23: file_get_contents or fopen is called and is not an absolute path
        (Wordpress calls everything absolutely. Normally I would have only checked for http[s] calls).
    • If it does want to log the call, it stores it in a global array (which holds all the calls we will want to log).
      The logged data includes [Line 25]:
      • The function name
      • The function parameters
      • 2 functions back of backtrace (which can often get quite large when stored in the log file)
  2. It then calls the original function, with parameters intact, and passes through the return [Line 27].

The “GlobalShutdown()” [Line 30] is then called when the script is closing [Line 38] and saves all the logs, if any exist, to “$GlobalLogDir/$DATETIME.srl”.

I have it using “serialize()” to encode the log data [Line 25], as opposed to “json_encode()” or “print_r()” calls, as the latter were getting too large for the logs. You may want to have it use one of these other encoding functions for easier log perusal, if running out of space is not a concern.

<?
//The log data to save is stored here
global $GlobalLogArr, $GlobalLogDir;
$GlobalLogArr=Array();
$GlobalLogDir='./LOG_DIRECTORY_NAME';

//Override the functions here to instead have them call to GlobalRunFunc, which will in turn call the original functions
foreach(Array(
        'fopen', 'file_get_contents', 'curl_init', 'curl_setopt', //List of functions to intercept
) as $FuncName)
{
        runkit_function_rename($FuncName, "OVERRIDE_$FuncName");
        runkit_function_add($FuncName, '', "return GlobalRunFunc('$FuncName', func_get_args());");
}

//This optionally 
function GlobalRunFunc($FuncName, $Args)
{
        global $GlobalLogArr;
        if(
                ($FuncName=='curl_setopt' && $Args[1]==10002) || //CURLOPT enumeration can be found at https://curl.haxx.se/mail/archive-2004-07/0100.html
                ($FuncName=='curl_init' && isset($Args[0])) ||
                (($FuncName=='file_get_contents' || $FuncName=='fopen') && $Args[0][0]!='/')
        )
                $GlobalLogArr[]=serialize(Array('FuncName'=>$FuncName, 'Args'=>$Args, 'Trace'=>array_slice(debug_backtrace(), 1, 2)));

        return call_user_func_array("OVERRIDE_$FuncName", $Args);
}

function GlobalShutdown()
{
        global $GlobalLogArr, $GlobalLogDir;
        $Time=microtime(true);
        if(count($GlobalLogArr))
                file_put_contents($GlobalLogDir.date('Y-m-d_H:i:s.'.substr($Time-floor($Time), 2, 3), floor($Time)).'.srl', implode("\n", $GlobalLogArr));

}
register_shutdown_function('GlobalShutdown');
?>
PHP String Concatenation - Stringbuilder results

I wrote the code at the end of this post to test the different forms of string concatenation and they really are all almost exactly equal in both memory and time footprints.


The two primary methods I used are concatenating strings onto each other, and filling an array with strings and then imploding them. I did 500 string additions with a 1MB string in PHP 5.6 (so the result is a 500MB string). At every iteration of the test, all memory and time footprints were very very close (at ~$IterationNumber*1MB). The runtime of both tests was 50.398 seconds and 50.843 seconds consecutively which are most likely within acceptable margins of error.

Garbage collection of strings that are no longer referenced seems to be pretty immediate, even without ever leaving the scope. Since the strings are mutable, no extra memory is really required after the fact.


HOWEVER, The following tests showed that there is a different in peak memory usage WHILE the strings are being concatenated.


$OneMB=str_repeat('x', 1024*1024);
$Final=$OneMB.$OneMB.$OneMB.$OneMB.$OneMB;
print memory_get_peak_usage();
Result=10,806,800 bytes (~10MB w/o the initial PHP memory footprint)

$OneMB=str_repeat('x', 1024*1024);
$Final=implode('', Array($OneMB, $OneMB, $OneMB, $OneMB, $OneMB));
print memory_get_peak_usage();
Result=6,613,320 bytes (~6MB w/o the initial PHP memory footprint)

So there is in fact a difference that could be significant in very very large string concatenations memory-wise (I have run into such examples when creating very large data sets or SQL queries).

But even this fact is disputable depending upon the data. For example, concatenating 1 character onto a string to get 50 million bytes (so 50 million iterations) took a maximum amount of 50,322,512 bytes (~48MB) in 5.97 seconds. While doing the array method ended up using 7,337,107,176 bytes (~6.8GB) to create the array in 12.1 seconds, and then took an extra 4.32 seconds to combine the strings from the array.


Anywho... the below is the benchmark code I mentioned at the beginning which shows the methods are pretty much equal. It outputs a pretty HTML table.

<?
//Please note, for the recursion test to go beyond 256, xdebug.max_nesting_level needs to be raised.
//You also may need to update your memory_limit depending on the number of iterations

//Output the start memory
print 'Start: '.memory_get_usage()."B<br><br>Below test results are in MB<br>";

//Our 1MB string
global $OneMB, $NumIterations;
$OneMB=str_repeat('x', 1024*1024);
$NumIterations=500;

//Run the tests
$ConcatTest=RunTest('ConcatTest');
$ImplodeTest=RunTest('ImplodeTest');
$RecurseTest=RunTest('RecurseTest');

//Output the results in a table
OutputResults(
  Array('ConcatTest', 'ImplodeTest', 'RecurseTest'),
  Array($ConcatTest, $ImplodeTest, $RecurseTest)
);

//Start a test run by initializing the array that will hold the results and manipulating those results after the test is complete
function RunTest($TestName)
{
  $CurrentTestNums=Array();
  $TestStartMem=memory_get_usage();
  $StartTime=microtime(true);
  RunTestReal($TestName, $CurrentTestNums, $StrLen);
  $CurrentTestNums[]=memory_get_usage();

  //Subtract $TestStartMem from all other numbers
  foreach($CurrentTestNums as &$Num)
    $Num-=$TestStartMem;
  unset($Num);

  $CurrentTestNums[]=$StrLen;
  $CurrentTestNums[]=microtime(true)-$StartTime;

  return $CurrentTestNums;
}

//Initialize the test and store the memory allocated at the end of the test, with the result
function RunTestReal($TestName, &$CurrentTestNums, &$StrLen)
{
  $R=$TestName($CurrentTestNums);
  $CurrentTestNums[]=memory_get_usage();
  $StrLen=strlen($R);
}

//Concatenate 1MB string over and over onto a single string
function ConcatTest(&$CurrentTestNums)
{
  global $OneMB, $NumIterations;
  $Result='';
  for($i=0;$i<$NumIterations;$i++)
  {
    $Result.=$OneMB;
    $CurrentTestNums[]=memory_get_usage();
  }
  return $Result;
}

//Create an array of 1MB strings and then join w/ an implode
function ImplodeTest(&$CurrentTestNums)
{
  global $OneMB, $NumIterations;
  $Result=Array();
  for($i=0;$i<$NumIterations;$i++)
  {
    $Result[]=$OneMB;
    $CurrentTestNums[]=memory_get_usage();
  }
  return implode('', $Result);
}

//Recursively add strings onto each other
function RecurseTest(&$CurrentTestNums, $TestNum=0)
{
  Global $OneMB, $NumIterations;
  if($TestNum==$NumIterations)
    return '';

  $NewStr=RecurseTest($CurrentTestNums, $TestNum+1).$OneMB;
  $CurrentTestNums[]=memory_get_usage();
  return $NewStr;
}

//Output the results in a table
function OutputResults($TestNames, $TestResults)
{
  global $NumIterations;
  print '<table border=1 cellspacing=0 cellpadding=2><tr><th>Test Name</th><th>'.implode('</th><th>', $TestNames).'</th></tr>';
  $FinalNames=Array('Final Result', 'Clean');
  for($i=0;$i<$NumIterations+2;$i++)
  {
    $TestName=($i<$NumIterations ? $i : $FinalNames[$i-$NumIterations]);
    print "<tr><th>$TestName</th>";
    foreach($TestResults as $TR)
      printf('<td>%07.4f</td>', $TR[$i]/1024/1024);
    print '</tr>';
  }

  //Other result numbers
  print '<tr><th>Final String Size</th>';
  foreach($TestResults as $TR)
    printf('<td>%d</td>', $TR[$NumIterations+2]);
  print '</tr><tr><th>Runtime</th>';
    foreach($TestResults as $TR)
      printf('<td>%s</td>', $TR[$NumIterations+3]);
  print '</tr></table>';
}
?>
Deep object compare for javascript
function DeepObjectCompare(O1, O2)
{
	try {
		DOC_Val(O1, O2, ['O1->O2', O1, O2]);
		return DOC_Val(O2, O1, ['O2->O1', O1, O2]);
	} catch(e) {
		console.log(e.Chain);
		throw(e);
	}
}
function DOC_Error(Reason, Chain, Val1, Val2)
{
	this.Reason=Reason;
	this.Chain=Chain;
	this.Val1=Val1;
	this.Val2=Val2;
}

function DOC_Val(Val1, Val2, Chain)
{
	function DoThrow(Reason, NewChain) { throw(new DOC_Error(Reason, NewChain!==undefined ? NewChain : Chain, Val1, Val2)); }

	if(typeof(Val1)!==typeof(Val2))
		return DoThrow('Type Mismatch');
	if(Val1===null || Val1===undefined)
		return Val1!==Val2 ? DoThrow('Null/undefined mismatch') : true;
	if(Val1.constructor!==Val2.constructor)
		return DoThrow('Constructor mismatch');
	switch(typeof(Val1))
	{
		case 'object':
			for(var m in Val1)
			{
				if(!Val1.hasOwnProperty(m))
					continue;
				var CurChain=Chain.concat([m]);
				if(!Val2.hasOwnProperty(m))
					return DoThrow('Val2 missing property', CurChain);
				DOC_Val(Val1[m], Val2[m], CurChain);
			}
			return true;
		case 'number':
			if(Number.isNaN(Val1))
				return !Number.isNaN(Val2) ? DoThrow('NaN mismatch') : true;
		case 'string':
		case 'boolean':
			return Val1!==Val2 ? DoThrow('Value mismatch') : true;
		case 'function':
			if(Val1.prototype!==Val2.prototype)
				return DoThrow('Prototype mismatch');
			if(Val1!==Val2)
				return DoThrow('Function mismatch');
			return true;
		default:
			return DoThrow('Val1 is unknown type');
	}
}
Painless migration from PHP MySQL to MySQLi

The PHP MySQL extension is being deprecated in favor of the MySQLi extension in PHP 5.5, and removed as of PHP 7.0. MySQLi was first referenced in PHP v5.0.0 beta 4 on 2004-02-12, with the first stable release in PHP 5.0.0 on 2004-07-13[1]. Before that, the PHP MySQL extension was by far the most popular way of interacting with MySQL on PHP, and still was for a very long time after. This website was opened only 2 years after the first stable release!


With the deprecation, problems from some websites I help host have popped up, many of these sites being very, very old. I needed a quick and dirty solution to monkey-patch these websites to use MySQLi without rewriting all their code. The obvious answer is to overwrite the functions with wrappers for MySQLi. The generally known way of doing this is with the Advanced PHP Debugger (APD). However, using this extension has a lot of requirements that are not appropriate for a production web server. Fortunately, another extension I recently learned of offers the renaming functionality; runkit. It was a super simple install for me.

  1. From the command line, run “pecl install runkit”
  2. Add “extension=runkit.so” and “runkit.internal_override=On” to the php.ini

Besides the ability to override these functions with wrappers, I also needed a way to make sure this file was always loaded before all other PHP files. The simple solution for that is adding “auto_prepend_file=/PATH/TO/FILE” to the “.user.ini” in the user’s root web directory.

The code for this script is as follows. It only contains a limited set of the MySQL functions, including some very esoteric ones that the web site used. This is not a foolproof script, but it gets the job done.


//Override the MySQL functions
foreach(Array(
    'connect', 'error', 'fetch_array', 'fetch_row', 'insert_id', 'num_fields', 'num_rows',
    'query', 'select_db', 'field_len', 'field_name', 'field_type', 'list_dbs', 'list_fields',
    'list_tables', 'tablename'
) as $FuncName)
    runkit_function_redefine("mysql_$FuncName", '',
        'return call_user_func_array("mysql_'.$FuncName.'_OVERRIDE", func_get_args());');

//If a connection is not explicitely passed to a mysql_ function, use the last created connection
global $SQLLink; //The remembered SQL Link
function GetConn($PassedConn)
{
    if(isset($PassedConn))
        return $PassedConn;
    global $SQLLink;
    return $SQLLink;
}

//Override functions
function mysql_connect_OVERRIDE($Host, $Username, $Password) {
    global $SQLLink;
    return $SQLLink=mysqli_connect($Host, $Username, $Password);
}
function mysql_error_OVERRIDE($SQLConn=NULL) {
    return mysqli_error(GetConn($SQLConn));
}
function mysql_fetch_array_OVERRIDE($Result, $ResultType=MYSQL_BOTH) {
    return mysqli_fetch_array($Result, $ResultType);
}
function mysql_fetch_row_OVERRIDE($Result) {
    return mysqli_fetch_row($Result);
}
function mysql_insert_id_OVERRIDE($SQLConn=NULL) {
    return mysqli_insert_id(GetConn($SQLConn));
}
function mysql_num_fields_OVERRIDE($Result) {
    return mysqli_num_fields($Result);
}
function mysql_num_rows_OVERRIDE($Result) {
    return mysqli_num_rows($Result);
}
function mysql_query_OVERRIDE($Query, $SQLConn=NULL) {
    return mysqli_query(GetConn($SQLConn), $Query);
}
function mysql_select_db_OVERRIDE($DBName, $SQLConn=NULL) {
    return mysqli_select_db(GetConn($SQLConn), $DBName);
}
function mysql_field_len_OVERRIDE($Result, $Offset) {
    $Fields=$Result->fetch_fields();
    return $Fields[$Offset]->length;
}
function mysql_field_name_OVERRIDE($Result, $Offset) {
    $Fields=$Result->fetch_fields();
    return $Fields[$Offset]->name;
}
function mysql_field_type_OVERRIDE($Result, $Offset) {
    $Fields=$Result->fetch_fields();
    return $Fields[$Offset]->type;
}
function mysql_list_dbs_OVERRIDE($SQLConn=NULL) {
    $Result=mysql_query('SHOW DATABASES', GetConn($SQLConn));
    $Tables=Array();
    while($Row=mysqli_fetch_assoc($Result))
        $Tables[]=$Row['Database'];
    return $Tables;
}
function mysql_list_fields_OVERRIDE($DBName, $TableName, $SQLConn=NULL) {
    $SQLConn=GetConn($SQLConn);
    $CurDB=mysql_fetch_array(mysql_query('SELECT Database()', $SQLConn));
    $CurDB=$CurDB[0];
    mysql_select_db($DBName, $SQLConn);
    $Result=mysql_query("SHOW COLUMNS FROM $TableName", $SQLConn);
    mysql_select_db($CurDB, $SQLConn);
    if(!$Result) {
        print 'Could not run query: '.mysql_error($SQLConn);
        return Array();
    }
    $Fields=Array();
    while($Row=mysqli_fetch_assoc($Result))
        $Fields[]=$Row['Field'];
    return $Fields;
}
function mysql_list_tables_OVERRIDE($DBName, $SQLConn=NULL) {
    $SQLConn=GetConn($SQLConn);
    $CurDB=mysql_fetch_array(mysql_query('SELECT Database()', $SQLConn));
    $CurDB=$CurDB[0];
    mysql_select_db($DBName, $SQLConn);
    $Result=mysql_query("SHOW TABLES", $SQLConn);
    mysql_select_db($CurDB, $SQLConn);
    if(!$Result) {
        print 'Could not run query: '.mysql_error($SQLConn);
        return Array();
    }
    $Tables=Array();
    while($Row=mysql_fetch_row($Result))
        $Tables[]=$Row[0];
    return $Tables;
}
function mysql_tablename_OVERRIDE($Result) {
    $Fields=$Result->fetch_fields();
    return $Fields[0]->table;
}

And here is some test code to confirm functionality:
global $MyConn, $TEST_Table;
$TEST_Server='localhost';
$TEST_UserName='...';
$TEST_Password='...';
$TEST_DB='...';
$TEST_Table='...';
function GetResult() {
    global $MyConn, $TEST_Table;
    return mysql_query('SELECT * FROM '.$TEST_Table.' LIMIT 1', $MyConn);
}
var_dump($MyConn=mysql_connect($TEST_Server, $TEST_UserName, $TEST_Password));
//Set $MyConn to NULL here if you want to test global $SQLLink functionality
var_dump(mysql_select_db($TEST_DB, $MyConn));
var_dump(mysql_query('SELECT * FROM INVALIDTABLE LIMIT 1', $MyConn));
var_dump(mysql_error($MyConn));
var_dump($Result=GetResult());
var_dump(mysql_fetch_array($Result));
$Result=GetResult(); var_dump(mysql_fetch_row($Result));
$Result=GetResult(); var_dump(mysql_num_fields($Result));
var_dump(mysql_num_rows($Result));
var_dump(mysql_field_len($Result, 0));
var_dump(mysql_field_name($Result, 0));
var_dump(mysql_field_type($Result, 0));
var_dump(mysql_tablename($Result));
var_dump(mysql_list_dbs($MyConn));
var_dump(mysql_list_fields($TEST_DB, $TEST_Table, $MyConn));
var_dump(mysql_list_tables($TEST_DB, $MyConn));
mysql_query('CREATE TEMPORARY TABLE mysqltest (i int auto_increment, primary key (i))', $MyConn);
mysql_query('INSERT INTO mysqltest VALUES ()', $MyConn);
mysql_query('INSERT INTO mysqltest VALUES ()', $MyConn);
var_dump(mysql_insert_id($MyConn));
mysql_query('DROP TEMPORARY TABLE mysqltest', $MyConn);
Backing up just the user settings in cPanel

One of the companies I work for recently moved one of our cPanel servers to a new collocation, still running cPanel. We decided to use a new backup solution called r1soft, which so far has been working spectacularly. I’d love to use it for my personal computers, except the licenses, which are geared towards enterprise business, are way too costly.

However, since r1soft only backs up files (on the incrementally block level, yay) you can’t use it to restore a cPanel account. It can only restore things like the user’s home directory and SQL databases. Because of this, when we had need to restore an entire account today, and found out there is no easy/quick way to do it, we were up a creek. The obvious future solution for this would be to use cPanel’s backup (or legacy backup) systems, but unfortunately, you can’t easily set them to not backup the user’s databases and home directory, which can be very large, and are already taken care of by r1soft. I ended up adding the following script, ran nightly via cron, to back up user account settings.

It saves all the user settings under the backup path in their own directory, uncompressed, and named cpmove-USERNAME. It is best to do it this way so r1soft’s incremental backups don’t have much extra work if anything changes. Make sure to change line 3 in the following script to the path where you want backups to occur.

#!/bin/bash
#Create and move to backup directory
BACKUPDIR=/backup/userbackup
mkdir -p $BACKUPDIR #Make sure the directory exists
cd $BACKUPDIR

#Remove old backups
rm -rf cpmove-*

#Loop over accounts
for USER in `/usr/sbin/whmapi1 listaccts | grep -oP '(?<=user: )\w+$' | sort -u`; do
  #Backup the account
  /scripts/pkgacct --nocompress --skipbwdata --skiphomedir --skiplogs --skipmysql --skipmailman $USER ./

  #Extract from and remove the tar container file
  tar -xvf cpmove-$USER.tar
  rm -f cpmove-$USER.tar

  #Save MySQL user settings
  mysqldump --compact -fnt -w "User LIKE '$USER""_%'" mysql user db tables_priv columns_priv procs_priv proxies_priv \
  | perl -pe "s~('|NULL)\),\('~\1),\n('~ig" \
  > cpmove-$USER/mysql-users.sql
done;

This script skips a few backup items that need to be noted. Mailman, logs, homedir, and bandwidth data should all be easy 1:1 copy over restores from r1soft. I excluded them because those can take up a lot of room, which we want r1soft to handle. The same goes for MySQL, except that your MySQL users are not backed up to your account, which is why I added the final section.

Do note, for the final section, the line starting with “| perl” is optional. It is there to separate the insert rows into their own lines. A very minor warning though; it would also pick up cases where the last field in MySQL’s user table ends in “NULL),​(”. This would only happen if someone is trying to be malicious and knew about this script, and even then, it couldn’t harm anything.

Bonus note: To restore a MySQL database which does not use a shared-file (like InnoDB does by default), you could actually stop the MySQL server, copy over the binary database files, and start the server back up.

Blacklisting DNS Server on Amazon EC2

Amazon EC2 is a great resource for cheap virtual servers to do simple things, like DNS or (low bandwidth) VPNs. I had the need this morning to set up a DNS server for a company which needed to blacklist a list of domains. The simplest way to do this is by editing all the computers’ hostfiles, but that method leaves a lot to be desired. Namely, blocking entire domains (as opposed to single subdomains), and deploying changes. Centralizing in a single place makes the job instant, immediate, and in the end, faster.

The following are the steps I used to set this up on an EC2 server. All command line instructions are followed by a single command you can run to execute the step. There is a full script below, at the end of the post, containing all steps from when you first login to SSH ("Login to root") to the end.


I am not going to go into the details of setting up an EC2 instance, as that information can be found elsewhere. I will also be skipping over some of the more obvious steps. Just create a default EC2 instance with the “Amazon Linux AMI”, and I will list all the changes that need to be made beyond that.

  • Creating the instance
    • For the first year, for the instance type, you might as well use a t2.micro, as it is free. After that, a t2.nano (which is a new lower level) currently at $56.94/year ($0.0065/Hour), should be fine.
    • After you select your instance type, click “Review and Launch” to launch the instance with all of the defaults.
    • After the confirmation screen, it will ask you to create a key pair. You can see other tutorials about this and how it enables you to log into your instance.
  • Edit the security group
    • Next, you need to edit the security group for your instance to allow incoming connections.
    • Go to “Instances” under the “Instances” group on the left menu, and click your instance.
    • In the bottom of the window, in the “Descriptions” tab, click the link next to “Security Groups”, which will bring you to the proper group in the security groups tab.
    • Right click it and “Edit inbound Rules”.
    • Make sure it has the following rules with Source=Anywhere: ALL ICMP [For pinging], SSH, HTTP, DNS (UDP), DNS (TCP)
  • Assign a permanent IP to your instance
    • To do this, click the “Elastic IPs” under “Network & Security” in the left menu.
    • Click “Allocate New Address”.
    • After creating it, right click the new address, then “Associate Address”, and assign it to your new instance.
  • You should probably set this IP up as an A record somewhere. I will refer to this IP as dns.yourdomain.com from now on.
  • Login to root
    • SSH into your instance as the ec2-user via “ssh ec2-user@dns.yourdomain.com”. If in windows, you could also use putty.
    • Sudo into root via “sudo su”.
  • Allow root login
    • At this point, I recommend setting it up so you can directly root into the server. Warning: some people consider this a security risk.
    • Copy your key pair(s) to the root user via “cat /home/ec2-user/.ssh/authorized_keys > /root/.ssh/authorized_keys
    • Set SSHD to permit root logins by changing the PermitRootLogin variable to “yes” in /etc/ssh/sshd_config. A quick command to do this is “perl -pi -e 's/^\s*#?\s*PermitRootLogin.*$/PermitRootLogin yes/igm' /etc/ssh/sshd_config”, and then reload the SSHD config with “service sshd reload”. Make sure to attempt to directly log into SSH as root before exiting your current session to make sure you haven’t locked yourself out.
  • Install apache (the web server), bind/named (the DNS server), and PHP (a scripting language)
    • yum -y install bind httpd php
  • Start and set services to run at boot
    • service httpd start; service named start; chkconfig httpd on; chkconfig named on;
  • Set the DNS server to be usable by other computers
    • Edit /etc/named.conf and change the 2 following lines to have the value “any”: “listen-on port 53” and “allow-query”
    • perl -pi -e 's/^(\s*(?:listen-on port 53|allow-query)\s*{).*$/$1 any; };/igm' /etc/named.conf; service named reload;
  • Point the DNS server to the blacklist files
    • This is done by adding “include "/var/named/blacklisted.conf";” to /etc/named.conf
    • echo -ne '\ninclude "/var/named/blacklisted.conf";' >> /etc/named.conf
  • Create the blacklist domain list file
    • touch /var/named/blacklisted.conf
  • Create the blacklist zone file
    • Put the following into /var/named/blacklisted.db . Make sure to change dns.yourdomain.com to your domain (or otherwise, “localhost”), and 1.1.1.1 to dns.yourdomain.com’s (your server’s) IP address. Make sure to keep all periods intact.
      $TTL 14400
      @       IN SOA dns.yourdomain.com. dns.yourdomain.com ( 2003052800  86400  300  604800  3600 )
      @       IN      NS   dns.yourdomain.com.
      @       IN      A    1.1.1.1
      *       IN      A    1.1.1.1
    • The first 2 lines tell the server the domains belong to it. The 3rd line sets the base blacklisted domain to your server’s IP. The 4th line sets all subdomains of the blacklisted domain to your server’s IP.
    • This can be done via (Update the first line with your values)
      YOURDOMAIN="dns.yourdomain.com"; YOURIP="1.1.1.1";
      echo -ne "\$TTL 14400\n@       IN SOA $YOURDOMAIN. $YOURDOMAIN ( 2003052800  86400  300  604800  3600 )\n@       IN      NS   $YOURDOMAIN.\n@       IN      A    $YOURIP\n*       IN      A    $YOURIP" > /var/named/blacklisted.db;
  • Fix the permissions on the blacklist files
    • chgrp named /var/named/blacklisted.*; chmod 660 /var/named/blacklisted.*;
  • Set the server’s domain resolution name servers
    • The server always needs to look at itself before other DNS servers. To do this, comment out everything in /etc/resolv.conf and add to it “nameserver localhost”. This is not the best solution. I’ll find something better later.
    • perl -pi -e 's/^(?!;)/;/gm' /etc/resolv.conf; echo -ne '\nnameserver localhost' >> /etc/resolv.conf
  • Run a test
    • At this point, it’s a good idea to make sure the DNS server is working as intended. So first, we’ll add an example domain to the DNS server.
    • Add the following to /var/named/blacklisted.conf and restart named to get the server going with example.com: “zone "example.com" { type master; file "blacklisted.db"; };
    • echo 'zone "example.com" { type master; file "blacklisted.db"; };' >> /var/named/blacklisted.conf; service named reload;
    • Ping “test.example.com” and make sure it’s IP is your server’s IP
    • Set your computer’s DNS to your server’s IP in your computer’s network settings, ping “test.example.com” from your computer, and make sure the returned IP is your server’s IP. If it works, you can restore your computer’s DNS settings.
  • Have the server return a message when a blacklisted domain is accessed
    • Add your message to /var/www/html
    • echo 'Domain is blocked' > /var/www/html/index.html
    • Set all URL paths to show the message by adding the following to the /var/www/html/.htaccess file
      RewriteEngine on
      RewriteCond %{REQUEST_URI} !index.html
      RewriteCond %{REQUEST_URI} !AddRules/
      RewriteRule ^(.*)$ /index.html [L]
    • echo -ne 'RewriteEngine on\nRewriteCond %{REQUEST_URI} !index.html\nRewriteCond %{REQUEST_URI} !AddRules/\nRewriteRule ^(.*)$ /index.html [L]' > /var/www/html/.htaccess
    • Turn on AllowOverride in the /etc/httpd/conf/httpd.conf for the document directory (/var/www/html/) via “ perl -0777 -pi -e 's~(<Directory "/var/www/html">.*?\n\s*AllowOverride).*?\n~$1 All~s' /etc/httpd/conf/httpd.conf
    • Start the server via “service httpd graceful
  • Create a script that allows apache to refresh the name server’s settings
    • Create a script at /var/www/html/AddRules/restart_named with “/sbin/service named reload” and set it to executable
    • mkdir /var/www/html/AddRules; echo '/sbin/service named reload' > /var/www/html/AddRules/restart_named; chmod 755 /var/www/html/AddRules/restart_named
    • Allow the user to run the script as root by adding to /etc/sudoers “apache ALL=(root) NOPASSWD: /var/www/html/AddRules/restart_named” and “Defaults!/var/www/html/AddRules/restart_named !requiretty
    • echo -e 'apache ALL=(root) NOPASSWD:/var/www/html/AddRules/restart_named\nDefaults!/var/www/html/AddRules/restart_named !requiretty' >> /etc/sudoers
  • Create a script that allows the user to add, remove, and list the blacklisted domains
    • Add the following to /var/www/html/AddRules/index.php (one line command not given. You can use “nano” to create it)
      <?php
      //Get old domains
      $BlockedFile='/var/named/blacklisted.conf';
      $CurrentZones=Array();
      foreach(explode("\n", file_get_contents($BlockedFile)) as $Line)
              if(preg_match('/^zone "([\w\._-]+)"/', $Line, $Results))
                      $CurrentZones[]=$Results[1];
      
      //List domains
      if(isset($_REQUEST['List']))
              return print implode('<br>', $CurrentZones);
      
      //Get new domains
      if(!isset($_REQUEST['Domains']))
              return print 'Missing Domains';
      $Domains=$_REQUEST['Domains'];
      if(!preg_match('/^[\w\._-]+(,[\w\._-]+)*$/uD', $Domains))
              return print 'Invalid domains string';
      $Domains=explode(',', $Domains);
      
      //Remove domains
      if(isset($_REQUEST['Remove']))
      {
              $CurrentZones=array_flip($CurrentZones);
              foreach($Domains as $Domain)
                      unset($CurrentZones[$Domain]);
              $FinalDomainList=array_keys($CurrentZones);
      }
      else //Combine domains
              $FinalDomainList=array_unique(array_merge($Domains, $CurrentZones));
      
      //Output to the file
      $FinalDomainData=Array();
      foreach($FinalDomainList as $Domain)
              $FinalDomainData[]=
                      "zone \"$Domain\" { type master; file \"blacklisted.db\"; };";
      file_put_contents($BlockedFile, implode("\n", $FinalDomainData));
      
      //Reload named
      print `sudo /var/www/html/AddRules/restart_named`;
      ?>
    • Add the “apache” user to the “named” group so the script can update the list of domains in /var/named/blacklisted.conf via “usermod -a -G named apache; service httpd graceful;
  • Run the domain update script
    • To add a domain (separate by commas): http://dns.yourdomain.com/AddRules/?Domains=domain1.com,domain2.com
    • To remove a domain (add “Remove&” after the “?”): http://dns.yourdomain.com/AddRules/?Remove&Domains=domain1.com,domain2.com
    • To list the domains: http://dns.yourdomain.com/AddRules/?List
  • Password protect the domain update script
    • Add to AddRules/.htaccess the following
      AuthType Basic
      AuthName "Admins Only"
      AuthUserFile "/var/www/html/AddRules/.htpasswd"
      require valid-user
    • echo -ne 'AuthType Basic\nAuthName "Admins Only"\nAuthUserFile "/var/www/html/AddRules/.htpasswd"\nrequire valid-user' > /var/www/html/AddRules/.htaccess
    • Warning: Putting the password file in an http accessible directory is a security risk. I just did this for sake of organization.
    • Create the user+password via “htpasswd -bc /var/www/html/AddRules/.htpasswd USERNAME” and then entering the password


[Edit on 2016-01-30 @ noon]

To permanently set “localhost” as the resolver DNS, add “DNS1=localhost” to “/etc/sysconfig/network-scripts/ifcfg-eth0”. I have not yet confirmed this edit.

Security Issue

Soon after setting up this DNS server, it started getting hit by a DNS amplification attack. As the server is being used as a client’s DNS server, turning off recursion is not available. The best solution is to limit the people who can query the name server via an access list (usually a specific subnet), but that would very often not be an option either. The solution I currently have in place, which I have not actually verified if it works, is to add a forced-forward rule which only makes external requests to the name server provided by Amazon. To do this, get the name server’s IP from /etc/resolv.conf (it should be commented from an earlier step). Then add the following to your named.conf in the “options” section.

	forwarders {
		DNS_SERVER_IP;
	};
	forward only;

After I added this rule, external DNS requests stopped going through completely. To fix this, I turned “dnssec-validation” to “no” in the named.conf. Don’t forget to restart the service once you have made your changes.

[End of edit]

Full serverside script
Make sure to run this as root (login as root or sudo it)

Download the script here. Make sure to chmod and sudo it when running. “chmod +x dnsblacklist_install.sh; sudo ./dnsblacklist_install.sh;

#User defined variables
VARIABLES_SET=0; #Set this to 1 to allow the script to run
YOUR_DOMAIN="localhost";
YOUR_IP="1.1.1.1";
BLOCKED_ERROR_MESSAGE="Domain is blocked";
ADDRULES_USERNAME="YourUserName";
ADDRULES_PASSWORD="YourPassword";

#Confirm script is ready to run
if [ $VARIABLES_SET != 1 ]; then
    echo 'Variables need to be set in the script';
    exit 1;
fi
if [ `whoami` != 'root' ]; then
    echo 'Must be root to run script. When running the script, add "sudo" before it to' \
        'run as root';
    exit 1;
fi

#Allow root login
cat /home/ec2-user/.ssh/authorized_keys > /root/.ssh/authorized_keys;
perl -pi -e 's/^\s*#?\s*PermitRootLogin.*$/PermitRootLogin yes/igm' /etc/ssh/sshd_config;
service sshd reload;

#Install services
yum -y install bind httpd php;
chkconfig httpd on;
chkconfig named on;
service httpd start;
service named start;

#Set the DNS server to be usable by other computers
perl -pi -e 's/^(\s*(?:listen-on port 53|allow-query)\s*{).*$/$1 any; };/igm' \
    /etc/named.conf;
service named reload;

#Create/link the blacklist files
echo -ne '\ninclude "/var/named/blacklisted.conf";' >> /etc/named.conf;
touch /var/named/blacklisted.conf;

#Create the blacklist zone file
echo -ne "\$TTL 14400
@       IN SOA $YOUR_DOMAIN. $YOUR_DOMAIN ( 2003052800  86400  300  604800  3600 )
@       IN      NS   $YOUR_DOMAIN.
@       IN      A    $YOUR_IP
*       IN      A    $YOUR_IP" > /var/named/blacklisted.db;

#Fix the permissions on the blacklist files
chgrp named /var/named/blacklisted.*;
chmod 660 /var/named/blacklisted.*;

#Set the server’s domain resolution name servers
perl -pi -e 's/^(?!;)/;/gm' /etc/resolv.conf;
echo -ne '\nnameserver localhost' >> /etc/resolv.conf;

#Run a test
echo 'zone "example.com" { type master; file "blacklisted.db"; };' >> \
    /var/named/blacklisted.conf;
service named reload;
FOUND_IP=`dig -t A example.com | grep -ioP "^example\.com\..*?"'in\s+a\s+[\d\.:]+' | \
     grep -oP '[\d\.:]+$'`;
if [ "$YOUR_IP" == "$FOUND_IP" ]
then
  echo 'Success: Example domain matches your given IP' > /dev/stderr;
else
  echo 'Warning: Example domain does not match your given IP' > /dev/stderr;
fi

#Have the server return a message when a blacklisted domain is accessed
echo "$BLOCKED_ERROR_MESSAGE" > /var/www/html/index.html;
perl -0777 -pi -e 's~(<Directory "/var/www/html">.*?\n\s*AllowOverride).*?\n~$1 All~s' \
     /etc/httpd/conf/httpd.conf;
echo -n 'RewriteEngine on
RewriteCond %{REQUEST_URI} !index.html
RewriteCond %{REQUEST_URI} !AddRules/
RewriteRule ^(.*)$ /index.html [L]' > /var/www/html/.htaccess;
service httpd graceful;

#Create a script that allows apache to refresh the name server’s settings
mkdir /var/www/html/AddRules;
echo '/sbin/service named reload' > /var/www/html/AddRules/restart_named;
chmod 755 /var/www/html/AddRules/restart_named;

echo 'apache ALL=(root) NOPASSWD:/var/www/html/AddRules/restart_named
Defaults!/var/www/html/AddRules/restart_named !requiretty' >> /etc/sudoers;

#Create a script that allows the user to add, remove, and list the blacklisted domains
echo -n $'<?php
//Get old domains
$BlockedFile=\'/var/named/blacklisted.conf\';
$CurrentZones=Array();
foreach(explode("\\n", file_get_contents($BlockedFile)) as $Line)
        if(preg_match(\'/^zone "([\\w\\._-]+)"/\', $Line, $Results))
                $CurrentZones[]=$Results[1];

//List domains
if(isset($_REQUEST[\'List\']))
        return print implode(\'<br>\', $CurrentZones);

//Get new domains
if(!isset($_REQUEST[\'Domains\']))
        return print \'Missing Domains\';
$Domains=$_REQUEST[\'Domains\'];
if(!preg_match(\'/^[\\w\\._-]+(,[\\w\\._-]+)*$/uD\', $Domains))
        return print \'Invalid domains string\';
$Domains=explode(\',\', $Domains);

//Remove domains
if(isset($_REQUEST[\'Remove\']))
{
        $CurrentZones=array_flip($CurrentZones);
        foreach($Domains as $Domain)
                unset($CurrentZones[$Domain]);
        $FinalDomainList=array_keys($CurrentZones);
}
else //Combine domains
        $FinalDomainList=array_unique(array_merge($Domains, $CurrentZones));

//Output to the file
$FinalDomainData=Array();
foreach($FinalDomainList as $Domain)
    $FinalDomainData[]="zone \\"$Domain\\" { type master; file \\"blacklisted.db\\"; };";
file_put_contents($BlockedFile, implode("\\n", $FinalDomainData));

//Reload named
print `sudo /var/www/html/AddRules/restart_named`;
?>' > /var/www/html/AddRules/index.php;

usermod -a -G named apache;
service httpd graceful;

#Password protect the domain update script
echo -n 'AuthType Basic
AuthName "Admins Only"
AuthUserFile "/var/www/html/AddRules/.htpasswd"
require valid-user' > /var/www/html/AddRules/.htaccess;

htpasswd -bc /var/www/html/AddRules/.htpasswd "$ADDRULES_USERNAME" "$ADDRULES_PASSWORD";

echo 'Script complete';
Syncing Amazon EC2 Instances

In continuation of yesterday’s post, in which I showed how to create Amazon AMIs to keep your newly created EC2 instances up to date, today I will cover syncing already-live instances from the master to slaves. All of the below takes place on the master instance, and assumes all other instances are part of the slave group. You may have to use extra filters on the below “aws” command to only pull IPs from a certain group of instances.

Here is a simple bash script (hereby referred to as “Propagate.sh”) which syncs /var/www/html/ to all of your slave instances. It uses the “aws” command line interface provided by Amazon, which comes default with the Amazon Linux starter AMI.

#The first command line of the script contains the master’s IP, so it does not sync with itself.
export LocalIP=Your_Master_IP_Here;

#Get the IPs of all slave instances
export NewIPs=`aws ec2 describe-instances | grep '"PrivateIpAddress"' | perl -i -pe 's/(^.*?: "|",?\s*?$)//gm' | sort -u | grep -v $LocalIP`

#Loop over all slave instances
for i in $NewIPs; do
        echo "Syncing to: $i";
        #Run an rsync from the master to the slave
        rsync -aP -e 'ssh -o StrictHostKeyChecking=no' /var/www/html/ root@$i:/var/www/html/;
done

You may also want to add “-o UserKnownHostsFile=/dev/null” to the SSH command (directly after “-o StrictHostKeyChecking=no”), as a second EC2 instance may end up having the same IP as a previously terminated instance. Another solution to that problem is syncing the “/etc/ssh/ssh_host_rsa_key*” from the master when an instance initializes, so all instances keep the same SSH fingerprint.


To let other people manually execute this script, you can create a PHP file with the following in it. (Change /var/www/ in all below examples to where you place your Propagate.sh)

<? print nl2br(htmlentities(shell_exec('sudo /var/www/Propagate.sh 2<&1'))); ?>

If your Propagate.sh needs to be ran as root, which it may if your PHP environment is not run as the user root (usually “apache”), then you need to make sure it CAN run as root without intervention. To do this, add the following to the /etc/sudoers file
apache  ALL=(ALL)       NOPASSWD: /usr/bin/whoami, /var/www/Propagate.sh
Change the user from “apache” to the user which PHP runs as (when running through apache).
I included “whoami” as a valid sudoer application for testing purposes.
Also, in the sudoers file, if “Defaults requiretty” is turned on, you will need to comment it/turn it off.

While I did not mention it in yesterday's post, I thought I should at least mention it here. There are other ways to keep file systems in sync with each other. This is just a good use case for when you want to keep all instances as separate independent entities. Another solution to many of the previously mentioned problems is using Amazon's new EFS, which is currently still in preview mode.

Custom Initializations for Amazon AMIs

I was recently hired to move a client's site from our primary server in Houston to the Amazon cloud, as it was about to take a big hit in traffic. The normal setup for this kind of job is pretty straightforward. Move the database over to RDS, set up an AMI of an EC2 instance, a load balancer, and ec2 auto scaling. However, there were a couple of problems I needed to solve this time around for the instances launched via the auto scalar that I had not really needed to do before. This includes syncing the SSH settings and current codebase from the primary instance, as opposed to recreating AMIs every time there was a change. So, long story short, here are the problems and solutions that need to be added before the AMI image is created.


This all assumes you are running as root. Most of these commands should work on any Linux distribution that Amazon has default AMIs for, but some of these may only work in the Amazon and CentOS AMIs.


Pre-setup:
  • Your first instance that you are creating the AMI from should be a permanent instance. This is important for 2 reasons.
    1. When changing configurations for the auto scalar, if and when your instances are terminated and recreated, this instance will always be available on the load balancer, so there is no downtime.
    2. This instance can act as a central repository for other instances to sync from.
    So make sure this instance has an elastic IP assigned to it. From here on out, we will refer to this instance as PrimaryInstance (you can set this physically in the host file, or change it in all scripts to however you want to refer to your elastic IP [most likely through a DNS domain]).
  • Create your ssh private key for the instances: (For all prompts, use default settings)
    ssh-keygen -t rsa -b 4096 -C "your_email@example.com"
  • Make sure your current ssh authorized_keys contains your new ssh private key:
    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  • Make sure your ssh known_hosts includes your primary instance, so all future ssh calls to it are automatically accept it as a known host:
    ssh PrimaryInstance -o StrictHostKeyChecking=no
    You do not have to finish the login process. This just makes sure our primary instance will be recognized by other instances.
  • Turn on PermitRootLogin in /etc/ssh/sshd_config and reload the sshd config service sshd reload
    I just recommend this because it makes life way, way easier. The scripts below assume that you did this.

Create a custom init file that runs on boot to take care of all the commands that need to be run.
#Create the script and make sure the full path (+all other root environment variables) are set when it is ran
echo '#!/bin/bash -l' > /etc/rc.d/init.d/custom_init

#Set the script as executable
chmod +x /etc/rc.d/init.d/custom_init

#Executes it as one of the last scripts on run level 3 (Multi-user mode with networking)
ln -s ../init.d/custom_init /etc/rc.d/rc3.d/S99custom_init
All of the below commands in this post will go into this script.

Allow login via password authentication:
perl -i -pe 's/^PasswordAuthentication.*$/PasswordAuthentication yes/mg' /etc/ssh/sshd_config
service sshd reload
Notes:
You may not want to do this. It was just required by my client in this case.
This is required in the startup script because Amazon likes to mess with the sshd_config (and authorized_keys) in new instances it boots.

Sync SSH settings from the PrimaryInstance:
#Remove the known_hosts file, in case something on the PrimaryInstance has changed that would block ssh commands.
rm -f ~/.ssh/known_hosts

#Sync the SSH settings from the PrimaryInstance
rsync -e 'ssh -o StrictHostKeyChecking=no' -a root@PrimaryInstance:~/.ssh/ ~/.ssh/

Sync required files from the PrimaryInstance. In this case, the default web root folder:
rsync -at root@PrimaryInstance:/var/www/html/ /var/www/html/

That's it for the things that need to be configured/added to the instance. From there, create your AMI and launch config, and create/modify your launch group and load balancer.


Also, as a very important note about your load balancer, make sure if you are mirroring its IP on another domain to use a CNAME record, and not the IP in an A record, as the load balancer IP is subject to change.

Lets Encrypt HTTPS Certificates

After a little over a year of waiting, Let’s Encrypt has finally opened its doors to the public! Let’s Encrypt is a free https certificate authority, with the goal of getting the entire web off of http (unencrypted) and on to https. I consider this a very important undertaking, as encryption is one of the best ways we can fight illegal government surveillance. The more out there that is encrypted, the harder it will be to spy on people.

I went ahead and got it up and running on 2 servers today, which was a bit of a pain in the butt. It [no longer] supports Python 2.6, and was also very unhappy with my CentOS 6.4 cPanel install. Also, when you first run the letsencrypt-auto executable script as instructed by the site, it opens up your package manager and immediately starts downloading LOTS of packages. I found this to be quite anti-social, especially as I had not yet seen anywhere, or been warned, that it would do this before I started the install, but oh well. It is convenient. The problem in cPanel was that a specific library, libffi, was causing problems during the install.


To fix the Python problem for all of my servers, I had to install Python 2.7 as an alt Python install so it wouldn’t mess with any existing infrastructure using Python 2.6. After that, I also set the current alias of “python” to “python2.7” so the local shell would pick up on the correct version of Python.


As root in a clean directory:
wget https://www.python.org/ftp/python/2.7.8/Python-2.7.8.tgz
tar -xzvf Python-2.7.8.tgz
cd Python-2.7.8
./configure --prefix=/usr/local
make
make altinstall
alias python=python2.7

The cPanel lib problem was caused by libffi already being installed as 3.0.9-1.el5.rf, but yum wanted to install its devel package as version 3.0.5-3.2.el6.x86_64 (an older version). It did not like running conflicting versions. All that was needed to fix the problem was to manually download and install the same devel version as the current live version.

wget http://pkgs.repoforge.org/libffi/libffi-devel-3.0.9-1.el5.rf.x86_64.rpm
rpm -ivh libffi-devel-3.0.9-1.el5.rf.x86_64.rpm

Unfortunately, the apache plugin was also not working, so I had to do a manual install with “certonly” and “--webroot”.


And that was it; letsencrypt was ready to go and start signing my domains! You can check out my current certificate, issued today, that currently has 13 domains tied to it!

AutoHotKey Scripts

In lieu of using my own custom C++ background services to take care of hot key tasks in Windows, I started using AutoHotKey a while back. While it’s not perfect, and it is missing a lot of Win32 API functionality, I am still able to mostly accomplish what I want in it. I was thinking I should add some of the simple scripts I use here.


Center a string within padding characters and output as key-strokes
Example:
  • PadText = ~*
  • Length = 43
  • Text = Example Text
  • Result = ~*~*~*~*~*~*~*~*Example Text~*~*~*~*~*~*~*~
;Get the last values
IniPath=%A_ScriptDir%\AutoHotKey.ini
IniRead,PadText,%IniPath%,CenterString,PadText,-
IniRead,NewLength,%IniPath%,CenterString,NewLength,10
IniRead,TheString,%IniPath%,CenterString,TheString,The String

;Get the input
InputBox,PadText,Center String,Pad Character,,,,,,,,%PadText%
InputBox,NewLength,Center String,New Length,,,,,,,,%NewLength%
InputBox,TheString,Center String,String To Center,,,,,,,,%TheString%

;Cancel on blank pad or invalid number
if StrLen(PadText)==0
{
	MsgBox,Pad text cannot be blank
	return
}
if NewLength is not integer
{
	MsgBox,New length must be an integer
	return
}

;Save the last values
IniWrite,%PadText%,%IniPath%,CenterString,PadText
IniWrite,%NewLength%,%IniPath%,CenterString,NewLength
IniWrite,%TheString%,%IniPath%,CenterString,TheString

;Initial padding
PadStrLen:=StrLen(PadText)
PadLen:=NewLength-StrLen(TheString)
NewString:=""
Loop
{
	if StrLen(NewString)>=Ceil(PadLen/2)
		break
	NewString.=PadText
}

;Truncate initial padding to at least half
NewString:=Substr(NewString, 1, Ceil(PadLen/2))

;Add the string
NewString.=TheString

;Final padding
Loop
{
	if StrLen(NewString)>=NewLength
		break
	NewString.=PadText
}

;Truncate to proper length
NewString:=Substr(NewString, 1, NewLength)

;Output to console
Sleep,100
Send %NewString%
return

Format rich clipboard text to plain text
clipboard = %clipboard%
return

Force window to borderless full screen
Description: This takes the active window, removes all window dressing (titlebar, borders, etc), sets its resolution as 1920x1080, and positions the window at 0x0. In other words, this makes your current window take up the entirety of your primary monitor (assuming it has a resolution of 1920x1080).
WinGetActiveTitle, WinTitle
WinSet, Style, -0xC40000, %WinTitle%
WinMove, %WinTitle%, , 0, 0, 1920, 1080
return

Continually press key on current window
Description: Saves the currently active window (by its title) and focused control object within the window; asks the user for a keypress interval and the key to press; starts to continually press the requested key at the requested interval in the original control (or top level window if an active control is not found); stops via the F11 key.
Note: I had created this to help me get through the LISA intro multiple times.
;Get the current window and control
WinGetActiveTitle, TheTitle
ControlGetFocus FocusedControl, %TheTitle%
if(ErrorLevel)
	FocusedControl=ahk_parent

;Get the pause interval
InputBox,IntervalTime,Starting script with window '%TheTitle%',Enter pause interval in milliseconds. After submitted`, hold down the key to repeat,,,,,,,,200
if(ErrorLevel || IntervalTime=="") ;Cancel action if blank or cancelled
	return
IntervalTime := IntervalTime+0

;Get the key to keep pressing - Unfortunately, there is no other way I can find to get the currently pressed keycode besides polling all 255 of them
Sleep 500 ;Barrier to make sure one of the initialization keys is not grabbed
Loop {
	TestKey := 0
	Loop {
		SetFormat, INTEGER, H
		HexTextKey := TestKey
		SetFormat, INTEGER, D
		VirtKey = % "vk" . SubStr(HexTextKey, 3)
		if(GetKeyState(VirtKey)=1 || TestKey>255)
			break
		TestKey:=TestKey+1
	}
	if(TestKey<=255)
		break
	Sleep 500
}
VirtKey := GetKeyName(VirtKey)

;If a direction key, remap to the actual key
if(TestKey>=0x25 && TestKey<=0x28)
	VirtKey := SubStr(VirtKey, 7)

;Let the user know their key
MsgBox Received key: '%VirtKey%'. You may now let go of the key. Hold F11 to stop the script.

;Continually send the key at the requested interval
KeyDelay:=10
SetKeyDelay %KeyDelay% #Interval between up/down keys
IntervalTime-=%KeyDelay%
Loop {
	;Press the key
	ControlSend, %FocusedControl%, {%VirtKey% Up}{%VirtKey% Down}, %TheTitle%

	;Check for the cancel key
	if(GetKeyState("F11"))
		break

	;Wait the requested interval to press the key again
	Sleep, %IntervalTime%
}

;Let the user know the script has ended
MsgBox Ending script with window '%TheTitle%'
return
Useful Exim Scripts
For fighting spam

In the course of my Linux administrative duties (on a cPanel server), I have created multiple scripts to help us out with Exim, our mail transfer agent. These are mostly used to help us fight spam, and determine who is spamming when it occurs.



This monitors the number of emails in the queue, and sends ours admins an email when a limit (1000) is reached. It would need to be run on a schedule (via cron).
#!/bin/bash
export AdminEmailList="ADMIN EMAIL LIST SEPARATED BY COMMAS HERE"
export Num=`/usr/sbin/exim -bpc`
if [ $Num -gt 1000 ]; then
        echo "Too many emails! $Num" | /usr/sbin/sendmail -v "$AdminEmailList"
        #Here might be a good place to delete emails with “undeliverable” strings within them
        #Example (See the 3rd script): exim-delete-messages-with 'A message that you sent could not be delivered'
fi

This deletes any emails in the queue from or to a specified email address (first parameter). If the address is the recipient, the from must be "<>" (root)
#!/bin/bash
exiqgrep -ir $1 -f '<>' | xargs exim -Mrm
exiqgrep -if $1 | xargs exim -Mrm

This deletes any emails in the queue which contain a given string (first parameter)
#!/bin/bash
if [ "$1" == "" ]
then
  echo 'Cannot delete with empty string'
else
  grep -lir "$1" /var/spool/exim/input/ | sed -e 's/^.*\/\([a-zA-Z0-9-]*\)-[DH]$/\1/g' | xargs /usr/sbin/exim -Mrm
fi

Get a count of emails in the queue per sender (sender email address is supplied by sender and can be faked)
#!/bin/bash
exim -bp | grep -oP '<.*?>' | sort | uniq -c | sort -n

Get a count of emails in the queue per account (running this script can take a little while)
#!/bin/bash
exim -bp | grep -Po '(?<= )[-\w]+(?= <)' | xargs -n1 exim -Mvh | grep -ioP '(?<=auth_sender ).*$' | sort | uniq -c

Bonus: Force all non-specified accounts on Exim to use a certain IP address for sending. It would need to be run on a schedule (via cron).
#!/bin/bash
export IPAddress="YOUR ADDRESS HERE"
/usr/bin/perl -i -pe 's/\*:.*/*: '$IPAddress'/g' /etc/mailips
Optimization gone bad
Or, the case of the Android app out-of-order calls

On Android, there is a primary thread which runs all UI stuff. If a GUI operation is ran in a different thread, it just won't work, and may throw an error. If you block this thread with too much processing... well... bad things happen. Due to this design, you have to push all UI operations to this main thread using Looper.run().

Runnables pushed to this thread are always ran in FIFO execution order, which is a useful guarantee for programming.

So I decided to get smart and create the following function to add asynchronous calls that needed to be run on the primary thread. It takes a Runnable and either runs it immediately, if already on the Primary thread, or otherwise adds it to the Primary Thread’s queue.

//Run a function on primary thread
public static void RunOnPrimary(Runnable R)
{
    Looper L=MainActivity.getMainLooper();
    //Start commenting here so that items are always added to the queue, forcing in-order processesing
    if(Looper.myLooper()==Looper.getMainLooper())
        R.run();
    else
    //End commenting here
        new Handler(Looper.getMainLooper()).post(R);
}

I was getting weird behaviors though in a part of the project where some actions pulled in from JavaScript were happening before subsequent actions. After the normal debugging one-by-one steps to figure it out, I realized that MAYBE some of the JavaScript calls were, for some bizarre reason, already running on the primary thread. In this case they would run immediately, before the queued items started coming in. This turned out to be the case, so I ended up having to comment out the first 3 lines after the function’s first comment (if/R.run/else), and it worked great.

I found it kind of irritating that I had to add actions to the queue when it could have been run immediately on the current thread, but oh well, I didn’t really have a choice if I want to make sure everything is always run in order across the system.

Renaming a series for Plex

I was recently trying to upload a TV series into Plex and was having a bit of a problem with the file naming. While I will leave the show nameless, let’s just say it has a magic dog.

Each of the files (generally) contained 2 episodes and were named S##-E##-E## (Season #, First Episode #, Second Episode #). Plex really didn’t like this, as for multi-episode files, it only supports the naming convention of first episode number THROUGH a second episode number. As an example S02-E05-E09 is considered episodes 5 through 9 of season 2. So I wrote a quick script to fix up the names of the files to consider each file only 1 episode (the first one), and then create a second symlinked file, pointing to the first episode, but named for the second episode.

So, for the above example, we would get 2 files with the exact same original filenames, except with the primary file having “S02E05,E09” in place of the episode number information, and the linked file having “S02E09-Link” in its place.


The following is the bash code for renaming/fixing a single episode file. It needs to be saved into a script file. This requires perl for regular expression renaming.


#Get the file path info and updated file name
FilePath=`echo "$1" | perl -pe 's/\/[^\/]*$//g'`
FileName=`echo "$1" | perl -pe 's/^.*\///g'`
UpdatedFileName=`echo "$FileName" | perl -pe 's/\b(S\d\d)-(E\d\d)-(E\d\d)\b/$1$2,$3/g'`

#If the file is not in the proper format, exit prematurely
if [ "$UpdatedFileName" == "$FileName" ]; then
    echo "Proper format not found: $FilePath/$FileName"
    exit 1
fi

#Rename the file
cd "$FilePath"
mv "$FileName" "$UpdatedFileName"

#Create a link to the file with the second episode name
NewLinkName=`echo "$FileName" | perl -pe 's/\b(S\d\d)-(E\d\d)-(E\d\d)\b/$1$3-Link/g'`
ln -s "$UpdatedFileName" "$NewLinkName"

If you save that to a file named “RenameShow.sh”, you would use this like “find /PATH/ -type f -print0 | xargs -0n 1 ./RenameShowl.sh”. For windows, make sure you use windows symlinks with /H (to make them hard file links, as soft/symbolic link files really just don’t work in Windows).

Sending URLs as a file in an HTML form using AJAX
It is common knowledge that you can use the FormData class to send a file via AJAX as follows:
var DataToSend=new FormData();
DataToSend.append(PostVariableName, VariableData); //Send a normal variable
DataToSend.append(PostFileVariableName, FileElement.files[0], PostFileName); //Send a file
var xhr=new XMLHttpRequest();
xhr.open("POST", YOUR_URL, true);
xhr.send(DataToSend);

Something that is much less known, which doesn't have any really good full-process examples online (that I could find), is sending a URL's file as the posted file.
This is doable by downloading the file as a Blob, and then directly passing that blob to the FormData. The 3rd parameter to the FormData.append should be the file name.

The following code demonstrates downloading the file. I did not worry about adding error checking.
function DownloadFile(
    FileURL,     //http://...
    Callback,    //The function to call back when the file download is complete. It receives the file Blob.
    ContentType) //The output Content-Type for the file. Example=image/jpeg
{
    var Req=new XMLHttpRequest();
    Req.responseType='arraybuffer';
    Req.onload=function() {
        Callback(new Blob([this.response], {type:ContentType}));
    };
    Req.open("GET", FileURL, true);
    Req.send();
}

And the following code demonstrates submitting that file
//User Variables
var DownloadURL="https://www.castledragmire.com/layout/PopupBG.png";
var PostURL="https://www.castledragmire.com/ProjectContent/WebScripts/Default_PHP_Variables.php";
var PostFileVariableName="MyFile";
var OutputFileName="Example.jpg";
//End of User Variables

DownloadFile(DownloadURL, function(DownloadedFileBlob) {
    //Get the data to send
    var Data=new FormData();
    Data.append(PostFileVariableName, DownloadedFileBlob, OutputFileName);

    //Function to run on completion
    var CompleteFunction=function(ReturnData) {
        //Add your code in this function to handle the ajax result
        var ReturnText=(ReturnData.responseText ? ReturnData : this).responseText;
        console.log(ReturnText);
    }

    //Normal AJAX example
    var Req=new XMLHttpRequest();
    Req.onload=CompleteFunction; //You can also use "onreadystatechange", which is required for some older browsers
    Req.open("POST", PostURL, true);
    Req.send(Data);

    //jQuery example
    $.ajax({type:'POST', url:PostURL, data:Data, contentType:false, processData:false, cache:false, complete:CompleteFunction});
});

Unfortunately, due to cross site scripting (XSS) security settings, you can generally only use ajax to query URLs on the same domain. I use my Cross site scripting solutions and HTTP Forwarders for this. Stackoverflow also has a good thread about it.

Missing phar wrapper

Phar files are PHP’s way of distributing an entire PHP solution in a single package file. I recently had a problem on my Cygwin PHP server that said “Unable to find the wrapper "phar" - did you forget to enable it when you configured PHP?”. I couldn’t find any solution for this online, so I played with it a bit.

The quick and dirty solution I came up with is to include the phar file like any normal PHP file, which sets your current working directory inside of the phar file. After that, you can include files inside the phar and then change your directory back to where you started. Here is the code I used:

if(preg_match('/^(?:win|cygwin)/i', PHP_OS))
{
    $CWD=getcwd();
    require_once('Scripts/PHPExcel.phar');
    require_once('PHPExcel/IOFactory.php');
    chdir($CWD);
}
else
    require_once('phar://Scripts/PHPExcel.phar/PHPExcel/IOFactory.php');
Pulling HTML from Github markdown for external use
Although, converting to markdown is a time consuming pain

So I started getting on the Github bandwagon FINALLY. I figured that while I was going to the trouble of remaking readme files for the projects into github markdown files, I might as well duplicate the compiled HTML for my website.

The below code is a simple PHP script to pull in the converted HTML from Github’s API and then do some more modifications to facilitate directly inserting it into a website.


Usage:
  • The variables that can be updated are all at the top of the file.
  • The script will always output the finished result to the user’s browser, but can also optionally save it to an external file by setting the $SaveFileName variable.
  • Stylesheet:
    • The script automatically includes a specified stylesheet from the $StylesheetLocation variable.
    • The stylesheet I used is from https://gist.github.com/somebox/1082608. I’m not too happy with its coloring scheme, but it’ll do for now.
    • The required modifications that need to be made to the css are to change “body” to “.GHMarkdown”, and then add “.GHMarkdown” before all other rules.
    • This is the one I am currently using for my website, but it also has a few modifications made specifically for my layouts.
  • Modifications
    • In my markdowns, I like to link to internal sections by first creating a bookmark as “<div name="BOOKMARK_NAME">...</div>” and then linking via “[LinkName](#BOOKMARK_NAME)”. While this works on github, the bookmark’s names are actually changed to something like “user-content-BOOKMARK-NAME”, which is not useable outside of github. The first $RegexModifications item therefore updates the bookmarks back to their original name, and turns them into <span>s (which github does not support).
    • The second rule just removes the “aria-hidden” attributes, which my W3C checking scripts throw a warning on.
  • Note that sometimes, the script may return an error of “transfer closed with XXX bytes remaining to read”. This means that github denied the request (probably due to too many requests in too short a timespan), but the input is too large so github prematurely terminated the connection. If this happens, try sending a tiny input and see if you get back a proper error.

<?php
//Variables
$SaveFileName='Output.html'; //Optionally save output to a file. Comment out to not save
$InputFile='Input.md';
$StylesheetLocation='github-markdown.css';
$RegexModifications=Array(
        '/<div name="user-content-(.*?)"(.*?)<\/div>/s'=>'<span id="$1"$2</span>', //Change <div name="user-contentXXX ---TO--- <span name="XXX
        '/ ?aria-hidden="true"/'=>'' //Remove aria-hidden attribute
);

//Set the curl options
$CurlHandle=curl_init(); //Init curl
curl_setopt_array($CurlHandle, Array(
        CURLOPT_URL=>           'https://api.github.com/markdown/raw', //Markdown/raw takes and returns plain text input and output
        CURLOPT_FAILONERROR=>   false,
        CURLOPT_FOLLOWLOCATION=>1,
        CURLOPT_RETURNTRANSFER=>1, //Return result as a string
        CURLOPT_TIMEOUT=>       300,
        CURLOPT_POST=>          1,
        CURLOPT_POSTFIELDS=>    file_get_contents($InputFile), //Pull in the requested file
        CURLOPT_HTTPHEADER=>    Array('Content-type: text/plain'), //Github expects the given data to be plaintext
        CURLOPT_SSL_VERIFYPEER=>0, //In case there are problems with the PHP ssl chain (often the case in Windows), ignore the error
        CURLOPT_USERAGENT=>     'Curl/PHP' //Github now requires a useragent to process the request
));

//Pull in the html converted markdown from Github
$Return=curl_exec($CurlHandle);
if(curl_errno($CurlHandle)) //Check for error
        $Return=curl_error($CurlHandle);
curl_close($CurlHandle);

//Make regex modifications
$Return=preg_replace(array_keys($RegexModifications), array_values($RegexModifications), $Return);

//Generate the final HTML. It will also be output here if not saving to a file
header('Content-Type: text/html; charset=utf-8');
if(isset($SaveFileName)) //If saving to a file, buffer output
        ob_start();
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>Markdown pull</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<link href="<?=$StylesheetLocation?>" rel=stylesheet type="text/css">
</head><body><div class=GHMarkdown>
<?=$Return?>
</div></body></html>
<?php
//Save to a file if requested
if(isset($SaveFileName))
        file_put_contents($SaveFileName, ob_get_flush()); //Actual output happens here too when saving to a file
?>
Installing PSLib for PHP
I was setting up a brand new server yesterday running Cent OS 6.4 and had the need to install PSLib (PostScript) for PHP.
The initial setup commands all worked as expected:
cd /src #A good directory to install stuff in. You may need to create it first

#Install intltool (required for pslib)
yum install intltool

#Install pslib
wget http://sourceforge.net/projects/pslib/files/latest/download?source=files
tar -xzvf pslib-*.tar.gz
cd pslib-*
./configure
make
make install
cd ..

#Install pslib wrapper for php
pecl download ps
tar -xzvf ps-*.tgz
cd ps-*
phpize
./configure
make

At this point, the make failed with
/src/ps-1.3.6/ps.c:58: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘ps_functions’

After a little bit of browsing the code and a really lucky first guess, I found that changing the following fixed the problem:
In ps.c change: “function_entry ps_functions[] = {” to “zend_function_entry ps_functions[] = {

Then to finish the install, just run:
make
make install
and you’re done!
Using Twitter API v1.1

Twitter recently turned off their v1.0 API which broke a multitude of applications, many of which are still broken today. I had the need to immediately update at least one website to the new Twitter v1.1 API, but I could not find a simple bare bones example on the internet. Pretty much all the code/libraries out there I could find for the new API version were hundreds to thousands of lines long, or didn’t work >.<; . So anywho, here is the simple PHP code I put together to make API calls.

This code requires certain consumer/authentication keys and secrets. You can find how to generate them elsewhere online.

//OAuth Request parameters
$ConsumerKey='FILL ME IN';
$ConsumerSecret='FILL ME IN';
$AccessToken='FILL ME IN';
$AccessTokenSecret='FILL ME IN';

function EncodeParam($input) { return strtr(rawurlencode($input), Array('+'=>' ', '%7E'=>'~')); }
function SendTwitterRequest($RequestURL, $Params=Array())
{
	//Compile the OAuth parameters
	global $ConsumerKey, $ConsumerSecret, $AccessToken, $AccessTokenSecret;
	$Params=array_merge(
		$Params,
		Array('oauth_version'=>'1.0', 'oauth_nonce'=>mt_rand(), 'oauth_timestamp'=>time(), 'oauth_consumer_key'=>$ConsumerKey, 'oauth_signature_method'=>'HMAC-SHA1'),
		isset($AccessToken) ? Array('oauth_token'=>$AccessToken) : Array()
	);
	uksort($Params, 'strcmp'); //Must be sorted to determine the signature
	foreach($Params as $Key => &$Val) //Create the url encoded parameter list
		$Val=EncodeParam($Key).'='.EncodeParam($Val);
	$Params=implode('&', $Params); //Combine the parameter list
	$Params.='&oauth_signature='.EncodeParam(base64_encode(hash_hmac('sha1', 'GET&'.EncodeParam($RequestURL).'&'.EncodeParam($Params), EncodeParam($ConsumerSecret).'&'.EncodeParam($AccessTokenSecret), TRUE)));

	//Do the OAuth request
	$CurlObj=curl_init();
	foreach(Array(CURLOPT_URL=>"$RequestURL?$Params", CURLOPT_SSL_VERIFYHOST=>0, CURLOPT_SSL_VERIFYPEER=>0, CURLOPT_RETURNTRANSFER=>1) as $Key => $CurlVal)
		curl_setopt($CurlObj, $Key, $CurlVal);
	$Result=curl_exec($CurlObj);
	curl_close($CurlObj);
	return $Result;
}

If you don’t have an AccessToken and AccessSecret yet, you can get them through the following code:
$OAuthResult=SendTwitterRequest('https://twitter.com/oauth/request_token');
parse_str($OAuthResult, $OauthRet);
if(!isset($OauthRet['oauth_token']))
	throw new Exception("OAuth error: $OAuthResult");
$AccessToken=$OauthRet['oauth_token'];
$AccessSecret=$OauthRet['oauth_token_secret'];

Here is an example to pull the last 4 tweets from a user:
$UserName='TheUserName';
$Result=json_decode(SendTwitterRequest('https://api.twitter.com/1.1/statuses/user_timeline.json', Array('screen_name'=>$UserName, 'count'=>4)));
if(isset($Result->{'errors'}))
	throw new Exception($Result->{'errors'}[0]->{'message'});
$Tweets=Array();
foreach($Result as $Tweet)
	$Tweets[]=$Tweet->text;

print implode('<br>', $Tweets);
JavaScript Cookies Functions

Just throwing these up here for reference. Simple JavaScript scripts to get and set cookies. Not particularly foolproof, robust, or fully featured.

function SetCookie(Name, Value, SecondsToExpire) { document.cookie=Name+"="+escape(Value)+"; expires="+new Date(new Date().getTime()+SecondsToExpire*1000).toUTCString(); }
function GetCookie(Name)
{
	var Match=document.cookie.match(new RegExp('(?:^|; ?)'+Name+'=(.*?)(?:;|$)'));
	return Match ? unescape(Match[1]) : null;
}
Converting Videos to HTML5 Formats

The following is a Linux script I threw together to convert an mpeg4 (.mp4, .mpv, .mpeg4, .mpg) file into ogg vorbis (.ogg, .ogv) and flash video (.flv) maintaining the same bit rate. It also doesn’t hurt to have a .webm format for some older devices. This assumes you already have the appropriate packages installed, including ffmpeg. The script also wasn’t made to be foolproof, and might not be able to read the bitrate on some videos. The script takes a list of mpeg4 files to convert.

for filename in "$@"
do
  extension="${filename##*.}"
  filename="${filename%.*}"

  export BITRATE=`ffmpeg -i $filename.$extension 2>&1 | grep -oP 'Video.*\d+ kb/s' | grep -oP '\d+ kb/s' | grep -oP '\d+'`
  ffmpeg2theora -V $BITRATE -o $filename.ogv $filename.$extension
  ffmpeg -i $filename.$extension -ar 44100 -b ${BITRATE}k -f flv $filename.flv #Add an optional "-threads #" to make this faster
done
Font size hack for Mac Web Browsers

Mac renders fonts differently than windows, which is a problem when trying to make cross-system compatible web pages. I've read in many places that using "em" instead of pixels fixes this problem, but sometimes using pixel based measurements is required for a project.

I found when testing a recent project's web page out on Apple's OSX that no matter what browser I used, the fonts were always 2px bigger than on windows, which threw off my layouts. So I used the simple JavaScript solution below (jQuery required). Note that this assumes all font sizes are in pixels. It might not hurt to add a check for that if you mix font size types.

$(document).ready(function() {
	if(/Macintosh/.test(navigator.userAgent))
		$.each(document.styleSheets, function(Indx, SS) {
			var rules=SS.cssRules || SS.rules;
			for(var i=0;i<rules.length;i++)
				if(rules[i].style && rules[i].style.fontSize!='')
					rules[i].style.fontSize=(parseInt(rules[i].style.fontSize, 10)-2)+'px';
});
Encoding & decoding HTML in JavaScript with jQuery

Here are a few functions I’ve been finding a lot of use for lately. They are basically the JavaScript equivalent for PHP’s htmlentities and html_entity_decode. These functions are useful for inserting HTML dynamically, and getting values of contentEditable fields. These functions do replace line breaks appropriately, and HTML2Text removes a trailing line break.


var TextTransformer=$('<div></div>');
function Text2HTML(T) { return TextTransformer.text(T).html().replace(/\r?\n/g, '<br>'); }
function HTML2Text(T) { return TextTransformer.html(ReplaceBreaks(T, "\x01br\x01")).text().replace(/\x01br\x01/g, "\n").replace(/\n$/, ''); }
function ReplaceBreaks(TheHTML, ReplaceText) { return TheHTML.replace(/<\s*br\s*\/?\s*>/g, ReplaceText || ' - '); }
Transferring an Excel Spreadsheet to MySQL [Again]
Data manipulation primer

Sigh, I just realized after writing this post that I had already covered this topic... oh well, this version has some new information the other one is missing.


I find people very often asking me to move data from an Excel spreadsheet to a MySQL database, so I thought I’d write up the procedure I follow when doing so. This assumes no multi-line cells or tabs in the excel spreadsheet data.

  1. You need a good text editor with regular expression support. I highly recommend EditPad Pro (a free version is available too), and will be assuming you are using it for the steps below.
  2. Make sure all data in the Excel spreadsheet is formatted for SQL insertion, for example:
    To convert a date “mm/dd/yyyy” to SQL:
    1. Copy the entire row to your text editor
    2. Run the following regular expression replace:
      Find TextReplace Text
      ^(\d+)/(\d+)/(\d+)$$3-$1-$2
    3. Copy the text back to the spreadsheet row
  3. Copy all the data into the text editor, and run the following regular expressions:
    Find TextReplace TextExplanation
    \\\\\\Escape backslash
    '\\'Escape single quotation mark
    \t','Change separators so that all values are encased as strings
    ^('Line prefix to insert a row and stringify the first value
    $'),Line suffix to insert a row and stringify the last value
  4. Change the very last character on the last line from a comma to a semi colon to end the query
  5. Add the following to the top of the file:
    SET NAMES 'utf8' COLLATE 'utf8_general_ci';
    SET CHARACTER SET 'utf8';
    TRUNCATE TABLE TABLE_NAME;
    INSERT INTO TABLE_NAME (Field1, Field2, ...) VALUES
    		
  6. Make sure the file is saved as UTF8: Menu -> Convert -> Text Encoding -> (Encode the data with another character set ...) AND (Unicode, UTF-8)
  7. Make sure the file is saved with Unix line breaks: Menu -> Convert -> To Unix (LF Only)
  8. Save the file and run the following in your MySQL command line prompt to import it:
    \u DATABASE_NAME
    \. FILE_NAME
    		

There are of course easier solutions, but they can often be buggy, and I figured this is a good primer on regular expressions and simple data manipulation :-)

Cygwin SIGINT fix for golang

Cygwin has had a long time problem that, depending on your configuration, may cause you to be unable to send a SIGINT (interrupt signal via Ctrl+C) to a native Windows command line executables. As a matter of fact, trying to do so may completely freeze up the console, requiring a process kill of the actual console, bash, and the executable you ran. This problem can crop up for many reasons including the version of Cygwin you are running and your terminal emulator. I specifically installed mintty as my default Cygwin console to get rid of this problem a long time ago (among many other features it had), and now it even has this problem.

While my normal solution is to try and steer clear of native Windows command line executables in Cygwin, this is not always an option. Golang was also causing me this problem every time I ran a network server, which was especially problematic as I would have to ALSO manually kill the server process or it would continue to hold the network port so another test of the code could not use it. An example piece of code is as follows:

package main
import ( "net/http"; "fmt" )
func main() {
	var HR HandleRequest
	if err := http.ListenAndServe("127.0.0.1:81", HR); err!=nil {
		fmt.Println("Error starting server") }
}

//Handle a server request
type HandleRequest struct{}
func (HR HandleRequest) ServeHTTP(w http.ResponseWriter, req *http.Request) {
	fmt.Printf("Received connection from: %s\n", req.RemoteAddr)
}
---
go run example.go

The first solution I found to this problem, which is by far the best solution, was to build the executable and then run it, instead of just running it straight from go.
go build example.go && example.exe
However, as of this post, it seems to no longer work! The last time I tested it and confirmed it was working was about 3 months ago, so who knows what has changed since then.

The second solution is to just build in some method of killing the process that uses “os.Exit”. For example, the following will exit if the user types “exit”
func ListenForExitCommand() {
	for s:=""; s!="exit"; { //Listen for the user to type exit
		if _, err:=fmt.Scanln(&s); err!=nil {
			if err.Error()!="unexpected newline" {
				fmt.Println(err) }
		} else if s=="flush" || s=="exit" {
			//Clean up everything here
		}
	}
	fmt.Println("Exit received, closing process")
	os.Exit(1)
}
and then add the following at the top of the main function:
go ListenForExitCommand() //Listen for "exit" command
Thread synchronization in C#
Building the wheel that should have already existed

I have been working heavily in C# CE (Compact Edition) v2.0 for the last 2 years for clients, and one of the very many things that I was never really happy with (in at least that version of the language, though it looks like it might plague all versions of C#) is the available thread synchronization tools. I’ve come to love the lock/wait/notify model (in Java it’s synchronized/wait/notify and in Perl it’s lock/cond_wait/cond_signal), but I have found nothing as intuitive and safe to use in C#. To alleviate this, I went ahead and wrote my own ThreadLockAndWait class that achieves this functionality.


This works the same as the POSIX lock, unlock, cond_wait, cond_signal, and cond_timedwait functions, except:
  • Lock is not required before CondSignal (it does its own inner lock and unlock)
  • If ReacquireLockAfterWait is false, in which case CondWait will not lock again after signaled and just continue immediately
  • Only 1 thread can be CondWaiting at a time (If one is CondWaiting and is signaled but not reacquired the lock, its ok for another to start CondWaiting)

public class ThreadLockAndWait
{
	private Mutex TheLock=new Mutex(), CondWaitLock=new Mutex(); //CondWaitLock makes sure 1 thread stops waiting before the next one starts waiting
	private ManualResetEvent WaitTimer=new ManualResetEvent(false);
	private string OwnersThreadName=null;
	private int OwnerLockCount=0;

	public void Lock()
	{
		TheLock.WaitOne();
		if(OwnerLockCount++==0)
			OwnersThreadName=Thread.CurrentThread.Name;
	}
	public void UnLock()
	{
		TheLock.WaitOne();
		if(OwnerLockCount==0)
		{
			TheLock.ReleaseMutex();
			throw new Exception("Cannot unlock if not locked");
		}
		TheLock.ReleaseMutex();
		if(--OwnerLockCount==0)
			OwnersThreadName=null;
		TheLock.ReleaseMutex();
	}
	public void CondWait() { RealCondWait(-1, true); }
	public void CondWait(bool ReacquireLockAfterWait) { RealCondWait(-1, ReacquireLockAfterWait); }
	public void CondTimedWait(int TimeToWait) { RealCondWait(Math.Max(0, TimeToWait), true); }
	public void CondTimedWait(int TimeToWait, bool ReacquireLockAfterWait) { RealCondWait(Math.Max(0, TimeToWait), ReacquireLockAfterWait); }
	private void RealCondWait(int TimeToWait, bool ReacquireLockAfterWait)
	{
		//Prepare to wait
		TheLock.WaitOne();
		if(OwnerLockCount==0)
		{
			TheLock.ReleaseMutex();
			throw new Exception("Cannot wait if not locked");
		}
		CondWaitLock.WaitOne(); //Release this wait before the next one starts
		WaitTimer.Reset();
		TheLock.ReleaseMutex();

		//Release all locks
		int PreviousLockCount=OwnerLockCount;
		OwnersThreadName=null;
		OwnerLockCount=0;
		if(PreviousLockCount!=1)
			System.Diagnostics.Debug.Print("Warning, mutex has multiple locks from thread!");
		for(int i=0;i<PreviousLockCount;i++)
			TheLock.ReleaseMutex();

		//Wait
		if(TimeToWait>0)
			WaitTimer.WaitOne(TimeToWait, false);
		else if(TimeToWait!=0)
			WaitTimer.WaitOne();
		CondWaitLock.ReleaseMutex();

		//Reacquire lock
		if(!ReacquireLockAfterWait)
			return;
		for(int i=0;i<PreviousLockCount;i++)
			TheLock.WaitOne();
		OwnerLockCount=PreviousLockCount;
		OwnersThreadName=Thread.CurrentThread.Name;
	}

	public void CondSignal()
	{
		TheLock.WaitOne();
		WaitTimer.Set();
		TheLock.ReleaseMutex();
	}
}
Selectively skipping data in a cPanel backup
Using a hammer instead of a scalpel

I was having problems on one of our production Linux cPanel servers in which our backup drive was not able to hold all the data from our primary drive for both our daily and weekly backups. An easy hack to fix this is to mount any subfolders you wish to exclude (generally very large ones) as a readonly temp file system in the appropriate backup folder. With this method, you can selectively exclude individual directories to one or more of the daily/weekly/monthly backup folders.

The only downside to this method is that pkgacct (called by cpbackup) logs will throw readonly file system errors for each file that cannot be copied.


So, to have cPanel discard an individual directory during the backup, you need to do the following:
First, make sure the backup directory to exclude is created and empty by running:
rm -rf PATH;
mkdir -p PATH;
NOTE: BE CAREFUL WITH “rm -rf”, IT IS A DANGEROUS COMMAND

To manually mount the directory, run:
mount tmpfs PATH -t tmpfs -o defaults,ro
To permanently mount the directory (mount on boot), edit /etc/fstab and add the following line:
tmpfs PATH tmpfs defaults,ro 0 0
If you do the permanent fix, don’t forget to run “mount PATH” to have it mount it to the live system, since fstab will not mount all its listed file systems until the next boot.

An example of a PATH might be: /backup/cpbackup/weekly/dakusan/public_html/uploads

cPanel also recently added (experimental) hard linking for backups, which really helps out with space concerns, and makes the need for this script a bit less.

Optionally encrypted TCP class for Google's Go
Yet another new language to play with

I wanted to play around with Google's go language a little so I ended up decided on making a simple class that helps create a TCP connection between a server and client that is encrypted via TLS, or not, depending upon a flag. Having the ability to not encrypt a connection is useful for debugging and testing purposes, especially if other people are needing to create clients to connect to your server.


The example server.go file listens on port 16001 and for every set of data it receives, it sends the reversed string back to the client. (Note there are limitations to the string lengths in the examples due to buffer and packet payload length restrictions).


The example client.go file connects to the server (given via the 1st command line parameter), optionally encrypts the connection (depending upon the 2nd command line parameter), and sends the rest of the parameters to the server as strings.


The encryptedtcp.go class has the following exported functions:
  • StartServer: Goes into a connection accepting loop. Whenever a connection is accepted, it checks the data stream for either the "ENCR" or "PTXT" flags, which control whether a TLS connection is created or not. The passed "clientHandler" function is called once the connection is completed.
  • StartClient: Connects to a server, passes either the "ENCR" or "PTXT" flag as noted above, and returns the finished connection.

Connections are returned as "ReadWriteClose" interfaces. Creating the pem and key certificate files is done via openssl. You can just google for examples.


server.go:
package main
import ( "./encryptedtcp"; "fmt"; "log" )

func main() {
	if err := encryptedtcp.StartServer("server.pem", "server.key", "0.0.0.0:16001", handleClient); err != nil {
		log.Printf("%q\n", err) }
}

func handleClient(conn encryptedtcp.ReadWriteClose) {
	buf := make([]byte, 512)
	for {
		//Read data
		n, err := conn.Read(buf)
		if err != nil {
			log.Printf("Error Reading: %q\n", err); break }
		fmt.Printf("Received: %q\n", string(buf[:n]))

		//Reverse data
		for i, m := 0, n/2; i<m; i++ { //Iterate over half the list
			buf[i], buf[n-i-1] = buf[n-i-1], buf[i] } //Swap first and half of list 1 char at a time

		//Echo back reversed data
		n, err = conn.Write(buf[:n])
		if err != nil {
			log.Printf("Error Writing: %q\n", err); break }
		fmt.Printf("Sent: %q\n", string(buf[:n]))
	}
}

client.go:
package main
import ( "./encryptedtcp"; "fmt"; "log"; "os" )

func main() {
	//Confirm parameters, and if invalid, print the help
	if len(os.Args) < 4 || (os.Args[2] != "y" && os.Args[2] != "n") {
		log.Print("First Parameter: ip address to connect to\nSecond Parameter: y = encrypted, n = unencrypted\nAdditional Parameters (at least 1 required): messages to send\n"); return }

	//Initialize the connection
	conn, err := encryptedtcp.StartClient("client.pem", "client.key", os.Args[1]+":16001", os.Args[2]=="y" )
	if err != nil {
		log.Printf("%q\n", err); return }
	defer conn.Close()

	//Process all parameters past the first
	buf := make([]byte, 512)
	for _, msg := range os.Args[3:] {
		//Send the parameter
		if(len(msg)==0) {
			continue }
		n, err := conn.Write([]byte(msg))
		if err != nil {
			log.Printf("Error Writing: %q\n", err); break }
		fmt.Printf("Sent: %q\n", msg[:n])

		//Receive the reply
		n, err = conn.Read(buf)
		if err != nil {
			log.Printf("Error Reading: %q\n", err); break }
		fmt.Printf("Received: %q\n", string(buf[:n]))
	}
}

encryptedtcp/encryptedtcp.go:
//A simple TCP client/server that can be encrypted (via tls) or not, depending on a flag passed from the client

package encryptedtcp

import ( "crypto/rand"; "crypto/tls"; "net"; "log" )

//Goes into a loop to accept clients. Returns a string on error
func StartServer(certFile, keyFile, listenOn string, clientHandler func(ReadWriteClose)) (error) {
	//Configure the certificate information
	cert, err := tls.LoadX509KeyPair(certFile, keyFile)
	if err != nil {
		return MyError{"Cannot Load Keys", err} }
	conf := tls.Config{Certificates:[]tls.Certificate{cert}, ClientAuth:tls.RequireAnyClientCert, Rand:rand.Reader}

	//Create the listener
	listener, err := net.Listen("tcp", listenOn)
	if err != nil {
		return MyError{"Cannot Listen", err} }
	defer listener.Close()

	//Listen and dispatch clients
	for {
		conn, err := listener.Accept()
		if err != nil {
			return MyError{"Cannot Accept Client", err} }
		go startHandleClient(conn, &conf, clientHandler)
	}

	//No error to return - This state is unreachable in the current library
	return nil
}

//Return the io stream for the connected client
func startHandleClient(conn net.Conn, conf* tls.Config, clientHandler func(ReadWriteClose)) {
	defer conn.Close()

	//Confirm encrypted connection flag (ENCR = yes, PTXT = no)
	isEncrypted := make([]byte, 4)
	amountRead, err := conn.Read(isEncrypted)
	if err != nil {
		log.Printf("Cannot get Encrypted Flag: %q\n", err); return }
	if amountRead != 4 {
		log.Printf("Cannot get Encrypted Flag: %q\n", "Invalid flag length"); return }
	if string(isEncrypted) == "PTXT" { //If plain text, just pass the net.Conn object to the client handler
		clientHandler(conn); return
	} else if string(isEncrypted) != "ENCR" { //If not a valid flag value
		log.Printf("Invalid flag value: %q\n", isEncrypted); return }

	//Initialize the tls session
	tlsconn := tls.Server(conn, conf)
	defer tlsconn.Close()
	if err := tlsconn.Handshake(); err != nil {
		log.Printf("TLS handshake failed: %q\n", err); return }

	//Pass the tls.Conn object to the client handler
	clientHandler(tlsconn)
}

//Start a client connection
func StartClient(certFile, keyFile, connectTo string, isEncrypted bool) (ReadWriteClose, error) {
	//Configure the certificate information
	cert, err := tls.LoadX509KeyPair(certFile, keyFile)
	if err != nil {
		return nil, MyError{"Cannot Load Keys", err} }
	conf := tls.Config{Certificates:[]tls.Certificate{cert}, InsecureSkipVerify:true}

	//Connect to the server
	tcpconn, err := net.Dial("tcp", connectTo)
	if err != nil {
		return nil, MyError{"Cannot Connect", err} }

	//Handle unencrypted connections
	if !isEncrypted {
		tcpconn.Write([]byte("PTXT"))
		return tcpconn, nil //Return the base tcp connection
	}

	//Initialize encrypted connections
	tcpconn.Write([]byte("ENCR"))
	conn := tls.Client(tcpconn, &conf)
	conn.Handshake()

	//Confirm handshake was successful
	state := conn.ConnectionState()
	if !state.HandshakeComplete || !state.NegotiatedProtocolIsMutual {
		conn.Close()
		if !state.HandshakeComplete {
			return nil, MyError{"Handshake did not complete successfully", nil}
		} else {
			return nil, MyError{"Negotiated Protocol Is Not Mutual", nil} }
	}

	//Return the tls connection
	return conn, nil
}

//Error handling
type MyError struct {
	Context string
	TheError error
}
func (e MyError) Error() string {
	return e.Context+": "+e.TheError.Error(); }

//Interface for socket objects (read, write, close)
type ReadWriteClose interface {
	Read(b []byte) (n int, err error)
	Write(b []byte) (n int, err error)
	Close() error
}
Automatically resuming rsync
The old network file copy problem

Rsync is a spectacular bash utility for doing file copying and/or syncing operations. It has a multitude of switches to help optimize and handle any requirements for file copy operation over a local computer or network. However, sometimes networks are less than stable and stalls can happen during an rsync (or scp). This is quite the nuisance when doing very large (i.e. GB) transfers. To solve this, the following script can be used to auto resume a stalled rsync.


export Result=1;
while [ $Result -ne 0 ]; do
  echo "STARTING ($Result) @" `date`;
  rsync -Pza --timeout=10 COPY_FROM COPY_TO_USER@COPY_TO_HOST:COPY_TO_LOCATION;
  Result=$?;
  sleep 1;
done

  • The -P switch is highly suggested as it activates:
    • --partial: This keeps a file even if it doesn’t finish transferring so it can be resumed when rsync is restarted. This is especially important if you have very large files.
    • --progress: This shows you the progress of the current file copy operation.
  • The -z switch turns on gzip compression during the file transfer, so may only be useful depending on the circumstances.
  • The -a switch stands for “archive” and is generally a good idea to use. It includes the switches:
    • -r: Recurse into folders
    • -t: Preserves file modification time stamp. This is highly recommended for this, and incremental backups, as rsync, by default, skips files whose file sizes and modification times match.
    • -l and -D: Preserve file type (i.e. symlinks)
    • -p: Preserve file (chmod) permissions
    • -g and -o: Preserve file owners.
  • The --timeout is the crux of the script, in that if an I/O timeout of 10 seconds occurs, the rsync exits prematurely so it can be restarted.

For more useful switches and information, see the rsync man page.



Script with comments:
export Result=1; #This will hold the result of the rsync. Set to 1 so the first loop check will fail.
while [ $Result -ne 0 ]; do #Loop until rsync result is successful
  echo "STARTING ($Result) @" `date`; #Inform the user of the time an rsync is starting and the last rsync failure code
  rsync -Pza --timeout=10 COPY_FROM COPY_TO_USER@COPY_TO_HOST:COPY_TO_LOCATION; #See rest of post for switch information
  Result=$?; #Store the result of the rsync
  sleep 1; #This is an optional 1 second timeout between attempts
done
Setting the time zone through a numeric offset
They never make it easy

I had the need today to be able to set the current time zone for an application in multiple computer languages by the hourly offset from GMT/UTC, which turned out to be a lot harder than I expected. It seems most time zone related functions, at least in Linux, expect you to use full location strings to set the current time zone offset (i.e. America/Chicago).


After a lot of research and experimenting, I came up with the following results. All of these are confirmed working in Linux, and most or all of them should work in Windows too.

Language Format Note Format for GMT+5 Format for GMT-5
C Negate GMT-5 GMT5
Perl Negate GMT-5 GMT5
SQL Requires Sign +5:00 -5:00
PHP Negate, Requires Sign Etc/GMT-5 Etc/GMT+5

And here are examples of using this in each language. The “TimeZone” string variable should be a 1-2 digit integer with an optional preceding negative sign:
Language Example
C
#include <stdio.h> //snprintf
#include <stdlib.h> //setenv, atoi
#include <time.h> //tzset

...

char Buffer[10];
snprintf(Buffer, 10, "GMT%i", -atoi(TimeZone));
setenv("TZ", Buffer, 1);
tzset();
		
Perl
use POSIX qw/tzset/;
$ENV{TZ}='GMT'.(-$TimeZone);
tzset;
		
SQL [Query string created via Perl]
$Query='SET time_zone="'.($TimeZone>=0 ? '+' : '').$TimeZone.':00"';
		
PHP
date_default_timezone_set('Etc/GMT'.($TimeZone<=0 ? '+' : '').(-$TimeZone));
		
SymLink Fix for Combining Android Project Versions
Yay again at NTFS symlinking :-)

Since I found out that NTFS now has semi-working native symlinks, I have updated the symlinking script used in the Combining an Android Project's Versions post. This script creates relative symlinks now through Perl instead of absolute hard links through Bash. It is as follows:

#!/usr/bin/perl
#Run this file to install links to shared files into all branches
use warnings;
use strict;

#Configuration
my $SharedDirectoryName="Shared";
my $NonProjectDirectories="^\\.(|/\\.git|/$SharedDirectoryName)\$"; #Non Project directories (., .git, $SharedDirectoryName)
my $IsWindows=(index(lc(`uname`), 'cygwin')!=-1);

#Create a symlink
sub MakeLink
{
	my ($LinkTarget, $LinkName, $IsWindows, $IsDirectory)=@_;

	#Create the target directory if it does not exist
	my $LinkDirectory=$LinkName;
	$LinkDirectory =~ s/\/[^\/]+$//;
	if(!-e $LinkDirectory) {
		print "Creating directory: $LinkDirectory\n";
		`mkdir -p "$LinkDirectory"`;
	}
	
	#If the link already exists, issue a warning
	if(-l $LinkName) {
		print "Link already exists: $LinkName\n";
		return;
	}

	#Create the relative symlink
	my $RelativePathFromLinkToTarget=('../' x ($LinkName =~ tr/\///)).$LinkTarget; #Determine the relative path between the link and its target
	my $Command;
	if(!$IsWindows) { #Create the Linux command
		$Command="ln -s \"$RelativePathFromLinkToTarget\" \"$LinkName\"";
	}
	else #Create the Windows command
	{
		#Replace /s in path with \s
		$RelativePathFromLinkToTarget =~ s/\//\\/g;
		$LinkName =~ s/\//\\/g;
		
		$Command='cmd /c mklink'.($IsDirectory ? ' /d' : '')." \"$LinkName\" \"$RelativePathFromLinkToTarget\"";
	}

	print "$Command\n";
	`$Command`;
}

#Find required information from file searches
my @LocalBranches=grep(!/$NonProjectDirectories/, `find -maxdepth 1 -type d`); #Find version folders by ignoring Non Project directories
my @Files=split(/\n?^$SharedDirectoryName\//m, substr(`find $SharedDirectoryName -type f`, 0, -1)); shift @Files; #Find shared files

#Propagate shared files into different versions
foreach my $LocalBranch (@LocalBranches) {
	$LocalBranch=substr($LocalBranch, 2, -1); #Remove ./ and new line separator
	foreach my $File (@Files) {
		MakeLink("$SharedDirectoryName/$File", "$LocalBranch/$File", $IsWindows, 0);
	}
}
Automatic disconnect protection for SSH terminals
A simple solution for a simple problem

I got tired a long time ago of losing what I was currently working on in SSH sessions when they were lost due to disconnects from network connectivity issues. To combat this I have been using screen when running sessions that I can absolutely not lose, but the problem still persists in other sessions or when I forgot to run it. The easy solution to this was to add screen to one of my bash init scripts (~/.bashrc [or ~/.bash_profile]) as follows:

alias autoscreen="screen -x -RR && exit"
if [[ "$TERM" == cygwin* || "$TERM" == xterm* ]]; then autoscreen; fi
This automatically makes the screen command run on bash user initialization, always connecting to the same session.

Edit on 2012-12-17 @ 7:00pm:
The last iteration was:
if [ $TERM == "xterm" ]; then screen -R pts-0.`hostname` && exit; fi
  • The main screen command is now an “alias” to help out with some bash parsing problems.
  • The resume parameters are now “-x -RR” which first attempts to multiplex a session, and if that fails, it creates a session. With multiplexing turned on, everyone uses the same screen session and can interact with each other, and you don’t have to worry about accidentally connecting to the wrong screen session or having new ones turned on. The only problem is sometimes you may accidentally step on other user’s toes :-)
  • The special screen session name was removed so it always starts with the default name (easier to manually interact with)
  • I added detection of multiple term names (cygwin and xterm), and added a wildcard at the end of each since there is often suffixes to these names. More term names can easily be added using this syntax.

Edit on 2010-12-30 @ 3:50am: I changed != "screen" to == "xterm" because otherwise scp and some other non-term programs were failing. You may have to use something other than “xterm” for your systems.


Edit on 2010-1-24 @ 2:00pm: I added the exit; so the terminal automatically exits when the screen session closes.

OpenVPN Authentication and Gateway Configuration
Securing oneself is a never ending battle

For a number of years now when on insecure network connections I have been routing my computer to the Internet through secure tunnels and VPNs, but I’ve been interested in trying out different types of VPN software lately so I can more easily help secure friends who ask of it. This would mainly include ease of installation and enabling, which partly requires no extra software for them to install.

Unfortunately, Windows 7 and Android (and probably most other software) only support PPTP and L2TP/IPSEC out of the box. While these protocols are good for what they do, everything I have read says OpenVPN is superior to them as a protocol. I was very frustrated to find out how little support OpenVPN actually has today as a standard in the industry, which is to say, you have to use third party clients and it is rarely, if ever, included by default in OSes. The OpenVPN client and server aren’t exactly the easiest to set up either for novices.


So on to the real point of this post. The sample client and server configurations for OpenVPN were set up just how I needed them except they did not include two important options for me: User authentication and full client Internet forwarding/tunneling/gateway routing. Here is how to enable both.


Routing all client traffic (including web-traffic) through the VPN:
  • Add the following options to the server configuration file:
    • push "redirect-gateway def1" #Tells the client to use the server as its default gateway
    • push "dhcp-option DNS 10.8.0.1" #Tells the client to use the server as its DNS Server (DNS Server's IP address dependent on configuration)
  • Run the following commands in bash:
    • iptables -t nat -A POSTROUTING -s 10.8.0.0/24 -o eth0 -j MASQUERADE #This command assumes that the VPN subnet is 10.8.0.0/24 (taken from the server directive in the OpenVPN server configuration) and that the local ethernet interface is eth0.
    • echo '1' > /proc/sys/net/ipv4/ip_forward #Enable IP Forwarding (This one is not mentioned in the OpenVPN howto)

Adding user authentication (Using alternative authentication methods)

To set up username/password authentication on the server, an authorization script is needed that receives the username/password and returns whether the login information was successful (0) or failed (1). The steps to set up this process are as follows:

  • Add the following options to the server configuration file:
    • auth-user-pass-verify verify.php via-env #The third argument (method) specifies whether to send the username and password through either a temporary file (via-file) or environment variables (via-env)
    • script-security 3 system #Allows OpenVPN to run user scripts and executables and send password authentication information through environment variables. While "system" is deprecated, I had to use it or external commands like ifconfig and route were failing with "failed: could not execute external program"
  • Add the following options to the client configuration file:
    • auth-user-pass #Request user credentials to log in
  • The final step is to create the verify.php (see auth-user-pass-verify configuration above) script which returns whether it was successful, and also outputs its success to stdout, which is added to the OpenVPN log file.
    #!/usr/bin/php -q
    <?
    //Configuration
    $ValidUserFile='users.txt'; //This file must be in htpasswd SHA1 format (htpasswd -s)
    $Method='via-env'; //via-file or via-env (see auth-user-pass-verify configuration above for more information)
    
    //Get the login info
    if($Method=='via-file') //via-file method
    {
    	$LoginInfoFile=trim(file_get_contents('php://stdin')); //Get the file that contains the passed login info from stdin
    	$LoginInfo=file_get_contents($LoginInfoFile); //Get the passed login info
    	file_put_contents($LoginInfoFile, str_repeat('x', strlen($LoginInfo))); //Shred the login info file
    	$LoginInfo=explode("\n", $LoginInfo); //Split into [Username, Password]
    	$UserName=$LoginInfo[0];
    	$Password=$LoginInfo[1];
    }
    else //via-env method
    {
    	$UserName=$_ENV['username'];
    	$Password=$_ENV['password'];
    }
    
    //Test the login info against the valid user file
    $UserLine="$UserName:{SHA}".base64_encode(sha1($Password, TRUE)); //Compile what the user line should look like
    foreach(file($ValidUserFile, FILE_IGNORE_NEW_LINES) as $Line) //Attempt to match against each line in the file
    	if($UserLine==$Line) //If credentials match, return success
    	{
    		print "Logged in: $UserName\n";
    		exit(0);
    	}
    
    //Return failure
    print "NOT Logged in: $UserName\n";
    exit(1);
    ?>
    		
Ping URL
Cause Python is quick to program in and can make executables

The following is a Python script that automatically pings a requested web address at a given interval. It was made as a quick favor for a friend. Here is the downloadable source code and Windows binary.

The Configuration Options File (PingURL.cfg) contains 2 lines:
  1. The URL to ping (The default pings the GetIP script)
  2. The millisecond interval between pings (Default=600000=10 minutes)

#!python
from sys import stderr
from urllib import urlopen
from time import localtime, strftime, sleep

#Main function
def Main():
    #Open the settings file
    SettingFileName='PingURL.cfg';
    Settings=[] #Blank out settings file variable in case it can't be opened
    try:
        Settings=open(SettingFileName, 'r').readlines()
    except IOError as (errno, strerror):
        stderr.write('Cannot open {0} configuration file: I/O error({1}): {2}\n'.format(SettingFileName, errno, strerror))
        return

    #Confirm valid settings were passed
    if(len(Settings)<2):
        stderr.write('Not enough settings found in settings file\n')
        return
    try:
        IntervalTime=int(Settings[1])
    except:
        stderr.write('Invalid interval time\n')
        return
        
    #Ping the URL indefinitely
    while(True):
        try:
            URLText=urlopen(Settings[0]).read()
        except:
            URLText='READ FAILED'
        print 'URL Pinged At {0}: {1}'.format(strftime('%Y-%m-%d %H:%M:%S', localtime()), URLText)
        sleep(IntervalTime/1000)

#Run the program
Main()
Auto initializing SSH Remote Tunnel
SSH is so incredibly useful

I often find SSH tunnels absolutely indispensable in my line of work for multiple reasons including secure proxies (tunneling) over insecure connections and connecting computers and programs together over difficult network setups involving NATs and firewalls.

One such example of this I ran into recently is that I have a server machine (hereby called “the client”) that I wanted to make sure I have access to no matter where it is. For this I created an auto initializing SSH remote port tunnel to a server with a static IP Address (hereby called “the proxy server”) which attempts to keep itself open when there is problems.

The first step of this was to create the following bash script on the client that utilizes the OpenSSH’s client to connect to an OpenSSH server on the proxy server for tunneling:

#!/bin/bash
for ((;;)) #Infinite loop to keep the tunnel open
do
	ssh USERNAME@PROXYSERVER -o ExitOnForwardFailure=yes -o TCPKeepAlive=yes -o ServerAliveCountMax=2 -o ServerAliveInterval=10 -N -R PROXYPORT:localhost:22 &>> TUNNEL_LOG #Create the SSH tunnel
	echo "Restarting: " `date` >> TUNNEL_LOG #Write to the log file "TUNNEL_LOG" whenever a restart is attempted
	sleep 1 #Wait 1 second in between connection attempts
done
The parts of the command that create the SSH tunnel are as follows:
Part of Command Description
ssh The OpenSSH client application
USERNAME@PROXYSERVER The proxy server and username on said server to connect to
-o ExitOnForwardFailure=yes Automatically terminate the SSH session if the remote port forward fails
-o TCPKeepAlive=yes
-o ServerAliveCountMax=2
-o ServerAliveInterval=10
Make sure the SSH connection is working, and if not, reinitialize it. The connection fails if server keepalive packets that are sent every 10 seconds are not received twice in a row, or if TCP protocol keepalive fails
-N “Do not execute a remote command. This is useful for just forwarding ports” (This means no interactive shell is run)
-R PROXYPORT:localhost:22 Establish a port of PROXYPORT on the proxy server that sends all data to port 22 (ssh) on the client (localhost)
&>> TUNNEL_LOG Send all output from both stdin and stderr to the log file “TUNNEL_LOG”

For this to work, you will need to set up public key authentication between the client and utilized user on the proxy server. To do this, you have to run “ssh-keygen” on the client. When it has finished, you must copy the contents of your public key file (most likely at “~/.ssh/id_rsa.pub”) to the “~/.ssh/authorized_keys” file on the user account you are logging into on the proxy server. You will also have to log into this account at least once from the client so the proxy server’s information is in the client’s “known_hosts” file.

For security reasons you may want the user on the proxy server to have a nologin shell set since it is only being used for tunneling, and the above mentioned public key will allow access without a password.

For network access reasons, it might also be worth it to have the proxy port and the ssh port on the proxy server set to commonly accessible ports on all network setups (that a firewall wouldn’t block). Or you may NOT want to have it on common ports for other security reasons :-).

If you want the proxy port on the proxy server accessible from other computers (not only the proxy server), you will have to turn on “GatewayPorts” (set to “yes”) in the proxy server’s sshd config, most likely located at “/etc/ssh/sshd_config”.


The next step is to create a daemon that calls this script. The normal method for this in Linux would be to use inittab. This can be a bit cumbersome though with Linux distributions that use upstart, like Ubuntu, so I just cheated in it and created the following script to initialize a daemon through rc.d:

#!/bin/bash
echo -e '#!/bin/bash\nfor ((;;))\ndo\n  ssh USERNAME@PROXYSERVER -o TCPKeepAlive=yes -o ExitOnForwardFailure=yes -o ServerAliveCountMax=2 -o ServerAliveInterval=10 -N -R PROXYPORT:localhost:22 &>> TUNNEL_LOG\n  echo "Restarting: " `date` >> TUNNEL_LOG\n  sleep 1\ndone' > TUNNEL_SCRIPT_PATH #This creates the above script
echo -e '#!/bin/bash\ncd TUNNEL_SCRIPT_DIRECTORY\n./TUNNEL_SCRIPT_EXECUTABLE &' > /etc/init.d/TUNNEL_SCRIPT_SERVICE_NAME #This creates the init.d daemon script. I have the script set the working path before running the executable so the log file stays in the same directory
chmod u+x TUNNEL_SCRIPT_PATH /etc/init.d/TUNNEL_SCRIPT_SERVICE_NAME #Set all scripts as executable
update-rc.d TUNNEL_SCRIPT_SERVICE_NAME defaults #Set the run levels at which the script runs
Symlinks in a Windows programming environment
Windows will get it right one day

I have been having some problems regarding symlinks (symbolic links) for a project that I’ve been working on recently which is requiring work in at least 5 very different operating systems (and about a dozen programming languages). Not many programs support symlinks properly that I have the need to because support for it wasn’t added for NTFS until Windows Vista, and it still has some problems.

It is really great that Windows Vista and Windows 7 now support native symlinks so they can be utilized by programs out of the box. For example, one such instance where I need this a lot is directory relinking in Apache. While Apache’s mod_alias can duplicate the functionality of symlinks for many needs, creating special cases for this one piece of software when distributing a code repository is just not practical, and having proper symlinks natively followed without the program knowing they aren’t the actual file/directory is really the best solution so everything works without special cases.

The way to create NTFS symlinks in Windows Vista+ is through the “mklink” command, which is unfortunately implemented directly in the Window’s command shell, and not a separate executable, so it is not accessible to Cygwin. Further, Cygwin has made a stance to only support reading NTFS symlinks, and not creating them, because they can only be created by administrators, and require specification as to whether the link’s target is a directory or file. Cygwin itself in Windows has had support for symlinks for a long time, but these are not compatible with any program run outside of the Cygwin environment.

Now, my real problem started occurring when trying to use these NTFS symlinks with GIT. While GIT natively supports symlinks, TortoiseGIT doesn’t really support them at all, and throws errors when they are encountered. This is still a big problem that I am going to have to think about :-\. Fortunately, when working with GIT in Cygwin they still work, with caveats. As previously mentioned, only reading the NTFS symlinks in Cygwin work, so when you fetch/pull from a repository and it creates Cygwin style symlinks, Windows still does not read them properly. The following is a script I wrote to change the Cygwin style symlinks into NTFS style symlinks. It can be run from the root folder of the GIT project.

#!/bin/bash
IFS=$'\n' #Spaces do not count as new delimiters

function makewinlink
{
	LINK=$1
	OPTIONS=$2
	TARGET=`find $LINK -maxdepth 0 -printf %l`
	LASTMODTIME=`find $LINK -maxdepth 0 -printf "%t"`
	LINKDIR=`find $LINK -maxdepth 0 -printf %h`
	TARGET=`echo $LINKDIR/$TARGET`
	rm -f $LINK
	cmd /c mklink $OPTIONS "$(cygpath -wa $LINK)" "$(cygpath -wa $TARGET)"
	touch -h -d $LASTMODTIME $LINK
}

#Relink all directories
FILES=`find -type l -print0 | xargs -0 -i find -L {} -type d -maxdepth 0`
for f in $FILES
do
	makewinlink $f /D
done

#Relink all files
FILES=`find -type l -print0 | xargs -0 -i find -L {} -type f -maxdepth 0`
for f in $FILES
do
	makewinlink $f
done

Make sure when committing symlinks in a GIT repository in Windows to use Cygwin with Cygwin style symlinks instead of TortoiseGIT. Also, as previously mentioned, after running this script, TortoiseGIT will show these symlinks as modified :-\. If this is a problem, you can always reverse the process in Cygwin by changing the “cmd /c mklink $OPTIONS” line to a “ln -s” in the above script (note that “target” and “symlink’s name” need to be switched) along with a few other changes.


[EDIT ON 2011-01-03 @ 6:30am] See here for a better example of symlinking in Windows that uses relative paths. [/EDIT]
Something I feel JavaScript really got right
Language design is a PITA though... so bleh

One thing I always really miss when working in other dynamic languages that aren’t JavaScript is the ability to access known (non dynamic) members of an associative array/object/hash (called a hash from here on out) through just a single dot. This matches C’s syntax of accessing struct members, as opposed to being forced into using array syntax which is harder to read IMO in languages like PHP and Perl. For example...


Creating a hash in:
JavaScript:var Hash={foo:1, bar:2};
Perl:my %Hash=(foo=>1, bar=>2);
PHP:$Hash=Array('foo'=>1, 'bar'=>2);

Accessing a Hash’s member:
JavaScript:Hash.foo or Hash['foo']
Perl:$Hash{foo};
PHP:$Hash['foo']

The reason this is preferable to me is it can make code like the following
Zones[Info['Zone']]['DateInfo'][Info['Date']]['UniqueEnters']+=Info['Count'];
much more readable by turning it into the following
Zones[Info.Zone].DateInfo[Info.Date].UniqueEnters+=Info.Count;
CSharp error failure
Why does Microsoft always have to make everything so hard?

I was running into a rather nasty .NET crash today in C# for a rather large project that I have been continuing development for on a handheld device that runs Windows CE6. When I was calling a callback function pointer (called a Delegate in .NET land) from a module, I was getting a TypeLoadException error with no further information. I started out making the incorrect assumption that I was doing something wrong with Delegates, as C# is not exactly my primary language ;-). The symptoms were pointing to the delegate call being the problem because the program was crashing during the delegate call itself, as the code reached the call, and did not make it into the callback function. After doing the normal debugging-thing, I found out the program crashed in the same manner every time the specific callback function was called and before it started executing, even if it was called in a normal fashion from the same class.

After further poking around, I realized that there was one line of code in the function that if included in any function, would cause the program to fail out on calling said function. Basically, resources were somehow missing from the compilation and there were no warnings anywhere telling me this. If I tried to access said resource normally, I was getting an easily traceable MissingManifestResourceException error. However, the weird situation was happening because I had the missing resource being accessed from a static member in another class. So here is some example code that was causing the problem:

public class ClassA
{
	public void PlaySuccess()
	{
		//Execution DOES NOT reach here
		Sound.Play(Sound.Success);
	}
}

public class Sound
{
	public static byte[] Success=MyResource.Success; //This resource is somehow missing from the executable
	public static byte[] Failure=MyResource.Failure;
	public static void Play(byte[] TheSound) { sndPlaySound(TheSound, SND_ASYNC|SND_MEMORY); }
}

ClassA Foo=new ClassA();
//Execution reaches here
Foo.PlaySuccess();

Oh well, at least it wasn’t an array overrun, those are fun to track down :-).

Visual Studio IDE Tab Order
Microsoft fails at usability

I’ve been really annoyed for a while by the unintuitive IDE tab ordering in Visual Studio 2005+. When you type [shift+]alt+tab, you don’t get the next/previous tab in the list as would be the OBVIOUS way to do it (which probably all other IDEs do this right). No, it switches between tabs in an arbitrary hidden order related to the last acces order of the tabs.

Searching the internet for a solution to this was pretty fruitless, so I tried to tackle the problem myself. I dug through all the possible structures I could find in the Visual Studio IDE macro explorer, and was unfortunately unable to find where the tab order was kept in a window pane (if it is even accessible to the user). I thought I had the solution at one point, but realized it also just switches tabs in the order they were originally opened :-(. This is the VB macro code I came up with to do at least that, which uses “DTE.ActiveWindow.Document.Collection” for the tab-open order.

	 Public Sub TabDirection(ByVal Direction As Integer)
		  'Find the index of the current tab
		  Dim i As Integer
		  Dim Index As Integer
		  Dim Count As Integer
		  Count = DTE.ActiveWindow.Document.Collection.Count
		  For i = 1 To Count
				If DTE.ActiveWindow.Document.Collection.Item(i).ActiveWindow.Equals(DTE.ActiveWindow) Then Index = i
		  Next i

		  'Determine the new index
		  Index = Index + Direction
		  If Index > Count Then Index = 1
		  If Index = 0 Then Index = Count

		  DTE.ActiveWindow.Document.Collection.Item(Index).Activate() 'Activate the next tab in the proper direction
	 End Sub

	 Public Sub TabForward()
		  TabDirection(1)
	 End Sub

	 Public Sub TabBackward()
		  TabDirection(-1)
	 End Sub
Second Life Research
More old research I never got around to releasing

Back in May of 2007 one of my friends got me onto Second Life, the first and only MMORPG I’ve touch since my Ragnarok days. While Second Life had a strong pull for me due to its similarities to The MetaVerse in Snow Crash, my favorite book, I was of course more drawn to playing with the Engine and seeing what I could do with it.

I felt no real need to delve into the code or packet level of the client as it was open source, so I stayed mostly on the scripting level side of things in the world. IIRC I did find at least a dozen major security holes, but I unfortunately cannot seem to find logs of my research :-(.

I do however remember at least 2 of the security holes I found:
  • While an avatar could not pass through solid walls normally, if an object was visible that allowed “sitting” beyond the walls, the user could issue the sit command on that object which transported the avatar past the barriers.
  • While there were optional restrictions on areas pertaining to if/where an object could be placed, once an object was placed somewhere, it could be “pushed” to almost any other location no matter the restrictions. When an object was pushed into another area beyond where it was placed, it was still inventoried as being in the originally placed location, but could interact with the world at the location it was actually at. Objects could even pass through solid barriers if the proper push velocities were given. The only way at the time to combat this was to have whole private islands as blocking anonymous objects. This security hole opened up multiple other security holes including:
    • If a user “sat” on the object, they could get to anywhere the object could.
    • These objects could be used to interact with the immediate world around them, including repeating private conversations in a private area.

I had also at the time planned on writing an application that allowed hijacking and reuploading any encountered texture or construct, which was trivial due to the open nature of the system. I never did get around to it for two reasons. First, I got distracted by other projects, and second, because it could have seriously destabilized the Second Life economy, which was built around selling said textures and constructs. I actually liked what Second Life was trying to accomplish and had no wish of making Linden Lab’s life harder or ruining the experiment in open economy.


I was however able to find a few pieces of my research and scripts that I figured I could post here. First, I do not recall what I did to find this, but the entire list of pre-defined “Last Names” was accessible, and IIRC the proprietary last names could be used for character creation if you knew how to access them (not 100% sure if this latter hack was available). Here was the list as of when I acquired it in 2007. I had the list separated into two columns, and I think they were “open” names and “proprietary” names. Each name is followed by its identifier.

Open Names
Congrejo(339), Spitteler(957), Boucher(1716), Kohime(2315), Korobase(2363), Bingyi(3983), Hyun(3994), Qunhua(4003), Yiyuan(4010), Nikolaidis(4032), Bikcin(4040), Laryukov(4112), Bamaisin(4127), Choche(4136), Ultsch(4140), Coage(4164), Cioc(4173), Barthelmess(4212), Koenkamp(4322), Daviau(4340), Menges(4345), Beaumont(4390), Lubitsch(4392), Taurog(4408), Negulesco(4418), Beresford(4466), Babenco(4468), Catteneo(4483), Dagostino(4509), Ihnen(4511), Basevi(4517), Gausman(4530), Heron(4533), Fegte(4535), Huldschinsky(4539), Juran(4543), Furse(4548), Heckroth(4550), Perfferle(4552), Reifsnider(4553), Hotaling(4559), DeCuir(4560), Carfagno(4561), Mielziner(4573), Bechir(4592), Zehetbauer(4615), Roelofs(4624), Hienrichs(4647), Rau(4654), Oppewall(4657), Bonetto(4659), Forwzy(4677), Repine(4680), Fimicoloud(4685), Bleac(4687), Anatine(4688), Gynoid(4745), Recreant(4748), Hapmouche(4749), Ceawlin(4758), Balut(4760), Peccable(4768), Barzane(4778), Eilde(4783), Whitfield(4806), Carter(4807), Vuckovic(4808), Rehula(4809), Docherty(4810), Riederer(4811), McMahon(4812), Messmer(4813), Allen(4814), Harrop(4815), Lilliehook(4816), Asbrink(4817), Laval(4818), Dyrssen(4819), Runo(4820), Uggla(4822), Mayo(4823), Handrick(4824), Grut(4825), Szondi(4826), Mannonen(4827), Korhonen(4828), Beck(4829), Nagy(4830), Nemeth(4831), Torok(4832), Mokeev(4833), Lednev(4834), Balczo(4835), Starostin(4836), Masala(4837), Rasmuson(4838), Martinek(4839), Mizser(4840), Zenovka(4841), Dovgal(4842), Capalini(4843), Kuhn(4845), Platthy(4846), Uriza(4847), Cortes(4848), Nishi(4849), Rang(4850), Schridde(4851), Dinzeo(4852), Winkler(4853), Broome(4854), Coakes(4855), Fargis(4856), Beerbaum(4857), Pessoa(4858), Mathy(4859), Robbiani(4860), Raymaker(4861), Voom(4862), Kappler(4863), Katscher(4864), Villota(4865), Etchegaray(4866), Waydelich(4867), Johin(4868), Blachere(4869), Despres(4871), Sautereau(4872), Miles(4873), Lytton(4874), Biedermann(4875), Noel(4876), Pennell(4877), Cazalet(4878), Sands(4879), Tatham(4880), Aabye(4881), Soderstrom(4882), Straaf(4883), Collas(4884), Roffo(4885), Sicling(4886), Flanagan(4887), Seiling(4888), Upshaw(4889), Rodenberger(4890), Habercom(4891), Kungler(4892), Theas(4893), Fride(4894), Hirons(4895), Shepherd(4896), Humphreys(4897), Mills(4898), Ireton(4899), Meriman(4900), Philbin(4901), Kidd(4902), Swindlehurst(4903), Lowey(4904), Foden(4905), Greggan(4906), Tammas(4907), Slade(4908), Munro(4909), Ebbage(4910), Homewood(4911), Chaffe(4912), Woodget(4913), Edman(4914), Fredriksson(4915), Larsson(4916), Gustafson(4917), Hynes(4918), Canning(4919), Loon(4920), Bekkers(4921), Ducatillon(4923), Maertens(4924), Piek(4925), Pintens(4926), Jansma(4927), Sewell(4928), Wuyts(4929), Hoorenbeek(4930), Broek(4931), Jacobus(4932), Streeter(4933), Babii(4934), Yifu(4935), Carlberg(4936), Palen(4937), Lane(4938), Bracken(4939), Bailey(4940), Morigi(4941), Hax(4942), Oyen(4943), Takacs(4944), Saenz(4945), Lundquist(4946), Tripsa(4947), Zabelin(4948), McMillan(4950), Rosca(4951), Zapedzki(4952), Falta(4953), Wiefel(4954), Ferraris(4955), Klaar(4956), Kamachi(4957), Schumann(4958), Milev(4959), Paine(4960), Staheli(4961), Decosta(4962), Schnyder(4963), Umarov(4964), Pinion(4965), Yoshikawa(4966), Mertel(4967), Iuga(4968), Vollmar(4969), Dollinger(4970), Hifeng(4971), Oh(4972), Tenk(4973), Snook(4974), Hultcrantz(4975), Barbosa(4976), Heberle(4977), Dagger(4978), Amat(4979), Jie(4980), Qinan(4981), Yalin(4982), Humby(4983), Carnell(4984), Burt(4985), Hird(4986), Lisle(4987), Huet(4988), Ronmark(4989), Sirbu(4990), Tomsen(4991), Karas(4992), Enoch(4993), Boa(4994), Stenvaag(4995), Bury(4996), Auer(4997), Etzel(4998), Klees(4999), Emmons(5000), Lusch(5001), Martynov(5002), Rotaru(5003), Ballinger(5004), Forcella(5005), Kohnke(5006), Kurka(5007), Writer(5008), Debevec(5009), Hirvi(5010), Planer(5011), Koba(5012), Helgerud(5013), Papp(5014), Melnik(5015), Hammerer(5016), Guyot(5017), Clary(5018), Ewing(5019), Beattie(5020), Merlin(5021), Halasy(5022), Rossini(5024), Halderman(5025), Watanabe(5026), Bade(5027), Vella(5028), Garrigus(5029), Faulds(5030), Pera(5031), Bing(5032), Singh(5033), Maktoum(5034), Petrov(5035), Panacek(5036), Dryke(5037), Shan(5038), Giha(5039), Graves(5040), Benelli(5041), Jun(5042), Ling(5043), Janus(5044), Gazov(5045), Pfeffer(5046), Lykin(5047), Forder(5048), Dench(5049), Hykova(5050), Gufler(5051), Binder(5052), Shilova(5053), Jewell(5054), Sperber(5055), Meili(5056), Matova(5057), Holmer(5058), Balogh(5059), Rhode(5060), Igaly(5061), Demina(5062)

Proprietary Names
ACS(1353), FairChang(1512), Teacher(2186), Learner(2213), Maestro(2214), Aprendiz(2215), Millionsofus(2746), Playahead(2833), RiversRunRed(2834), SunMicrosystems(2836), Carr(2917), Dell(3167), Reuters(3168), Hollywood(3173), Sheep(3471), YouTopia(3816), Hillburn(3817), Bradford(3820), CiscoSystems(3958), PhilipsDesign(3959), MadeVirtual(4205), DuranDuran(4210), eBay(4665), Vodafone(4666), Xerox(4667), TGDev(4668), Modesto(4669), Sensei(4670), Ideator(4671), Autodesk(4789), MovieTickets(4790), AvaStar(4791), DiorJoaillerie(4793), AOL(4795), Gabriel(4805), Tequila(5064), Loken(5065), Matlin(5066), GeekSquad(5067), Bradesco(5068), CredicardCiti(5069), PontiacGXP(5070), KAIZEN(5071), McCain(5072), Schomer(5074), Showtime(5075), OzIslander(5076), Meltingdots(5077), Allanson(5083), Sunbelter(5084), SaxoBank(5085), Esslinger(5086), Stengel(5087), Lemeur(5088), Tsujimoto(5089), KaizenGames(5090), Uphantis(5091), OurVirtualHolland(5092), McKinseyandCompany(5093), Lempert(5094), Affuso(5095), Gkguest(5096), Eye4You(5097), OShea(5098), Citibank(5099), Citicard(5100), Citigroup(5101), Citi(5102), Credicard(5103), Diners(5104), Citifinancial(5105), CitiBusiness(5106), BnT(5107), Yensid(5108), Helnwein(5111), Grindstaff(5112), Shirk(5113), SolidWorks(5114), Storm(5115), CarterFinancial(5116), Parkinson(5117), Lear(5118), FiatBrasil(5119), RossiResidencial(5120), Brooklintolive(5121), Calmund(5123), Briegel(5124), Herde(5125), Pfetzing(5126), Triebel(5127), Roemer(5128), Reacher(5129), Thomas(5130), Fraser(5131), Gabaldon(5132), NBA(5133), Accubee(5134), Brindle(5135), Searer(5136), Ukrop(5137), Ponticelli(5138), Belcastro(5139), Glin(5140), Rice(5141), DavidStern(5142), Totti(5144), onrez(5145), DeAnda(5146), Grandi(5147), Pianist(5148), osMoz(5149), PaulGee(5150)

The second piece I was able to find was a script I used to alert me via email whenever one of my friends signed on. I have unfortunately not tested this script before posting it as I no longer have Second Life installed or wish to waste the time testing it, but here it is none the less. ^_^;

//Users to watch
key DetectPersons=[ //List of UIDs of users to watch. (Real UIDs redacted)
    "fdf1fbff-f19f-ffff-ffff-ffffffffffff", //Person 1
    "f0fffaff-f61f-ffff-ffff-ffffffffffff" //Person 2
];

//Other Global Variables
integer NumUsers;
integer UsersParsed=0;
list UserNames;
list Status;

default
{
    state_entry()
    {
        NumUsers=llGetListLength(DetectPersons); //Number of users to watch

        //Get User Names
        integer i;
        for(i=0;i<NumUsers;i++)
        {
            llListInsertList(UserNames, [''], i);
            llListInsertList(Status, [0], i);
            llRequestAgentData(llList2Key(DetectPersons, i), DATA_NAME);
        }
    }

    dataserver(key requested, string data)
    {
        //Find User Position
        integer i;
        for(i=0;i<NumUsers;i++)
            if(llList2Key(DetectPersons, i)==requested)
                llListReplaceList(UserNames, [data], i, 1);

        if(++UsersParsed==NumUsers)
            state Running;
    }
}

state Running
{
    state_entry()
    {
        llOwnerSay((string)UserNames);
        llOwnerSay((string)Status);
        llSetTimerEvent(30);
    }

    timer()
    {
        llRequestAgentData(DetectPerson, DATA_ONLINE);
    }

    dataserver(key requested, string data)
    {
        if(data==IsOnline)
            return;
        IsOnline=data;
        if(data=="0")
            return;
        string Message="The user you are watching '"+UserName+"' signed on at "+llGetTimestamp();
        llEmail(EMAIL_ADDRESS, "User Signed on", Message);
        llOwnerSay(Message);
    }
}

Of course all this research was from 2007 and I have no idea what is capable now. I do really hope though that they at least updated the client’s interface because it was incredibly clunky. Also, Second Life has always been a neat experiment, and I hope it still is and continues to keep doing well :-).

UTF8 BOM
When a good idea is still considered too much by some

While UTF-8 has almost universally been accepted as the de-facto standard for Unicode character encoding in most non-Windows systems (mmmmmm Plan 9 ^_^), the BOM (Byte Order Marker) still has large adoption problems. While I have been allowing my text editors to add the UTF8 BOM to the beginning of all my text files for years, I have finally decided to rescind this practice for compatibility reasons.

While the UTF8 BOM is useful so that editors know for sure what the character encoding of a file is, and don’t have to guess, they are not really supported, for their reasons, in Unixland. Having to code solutions around this was becoming cumbersome. Programs like vi and pico/nano seem to ignore a file’s character encoding anyways and adopt the character encoding of the current terminal session.

The main culprit in which I was running into this problem a lot with is PHP. The funny thing about it too was that I had a solution for it working properly in Linux, but not Windows :-).

Web browsers do not expect to receive the BOM marker at the beginning of files, and if they encounter it, may have serious problems. For example, in a certain browser (*cough*IE*cough*) having a BOM on a file will cause the browser to not properly read the DOCTYPE, which can cause all sorts of nasty compatibility issues.

Something in my LAMP setup on my cPanel systems was removing the initial BOM at the beginning of outputted PHP contents, but through some preliminary research I could not find out why this was not occurring in Windows. However, both systems were receiving multiple BOMs at the beginning of the output due to PHP’s include/require functions not stripping the BOM from those included files. My solution to this was a simple overload of these include functions as follows (only required when called from any directly opened [non-included] PHP file):

<?
/*Safe include/require functions that make sure UTF8 BOM is not output
Use like: eval(safe_INCLUDETYPE($INCLUDE_FILE_NAME));
where INCLUDETYPE is one of the following: include, require, include_once, require_once
An eval statement is used to maintain current scope
*/

//The different include type functions
function safe_include($FileName)	{ return real_safe_include($FileName, 'include'); }
function safe_require($FileName)	{ return real_safe_include($FileName, 'require'); }
function safe_include_once($FileName)	{ return real_safe_include($FileName, 'include_once'); }
function safe_require_once($FileName)	{ return real_safe_include($FileName, 'require_once'); }

//Start the processing and return the eval statement
function real_safe_include($FileName, $IncludeType)
{
	ob_start();
	return "$IncludeType('".strtr($FileName, Array("\\"=>"\\\\", "'", "\\'"))."'); safe_output_handler();";
}

//Do the actual processing and return the include data
function safe_output_handler()
{
	$Output=ob_get_clean();
	while(substr($Output, 0, 3)=='?') //Remove all instances of UTF8 BOM at the beginning of the output
		$Output=substr($Output, 3);
	print $Output;
}
?>

I would have like to have used PHP’s output_handler ini setting to catch even the root file’s BOM and not require include function overloads, but, as php.net puts it “Only built-in functions can be used with this directive. For user defined functions, use ob_start().”.

As a bonus, the following bash command can be used to find all PHP files in the current directory tree with a UTF8 BOM:

grep -rlP "^\xef\xbb\xbf" . | grep -iP "\.php\$"

[Edit on 2015-11-27]
Better UTF8 BOM file find code (Cygwin compatible):
 find . -name '*.php' -print0 | xargs -0 -n1000 grep -l $'^\xef\xbb\xbf'
And to remove the BOMs (Cygwin compatible):
find . -name '*.php' -print0 | xargs -0 -n1000 grep -l $'^\xef\xbb\xbf' | xargs -i perl -i.bak -pe 'BEGIN{ @d=@ARGV } s/^\xef\xbb\xbf//; END{ unlink map "$_$^I", @d }' "{}"
Simpler remove BOMs (not Cygwin/Windows compatible):
find . -name '*.php' -print0 | xargs -0 -n1000 grep -l $'^\xef\xbb\xbf' | xargs -i perl -i -pe 's/^\xef\xbb\xbf//' "{}"
Combining an Android Project's Versions
Or: “Realtime Project Version Syncing”
As noted in a previous post:

Seeing as there are a number of applications on the market that have both a “Free” and “Full” version, you’d think this would be an easy thing to accomplish. Unfortunately, the marketplace uses an application’s package name as its unique identifier, so both versions have to have a different package name, which is again, a bit of a nuisance.

One method of remedying this is just having a recursive string replace through all the files [...]


I spent a bit of time coming up with a solution for this a few months ago for my latest project, [TEMPORARILY DOWN**]. This solution uses a shared package with a shared code base and resources that the different project versions pull from.


**The project that is a great example of how this process works should be uploaded very soon. At that time this message will disappear and appropriate links will be added. You’ll know this has happened when I upload my next project.


The steps for this setup are as follows: (The source for [TEMPORARILY DOWN**] can be used as an example)
  • First some definitions that will be used below*:
    • ProjectName: The base name of the project (e.g. “MyAndroidProject”)
    • VersionName: The name of separate versions (e.g. “Free” and “Full”)
    • SharedName: The name of the shared section of the code (e.g. “Shared”)
    • BasePackageName: The base name of the package group (e.g. “com.example.MyAndroidProject”)
    • BasePackageDirectory: The base path of the package group (e.g. “com/example/MyAndroidProject”)
    *Please note these definitions are used in code sections below.
  • Create the directory structure:
    • A base directory of ProjectName (e.g. “~/MyAndroidProject”)
    • A subdirectory under the base directory named SharedName for the shared files (e.g. “~/MyAndroidProject/Shared”). It will hold any shared files in appropriate Android standard directories (e.g. “res”, “src”).
    • Subdirectories under the base directory named VersionName for each version’s project (e.g. “~/MyAndroidProject/Free”). Each of these will hold its own complete project including the AndroidManifest.
  • Creating the shared resources: There’s nothing special about shared resources (probably in “SharedName/res”), except I suggest noting at the top of the files that they are shared, for reference sake.
  • Creating the shared code:
    • Shared code goes in “SharedName/src/BasePackageDirectory/SharedName” (e.g. “~/MyAndroidProject/Shared/src/com/example/MyAndroidProject/Shared”).
    • As noted for shared resources, you might want to note at the top of the files that they are shared.
    • Set the package name for shared code to “BasePackageName.SharedName” (e.g. “package com.example.MyAndroidProject.Shared;”).
    • Shared code should never directly refer back to a version’s package (non shared) code except through reflection.
    • Resource IDs are still accessible in this shared package through the “R” class, but when accessed the function or class that does the accessing needs to be proceeded with “@SuppressWarnings({ "static-access" })”. The “R” variable also has a “Version” member that can be used to alter shared code flow depending on the version being used. This will be explained more in detail later.
    • *BONUS*
      If shared code needs to access information from a static member in a non-shared class, reflection can be used, for example:
      Class.forName("BasePackageName."+R.Version.name()+".NonSharedClassName").getDeclaredField("StaticMemberName").get(null)
      A static function can be called in a similar manner through reflection:
      Class.forName("BasePackageName."+R.Version.name()+".NonSharedClassName").getMethod("StaticMethodName", new Class[0]).invoke(null);
  • *BONUS* Global Shared Class: I also suggest having a shared class that holds global variables that allows easy interaction from the shared to non shared classes, and holds static information that both use, with members including:
    • A reference to a Context (or Activity) from the main program
    • The BasePackageName (needed for reflection, and other stuff like preference storage)
    • Other useful interfaces like logging
  • Creating the non-shared code:
    • Create a separate project for each version in their corresponding subdirectories listed in the third step of the Create the directory structure" section above.
    • Make the package name for a version as “BasePackageName.VersionName”.
    • When referencing shared classes form an android manifest, make sure to use their full package name, for example, a shared activity would look like “<activity android:name="BasePackageName.SharedName.ActivityName">
    • Import the shared package into all non-shared class files “import BasePackageName.SharedName.*;
  • Linking the shared code into each project:
    • The shared code now needs to get integrated into each project. To do this, all the shared files need to be symbolically (or hard) linked back into their corresponding directories for each version.
    • First, make sure each project directory also contains the same subdirectories as those found in the shared directory.
    • The script I have written for this, which needs to be put in the base directory, is as follows: [EDIT ON 2011-01-03 @ 6:28am] See here for a better copy of the script. [/EDIT]
      #!/bin/bash
      
      #Run this file to install links to shared files into all branches
      
      LN="ln" #Use hard links, which work well in Windows and require less logic to calculate linking
      
      #if [ `uname || grep -vi 'cygwin'` ]; then #If not windows (NOT YET SUPPORTED)
      #	LN="ln -s" #Use symbolic links, which take some additional logic that is not yet programmed
      #fi
      
      LocalBranches=`find -maxdepth 1 -type d | grep -iPv "^\.(/SharedName|)$"` #Find version names, ignoring "." ".." and the shared directory
      
      #Propagate shared files into different versions
      cd Shared
      for i in $LocalBranches; do
      	find -type f -print0 | xargs -0 -i rm -f ../$i/{} #Clear out old links from version just in case the link has been undone
      	if [ "$1" != "clean" ]; then
      		find -type f -print0 | xargs -0 -i $LN {} ../$i/{} #Link shared files into directories
      	fi
      done
      				
  • Tying the resources together:
    • The resources IDs in the “R” class might need to be accessible by the shared classes. To do this, an “R” class needs to be added into the shared namespace that duplicates the information. Unfortunately, the Android generated resources ID class “R” marks everything as “final” so class duplication requires a bit more work than would be preferable.
    • Each different version needs its own “R” class, put into the directory “~/MyAndroidProject/VersionName/src/BasePackageDirectory/SharedName” that the shared code reads from.
    • This class will also contain version information so the shared classes can know which version it is currently interacting with.
    • The code is as follows:
      package BasePackageName.SharedName;
      
      import BasePackageName.VersionName.R.*;
      
      public final class R //Mimic the resources from the (non-shared) parent project
      {
      	//There may be more resource ID groups than these that need to be shared
      	final static attr attr=null;
      	final static drawable drawable=null;
      	final static id id=null;
      	final static layout layout=null;
      	final static string string=null;
      	final static xml xml=null;
      	final static array array=null;
      	
      	public enum Versions { Version1, Version2 }; //List versions here
      	
      	public final static Versions Version=Versions.CurrentVersion;
      }
      			
    • Whenever this “shared” “R” class is accessed, the function or class that does the accessing needs to be proceeded with “@SuppressWarnings({ "static-access" })”. This is due to the hack required to reproduce the “final” class information from the original “R” class into a shared class.
  • Working with shared projects and code in Eclipse:
    • When modifying shared code in Eclipse for any of the different versions, other version’s projects need to be refreshed to pick up the new code. I tried many different methods and processes to fix this different package name problem but this described method is still much quicker and easier than any of the others.
Android Stuff
Yet another platform/library to learn. It never ends.

Having recently finished my first Android project (and hopefully not last), I decided to supply some notes I took about the process.


While I am going to try and keep impressions to a minimum on the rest of this post, and keep it to tangible notes, I must first comment that trying to find out things for the Android platform was often like pulling teeth. While its typical Java reference online documentation is all there with all the classes and cross-linking, that is about all it is, very dry and virtually useless beyond a base reference. The comments on variable parameters (and many other sections) in the reference are often coarse and not descriptive at all, for example, one parameter named mask has the basic description as “this is a mask”. Some functions don’t even have descriptions at all.

Perhaps I am getting too complacent as a programmer and getting used to excellent documentation like for Python or GTK (I’ve even grown to love Microsoft documentation after having used it for long enough!). After all, most required information is just a Google away, and being a programmer is often just about finding the proper magical incantations to hook into a certain library. Unfortunately, however, even web searches were often yielding less than fruitful results when dealing with Android, as the platform is relatively new.



  • Some useful tasks and some problems:
    • Using the virtual (soft) keyboard without a TextView:
      • Showing the virtual keyboard:
        ((InputMethodManager)getSystemService(INPUT_METHOD_SERVICE)).toggleSoftInput(InputMethodManager.SHOW_FORCED, InputMethodManager.HIDE_IMPLICIT_ONLY);
      • Hiding the virtual keyboard:
        ((InputMethodManager)getSystemService(INPUT_METHOD_SERVICE)).hideSoftInputFromWindow(getWindow().getDecorView().getApplicationWindowToken(), 0);
        Note: “getWindow().getDecorView()” can also be replaced by a View on your screen
      • Getting the keyboard input: Add the following function to the Activity that opened the keyboard:
        @Override public boolean onKeyDown(int keyCode, KeyEvent msg)
        Note: This will not work if you’re not using default keyboard input (like if it’s set to enter Japanese or Chinese characters).
    • Determining the physical dimensions of the screen:

      This should be a trivial task using the DisplayMetrics (getWindowManager().getDefaultDisplay()) interface to get dpis and multiply by the screen dimensions getWindowManager().getDefaultDisplay().getWidth() (and .getHeight). However, it doesn’t always work as it should.

      The best method to get the DPI would be to use “DisplayMetrics.xdpi” and “DisplayMetrics.ydpi”, but unfortunately, these are misreported by at least the Motorola Droid. I’ve found “DisplayMetrics.density”*160 to be pretty accurate, but if true accuracy is needed, a calibration screen might be required.

    • Inform user of touch events: Many Android widgets (Views) change their visual state (highlight) when the user presses down on them to let the user know something is going to happen if the user lifts their finger while still on the widget. Unfortunately, there seems to be no text widget or layout view that does this automatic highlighting by itself (ListViews do in groups). The following is some example code to produce this effect.
      import android.view.View.OnTouchListener;
      
      public class CLASSNAME extends Activity
      {
      	@Override public void onCreate(Bundle savedInstanceState)
      	{
      		View HighlightView=findViewById(R.id.THE_VIEWS_ID);
      		HighlightView.setOnTouchListener(HighlightState);
      	}	
      	
      	private OnTouchListener HighlightState = new OnTouchListener() { public boolean onTouch(View v, MotionEvent event)
      	{
      		if(event.getAction()==MotionEvent.ACTION_DOWN)
      			v.setBackgroundColor(0xFF0000FF); //Set background color to blue
      		else if(event.getAction()==MotionEvent.ACTION_CANCEL || event.getAction()==MotionEvent.ACTION_UP)
      			v.setBackgroundResource(0); //No background color
      			
      		return false;
      	} };
      }
    • Retrieving the names and IDs of all resources in a resource group:
      import java.lang.reflect.Field;
      
      Field[] FieldList=R.drawable.class.getDeclaredFields();
      String[] Names=new String[FieldList.length];
      int[] IDs=new int[FieldList.length];
      for(int i=0;i<FieldList.length;i++)
      	IDs[i]=getResources().getIdentifier(Names[i]=FieldList[i].getName(), "drawable", getClass().getPackage().getName());
    • Setting a color matrix on an image: If you have 2 ImageViews that display the same resource image, and either has a color matrix set on it, the will both share one of the color matrices. If this occurs, copy the image the resource, or use a separate image resource. For kicks, here is an example of setting an inverse color matrix on an image.
      ((ImageView)findViewById(R.id.IMAGE_ID)).setColorFilter(new ColorMatrixColorFilter(new float[] {-1,0,0,0,255, 0,-1,0,0,255, 0,0,-1,0,255, 0,0,0,1,0}));
    • Setting to full screen:
      requestWindowFeature(Window.FEATURE_NO_TITLE); //This must be called before "setContentView", and hides the title bar
      getWindow().setFlags(FULLSCREEN ? WindowManager.LayoutParams.FLAG_FULLSCREEN : 0, WindowManager.LayoutParams.FLAG_FULLSCREEN); //Turns on/off the status bar
    • Starting another local activity: Instead of using Intent(String action) for Context.StartActivity, as the Reference explains, it is much easier to use Intent(Context packageContext, Class<?> cls) like the following: (called from inside an Activity)
      startActivity(new Intent(this, OTHER_ACTIVITY_NAME.class);
    • Creating a timed event that updates the UI: A function running through java.util.Timer cannot interact with the GUI. One solution to make a timer is with the android.os.Handler interface.
      import android.os.Handler;
      
      public class ExampleActivity extends Activity
      {
      	final int InitialDelay, RepeatDelay; 
      	Handler TimedHandler=new Handler();
      	
      	public void ExampleFunction()
      	{
      		TimedHandler.postDelayed(new Runnable() { public void run() {
      			//Do GUI stuff...
      			TimedHandler.postDelayed(this, RepeatDelay);
      		} }, InitialDelay);
      	}
      }

      Another solution is to post to a Handler from the Timer function.
  • When dealing with putting on the market place:
    • Getting an account to put applications on the Android Market cost $25.
    • Screenshots shown on the Android Market description page are somewhat buggy, and seemingly randomly either stretch properly or crop. Viewing the full sized screenshots does seem to work properly.
    • Seeing as there are a number of applications on the market that have both a “Free” and “Full” version, you’d think this would be an easy thing to accomplish. Unfortunately, the marketplace uses an application’s package name as its unique identifier, so both versions have to have a different package name, which is again, a bit of a nuisance.

      One method of remedying this is just having a recursive string replace through all the files to change the package names. However, if using eclipse, so you don’t have to reopen it, it’s quicker to update the string first in the manifest, and then renaming the package under the “src” folder by pressing F2 (rename) on it when it is selected.

      Also, unfortunately, if you do this, when a person upgrades from the lite to the full version, preferences are not automatically transferred :-\.

    • The publisher’s market place page is very sparse and leaves a lot to be desired. It also seems to update only once every 24 hours or so (not sure of exact times).
    • If an application is put up, it WILL get downloads immediately. For example, I put up an application with a description of “This is a test, do not download this” for doing security tests that I took down within like 10 minutes. It already had 2 comments/ratings on it within that time ~.~; .
    • Google Checkout: Fees. When a copy of your application is purchased, the user has 24 hours to return it. The money is not deposited into your bank account until after this time (when it’s not a weekend). If you want to give your application to someone for free, they need to purchase it through the market, and then you can cancel the purchase transaction before the 24 hours are up. Unfortunately, this has to be done every time they want to update the application. It also seems you cannot buy your own applications, as the purchase server throws an error.
  • Application Protection:

    You can download any Android application by default from your phone to your computer, modify them, and reinstall them back to any phone. An example use for this would be to crack a shareware application where just a single byte probably needs to be changed to make it a full version.

    The applications themselves are in an .apk file (which is just a .zip file), and the source code (classes) are encoded as a “Dalvik Executable” file within it (classes.dex), which as I understand it, is optimized Java bytecode. So, AFAIK right now, there is no way to decompile the .dex file back to the original source, like you can with normal Java. However, the Android emulator, part of the Android SDK, includes a tool called dexdump, which allows you to decompile it to bytecode.

    Once you have the bytecode, you can use that as reference to modify the compiled .dex file however you want, which is pretty much synonymous with assembly editing. Once that is done, the signature and checksum of the .dex file must be recalculated (Java source by Timothy Strazzere), and then the apk file must be resigned, and then it’s good to go!

    The marketplace also has an option to turn on Copy Protection. When this is turned on for an application, the user cannot backup or access the applications package file. I would assume however with a rooted phone you could still grab it from “/data/app-private”, and the rest of the process should be the same. I have not tested this as rooting Android 2.1 is much more of a pain in the butt, ATM, than I want to deal with.

Realtime StdOut pass through to Web Browser
Tying it all together

I had the need to pass a program’s [standard] output to a web browser in real time. The best solution for this is to use a combination of programs made in different languages. The following are all of these individual components to accomplish this task.

Please note the C components are only compatible with gcc and bash (cygwin required for Windows), as MSVC and Windows command prompt are missing vital functionality for this to work.




The first component is a server made in C that receives stdin (as a pipe, or typed by the user after line breaks) and sends that data out to a connected client (buffering the output until the client connects).

PassThruServer source, PassThruServer compiled Windows executable.


Compilation notes:
  • This compiles as C99 under gcc:
    gcc PassThruServer.c -o PassThruServer
  • Define “WINDOWS” when compiling in Windows (pass “-DWINDOWS”)

Source Code:
#include <stdio.h>
#include <malloc.h>
#include <fcntl.h>
#include <sys/types.h> 
#include <sys/socket.h>
#include <netinet/in.h>
#include <signal.h>

//The server socket and options
int ServerSocket=0;
const int PortNumber=1234; //The port number to listen in on

//If an error occurs, exit cleanly
int error(char *msg)
{
	//Close the socket if it is still open
	if(ServerSocket)
		close(ServerSocket);
	ServerSocket=0;

	//Output the error message, and return the exit status
	fprintf(stderr, "%s\n", msg);
	return 1;
}

//Termination signals
void TerminationSignal(int sig)
{
	error("SIGNAL causing end of process");
	_exit(sig);
}

int main(int argc, char *argv[])
{
	//Listen for termination signals
	signal(SIGINT, TerminationSignal);
	signal(SIGTERM, TerminationSignal);
	signal(SIGHUP, SIG_IGN); //We want the server to continue running if the environment is closed, so SIGHUP is ignored -- This doesn't work in Windows
	
	//Create the server
	struct sockaddr_in ServerAddr={AF_INET, htons(PortNumber), INADDR_ANY, 0}; //Address/port to listen on
	if((ServerSocket=socket(AF_INET, SOCK_STREAM, 0))<0) //Attempt to create the socket
		return error("ERROR on 'socket' call");
	if(bind(ServerSocket, (struct sockaddr*)&ServerAddr, sizeof(ServerAddr))<0) //Bind the socket to the requested address/port
		return error("ERROR on 'bind' call");
	if(listen(ServerSocket,5)<0) //Attempt to listen on the requested address/port
		return error("ERROR on 'listen' call");

	//Accept a connection from a client
	struct sockaddr_in ClientAddr;
	int ClientAddrLen=sizeof(ClientAddr);
	int ClientSocket=accept(ServerSocket, (struct sockaddr*)&ClientAddr, &ClientAddrLen);
	if(ClientSocket<0) 
		return error("ERROR on 'accept' call");

	//Prepare to receive info from STDIN
		//Create the buffer
		const int BufferSize=1024*10;
		char *Buffer=malloc(BufferSize); //Allocate a 10k buffer
		//STDIN only needs to be set to binary mode in windows
		const int STDINno=fileno(stdin);
		#ifdef WINDOWS
			_setmode(STDINno, _O_BINARY);
		#endif
		//Prepare for blocked listening (select function)
		fcntl(STDINno, F_SETFL, fcntl(STDINno, F_GETFL, 0)|O_NONBLOCK); //Set STDIN as blocking
		fd_set WaitForSTDIN;
		FD_ZERO(&WaitForSTDIN);
		FD_SET(STDINno, &WaitForSTDIN);

	//Receive information from STDIN, and pass directly to the client
	int RetVal=0;
	while(1)
	{
		//Get the next block of data from STDIN
		select(STDINno+1, &WaitForSTDIN, NULL, NULL, NULL); //Wait for data
		size_t AmountRead=fread(Buffer, 1, BufferSize, stdin); //Read the data
		if(feof(stdin) || AmountRead==0) //If input is closed, process is complete
			break;
		
		//Send the data to the client
		if(write(ClientSocket,Buffer,AmountRead)<0) //If error in network connection occurred
		{
			RetVal=error("ERROR on 'write' call");
			break;
		}
	}
	
	//Cleanup
	if(ServerSocket)
		close(ServerSocket);
	free(Buffer);
	
	return RetVal;
}



The next component is a Flash applet as the client to receive data. Flash is needed as it can keep a socket open for realtime communication. The applet receives the data and then passes it through to JavaScript for final processing.

Compiled Flash Client Applet


ActionScript 3.0 Code (This goes in frame 1)
import flash.external.ExternalInterface;
import flash.events.Event;
ExternalInterface.addCallback("OpenSocket", OpenSocket);

function OpenSocket(IP:String, Port:Number):void
{
	SendInfoToJS("Trying to connect");
	var TheSocket:Socket = new Socket();
	TheSocket.addEventListener(Event.CONNECT, function(Success) { SendInfoToJS(Success ? "Connected!" : "Could not connect"); });
	TheSocket.addEventListener(Event.CLOSE, function() { SendInfoToJS("Connection Closed"); });
	TheSocket.addEventListener(IOErrorEvent.IO_ERROR, function() {SendInfoToJS("Could not connect");});
	TheSocket.addEventListener(ProgressEvent.SOCKET_DATA, function(event:ProgressEvent):void { ExternalInterface.call("GetPacket", TheSocket.readUTFBytes(TheSocket.bytesAvailable)); });
	TheSocket.connect(IP, Port);
}
function SendInfoToJS(str:String) { ExternalInterface.call("GetInfoFromFlash", str); }
stop();

Flash sockets can also be implemented in ActionScript 1.0 Code (I did not include hooking up ActionScript 1.0 with JavaScript in this example. “GetPacket” and “SendInfoToJS” need to be implemented separately. “IP” and “Port” need to also be received separately).
var NewSock=new XMLSocket();
NewSock.onData=function(msg) { GetPacket(msg); }
NewSock.onConnect=function(Success) { SendInfoToJS(Success ? "Connected!" : "Could not connect"); }
SendInfoToJS(NewSock.connect(IP, Port) ? "Trying to Connect" : "Could not start connecting");



JavaScript can then receive (and send) information from (and to) the Flash applet through the following functions.

  • FLASH.OpenSocket(String IP, Number Port): Call this from JavaScript to open a connection to a server. Note the IP MIGHT have to be the domain the script is running on for security errors to not be thrown.
  • JAVASCRIPT.GetInfoFromFlash(String): This is called from Flash whenever connection information is updated. I have it giving arbitrary strings ATM.
  • JAVASCRIPT.GetPacket(String): This is called from Flash whenever data is received through the connection.

This example allows the user to input the IP to connect to that is streaming the output. Connection information is shown in the “ConnectionInfo” DOM object. Received data packets are appended to the document in separate DOM objects.

JavaScript+HTML Source


Source Code: (See JavaScript+HTML Source file for all code)
var isIE=navigator.appName.indexOf("Microsoft")!=-1;
function getFlashMovie(movieName) { return (isIE ? window[movieName] : document[movieName]);  }
function $(s) { return document.getElementById(s); }

function Connect()
{
	getFlashMovie("client").OpenSocket($('IP').value, 1234);
}

function GetInfoFromFlash(Str)
{
	$('ConnectionInfo').firstChild.data=Str;
}

function GetPacket(Str)
{
	var NewDiv=document.createElement('DIV');
	NewDiv.appendChild(document.createTextNode(Str));
	$('Info').appendChild(NewDiv);
}



Next is an example application that outputs to stdout. It is important that it flushes stdout after every output or the communication may not be real time.

inc source, inc compiled Windows executable.


inc counts from 0 to one less than a number (parameter #1 [default=50]) after a certain millisecond interval (parameter #2 [default=500]).

[Bash] Example:
./inc 10 #Counts from 0-9 every half a second

Source Code:
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
	int NumLoops=(argc>1 ? atoi(argv[1]) : 50); //Number of loops to run from passed argument 1. Default is 50 if not specified.
	int LoopWait=(argc>2 ? atoi(argv[2]) : 500); //Number of milliseconds to wait in between each loop from passed argument 2. Default is 500ms if not specified.
	LoopWait*=1000; //Convert to microseconds for usleep

	//Output an incremented number every half a second
	int i=0;
	while(i<NumLoops)
	{
		printf("%u\n", i++);
		fflush(stdout); //Force stdout flush
		usleep(LoopWait); //Wait for half a second
	};
	
	return 0;
}



This final component is needed so the Flash applet can connect to a server. Unfortunately, new versions of Flash (at least version 10, might have been before that though) started requiring policies for socket connections >:-(. I don’t think this is a problem if you compile your applet to target an older version of Flash with the ActionScript v1.0 code.

This Perl script creates a server on port 843 to respond to Flash policy requests, telling any Flash applet from any domain to allow connections to go through to any port on the computer (IP). It requires Perl, and root privileges on Linux to bind to a port <1024 (su to root or run with sudo).

Flash Socket Policy Server (Rename extension to .pl)


Source Code:
#!/usr/bin/perl
use warnings;
use strict;

#Listen for kill signals
$SIG{'QUIT'}=$SIG{'INT'}=$SIG{__DIE__} = sub
{
	close Server;
	print "Socket Policy Server Ended: $_[0]\n";
	exit;
};

#Start the server:
use Socket;
use IO::Handle;
my $FlashPolicyPort=843;
socket(Server, PF_INET, SOCK_STREAM, getprotobyname('tcp')) or die "'socket' call: $!"; #Open the socket
setsockopt(Server, SOL_SOCKET, SO_REUSEADDR, 1) or die "'setsockopt' call: $!"; #Allow reusing of port/address if in TIME_WAIT state
bind(Server, sockaddr_in($FlashPolicyPort,INADDR_ANY)) or die "'bind' call: $!"; #Listen on port $FlashPolicyPort for connections from any INET adapter
listen(Server,SOMAXCONN) or die "'listen' call: $!"; #Start listening for connections
Server->autoflush(1); #Do not buffer output

#Infinite loop that accepts connections
$/ = "\0"; #Reset terminator from new line to null char
while(my $paddr=accept(Client,Server))
{
	Client->autoflush(1); #Do not buffer IO
	if(<Client> =~ /.*policy\-file.*/i) { #If client requests policy file...
		print Client '<cross-domain-policy><allow-access-from domain="*" to-ports="*" /></cross-domain-policy>'.$/; #Output policy info: Allow any flash applets from any domain to connect
	}
	close Client; #Close the client
}
This could very easily be converted to another better [less resource intensive] language too.


How to tie all of this together
  1. Start the servers
    • In your [bash] command shell, execute the following
      Server/FlashSocketPolicy.pl & #Run the Flash Policy Server as a daemon. Don't forget sudo in Linux
      ./inc | ./PassThruServer #Pipe inc out to the PassThruServer
    • Note that this will immediately start the PassThruServer receiving information from “inc”, so if you don’t get the client up in time, it may already be done counting and send you all the info at once (25 seconds).
    • The PassThruServer will not end until one of the following conditions has been met:
      • The client has connected and the piped process is completed
      • The client has connected and disconnected and the disconnect has been detected (when a packet send failed)
      • It is manually killed through a signal
    • The Flash Policy Server daemon should probably just be left on indefinitely in the background (it only needs to be run once).
  2. To run the client, open client.html through a web server [i.e. Apache’s httpd] in your web browser. Don’t open the local file straight through your file system, it needs to be run through a web server for Flash to work correctly.
  3. Click “connect” (assuming you are running the PassThruServer already on localhost [the same computer]). You can click “connect” again every time a new PassThruServer is ran.
Live Streaming SHOUTcast through Flash
The one time I decide to look online before trying it out myself

A client of mine wanted their website to have an applet that played streaming music from a SHOUTcast server. The easy solution would have been to just embed a Windows Media Player applet into the page, but that would only work for IE.

I thoroughly searched the web and was unable to find a Flash applet (or other solution) that already did this (and actually worked). Most of the information I was finding was people having problems getting this kind of thing working in Flash with no answer provided. After giving up on finding a resolution online, I decided to load up Flash and see what I could find from some tinkering.

Quite frankly, I’m shocked people were having so many problems with this. I started an ActionScript 2.0 project and put in the following code, and it worked right away in Flash CS3 (v9.0) with no problem:

var URL="http://example.shoutcast.castledragmire.com:1234/" //The URL to the SHOUTcast server
var MySound:Sound=new Sound(this);
MySound.loadSound(URL,true);

Unfortunately, once I exported the Flash applet and loaded it up in my browsers, it was no longer working. After a few minutes of poking around, I had a hunch that the SHOUTcast host might be sending different data depending on the [Browser’s] User Agent. I changed Firefox’s User Agent to “Flash” through a Firefox add-on (User Agent Switcher), and it worked :-D.

Once again, unfortunately, this was not a viable solution because I couldn’t have every user who visited the client’s web page change their browser User Agent string :-). The quickest solution at this point to the problem was to just create a passthrough script that grabbed the live stream on their server and passed it to the client. The following is the PHP script I used for this:

$streamname='example.shoutcast.castledragmire.com';
$port      ='1234';
$path      ='/';

header('Content-type: audio/mpeg');
$sock=fsockopen($streamname,$port);
fputs($sock, "GET $path HTTP/1.0\r\n");
fputs($sock, "Host: $streamname\r\n");
fputs($sock, "User-Agent: WinampMPEG/2.8\r\n");
fputs($sock, "Accept: */*\r\n");
fputs($sock, "Connection: close\r\n\r\n");
fpassthru($sock);
fclose($sock);

The final two steps to get this working were:
  1. Setting the Flash Applet’s URL variable to the PHP file
  2. Turning off PHP output buffering for the file. This can only be done through Apache or the php.ini depending on the server setup. This is very important, as if it’s on, the data will never get sent to the user.

The only problem with this method is that it taxes the server that is passing the data through, especially since it uses PHP... This kind of thing could very easily be done in C though (as a matter of fact, I will be writing a post on something very close to that very soon).

JavaScript problems when crossing windows Part 2
IE being a pain in the butt like usual

To continue the subject in my last post, these next cross-window bugs also derive from objects not being recognized properly when being passed between windows in JavaScript.

I needed the ability to dynamically run functions in the secondary window form the primary window where the parameters are taken from an array. Since a “function” from a secondary window is not seen as a function object from the primary window in IE, the apply member was not working.

I have included a fix for this below in the “RunFunctionInRemoteWindow” function, which is just a wrapper function in the second window that calls the apply function. This function manually copies the array through a for loop, instead of using slice, because in IE7 (but not IE8), the passed arrays were not seen as valid JSObjects, so the slice method (which is a standard method used for copying arrays by value) was not working.


LocalWindow.html [run this one]
<html><body>
<input type=button onclick="RunTest();" value='Click me when the second window has opened to run the test'>
<script type="text/javascript">

//Spawn the second window
var NewWindow=window.open('RemoteWindow.html');

//Run the test
function RunTest()
{
	LocalAlertString('This is an alert generated from the local window');
	NewWindow.RemoteAlertString('This is an alert generated from the remote window');
	alert('The local window alert function is of type function: '+(LocalAlertString instanceof Function));
	alert('The remote window alert function is of type function: '+(NewWindow.RemoteAlertString instanceof Function));
	LocalAlertString.apply(window, ['This is an alert generated from the local window through the APPLY member']);

	try {
	NewWindow.RemoteAlertString.apply(NewWindow.window, ['This is an alert generated from the remote window through the APPLY member. This will not work in IE because the remote window\'s function is not actually a function.']);
	} catch(e) { alert('The REMOTE APPLY failed: '+e.message); }

	NewWindow.RunFunctionInRemoteWindow('RemoteAlertString', ['This is an alert generated from the remote window through the FIXED APPLY function.']);
}

//Generate an alert in the local window
function LocalAlertString(TheString)
{
	alert('Local String: '+TheString);
}

</script></body></html>

RemoteWindow.html [do not run this one, it is opened as a popup from LocalWindow.html]
<html><body><script type="text/javascript">
//Generate an alert in the remote window
function RemoteAlertString(TheString)
{
	alert('Remote String: '+TheString);
}

//Call functions in this window remotely through the "apply" member
function RunFunctionInRemoteWindow(FunctionName, Parameters)
{
	//Manually copy the passed Parameters since "Parameters" may not be a valid JSObject anymore (this could be detected and array.slice used if it is still valid)
	var ParametersCopy=[];
	for(var i=0;i<Parameters.length;i++)
		ParametersCopy[i]=Parameters[i];
	
	window[FunctionName].apply(window, ParametersCopy);
}
</script></body></html>
JavaScript problems when crossing tabs or windows
Too tired to think of a subtitle today

I was doing some research around April of 2009 on JavaScript interaction between web browser windows. I was doing this because web browsers are starting to split off each tab/window into separate processes/threads (Firefox is lagging in this), which can lead to some useful new implementations in the browser world, including multithreading. I wanted to explore the interaction between these windows to make sure there were no caveats that might creep up if I decided to take advantage of this.

The first one I found was that each browser window has its own instance of all of the base object classes, so prototypes do not carry over, and instanceof will not work as expected.

For example, if in WindowOne, you add a prototype to the Array class called IsArray, it is only accessible by arrays created in WindowOne. If you pass an array created in WindowOne into a second window, the prototype is still available on that one array (IIRC this was not true of some of the browsers at the time, but I tested again today, and it worked for IE8, Firefox3, and Google Chrome). Also, since the base object class in Window1 and other windows are not the same, an object created in Window1 and passed to another window will return false in a instanceof Object operation in that other window.


Here is some example code to help show what I’m talking about.

LocalWindow.html [run this one]
<html><body>
<input type=button onclick="RunTest();" value='Click me when the second window has opened to run the test'>
<script type="text/javascript">
Array.prototype.IsArray=true;
var NewWindow=window.open('RemoteWindow.html'); //Spawn the second window
function RunTest() { NewWindow.RunTest({}, [], new ExampleObject()); }; //Send the test data to remote window
function ExampleObject() { } //An example class
</script></body></html>

RemoteWindow.html [do not run this one, it is opened as a popup from LocalWindow.html]
<html><body><script type="text/javascript">
function RunTest(AnObject, AnArray, AnExampleObject)
{
   var MyTests=[
      'AnObject instanceof Object',
      'AnObject.IsArray',                               //Object.prototype does not have this (Array.prototype does)
      'AnArray instanceof Object',
      'AnArray instanceof Array',
      'AnArray.IsArray',                                //This was added to the Array.prototype in the parent window
      'AnArray instanceof opener.Array',                //This example does not work in IE7 because opener.x cannot be properly accessed
      'AnExampleObject instanceof opener.ExampleObject',//This example does not work in IE7 because opener.x cannot be properly accessed
      'AnExampleObject instanceof ExampleObject'        //This test should error because "ExampleObject" does not exist in this window
   ];
   
   for(var i=0;i<MyTests.length;i++) //This runs each test like the following: alert("TEST: "+(TEST));
      try {
         eval('alert("'+MyTests[i]+': "+('+MyTests[i]+'));');
      } catch(e) {
         alert('Error on test "'+MyTests[i]+'": '+(e.hasOwnProperty('message') ? e.message : e.toString()));
      }
}
</script></body></html>
Cross Domain AJAX Requests
Bypassing the pesky browser security model

Since I just released my AJAX Library, I thought I’d post a useful script that uses it. The function CrossDomainGetURL below uses the AJAX Library to make requests across domains in Firefox. It takes one more parameter (not in order) than the AJAX Library's GetURL function, which is an array of domains to pull cookies from for the AJAX request.


function GetCookiesFromURL(Domains) //Return all the cookies for Domains specified in the Domains array
{
	var cookieManager = Components.classes["@mozilla.org/cookiemanager;1"].getService(Components.interfaces.nsICookieManager); //Requires privileges, which is granted in CrossDomainGetURL
	var iter=cookieManager.enumerator, CookieList=[], cookie; //The object used to find all cookies, the final list of cookies, and a temporary object
	while(iter.hasMoreElements()) //Loop through all cookies
		if(((cookie=iter.getNext()) instanceof Components.interfaces.nsICookie) && Domains.indexOf(cookie.host)!=-1) //If a cookie whose host matches one of our domains
			CookieList.push(cookie.name+'='+cookie.value); //Add it to our final list
	return CookieList.join("; "); //Return the cookie list for the specified domains
}

function CrossDomainGetURL(URL, Data, CookieDomains, ExtraOptions) //See AJAX Library GetURL function. CookieDomains is an array specifying what domains cookies are pulled from for the AJAX call. 
{
	//Access universal privileges in Firefox (Required to get cookies for other domains, and to use AJAX with other domains). This functionality is lost as soon as this function loses scope.
	try { netscape.security.PrivilegeManager.enablePrivilege("UniversalXPConnect"); }
	catch(e) { return alert('Cannot access browser privileges'); }

	if(CookieDomains instanceof Array) //If an array of domains is passed to get cookies from...
	{	
		ExtraOptions=((ExtraOptions instanceof Object) ? ExtraOptions : {}); //Make sure extra options is an object
		ExtraOptions.AdditionalHeaders=((ExtraOptions.AdditionalHeaders instanceof Object) ? ExtraOptions.AdditionalHeaders : {}); //Make sure extra options has an additional headers object
		ExtraOptions.AdditionalHeaders.Cookie=GetCookiesFromURL(CookieDomains); //Get cookies for the domains
	}
	
	return GetURL(URL, Data, ExtraOptions); //Do the AJAX Call
}
Directory Difference in Web Browser
Another quick and dirty solution

Since my FileSync Project is still a long way to being where I want it to be and is in a state that makes it annoying to use, I decided to throw together a script that essentially emulates the primary functions of it for what I need. I find it a bit annoying that I’ve never been able to find another good project that does exactly what I want for quick syncing of files over networks :-\. I used to think rsync would be a good solution for it but it’s very... quirky and unstable in certain ways. Not as flexible as I would like. Alas.

Anywho, this PHP script takes 2 file lists and gives you back their differences in a directory tree view. Each file has a data string after it used to tell if the files are different. The data string can be anything you want, but will usually probably be a timestamp or data checksum. The tree view lets you hide files/directories depending on whether each item is only on one side, different, or the same. An example is as follows (List1, List2):

Dir1.txt filesData String (Timestamps)
Same.txt9999-99-99 99:99:99
Diff.txt9999-99-99 99:99:99
LeftOnly.txt9999-99-99 99:99:99
PartialDir/Same.txt9999-99-99 99:99:99
PartialDir/Diff.txt9999-99-99 99:99:99
PartialDir/LeftOnly.txt9999-99-99 99:99:99
SameDir/Same.txt9999-99-99 99:99:99
LeftOnlyDir/Left.txt9999-99-99 99:99:99
LeftOnlyDir/Left2.txt9999-99-99 99:99:99
DiffDir/Diff.txt9999-99-99 99:99:99
Dir2.txt filesData String (Timestamps)
Same.txt9999-99-99 99:99:99
Diff.txt1111-11-11 11:11:11
RightOnly.txt9999-99-99 99:99:99
PartialDir/Same.txt9999-99-99 99:99:99
PartialDir/Diff.txt1111-11-11 11:11:11
PartialDir/RightOnly.txt9999-99-99 99:99:99
SameDir/Same.txt9999-99-99 99:99:99
RightOnlyDir/Right.txt9999-99-99 99:99:99
  
DiffDir/Diff.txt1111-11-11 11:11:11


Example Output:
Legend:
  • Differences:
    • No differences detected
    • File is different, or directory contains differences
    • File is only found on left side, when this happens to a directory, it is only counted as 1 difference
    • File is only found on right side, when this happens to a directory, it is only counted as 1 difference
  • Type:
    • This is a directory with sub items. After the directory name, it lists as info the number of differences over the total number of sub items
    • This is a file. If both sides contain the file but are different, it lists as info the different strings
  • Info about the file/directory listed in between parenthesis.
Options:
Left Side: ./Dir1.txt
Right Side: ./Dir2.txt
Total Differences: 9
Total Items: 13
  • DiffDir (1/1)
    • Diff.txt (9999-99-99 99:99:99 :: 1111-11-11 11:11:11)
  • LeftOnlyDir (1/2)
    • Left.txt
    • Left2.txt
  • PartialDir (3/4)
    • Diff.txt (9999-99-99 99:99:99 :: 1111-11-11 11:11:11)
    • LeftOnly.txt
    • RightOnly.txt
    • Same.txt
  • RightOnlyDir (1/1)
    • Right.txt
  • SameDir (0/1)
    • Same.txt
  • Diff.txt (9999-99-99 99:99:99 :: 1111-11-11 11:11:11)
  • LeftOnly.txt
  • RightOnly.txt
  • Same.txt

Some example bash commands used to create file lists:
  • Output all files and their timestamps to “Dir1.txt”: find -type f -printf '%P\t%T+\n' > Dir1.txt
  • Output all files and their md5sums to “Dir1.txt”: find -type f -print0 | xargs -0 md5sum | perl -pe 's/^(.*?) (.*)$/$2\t$1/g' > Dir1.txt
Alamo Draft House Schedule List
Simple information reorganization example

After discovering the Alamo Draft House’s coolness a few months ago, I’ve been trying to watch what they’re playing to make sure I catch anything I might want to see on the big screen. Unfortunately, it is not easy to get a good quick idea of all the movies playing from their calendar because it shows movies per day with showtimes, making the list repetitive and crowded with extra information.

I decided to throw together a real quick PHP script that would parse their data so I could organize it however I wanted. The final result can be viewed here. The code is as follows:

//The list of calendar pages in format TheaterName=>URL
$PagesToGrab=Array(
	'Ritz'=>'http://www.originalalamo.com/Calendar.aspx?l=2',
	'Village'=>'http://www.originalalamo.com/Calendar.aspx?l=3',
	'South Lamar'=>'http://www.originalalamo.com/Calendar.aspx?l=4'
);

foreach($PagesToGrab as $Name => $URL) //Grab the movies for each theater
{
	print "<b>$Name</b><br>"; //Output the theater name
	$TheHTML=file_get_contents($URL); //Grab the HTML
	$ShowList=Array(); //This will contain the final list of shows and what days they are on
	
	preg_match_all('/<td class="day">.*?<\/td>/', $TheHTML, $DayMatches); //Extract all table cells containing a day's info
	foreach($DayMatches[0] as $DayInfo) //Loop over each day's info
	{
		//Determine the day of month
		preg_match('/<a class=\"daynumber\" title=".*?, (.*?),/', $DayInfo, $DayOfMonth);
		$DayOfMonth=$DayOfMonth[1];
		
		//Determine all the shows for the day
		preg_match_all('/<span class="show"><a href=".*?">(.*?)<\/a>/', $DayInfo, $AllShows);
		foreach($AllShows[1] as $Show)
		{
			$Show=preg_replace('/^\s+|\s+$/', '', $Show); //Remove start and end of line whitespace
			if(!isset($ShowList[$Show])) //If show has not yet been added to overall list, add it
				$ShowList[$Show]=Array();
			$ShowList[$Show][]=$DayOfMonth; //Add this day as a time for the show
		}
	}
	
	//Output the shows and their days
	print '<table>';
	foreach($ShowList as $ShowName => $Days)
		print "<tr><td>$ShowName</td><td>".implode(', ', $Days).'</td></tr>';
	print '</table><br><br>';
}	
<? PageFooter(); ?>
</body></html>
PHP file inclusion weirdness
PHP has its own quirks too
In PHP, you cannot include files in parent directories “../” from a file that has already been included from another file in a different directory. This has been a nuisance for a long time.

Here is a test case: (Files followed by their code)

/test1.php (This is the file that is called directly by the browser/apache)
<?
//This all works fine
print 'test1'_start;
require('foo/bar/test2.php');
print 'test1_end';
?>

/foo/bar/test2.php
<?
print 'test2_start';
require('blah/test3.php'); //This works fine because the include file it is in a subdirectory, not a parent directory of test2.php
require('../test4.php'); //This does not call fine (an error is thrown by php) because it is in a parent directory relative to test2.php, which was already included from the parent file (test1.php) in another directory (/). To fix this, use 'foo/test4.php'
print 'test2_end';
?>

/foo/bar/blah/test3.php
<? print 'test3'; ?>

/foo/test4.php (This file is not reached by this example without the fixes mentioned in either the comment in test2.php, or below)
<? print 'test4'; ?>

The obvious method to fix this whole conundrum is to always set all includes relative to one root path, and then make sure that path is always used with the set_include_path function if your parent file is not in the root directory. For example:
set_include_path(get_include_path().':/PATH_TO_ROOT_DIRECTORY');

Another method would be to write a require/include wrapper that calculates paths from the current directory whenever a parent path “../” is used. Here is an example of this:
function _require($IncludeFile, $CurrentFile)
{
	$CurrentPath=preg_replace('/[^\/]+$/', '', $CurrentFile); //Remove the filename to get the current path
	require(realpath("$CurrentPath$IncludeFile"));
}

This method is called with 2 parameters: the relative path from the current include file to the file you want to include, and __FILE__
For example, line 4 of “/foo/bar/test2.php” above would now be:
_require('../test4.php', __FILE__);

The first line of the _require function could also be removed by using the __DIR__ constant (instead of __FILE__) which was added in PHP 5.3.0.
Google Search Removed Functionality
Has Google been removing search options to make things run faster?

I’ve been meaning to get searching working on my site for what seems like forever, and I decided to finally get around to getting some manner of search working via the temporary “use Google” solution. Unfortunately, it seems Google no longer does boolean searches completely properly as advertised. I am sure Google Search still supports boolean logic (as opposed to the assumed “and” between each word) because the Advanced Search, linked to from their front page, still has it, and it returns a few of the results it should.


As an example:
If I wanted to search the Projects and Updates sections of my sites for either the keywords fractal or font I would use the following search:
(site:www.castledragmire.com/Projects OR site:www.castledragmire.com/Updates) AND (Fractal OR Font)

This currently only returns 3 results, when it should return 11 different results, enumerated by using the 4 separate searches (with return results):
  1. site:www.castledragmire.com/Projects Fractal
  2. site:www.castledragmire.com/Projects Font
  3. site:www.castledragmire.com/Updates Fractal
  4. site:www.castledragmire.com/Updates Font

Actual Google returned results: A simple example of this through the Google Advanced Search Page is as follows:

Fractal OR Font site:www.castledragmire.com/Projects [Advanced Search]

Which only returns 3 results (following) instead of the 6 (see above) that it should:

Because of this, I need to go ahead and get real searching up via MySQL (or possibly another solution), as originally planned, sooner than later, since Google will not work as a temporary solution for what I want.


I wrote up a paper on what could be done through Google Search over 5 years ago as a job request [to be posted soon], which I believe is very informative. I’m sure it’s a little outdated, but it shows how much can Google can [could] do for you.

More Browser Bugs
I really hate web browser scripting due to the multitude of interoperability problems

I’ve been incredibly busy lately, especially with work, but I finally have some time for personal stuff like posting again, yay. I’m currently stuck at the airport, and am leaving at 7AM this morning on vacation for 10 days on a tour of the west coast (Los Angeles, Disney World, Hollywood, Las Vegas, Grand Canyon, etc). The main reason for this get away is I’ll be meeting up with a good friend and his fiancée for their vacation and will be attending his wedding in Las Vegas ^_^.

I am currently on one of those open network connections at the airport that you have to pay to use, tunneled through one of my SSH servers, so I can bypassing their pay service and get online for free to post this :-). Hey, it’s their own fault for not securing it properly lol. I periodically kept getting dropped connections due to a weak signal, so I had to get back up after finding the connection and walk around, using my iPod to detect signal strengths until I found a better area with a stronger signal because. The thing is proving to be very useful ^_^. Anywho, on to the content of the post.


I’ve recently run into a number of new bugs [new to me at least] in both IE (version 7) and Firefox (version 3) that I have not encountered before and, as usual, have to program around to accomplish my tasks. I thought I’d discuss 3 of these bugs.

  • Relative (non absolute) base URL paths do not work in either Firefox or Internet Explorer.

    Setting a base path for a website is often a necessity for websites that have web pages that are in subdirectories beyond the websites’s root directory. The reason for this is that the page-common layout of a web page usually refers to all images and content in a relative path. This is done for multiple reasons including:

    • Ease of moving the site between addresses like for test stages, or if the site is served from multiple domain names.
    • It’s easier to read source code URLs this way
    • It makes the HTML files smaller; though this isn’t a problem for most users these days because internet connection speeds are much faster.

    An example of W3C valid code that produces this error is as follows:
    <head><base href="/MySite/">
    
    The code, unfortunately, has to be an absolute URL like the following for current versions of IE and Firefox.
    <head><base href="http://domain.com/MySite/">
    

    One simple method to solve this problem is to use JavaScript to set an absolute base URL. Unfortunately, this then requires web browsers to have JavaScript enabled to work :-\. For this reason, this is really a quick fix for internal use that shouldn’t be put into production use unless JavaScript is required anyways.

    The following code will set a base of “http://domain.com/MySite/” for “http://domain.com/MySite/Posts/Post1.html”.

    <head>
    	<script type="text/javascript">
    		function GetBase() //Get the directory above the current path’s URL
    		{
    			return document.location.protocol+	//The protocol ("http:" or "https:")
    				'//'+				//End the protocol section with a //
    				document.location.hostname+	//The host (domain)
    				document.location.pathname.replace(/(\/[^/]*){2}$/,'')+ //This moves up 1 directory from the current path. To move up more directories, set the "2" in this line to NumberOfDirectoriesToMoveUp+1
    				'/';				//Add a '/' to set the end of the path as a directory
    		}
    		document.write('<base href="'+GetBase()+'">'); //Write a BASE object to set the current web page’s base URL to GetBase()
    	</script>
    </head>
    

    A simpler solution is to just have your parsing language (PHP for example) detect the server you are running on and set the proper base URL accordingly. This method assumes you know all the possible places/addresses your website will run on.

    <head><base href="<?=($_SERVER['HTTP_HOST']=='domain.com' ? 'http://domain.com/MySite/' : 'http://domain2.com/')?>"></head>
    
  • Reserved keywords in IE cannot be used as object members
    Example (JavaScript):
    var MyObject={};
    MyObject.return=function() { return true; }
    
    Solution: Instances of this must be encoded in strings
    var MyObject={};
    MyObject['return']=function() { return true; }
    
    This also occurs for other reserved keywords like “debugger” and “for”.
  • IE’s window does not have the “hasOwnProperty” member function like ALL OTHER OBJECTS

    This is a major nuisance because trying to find out if a variable exists and is not a prototype in the global scope is an important function. *sighs*

    The fix for this is using “window.VARIABLE!==undefined”, though this won’t tell you if the variable is actually instanced or [again] if it is part of the prototype; only if it is defined.


One more JavaScript engine difference between IE and Firefox is that in IE you can’t end a hash with an empty member. For example, the following works in Firefox, but not IE:

var b={a:1, b:2, c:3, d:4, };

This shouldn’t really be done anyways, so it’s not really a problem IMO. I ran across this when converting some bad Perl code (generated by YACC) which coincidentally allows this.


It’s really hard making everything compatible across all web browser platforms when they all contain so many nuances and bugs :-\.

Telnet Workaround
I hate not having root ^_^;

I recently had to do some work on a system where I was not allowed SSH/telnet access. Trying to do work strictly across ftp can take hours, especially when you have thousands of files to transfer, so I came up with a quick solution in PHP for simple command line access.

<form method=post action="exec.php">
	<table style="height:100%;width:100%">
		<tr><td height="10%">
			<textarea name=MyAction style="height:100%;width:100%"><?=(isset($_REQUEST['MyAction']) ? $_REQUEST['MyAction'] : '')?></textarea>
		</td></tr><tr><td height="90%">
			<textarea name=Output style="height:100%;width:100%"><?
if(isset($_REQUEST['MyAction']))
{
	$MyAction=preg_split("/\\r?\\n/", $_REQUEST['MyAction']);
	foreach($MyAction as $Action)
	{
		exec($Action, $MyOutput);
		print htmlentities(implode("\n", $MyOutput), ENT_QUOTES, 'ISO8859-1')."\n-----------------------\n";
	}
}
			?></textarea>
		</td></tr><tr><td height=1>
			<input type=submit>
		</td></tr>
	</table>
</form>

This code allows you to enter commands on separate lines in the top box, and after the form is submitted, the output of each command is entered into the bottom box separated by dashed lines.

Note that between each command the environment is reset, so commands like "cd" which change the current directory are not useable :-(. You must also change the line 'action="exec.php"' to reflect the name you give the file.


A more suitable solution would be possible through AJAX and a program that redirected console output from a persistent session, but this was just meant as quick fix :-).

Managing Firefox History
Software likes hiding sensitive information and keeping it persistent :-(

Since version 3 of Firefox, the browser has moved over from using flat files for keeping track of browsing history (history.dat) and bookmarks (bookmarks.html) to using SQLite databases (places.sqlite). This change over was required because the old flat file formats were badly implemented, clunky, and not able to handle the new demands of the location bar and browser history. Using a SQL database was the perfect solution for the complexity brought in with the new location bar and its dynamic searching of previous URLS, as SQL is easy to implement, is mostly compatible against multiple SQL application implementations (removing dependency on a single product), and powerful for cross referencing lookups. As a matter of fact, most of the data Firefox keeps now is stored in SQLite databases.

SQLite was also a good choice for the SQL solution because it can be implemented minimally straight into a product without needing a large install and a lot of bloat. While I like SQLite for this purpose and its ease of implementation, it lacks a lot of base SQL functionality that would be nice, like TABLE JOINS inside of DELETE statements, among many other language abilities. I wouldn’t suggest using it for large database driven products that require high optimization, which I believe it can’t handle. It’s meant as a simpler SQL implementation.


Anyways, I was very happy to see that when you delete URLs from the history in the newest version of Firefox that it actually deletes them out of the database as opposed to just hiding them, like it used to. The history manager actual seems to do its job quite well now, but I noticed one big problem. After attempting to delete all the URLs from a specific site out of the Firefox history manager, I noticed there were still some entries from that site in the SQLite database, which is a privacy problem.

After some digging, I realized that there are “hidden” entries inside of the history manager. A hidden entry is created when a URL is loaded in a frame or IFrame that you do not directly navigate too. These entries cannot be viewed through the history manager, and because of this, cannot be easily deleted outside of the history database without wiping the whole history.

At this point, I decided to go ahead and look at all the table structures for the history manager and figure out how they interact. Hidden entries are marked in places.sqlite::moz_places.history with the value “1”. According to a Firefox wiki “A hidden URL is one that the user did not specifically navigate to. These are commonly embedded pages, i-frames, RSS bookmarks and javascript calls.” So after figuring all of this out, I came up with some SQL commands to delete all hidden entries, which don’t really do anything anyways inside the database. Do note that Firefox has to be closed to work on the database so it is not locked.

sqlite3 places.sqlite
DELETE FROM moz_annos WHERE place_id IN (SELECT ID FROM moz_places WHERE hidden=1);
DELETE FROM moz_inputhistory WHERE place_id IN (SELECT ID FROM moz_places WHERE hidden=1);
DELETE FROM moz_historyvisits WHERE place_id IN (SELECT ID FROM moz_places WHERE hidden=1);
DELETE FROM moz_places WHERE hidden=1;
.exit

This could all be done in 1 SQL statement in MySQL, but again, SQLite is not as robust :-\. There is also a “Favorite’s Icon” table in the database that might keep an icon stored as long as a hidden entry for the domain still exists, but I didn’t really look into it.

Perl Magic
The language of obfuscation

I’ve been delving into the Perl language more lately for a job, and have found out some interesting things about it. Perl itself is a bit shrouded in mysticism, with it often being said that it runs on “magic”. The original Perl engine, written by Larry Wall, has never been duplicated due to its incredible complexity and hacked together nature.


One funny little thing I noticed is that an arrow “=>” and comma “,” are completely synonymous in the language. For example, this is how you SHOULD declare a hash and an array, because it just looks better and is proper coding standards:
@MyArray=('a',1,'b',2); #An array with values a,1,b,2
%MyHash=(a=>1, b=>2); #A hash with keys a,b that contain the values 1,2
but you can actually declare the exact same array and hash objects like this
@MyArray=('a'=>1=>'b'=>2); #An array with values a,1,b,2
%MyHash=(a,1,b,2); #A hash with keys a,b that contain the values 1,2

It’s also easy to find the length of a non referenced array in Perl as follows:
print $#MyArray; #Index of the last element, so add 1 to get length
or
$ArrayLength=@MyArray;
print $ArrayLength;

There are two ways to do it with a referenced array:
$MyRefArray=[1,2,3];
print scalar @$MyRefArray;
print $#$MyRefArray; #Index of the last element, so add 1 to get length
Moral of the story: there are many ways to do things in Perl.

After now having delved a bit more into how Perl works, I still like PHP better as a strictly quick scripting language. Oh well.

JavaScript Prototyping Headaches
A spiffy language feature leading to a problem

JavaScript is a neat little scripting language and does the job it is intended for very well. The prototype system is very useful too, but has one major drawback. First, however, a very quick primer on how objects are made in JavaScript and what prototyping is.


An object is made in JavaScript by calling a named function with the keyword “new”.
function FooBar(ExampleArgument)
{
	this.Member1=ExampleArgument;
	this.AnotherMember='Blah';
}
var MyObject=new FooBar(5);
This code creates a FooBar object in the variable MyObject with 2 members: Member1=5, and AnotherMember='Blah' .

Prototyping adds members to all objects of a certain type, without having to add the member to it manually. This also allows you to change the value of a member of all objects of a single type at once. For example (all examples are continued from above examples):
FooBar.prototype.NewMember=7;
var SecondObject=new FooBar();
Now both MyObject and SecondObject have a member NewMember with value 7, which can be changed easily for both objects like this:
FooBar.prototype.NewMember=9;

The way to detect if an object has a member is to use the in function, and then to determine if the member is prototyped, the hasOwnProperty function is used. For example:

'NewMember' in MyObject;			//Returns true
MyObject.hasOwnProperty('NewMember');		//Returns false

'Member1' in MyObject;				//Returns true
MyObject.hasOwnProperty('Member1');		//Returns true

'UnknownMember' in MyObject;			//Returns false
MyObject.hasOwnProperty('UnknownMember');	//Returns false

Now, the problem starts coming into play when using foreach loops.
for(var i in MyObject)
	console.log( i + '=' + MyObject[i].toString() ); //console.log is a function provided by FireBug for FireFox, and Google Chrome
This would output:
Member1=5
AnotherMember=Blah
NewMember=9

So if you wanted to do something on all members of an object and skip the prototype members, you would have to add a line of code to each foreach loop as follows:
for(var i in MyObject)
	if(MyObject.hasOwnProperty(i))
		console.log(i+'='+MyObject[i].toString());
This would output:
Member1=5
AnotherMember=Blah

This isn’t too bad if you are using prototyping yourself on your objects, but sometimes you might make objects that you wouldn’t expect to have prototypes. For good coding practice, you should really do the prototype check for every foreach loop because you can never assume that someone else will not add a prototype to an object type, even if your object type is private. This is especially true because all objects inherit from the actual Object object including its prototypes. So if someone does the following, which is considered very bad practice, every foreach loop will pick up this added member for all objects.

Object.prototype.GlobalMember=10;

You might ask “Why anyone would do this?”, but it could be useful for an instance like this...
Object.prototype.indexOf=function(Value)
{
	for(var i in this)
		if(this.hasOwnProperty(i) && this[i]===Value)
			return i;
	return undefined;
}
This function will search for the first member that contains the given value and return the member’s name.

It would be really nice if “for(x in y)” only returned non-prototype members and there was another type of foreach loop like “for(x inall y)” that also returned prototype members :-\.


This is especially important for Array objects. Arrays are like any other object but they come naturally with the JavaScript language. For Arrays, it is most appropriate to use
for(var i=0;i<ArrayObject.length;i++)
instead of
for(var i in ArrayObject)
loops. Also, in my own code, I often add the following because the “indexOf” function for Arrays is not available in IE, as it is not W3C standard. It is in Firefox though... but I’m not sure if this is a good thing, as it is not a standard.
//Array.indexOf prototype
if(Array.prototype.indexOf==undefined)
{
	function ArrayIndexOf(SearchIndex)
	{
		for(var i=0;i<this.length;i++)
			if(this[i]==SearchIndex)
				return i;
		return -1;
	}
	Array.prototype.indexOf=ArrayIndexOf;
}

I’m not going to go into how JavaScript stores the prototypes or how to find out all prototype members of an object, as that is a bit beyond what I wanted to talk about in this post, and it’s pretty self explanatory if you think about it.

Erasing Website Cookies
A quick useful code snippet because it takes way too long to do this through normal browser means
This erases all cookies on the current domain (in the “ / ” path)

JavaScript:
function ClearCookies() //Clear all the cookies on the current website
{
	var MyCookies=document.cookie; //Remember the original cookie string since it will be changing soon
	var StartAt=0; //The current string pointer in MyCookies
	do //Loop through all cookies
	{
		var CookieName=MyCookies.substring(StartAt, MyCookies.indexOf('=', StartAt)).replace(/^ /,''); //Get the next cookie name in the list, and strip off leading white space
		document.cookie=CookieName+"=;expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/"; //Erase the cookie
		StartAt=MyCookies.indexOf(';', StartAt)+1; //Move the string pointer to the end of the current cookie
	} while(StartAt!=0)
}

I went a little further with the script after finishing this to add a bit of a visual aspect.
The following adds a textarea box which displays the current cookies for the site, and also displays the cookie names when they are erased.
<input type=button value="Clear Cookies" onclick="ClearCookies()">
<input type=button value="View Cookies" onclick="ViewCookies()">
<textarea id=CookieBox style="width:100%;height:100%"></textarea>
<script type="text/javascript">
function ViewCookies() //Output the current cookies in the textbox
{
	document.getElementById('CookieBox').value=document.cookie.replace(/;/g,';\n\n');
}

function ClearCookies() //Clear all the cookies on the current website
{
	var CookieNames=[]; //Remember the cookie names as we erase them for later output
	var MyCookies=document.cookie; //Remember the original cookie string since it will be changing soon
	var StartAt=0; //The current string pointer in MyCookies
	do //Loop through all cookies
	{
		var CookieName=MyCookies.substring(StartAt, MyCookies.indexOf('=', StartAt)).replace(/^ /,''); //Get the next cookie name in the list, and strip off leading white space
		CookieNames.push(CookieName); //Remember the cookie name
		document.cookie=CookieName+"=;expires=Thu, 01-Jan-1970 00:00:01 GMT; path=/"; //Erase the cookie
		StartAt=MyCookies.indexOf(';', StartAt)+1; //Move the string pointer to the end of the current cookie
	} while(StartAt!=0)
	document.getElementById('CookieBox').value='Clearing: '+CookieNames.join("\nClearing: "); //Output the erased cookie names
}
</script>

Live Example:
Client Side Security Fallacies
Never rely solely on information you receive from untrusted sources

One of the most laughable aspects of client/server* systems is client side based security access restrictions. What I mean by this is when credentials and actions are not checked and restricted on the server side of the equation, only on the client side, which can ALWAYS be bypassed.


To briefly explain why it is basically insane to trust a client computer; ANY multimedia, software, data, etc that has touched a person’s computer is essentially now their property. Once something has been on or through a person’s computer, the user can make copies, modify it, and do whatever the heck they want with it. This is how the digital world works. There are ways to help stop copying and modification, like hashes and encryption, but most of the ways in which things are implemented nowadays are quite fallible. There may be, for example, safeguards in place to only allow a user to use a piece of software on one certain computer or for a certain amount of time (DRM [Digital Rights Management]), but these methods are ALWAYS bypassable. The only true security comes by not letting information which people aren’t supposed to have access to cross through their computer, and keeping track of all verifiable factual information on secure servers. A long time ago at an IGDA [International Game Developers Association] meeting (I only ever went to the one unfortunately :-\), I learned an interesting truth that hadn’t occurred to me before from the lecturer. That is, that companies that make games and other software [usually] know it will sooner or later be pirated/cracked**. The true intention of software DRM is to make it hard enough to crack to discourage the crackers into giving up, and to make it take long enough so that hopefully people stop waiting for a free copy and go ahead and buy it. By the time a piece of software is cracked (if it takes as long as they hope), the companies know the majority of the remainder of the people usually wouldn’t have bought it anyways. Now I’m done with the basic explanation of client side insecurities, back to the real reason for this post.


While it is actually proper to program safeguards into client side software, you can never rely on it for true security. Security measures should always be duplicated in both client and server software. There are two reasons off the top of my head for implementing security access restrictions into the client side of software. The first is to help remove strain on servers. There is no point in asking a server if something is valid when the client can immediately confirm that it isn’t. The second reason is for speed. It’s MUCH quicker if a client can detect a problem and instantly inform the user than having to wait for a server to answer, though this time is usually imperceptible to the user, it can really add up.

So I thought I’d give a couple of examples of this to help you understand more where I’m coming from. This is a very big problem in the software industry. I find exploitable instances of this kind of thing on a very regular basis. However, I generally don’t take advantage of such holes, and try to inform the companies/programmers if they’ll listen. The term for this is white hat hacking, as opposed to black hat.


First, a very basic example. Let’s say you have a folder on your website “/PersonalPictures” that you wanted to restrict access to with a password. The proper way to do it would be to restrict access to the whole folder and all files in it on the server side, requiring a password be sent to the server to view the contents of each file. This is normally done through Apache httpd (the most utilized web server software) with an “.htaccess” file and the mod_auth (authentication) module. The improper way to do it would be a page that forwarded to the “hidden” section with a JavaScript script like the following.

if(prompt('Please enter the password')=='SecretPassword')
	document.location.href='/PersonalPictures';

The problem with this code is two fold (besides the fact it pops up a request window :-) ). First, the password is exposed in plain text to the user. Fortunately, passwords are usually not as easy to find as this, but I have found passwords in web pages and Flash code before with some digging (yes, Flash files (and Java!) are 100% decompilable to their original source code, sans comments). The second problem is that once the person goes to the URL “/PersonalPictures”, they can get back there and to all files inside it without the password, and also give it freely to others (no need to mention the fact that the URL is written in plain text here, as it’s the same as with the password). This specific problem with JavaScript was much more prevalent in the old day when people ran their web pages through free hosting sites like Geocities (now owned and operated by Yahoo) which didn’t allow for proper password protection.

This kind of problem is still around on the web, though it morphed with the times into a new form. Many server side scripts I have found across the Internet assume their client side web pages can take care of security and ignore the necessary checks in the server scripts. For example, very recently I was on a website that only allowed me to add a few items to a list. The way it was done is that there was a form with a textbox that you submitted every time you wanted to add an entry to the list. After submitting, the page was reloaded with the updated list. After you added the maximum allowed number of items to the list, when the page refreshed, the form to add more was gone. This is incredibly easy to bypass however. The normal way to do this would be to just send the modified packets directly to the server with whatever information you want in it. The easier method would be to make your own form submission page and just submit to the proper URL all you want. The Firebug extension for Firefox however makes this kind of thing INCREDIBLY easy. All that needs to be done is to add an attribute to the form to send the requests to a new window “<form action=... method=... target=_blank>”, so the form is never erased/overwritten and you can keep sending requests all you want. Using Firebug, you can also edit the values of hidden input boxes for this kind of thing.

AJAX (Asynchronous JavaScript and XML - A tool used in web programming to send and receive data from a server without having to refresh a page) has often been lampooned as insecure for this kind of reason. In reality, the medium itself is not insecure at all; it’s just how people use it.


As a matter of fact, the majority of my best and most fun Ragnarok hacking was done with these methods. I just monitored the packets that came in and out of the system, reverse engineered how they were all structured, then made modifications and resent them myself to see what I could do. With this, I was able to do things like (These should be most of the exploits; listed in descending order of usefulness & severity):

  • Duplicate items
  • Crash the server (It was never fixed AFAIK, but I stopped playing 5+ years ago. I just put that it was fixed on my site so people wouldn’t look for it ^_^; )
  • Warp to any map from any warp location (warp locations are only supposed to link to 1 other map)
  • Spoof your name during chats (so you could pretend someone else was saying something - Ender’s game, anyone? ^_^)
  • Use certain skills of other classes (I have up pictures of my swordsman using merchant skills to house a selling shop)
  • Add skills points to an item on your skill tree that is not yet available (and use it immediately)
  • Warp back to save point without dying
  • Talk to NPCs on a map from any location on that map, and sometimes from other maps (great for selling items when in a dungeon)
  • Attack with weapons much quicker than was supposed to be allowed
  • Use certain skills on creatures from any location on a map no matter how far they are
  • Equip any item in any spot (so you could equip body armor on your head slot and get much more free armor defense points)
  • Run commands on your party/guild and in chat rooms as if you were the leader/admin
  • Rollback a characters stat’s to when you logged on that session (part of the dupe hack)
  • Bypass text repetition, length, and curse filters
  • Find out user account names

The original list is here; it should contain most of what I found. I took it down very soon after putting it up (replacement here) because I didn’t want to explicitly screw the game over with people finding out about these hacks (I had a lot of bad encounters with the company that ran the game, they refused to acknowledge or fix existing bugs when I reported them). There were so many things the server didn’t check just because the client wasn’t allowed to do them naturally.


Here are some very old news stories I saved up for when I wrote about this subject:


Just because you don’t give someone a way to do something doesn’t mean they won’t find a way.



*A server is a computer you connect to and a client is the connecting computer. So all you people connecting to this website are clients connecting to my web server.
**“Cracked” usually means to make a piece of software usable when it is not supposed to be, bypassing the DRM
Microsoft IIS Bug
Bad Programming: Only using file extensions as an indicator

According to a Microsoft KB article titled “Virtual directory names with executable extensions are not used correctly”, using a virtual folder ending in an executable extension (like .com, .exe, .dll, or .sh) under the web server for IIS [Microsoft’s Internet information services server suite] makes the contents inside the folder unviewable. This behavior itself is kind of silly, as you’d assume a web server would always check to see if something was a file or folder first.

Unfortunately, this doesn’t apply to just virtual folders, but all folders under an IIS web server, as I found out a few years ago when I backed up a site that I knew would be taken down very soon (ironically, because the company [SysInternals] was being taken over by Microsoft) and mirrored it on my Home Server, which runs IIS.

The solution I used was to add a character (in my case an underscore “_”) to the end of all the directory names ending in “.com” and then doing a global regular expression replace through all files in the mirror to replace any occurrences of these directories.


Search For: “(DOMAIN1|DOMAIN2|DOMAIN3)([\\/])
Replace With: “$1_/$2

I still plan on getting up some site mirrors of places that no longer exist and such for the miscellaneous section one of these days...

Custom Fonts in Web Browsers
Solutions for a strict medium

A very important part of the design world is fonts, but it is an unfortunately annoying part of web browser land. There are very few fonts that come by default with OSs and even less default ones that match each other across all OSs, so your website won’t look the same across all platforms unless you use the right combinations. It’s much pretty guaranteed that if you want anything even remotely special in terms of a font somewhere on your website, you will be out of luck to match it across all platforms.

The commonplace solution for this is, of course, creating images for whenever you need special fonts displayed. While this is the most elegant solution, it is only appropriate for special circumstances, and not normal site content, as image file sizes can get ridiculous, and you lose plain text advantages like searchability and search engine recognition. Another solution is to request the user to download the font, like here. While this is a valid solution, the vast majority of users would not download the font because, mostly, they don’t care enough, and secondly, people generally know not to go download unfamiliar files on the internet when they don’t have to, for security reasons.

This has actually been a problem for me recently as I realized some of the default fonts I use for my site, which have always come with Windows, do not have default equivalents that come with most Linux distributions, as I had assumed. That’s a topic for a different day though.


So I had a customer recently request the ability to dynamically display some text in a certain font, so I told him there are 2 solutions. The first would be to use JavaScript to load translucent PNG images, the second would be to embed a Flash applet, as Flash can store font files internally for use. So here are instructions and examples of both:


JavaScript + PNG Translucency (alpha blending) Method
There are 2 ways to create the PNG translucency in Photoshop; one easier but less effective way that doesn’t maintain quality, and a slightly more complex path with better results.
  • To start off for both paths, a screenshot (ALT+PRINT SCREEN to take only the current window) will need to be taken of the font rendered in black against a white background. This can be done in your favorite word processor as long as it properly renders with translucency, or (for Windows) by just going to the font file in “c:\windows\fonts” and opening it, which uses “fontview.exe”.
  • After you have the screenshot, open a new file in Photoshop (File > New OR CTRL+N) and paste the screenshot into a new layer (Edit > Paste OR CTRL+V)
  • Delete the background layer, which requires the layer window is open (Window > Layers OR F7 to toggle its display). Right click the text portion “Background” of the background layer, and choose “Delete Layer”.
  • Select the region that contains your font’s alphabet (M for selection tool) and crop it (Image > Crop).
  • You might want to zoom in at this point for easier viewing (CTRL++ for in, CTRL+- for out).
  • The easy way from there:
    • Deselect the area (Select > Deselect OR CTRL+D).
    • Select the Magic Wand tool (W), set Tolerance to 0, check Anti-Aliased, and uncheck Contiguous
    • Select a pure white pixel and then delete the selection (DELETE)
    • You now have a translucent image that you can save and use, but the translucency isn’t that of the original font, as that is not how the magic wand tool works.
    Example using “Aeolus True Type Font” (Set against a green background via HTML for example sake)
    Translucent Aeolus True Type Font Easy Method
  • The better way:
    • Add a mask to your current layer (Layer > Add Layer Mask > Reveal All)
    • Go to the channels window (Window > Channels to toggle its display, it should be in the same window as Layers, in a separate tab) and select either the red, green, or blue layer. It doesn’t matter which as they should all hold the exact same values (grayscale [white-black colors] have the same red, green, and blue values), so red channel (CTRL+1) is fine.
    • Copy the channel (CTRL+C) (the entire workspace should still be selected after the crop)
    • Select the mask channel (CTRL+\), and you also need to make it visible (toggle the little eyeball icon besides it)
    • Paste into the mask channel (CTRL+V), invert it (Image > Adjustments > Invert OR CTRL+I), and then make it invisible again (untoggle little eyeball icon besides it)
    • Reselect the RGB contents (CTRL+~) and flood fill it with black [or your color of choice]: Paint Bucket Tool (G), 255 tolerance, no antialias
    • You now have a translucent image of the font that you can save and use that has the original font quality. You can test it by adding a white layer below it.
    Example using “Aeolus True Type Font” (Set against a green background via HTML for example sake)
    Translucent Aeolus True Type Font Good Method
From there the image file can be split up into individual images called “a.png”, “b.png”, etc, and a simple JavaScript string could be used to convert a string to display the picture text like “'MyString'.replace(/(.)/g, '<img src="$1.png">')”.
Example (this is produced by JavaScript):

Internet Explorer 6 also has the added problem of not allowing translucent images, so a hack is needed for this. Basically, an element (like a blank image) needs to have its filter style set like the following (JavaScript DirectX hack...)
style.filter="progid:DXImageTransform.Microsoft.AlphaImageLoader(src='IMAGELOCATION', sizingMethod='scale')";


Flash Method
While this method is much quicker to complete and easier to pull off than the previous method, it is also more prone to problems and browser incompatibility. Flash and JavaScript never got along well enough in my book. Anywho, here’s the process. (Source file here)
  • In a new Flash document (v5.0+), create a text box with the following properties:
    • Type: “Dynamic Text”
    • var: MyText
    • Font: YOURFONTCHOICE
    • Embed (button): Select the set of characters the dynamic text box might display. The less glyphs you select, the smaller the output file will be. I included all alpha-numeric+punctuation in the below example (24.3KB).
  • That’s all you need for the Flash file, so all that’s left now is the JavaScript. The following function will set the text for you inside the movie. Also, you should set the embed (for normal browsers) and object (for IE) tags as different “id”s. The wmode is an important parameter here too, in that it makes the background invisible and the Flash applet more a part of the web page (not a “separate window”).
    <object width="300" height="40" id="CustomFontIE" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000">
    	<param name="movie" value="OtherContent/CustomFonts/CustomFont.swf">
    	<param wmode="transparent">
    	<embed src="OtherContent/CustomFonts/CustomFont.swf" wmode="transparent" width="300" height="40" id="CustomFont" type="application/x-shockwave-flash">
    </object>
    <script type="text/javascript">
    	var IsIE=(navigator.appName.indexOf('Microsoft')!=-1);
    	function SetFlashText(NewText) { document.getElementById('CustomFont'+(IsIE ? 'IE' : '')).SetVariable('MyText', NewText); }
    </script>
    		
Example: (Set against a green background via HTML for example sake)
Enter text here:
Flash applet:



Comparing Log File
Slow news day...

So for reasons I’m not going to go into, today I had to compare some log files. I was tempted to write the code in C, just because I miss it so much these days x.x;, but laziness won out, especially as there weren’t that many log files and they weren’t that large, so I wrote it in PHP.

Nothing else to this post except the working code which took me about 5 minutes to type out... The function goes through one directory and all of its subdirectories and checks all files against the same path in a second directory. If the file doesn’t exist in the second directory or its contents doesn’t match the first file up to the first file’s length, a message is outputted.


//Start the log run against 2 root directories
TestLogs("/DIR1", "/DIR2");

function TestLogs($RootDir1, $RootDir2, $CurDir="")
{
	//Iterate through the first directory
	$Dir1=opendir("$RootDir1$CurDir");
	$SubDirs=Array(); //Holds subdirectories
	while($File=readdir($Dir1))
		if($File=="." || $File=="..") //Skip . and ..
			continue;
		else if(is_dir("$RootDir1$CurDir/$File")) //Do not try to compare directory entries
			$SubDirs[]=$File; //Remember subdirectories
		else if(!file_exists("$RootDir2$CurDir/$File"))
			print "File '$CurDir/$File' does not exist in second directory.<br>";
		else if(file_get_contents("$RootDir1$CurDir/$File")!=substr(file_get_contents("$RootDir2$CurDir/$File"),0,filesize("$RootDir1$CurDir/$File"))) //Both files exist, so compare them - if first file does not equal second file up to the same length, output error
			print "'$CurDir/$File' does not match.<br>";
	
	//Run subdirectories recursively after current directories' file-run so directories do not get split up
	foreach($SubDirs as $NewDir)
		TestLogs($RootDir1, $RootDir2, "$CurDir/$NewDir");
}
Regular Expression Examples
Finding multiple domain’s name servers

Today I thought I’d give a demonstration on the use of regular expressions [reference page here]. Regular expressions are basically a simplified scripting language for finding and replacing complex text strings, and are implemented into much of today’s software which involve a lot of text editing. They are a fabulously handy tool for computer users and are especially useful for programmers. I believe RegExps actually originally gained their notoriety through the Perl programming language. I also recently heard that it is definite that the new version of C++ (C++0x) will have native library support for regular expressions, yay!

Since I posted yesterday on DNS stuff, and have the examples from it handy, I figured I’d use those :-).


Let’s say you had a group of .com domains and wanted to find out their name servers (I’ve had to do this when switching to new name servers to make sure all the domains we did not control at the registrar level had their name servers set to the new ones). For this example, we will use the following domains “castledragmire.com”, “riaboy.com”, “NonExistantDomainA.com”, and “dakusan.com”.

  • First, we’d need to have the list of the domains, for this example, one domain per line is used.
    castledragmire.com
    riaboy.com
    NonExistantDomainA.com
    dakusan.com
  • Next, we need to turn them into a bash (Linux) script to grab all the information we need.
    Replace: “^(.*)$
    With: “echo '!?$1?!'; host -t ns $1 a.gtld-servers.net | grep ' name server ';”
    Sample output: (The !? ?! stuff are markers for easier viewing and parsing)
    echo '!?castledragmire.com?!'; host -t ns castledragmire.com a.gtld-servers.net | grep ' name server ';
    echo '!?riaboy.com?!'; host -t ns riaboy.com a.gtld-servers.net | grep ' name server ';
    echo '!?NonExistantDomainA.com?!'; host -t ns NonExistantDomainA.com a.gtld-servers.net | grep ' name server ';
    echo '!?dakusan.com?!'; host -t ns dakusan.com a.gtld-servers.net | grep ' name server ';
  • Next, we run the script, and it would output the following:
    !?castledragmire.com?!
    castledragmire.com name server ns3.deltaarc.com.
    castledragmire.com name server ns4.deltaarc.com.
    !?riaboy.com?!
    riaboy.com name server ns3.deltaarc.com.
    riaboy.com name server ns4.deltaarc.com.
    !?NonExistantDomainA.com?!
    !?dakusan.com?!
    dakusan.com name server ns3.deltaarc.com.
    dakusan.com name server ns4.deltaarc.com.
  • Next, we would keep running the following regular expression until no more replacements are found.
    This would combine all domains with multiple name servers onto one line with name servers separated by spaces.
    Replace: “(.*?) name server (.*)\n\1 name server (.*)
    With: “$1 name server $2 $3
    It would output the following:
    !?castledragmire.com?!
    castledragmire.com name server ns3.deltaarc.com. ns4.deltaarc.com.
    !?riaboy.com?!
    riaboy.com name server ns3.deltaarc.com. ns4.deltaarc.com.
    !?NonExistantDomainA.com?!
    !?dakusan.com?!
    dakusan.com name server ns3.deltaarc.com. ns4.deltaarc.com.
  • The final regular expression would turn the output into a single line per domain, followed by its domain servers. The current extra line before the list of name servers is to help spot any domains that did not provide us with name servers.
    Replace: “!\?(.*?)\?!\n\1 name server (.*)
    With: “#$1 \t $2
    Which would output the final following data:
    #castledragmire.com ns3.deltaarc.com. ns4.deltaarc.com.
    #riaboy.com ns3.deltaarc.com. ns4.deltaarc.com.
    !?NonExistantDomainA.com?!
    #dakusan.com ns3.deltaarc.com. ns4.deltaarc.com.
    This data could be directly pasted into Excel, which would put the first column as domains and second column as name servers).
Diagnosing DNS Problems
Digging until you find the root

Yesterday I wrote a bit about the DNS system being rather fussy, so I thought today I’d go a bit more into how DNS works, and some good tools for problem solving in this area.


First, some technical background on the subject is required.
  • A network is simply a group of computers hooked together to communicate with each other. In the old days, all networking was done through physical wires (called the medium), but nowadays much of it is done through wireless connections. Wired networking is still required for the fastest communications, and is especially important for major backbones (the super highly utilized lines that connect networks together across the world).
  • A LAN is a local network of all computers connected together in one physical location, whether it be a single room, a building, or a city. Technically, a LAN doesn’t have to be localized in one area, but it is preferred, and we will just assume it is so for arguments sake :-).
  • A WAN is a Wide (Area) Network that connects multiple LANs together. This is what the Internet is.
  • The way one computer finds another computer on a network is through its IP Address [hereby referred to as IPs in this post only]. There are other protocols, but this (TCP/IP) is by far the most widely utilized and is the true backbone of the Internet. IPs are like a house’s address (123 Fake Street, Theoretical City, Made Up Country). To explain it in a very simplified manner (this isn’t even remotely accurate, as networking is a complicated topic, but this is a good generalization), IPs have 4 sections of numbers ranging from 0-255 (1 byte). For example, 67.45.32.28 is a (class 4) IP. Each number in that address is a broader location, so the “28” is like a street address, “32” is the street, “45” is the city, and “67” is the country. When you send a packet from your computer, it goes to your local (street) router which then passes it to the city router and so on until it reaches its destination. If you are in the same city as the final destination of the packet, then it wouldn’t have to go to the country level.
  • The final important part of networking (for this post) is the domain system (DNS) itself. A domain is a label for an IP Address, like calling “1600 Pennsylvania Avenue” as “The White House”. As an example, “www.castledragmire.com” just maps to my web server at “209.85.115.128” (this is the current IP, it will change if the site is ever moved to a new server).

Next is a brief lesson on how DNS itself works:
  • The root DNS servers (a.root-servers.net through m.root-servers.net) point to the servers that hold top-level-domain information (.com, .org., .net, .jp, etc)
    Examples of these servers are as follows:
    auns1.audns.net.au
    bizE.GTLD.biz
    caCA04.CIRA.ca
    cnA.DNS.cn
    com&netA.GTLD-SERVERS.NET
    deZ.NIC.de
    euU.NIC.eu
    infoB9.INFO.AFILIAS-NST.ORG
    orgTLD1.ULTRADNS.NET
    tvC5.NSTLD.COM
  • Next, these root name servers (like A.GTLD-SERVERS.NET through M.GTLD-SERVERS.NET for .com) hold two main pieces of information for ALL domains under their top-level-domain jurisdiction:
    • The registrar where the domain was registered
    • The name server(s) that are responsible for the domain
    Only registrars can talk to these root servers, so you have to go through the registrar to change the name server information.
  • The final lowest rung in the DNS hierarchy is name servers. Name servers hold all the actual addressing information for a domain and can be run by anyone. The 2 most important (or maybe relevant is a better word...) types of DNS records are:
    • A: There should be many of these, each pointing a domain or subdomain (castledragmire.com, www.castledragmire.com, info.castledragmire.com, ...) to a specific IP address (version 4)
    • SOA: Start of Authority - There is only one of these records per domain, and it specifies authoritative information including the primary name server, the domain administrator’s email, the domain serial number, and several timeout values relating to refreshing domain information.

Now that we have all the basics down, on to the actual reason for this post. It’s really a nuisance trying to explain to people why their domain isn’t working, or is pointing to the wrong place. So here’s why it happens!

Back in the old days, it often took days for DNS propagation to happen after you made changes at your registrar or elsewhere, but fortunately, this problem is of the past. The reason for this is that ISPs and/or routers cached domain lookups and only refreshed them according to the metrics in the SOA record mentioned above, as they were supposed to. This was done for network speed reasons, as I believe older OSs might not have cached domains (wild speculation), and ISPs didn’t want to look up the address for a domain every time it was requested. Now, though, I rarely see caching on any level except at the local computer; not only on the OS level, but even some programs cache domains, like FireFox.

So the answer for when a person is getting the wrong address for a domain, and you know it is set correctly, is usually to just reboot. Clearing the DNS cache works too (for the OS level), but explaining how to do that is harder than saying “just reboot” ^_^;.

To clear the DNS cache in XP, enter the following into your “run” menu or in the command prompt: “ipconfig /flushdns”. This does not ALWAYS work, but it should work.


If your domain is still resolving to the wrong address when you ping it after your DNS cache is cleared, the next step is to see what name servers are being used for the information. You can do a whois on your domain to get the information directly form the registrar who controls the domain, but be careful where you do this as you never know what people are doing with the information. For a quick and secure whois, you can use “whois” from your linux command line, which I have patched through to a web script here. This script gives both normal and extended information, FYI.

Whois just tells you the name servers that you SHOULD be contacting, it doesn’t mean these are the ones you are asking, as the root DNS servers may not have updated the information yet. This is where our command line programs come into play.

In XP, you can use “nslookup -query=hinfo DOMAINNAME” and “nslookup -query=soa DOMAINNAME” to get a domain’s name servers, and then “nslookup NAMESERVER DOMAINNAME” to get the IP the name server points too. For example: (Important information in the following examples are bolded and in white)

C:\>nslookup -query=hinfo castledragmire.com
Server:  dns-redirect-lb-01.texas.rr.com
Address:  24.93.41.127

castledragmire.com
        primary name server = ns3.deltaarc.com
        responsible mail addr = admins.deltaarc.net
        serial  = 2007022713
        refresh = 14400 (4 hours)
        retry   = 7200 (2 hours)
        expire  = 3600000 (41 days 16 hours)
        default TTL = 86400 (1 day)

C:\>nslookup -query=soa castledragmire.com
Server:  dns-redirect-lb-01.texas.rr.com
Address:  24.93.41.127

Non-authoritative answer:
castledragmire.com
        primary name server = ns3.deltaarc.com
        responsible mail addr = admins.deltaarc.net
        serial  = 2007022713
        refresh = 14400 (4 hours)
        retry   = 7200 (2 hours)
        expire  = 3600000 (41 days 16 hours)
        default TTL = 86400 (1 day)

castledragmire.com      nameserver = ns4.deltaarc.com
castledragmire.com      nameserver = ns3.deltaarc.com
ns3.deltaarc.com        internet address = 216.127.92.71

C:\>nslookup ns3.deltaarc.com castledragmire.com
Server:  ev1s-209-85-115-128.theplanet.com
Address:  209.85.115.128

Name:    ns3.deltaarc.com
Address:  216.127.92.71

Nslookup is also available in Linux, but Linux has a better tool for this, as nslookup itself doesn’t always seem to give the correct answers, for some reason. So I recommend you use dig if you have it or Linux available to you. So with dig, we just start at the root name servers and work our way up to the SOA name server to get the real information of where the domain is resolving to and why.

root@www [~]# dig @a.root-servers.net castledragmire.com

; <<>> DiG 9.2.4 <<>> @a.root-servers.net castledragmire.com
; (2 servers found)
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 5587
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 13, ADDITIONAL: 14

;; QUESTION SECTION:
;castledragmire.com.            IN      A

;; AUTHORITY SECTION:
com.                    172800  IN      NS      H.GTLD-SERVERS.NET.
com.                    172800  IN      NS      I.GTLD-SERVERS.NET.
com.                    172800  IN      NS      J.GTLD-SERVERS.NET.
com.                    172800  IN      NS      K.GTLD-SERVERS.NET.
com.                    172800  IN      NS      L.GTLD-SERVERS.NET.
com.                    172800  IN      NS      M.GTLD-SERVERS.NET.
com.                    172800  IN      NS      A.GTLD-SERVERS.NET.
com.                    172800  IN      NS      B.GTLD-SERVERS.NET.
com.                    172800  IN      NS      C.GTLD-SERVERS.NET.
com.                    172800  IN      NS      D.GTLD-SERVERS.NET.
com.                    172800  IN      NS      E.GTLD-SERVERS.NET.
com.                    172800  IN      NS      F.GTLD-SERVERS.NET.
com.                    172800  IN      NS      G.GTLD-SERVERS.NET.

;; ADDITIONAL SECTION:
A.GTLD-SERVERS.NET.     172800  IN      A       192.5.6.30
A.GTLD-SERVERS.NET.     172800  IN      AAAA    2001:503:a83e::2:30
B.GTLD-SERVERS.NET.     172800  IN      A       192.33.14.30
B.GTLD-SERVERS.NET.     172800  IN      AAAA    2001:503:231d::2:30
C.GTLD-SERVERS.NET.     172800  IN      A       192.26.92.30
D.GTLD-SERVERS.NET.     172800  IN      A       192.31.80.30
E.GTLD-SERVERS.NET.     172800  IN      A       192.12.94.30
F.GTLD-SERVERS.NET.     172800  IN      A       192.35.51.30
G.GTLD-SERVERS.NET.     172800  IN      A       192.42.93.30
H.GTLD-SERVERS.NET.     172800  IN      A       192.54.112.30
I.GTLD-SERVERS.NET.     172800  IN      A       192.43.172.30
J.GTLD-SERVERS.NET.     172800  IN      A       192.48.79.30
K.GTLD-SERVERS.NET.     172800  IN      A       192.52.178.30
L.GTLD-SERVERS.NET.     172800  IN      A       192.41.162.30

;; Query time: 240 msec
;; SERVER: 198.41.0.4#53(198.41.0.4)
;; WHEN: Sat Aug 23 04:15:28 2008
;; MSG SIZE  rcvd: 508

root@www [~]# dig @a.gtld-servers.net castledragmire.com

; <<>> DiG 9.2.4 <<>> @a.gtld-servers.net castledragmire.com
; (2 servers found)
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 35586
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 2, ADDITIONAL: 2

;; QUESTION SECTION:
;castledragmire.com.            IN      A

;; AUTHORITY SECTION:
castledragmire.com.     172800  IN      NS      ns3.deltaarc.com.
castledragmire.com.     172800  IN      NS      ns4.deltaarc.com.

;; ADDITIONAL SECTION:
ns3.deltaarc.com.       172800  IN      A       216.127.92.71
ns4.deltaarc.com.       172800  IN      A       209.85.115.181

;; Query time: 58 msec
;; SERVER: 192.5.6.30#53(192.5.6.30)
;; WHEN: Sat Aug 23 04:15:42 2008
;; MSG SIZE  rcvd: 113

root@www [~]# dig @ns3.deltaarc.com castledragmire.com

; <<>> DiG 9.2.4 <<>> @ns3.deltaarc.com castledragmire.com
; (1 server found)
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 26198
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 0

;; QUESTION SECTION:
;castledragmire.com.            IN      A

;; ANSWER SECTION:
castledragmire.com.     14400   IN      A       209.85.115.128

;; AUTHORITY SECTION:
castledragmire.com.     14400   IN      NS      ns4.deltaarc.com.
castledragmire.com.     14400   IN      NS      ns3.deltaarc.com.

;; Query time: 1 msec
;; SERVER: 216.127.92.71#53(216.127.92.71)
;; WHEN: Sat Aug 23 04:15:52 2008
;; MSG SIZE  rcvd: 97

Linux also has the “host” command, but I prefer and recommend “dig”.


And that’s how you diagnose DNS problems! :-). For reference, two common DNS configuration problems are not having your SOA and NS records properly set for the domain on your name server.


I also went ahead and added dig to the “Useful Bash commands and scripts” post.

Language Optimization Techniques
A few tricks up the programmers sleeve

I’m gonna cheat today since it is really late, as I spent a good amount of time organizing the 3D Engines update which pushed me a bit behind, and I’m also exhausted. Instead of writing some more content, I’m just linking to the “Utilized Optimization Techniques” section of the 3D Engines project, which I put up today.

It describes 4 programming speed optimization tricks: Local variable assignment, precalculating index lookups, pointer transversing/addition, and loop unrolling. This project post also goes into some differences between the used languages [Flash, C++, and Java], especially when dealing with speed.

Data Format Conversion
Moving from Point A to Point B

I am often asked to transfer data sets into MySQL databases, or other formats. In this case, I’ll use a Microsoft Excel file without line breaks in the fields to MySQL as an example. While there are many programs out there to do this kind of thing, this method doesn’t take too long and is a good example use of regular expressions.


First, select all the data in Excel (ctrl+a) and copy (ctrl+c) it to a text editor with regular expression support. I recommend EditPad Pro as a very versatile and powerful text editor.

Next, we need to turn each row into the format “('FIELD1','FIELD2','FIELD3',...),”. Four regular expressions are needed to format the data:

SearchReplaceExplanation
'\\'Escape single quotes
\t','Separate fields and quote as strings
^('Start of row
$'),End of row
From there, there are only 2 more steps to complete the query.
  • Add the start of the query: “INSERT INTO TABLENAME VALUES”
  • End the query by changing the last row's comma “,” at the very end of the line to a semi-colon “;”.

For example:
a	b	c
d	e	f
g	h	i
would be converted to
INSERT INTO MyTable VALUES
('a','b','c'),
('d','e','f'),
('h','h','i');

Sometimes queries may get too long and you will need to separate them by performing the “2 more steps to complete the query” from above.


After doing one of these conversions recently, I was also asked to make the data searchable, so I made a very simple PHP script for this.

This script lets you search through all the fields and lists all matches. The fields are listed on the 2nd line in an array as "SQL_FieldName"=>"Viewable Name". If the “Viewable Name” contains a pound sign “#” it is matched exactly, otherwise, only part of the search string needs to be found.

<?
$Fields=Array('ClientNumber'=>'Client #', 'FirstName'=>'First Name', 'LastName'=>'Last Name', ...); //Field list
print '<form method=post action=index.php><table>'; //Form action needs to point to the current file
foreach($Fields as $Name => $Value) //Output search text boxes
	print "<tr><td>$Value</td><td><input name=\"$Name\" style='width:200px;' value=\"".
		(isset($_POST[$Name]) ? htmlentities($_POST[$Name], ENT_QUOTES) : '').'"></td></tr>';//Text boxes w/ POSTed values,if set
print '</table><input type=submit value=Search></form>';

if(!isset($_POST[key($Fields)])) //If search data has not been POSTed, stop here
	return;
	
$SearchArray=Array('1=1'); //Search parameters are stored here. 1=1 is passed in case no POSTed search parameter are ...
                           //... requested so there is at least 1 WHERE parameter, and is optimized out with the MySQL preprocessor anyways.
foreach($Fields as $Name => $Value) //Check each POSTed search parameter
	if(trim($_POST[$Name])!='') //If the POSTed search parameter is empty, do not use it as a search parameter
	{
		$V=mysql_escape_string($_POST[$Name]); //Prepare for SQL insertion
		$SearchArray[]=$Name.(strpos($Value, '#')===FALSE ? " LIKE '%$V%'" : "='$V'"); //Pound sign in the Viewable Name=exact ...
			//... value, otherwise, just a partial patch
	}
//Get data from MySQL
mysql_connect('SQL_HOST', 'SQL_USERNAME', 'SQL_PASSWORD');
mysql_select_db('SQL_DATABASE');
$q=mysql_query('SELECT * FROM TABLENAME WHERE '.implode(' AND ', $SearchArray));

//Output retrieved data
$i=0;
while($d=mysql_fetch_assoc($q)) //Iterate through found rows
{
	if(!($i++)) //If this is the first row found, output header
	{
		print '<table border=1 cellpadding=0 cellspacing=0><tr><td>Num</td>'; //Start table and output first column header (row #)
		foreach($Fields as $Name => $Value) //Output the rest of the column headers (Viewable Names)
			print "<td>$Value</td>";
		print '</tr>'; //Finish header row
	}
	print '<tr bgcolor='.($i&1 ? 'white' : 'gray')."><td>$i</td>"; //Start the data field's row. Row's colors are alternating white and gray.
	foreach($Fields as $Name => $Value) //Output row data
		print '<td>'.$d[$Name].'</td>';
	print '</tr>'; //End data row
}

print ($i==0 ? 'No records found.' : '</table>'); //If no records are found, output an error message, otherwise, end the data table
?>
C Jump Tables
The unfortunate reality of different feature sets in different language implementations

I was thinking earlier today how it would be neat for C/C++ to be able to get the address of a jump-to label to be used in jump tables, specifically, for an emulator. A number of seconds after I did a Google query, I found out it is possible in gcc (the open source native Linux compiler) through the “label value operator” “&&”. I am crushed that MSVC doesn’t have native support for such a concept :-(.

The reason it would be great for an emulator is for emulating the CPU, in which, usually, each first byte of a CPU instruction’s opcode [see ASM] gives what the instruction is supposed to do. An example to explain the usefulness of a jump table is as follows:

void DoOpcode(int OpcodeNumber, ...)
{
	void *Opcodes[]={&&ADD, &&SUB, &&JUMP, &&MUL}; //assuming ADD=opcode 0 and so forth
	goto *Opcodes[OpcodeNumber];
  	ADD:
		//...
	SUB:
		//...
	JUMP:
		//...
	MUL:
		//...
}

Of course, this could still be done with virtual functions, function pointers, or a switch statement, but those are theoretically much slower. Having them in separate functions would also remove the possibility of local variables.

Although, again, theoretically, it wouldn’t be too bad to use, I believe, the _fastcall function calling convention with function pointers, and modern compilers SHOULD translate switches to jump tables in an instance like this, but modern compilers are so obfuscated you never know what they are really doing.

It would probably be best to try and code such an instance so that all 3 methods (function pointers, switch statement, jump table) could be utilized through compiler definitions, and then profile for whichever method is fastest and supported.

//Define the switch for which type of opcode picker we want
#define UseSwitchStatement
//#define UseJumpTable
//#define UseFunctionPointers

//Defines for how each opcode picker acts
#if defined(UseSwitchStatement)
	#define OPCODE(o) case OP_##o:
#elif defined(UseJumpTable)
	#define OPCODE(o) o:
	#define GET_OPCODE(o) &&o
#elif defined(UseFunctionPointers)
	#define OPCODE(o) void Opcode_##o()
	#define GET_OPCODE(o) (void*)&Opcode_##o
	//The above GET_OPCODE is actually a problem since the opcode functions aren't listed until after their ...
	//address is requested, but there are a couple of ways around that I'm not going to worry about going into here.
#endif

enum {OP_ADD=0, OP_SUB}; //assuming ADD=opcode 0 and so forth
void DoOpcode(int OpcodeNumber, ...)
{
	#ifndef UseSwitchStatement //If using JumpTable or FunctionPointers we need an array of the opcode jump locations
		void *Opcodes[]={GET_OPCODE(ADD), GET_OPCODE(SUB)}; //assuming ADD=opcode 0 and so forth
	#endif
	#if defined(UseSwitchStatement)
		switch(OpcodeNumber) { //Normal switch statement
	#elif defined(UseJumpTable)
		goto *Opcodes[OpcodeNumber]; //Jump to the proper label
	#elif defined(UseFunctionPointers)
		*(void(*)(void))Opcodes[OpcodeNumber]; //Jump to the proper function
		} //End the current function
	#endif

	//For testing under "UseFunctionPointers" (see GET_OPCODE comment under "defined(UseFunctionPointers)")
	//put the following OPCODE sections directly above this "DoOpcode" function
	OPCODE(ADD)
	{
		//...
	}
	OPCODE(SUB)
	{
		//...
	}

	#ifdef UseSwitchStatement //End the switch statement
	}
	#endif

#ifndef UseFunctionPointers //End the function
}
#endif

After some tinkering, I did discover through assembly insertion it was possible to retrieve the offset of a label in MSVC, so with some more tinkering, it could be utilized, though it might be a bit messy.
void ExamplePointerRetreival()
{
	void *LabelPointer;
	TheLabel:
	_asm mov LabelPointer, offset TheLabel
}
Outputting directory contents in PHP
Rebuilding the wheel
A friend just asked me to write a PHP function to list all the contents of a directory and its sub-directories.
Nothing special here... just a simple example piece of code and boredom...
function ListContents($DirName)
{
	print '<ul>';
	$dir=opendir($DirName);
	while($file=readdir($dir))
		if($file!='.' && $file!='..')
		{
			$FilePath="$DirName/$file";
			$IsDir=is_dir($FilePath);
			print "<li>$file [".($IsDir ? 'D' : number_format(filesize($FilePath), 0, '.', ',')).']';
			if($IsDir)
				ListContents($FilePath);
			print '</li>';
		}
	closedir($dir);
	print '</ul>';
}
It wouldn’t be a bad idea to turn off PHP’s “output buffering” and on “implicit flush” when running something like this for larger directories.
Example output for “ListContents('c:\\temp');”:
  • A.BMP [230]
  • Dir1 [D]
    • codeblocks-1.0rc2_mingw.exe [13,597,181]
    • Dir1a [D]
      • DEBUGUI.C [25,546]
  • Dir2 [D]
    • Dir3 [D]
      • HW.C [12,009]
      • INIFILE.C [9,436]
    • NTDETECT.COM [47,564]


    I decided to make it a little nicer afterwards by bolding the directories, adding their total size, and changing sizes to a human readable format. This function is a lot more memory intensive because it holds data in strings instead of immediately outputting.
    function HumanReadableSize($Size)
    {
    	$MetricSizes=Array('Bytes', 'KB', 'MB', 'GB', 'TB');
    	for($SizeOn=0;$Size>=1024 && $SizeOn<count($MetricSizes)-1;$SizeOn++) //Loops until Size is < a binary thousand (1,024) or we have run out of listed Metric Sizes
    		$Size/=1024;
    	return preg_replace('/\\.?0+$/', '', number_format($Size, 2, '.', ',')).' '.$MetricSizes[$SizeOn]; //Forces to a maximum of 2 decimal places, adds comma at thousands place, appends metric size
    }
    
    function ListContents2($DirName, &$RetSize)
    {
    	$Output='<ul>';
    	$dir=opendir($DirName);
    	$TotalSize=0;
    	while($file=readdir($dir))
    		if($file!='.' && $file!='..')
    		{
    			$FilePath="$DirName/$file";
    			if(is_dir($FilePath)) //Is directory
    			{
    				$DirContents=ListContents2($FilePath, $DirSize);
    				$Output.="<li><b>$file</b> [".HumanReadableSize($DirSize)."]$DirContents</li>";
    				$TotalSize+=$DirSize;
    			}
    			else //Is file
    			{
    				$FileSize=filesize($FilePath);
    				$Output.="<li>$file [".HumanReadableSize($FileSize).']</li>';
    				$TotalSize+=$FileSize;
    			}
    		}
    	closedir($dir);
    	$RetSize=$TotalSize;
    	$Output.='</ul>';
    	return $Output;
    }
    
    Example output for “print ListContents2('c:\\temp', $Dummy);”:
    • A.BMP [230 Bytes]
    • Dir1 [12.99 MB]
      • codeblocks-1.0rc2_mingw.exe [12.97 MB]
      • Dir1a [24.95 KB]
        • DEBUGUI.C [24.95 KB]
    • Dir2 [0 Bytes]
      • Dir3 [20.94 KB]
        • HW.C [11.73 KB]
        • INIFILE.C [9.21 KB]
      • NTDETECT.COM [46.45 KB]


      The memory problem can be rectified through a little extra IO by calculating the size of a directory before its contents is listed, thereby not needing to keep everything in a string.
      function CalcDirSize($DirName)
      {
      	$dir=opendir($DirName);
      	$TotalSize=0;
      	while($file=readdir($dir))
      		if($file!='.' && $file!='..')
      			$TotalSize+=(is_dir($FilePath="$DirName/$file") ? CalcDirSize($FilePath) :  filesize($FilePath));
      	closedir($dir);
      	return $TotalSize;
      }
      
      function ListContents3($DirName)
      {
      	print '<ul>';
      	$dir=opendir($DirName);
      	$TotalSize=0;
      	while($file=readdir($dir))
      		if($file!='.' && $file!='..')
      		{
      			$FilePath="$DirName/$file";
      			$IsDir=is_dir($FilePath);
      			$FileSize=($IsDir ? CalcDirSize($FilePath) : filesize($FilePath));
      			$TotalSize+=$FileSize;
      			print '<li>'.($IsDir ? '<b>' : '').$file.($IsDir ? '</b>' : '').' ['.HumanReadableSize($FileSize).']';
      			if($IsDir) //Is directory
      				$TotalSize+=ListContents3($FilePath);
      			print '</li>';
      		}
      	closedir($dir);
      	print '</ul>';
      }
      
      Example output: for “ListContents3('c:\\temp');”:
      • A.BMP [230 Bytes]
      • Dir1 [12.99 MB]
        • codeblocks-1.0rc2_mingw.exe [12.97 MB]
        • Dir1a [24.95 KB]
          • DEBUGUI.C [24.95 KB]
      • Dir2 [0 Bytes]
        • Dir3 [20.94 KB]
          • HW.C [11.73 KB]
          • INIFILE.C [9.21 KB]
        • NTDETECT.COM [46.45 KB]


        Of course, after all this, my friend took the original advice I gave him before writing any of this code, which was that using bash commands might get him to his original goal much easier.
        Useful Bash commands and scripts
        Unix is so great
        First, to find out more about any bash command, use
        man COMMAND

        Now, a primer on the three most useful bash commands: (IMO)
        find:
        Find will search through a directory and its subdirectories for objects (files, directories, links, etc) satisfying its parameters.
        Parameters are written like a math query, with parenthesis for order of operations (make sure to escape them with a “\”!), -a for boolean “and”, -o for boolean “or”, and ! for “not”. If neither -a or -o is specified, -a is assumed.
        For example, to find all files that contain “conf” but do not contain “.bak” as the extension, OR are greater than 5MB:
        find -type f \( \( -name "*conf*" ! -name "*.bak" \) -o -size +5120k \)
        Some useful parameters include:
        • -maxdepth & -mindepth: only look through certain levels of subdirectories
        • -name: name of the object (-iname for case insensitive)
        • -regex: name of object matches regular expression
        • -size: size of object
        • -type: type of object (block special, character special, directory, named pipe, regular file, symbolic link, socket, etc)
        • -user & -group: object is owned by user/group
        • -exec: exec a command on found objects
        • -print0: output each object separated by a null terminator (great so other programs don’t get confused from white space characters)
        • -printf: output specified information on each found object (see man file)

        For any number operations, use:
        +nfor greater than n
        -nfor less than n
        nfor exactly than n

        For a complete reference, see your find’s man page.

        xargs:
        xargs passes piped arguments to another command as trailing arguments.
        For example, to list information on all files in a directory greater than 1MB: (Note this will not work with paths with spaces in them, use “find -print0” and “xargs -0” to fix this)
        find -size +1024k | xargs ls -l
        Some useful parameters include:
        • -0: piped arguments are separated by null terminators
        • -n: max arguments passed to each command
        • -i: replaces “{}” with the piped argument(s)

        So, for example, if you had 2 mirrored directories, and wanted to sync their modification timestamps:
        cd /ORIGINAL_DIRECTORY
        find -print0 | xargs -0 -i touch -m -r="{}" "/MIRROR_DIRECTORY/{}"

        For a complete reference, see your xargs’s man page.

        grep:
        GREP is used to search through data for plain text, regular expression, or other pattern matches. You can use it to search through both pipes and files.
        For example, to get your number of CPUs and their speeds:
        cat /proc/cpuinfo | grep MHz
        Some useful parameters include:
        • -E: use extended regular expressions
        • -P: use perl regular expression
        • -l: output files with at least one match (-L for no matches)
        • -o: show only the matching part of the line
        • -r: recursively search through directories
        • -v: invert to only output non-matching lines
        • -Z: separates matches with null terminator

        So, for example, to list all files under your current directory that contain “foo1”, “foo2”, or “bar”, you would use:
        grep -rlE "foo(1|2)|bar"

        For a complete reference, see your grep’s man page.

        And now some useful commands and scripts:
        List size of subdirectories:
        du --max-depth=1
        The --max-depth parameter specifies how many sub levels to list.
        -h can be added for more human readable sizes.

        List number of files in each subdirectory*:
        #!/bin/bash
        export IFS=$'\n' #Forces only newlines to be considered argument separators
        for dir in `find -type d -maxdepth 1`
        do
        	a=`find $dir -type f | wc -l`;
        	if [ $a != "0" ]
        	then
        		echo $dir $a
        	fi
        done
        
        and to sort those results
        SCRIPTNAME | sort -n -k2

        List number of different file extensions in current directory and subdirectories:
        find -type f | grep -Eo "\.[^\.]+$" | sort | uniq -c | sort -nr

        Replace text in file(s):
        perl -i -pe 's/search1/replace1/g; s/search2/replace2/g' FILENAMES
        If you want to make pre-edit backups, include an extension after “-i” like “-i.orig”

        Perform operations in directories with too many files to pass as arguments: (in this example, remove all files from a directory 100 at a time instead of using “rm -f *”)
        find -type f | xargs -n100 rm -f

        Force kill all processes containing a string:
        killall -9 STRING

        Transfer MySQL databases between servers: (Works in Windows too)
        mysqldump -u LOCAL_USER_NAME -p LOCAL_DATABASE | mysql -u REMOTE_USER_NAME -p -D REMOTE_DATABASE -h REMOTE_SERVER_ADDRESS
        “-p” specifies a password is needed

        Some lesser known commands that are useful:
        screen: This opens up a virtual console session that can be disconnected and reconnected from without stopping the session. This is great when connecting to console through SSH so you don’t lose your progress if disconnected.
        htop: An updated version of top, which is a process information viewer.
        iotop: A process I/O (input/output - hard drive access) information viewer. Requires Python ? 2.5 and I/O accounting support compiled into the Linux kernel.
        dig: Domain information retrieval. See “Diagnosing DNS Problems” Post for more information.

        More to come later...

        *Anything staring with “#!/bin/bash” is intended to be put into a script.
        GreaseMonkey, FireBug, and JavaScripting
        Keeping up with the webmasters
        A few days ago I threw together a script for a friend in GreaseMonkey (a FireFox extension) that removes the side banner from Demonoid. It was as follows (JavaScript).
        var O1=document.getElementById('navtower').parentNode;
        O1.parentNode.removeChild(O1);

        This simple snippet is a useful example that is used for a lot of webpage operations. Most web page scripting just involves finding objects and then manipulating them and their parent objects. There are two common ways to get the reference to objects on a web page. One is document.getElementById, and another is through form objects in the DOM.
        With the first getElementById, you can get any object by passing it’s id tag, for example,
        <div id=example>
        <script language=JavaScript>
        	var MyObject=document.getElementById('example');
        </script>
        
        This function is used so often, many frameworks also abbreviate it with a function:
        function GE(Name) { return document.getElementById(Name); }
        I know of at least one framework that actually names the function as just a dollar sign $.

        The second way is through the name tag on objects, which both the form and any of its form elements require. Only form elements like input, textarea, and select can use this.
        <body>
        	<form name=MyForm>
        		<input type=text name=ExampleText value=Example>
        	</form>
        	<script language=JavaScript>
        		document.MyForm.ExampleText.value='New Example'; //Must use format document.FormName.ObjectName
        	</script>
        </body>
        
        This is the very basis of all JavaScript/web page (client side only) programming. The rest is just learning all the types of objects with their functions and properties.

        So, anyways, yesterday, Demonoid changed their page so it no longer worked. All that needed to be done was change the 'navtower' to 'smn' because they renamed the object (and made it an IFrame). This kind of information is very easy to find and edit using a very nice and useful FireFox extension called FireBug. I have been using this for a while to develop web pages and do editing (for both designing and JavaScript coding) and highly recommend it.
        FireBug in Action