Getting photos for use with Open Photo Roster

Originally posted on Jul 4, 2017.

Internally we have a service that was set up for getting user id photos.  You point a web browser at a URL with the staff/student number of a particular users at the end of the address and you will be presented with their photo.

I developed a very simple bash script to look up who was in Blackboard and get their photos.  This will be very particular to our environment but others may find it of interest or of use.  I have changed names of servers/locations.  I have no formal education in scripting, I just learn by googling   So there may be some bad practice within these scripts.

Script 1 - Get all the photos.


First of all I needed to get the photos.

Here is the script with explanations.

First the usual stuff I put at the beginning of a script:
#!/bin/sh 
echo "$0 starting at `/bin/date`" 
echo ""
Next I need to find out who the users are.  Since we get a full set of user data for our flatfile SIS user integration I can use that.  The script looks through the list and removes any users with the status of Other, then removes any users who are unavailable, this gives us a full set of staff and students.  Using awk I then create a new file with the username and the staff/student number separated by a pipe.  I'll need both of those soon.
# This script is to get all photos for all users in our user feed 
# Make list of usernames to ids 
cat /usr/local/data_integration/users/blackboard.out |grep -v "|Other|"|grep -v "|Y|N|Y|N|Y" |awk -F"|" {'print $3 "|" $4'} \
 |sort -u > /usr/local/data_integration/photo-get/username-ids.txt
So I now have in the file "username-ids.txt" a list of all relevant usernames and id numbers, which looks like:

abc123|123456
def456|654321

I need both of those values because our photo service can only get photos by id number, but our primary key for users in Blackboard is the username.

Next I create a list of URLs to get the photos of the users
# Make list of URLs we need to wget 
cat /usr/local/data_integration/photo-get/username-ids.txt | awk -F"|" {'print "https://api.photo-studio:8082/Photo/"$2'}\
 |sort -u > /usr/local/data_integration/photo-get/photo-list.txt
So what this is doing is making a list of web addresses in "photo-list.txt" for each user's photo using the id number (second field of "username-ids.txt"), where each line is the URL to get that person's photo.  Next we use wget to get all the photos listed in "photo-list.txt" and save the resulting jpg files into the relevant Blackboard directory, which is configured in the Open Photo Roster building block. wget will wait 5 seconds (with some randomisation) to get the photos, it won't verify the security certificate because it's internally generated and only available within our firewall.
# Use wget to get the photos in photo-list.txt 
wget --adjust-extension --wait=5 --random-wait --no-check-certificate \
 -i /usr/local/data_integration/photo-get/photo-list.txt \
 -P /usr/local/blackboard/content/PHOTOS/
So now we have the photos, but they are named by id number, not username.  So we make a script to do the renaming, using the info in the earlier produced in the username-ids.txt file.  Once the script has been made we make it executable and then run it.
# Make script to rename photos 
cat /usr/local/data_integration/photo-get/username-ids.txt \
 |awk -F"|" {'print "mv /usr/local/blackboard/content/PHOTOS/"$2 " /usr/local/blackboard/content/PHOTOS/"$1".jpg" " 2> /dev/null"'} \
 |sort -u > /usr/local/data_integration/photo-get/rename-images.sh  
# Make the script executable 
chmod +x /usr/local/data_integration/photo-get/rename-images.sh  
# run the script /usr/local/data_integration/photo-get/rename-images.sh  
echo "$0 finished at `/bin/date`" 
echo ""
So just using the data in our user integration flatfile we have downloaded and named appropriately the photographs.  It took about three days to download all the photographs we needed.  The next step is to keep those photos up to date.

Script 2 - Check photos we have, identify photos that are missing and get them

Having got a set of the photos we need a way to get new photos for users who have joined or re-joined the institution.  This next script runs every day.  It checks what photos we have, compares against our total user list, identifies the users for whom we don't have photos and then tries to get their photos.

The script starts in the usual way. I save the start time so I can compare it later with the finish time.
#!/bin/sh  
echo "$0 starting at `/bin/date`" 
STARTED=`/bin/date` 
echo ""  
Then I do the same as above, getting a list of user names and id numbers from the user SIS flatfile.
# Make list of usernames to ids 
echo "Making list of all users at `/bin/date`" 
echo "" 
cat /usr/local/data_integration/users/blackboard.out |grep -v "|Other|"|grep -v "|Y|N|Y|N|Y" |awk -F"|" {'print $3 "|" $4'} \
 |sort -u > /usr/local/data_integration/photo-get/username-ids.txt   
Next I get a list of all the photos we have already from the file system.  Using sed I can remove the .jpg so that I get just the usernames.  This is then saved to a text file called "photos-we-have.txt".  Using sed again I add a pipe character to the end of the username so that I can use grep to compare more easily.  I suppose I could have removed the jpg and swapped it with a pipe in one step, but because I was building the script and debugging as I went I wanted to have each stage of the script easy to separate.

# Make list of photos we have by username 
# Produce a file with a list of usernames 
# We use sed to remove the .jpg from the end  
echo "Producing list of Photos we have at `/bin/date`" 
echo ""   
cd /usr/local/blackboard/content/PHOTOS/ && ls -1 *jpg |sed -s s:\.[^./]*$:: \  
>/usr/local/data_integration/photo-get/photos-we-have.txt  
# Add a pipe to the end of the user name to make it easy to compare with the list of all users at the next stage 
sed 's/$/|/' -i /usr/local/data_integration/photo-get/photos-we-have.txt

So now I have a list of photos we have, next I need to find out what photos are missing.  I do this by comparing the user names from "photos-we-have.txt" with the full set of user names and id numbers in the username-ids.txt  file.  I found using normal grep took forever, after googling a bit I found about "fast grep" which is fgrep that only compares in a simple way, which is all I need for the purposes of this exercise.  I output the result into a text file photos-we-dont-have.txt Because I was comparing against the username-ids.txt file I have both the username and id number in the resulting photos-we-dont-have.txt file, so I can use awk

# Make a list of photos we DON'T have 
echo "Making a list of photos we don't have yet at `/bin/date`" 
echo "" 
cat /usr/local/data_integration/photo-get/username-ids.txt \ 
|fgrep -v -f /usr/local/data_integration/photo-get/photos-we-have.txt \
 >/usr/local/data_integration/photo-get/photos-we-dont-have.txt

So, now I know whose photos we are missing, so it's time to get them. Because I was comparing against the username-ids.txt file I have both the username and id number in the resulting photos-we-dont-have.txt file, so I can use awk to build the URLs using the second field in the text file to put the id number that I need for each user.  This is the same process as above, building a text file and then firing wget at it to retrieve the photos and saving them in the right place for Blackboard.

# Make list of URLs to wget the photos we do not have 
echo "Creating a list of URLs we need to wget at `/bin/date`" 
echo ""  
cat /usr/local/data_integration/photo-get/photos-we-dont-have.txt | awk -F"|" {'print "https://api.photo-studio:8082/Photo/"$2'}\
 |sort -u > /usr/local/data_integration/photo-get/photos-we-need.txt   
# Use wget to get the photos in photos-we-need.txt 
echo "Getting the photos we need at `/bin/date`" 
echo ""  
wget --adjust-extension --wait=1 --no-check-certificate \
 -i /usr/local/data_integration/photo-get/photos-we-need.txt \
 -P /usr/local/blackboard/content/PHOTOS/ 

Now I have the new photos but they are named by id number, so now I do the same as in the first script - I generate a script to rename the photo files from idnumber.jpg to username.jpg.  Having created the script I make it executable and run it.  This then renames the photos.

# Make script to rename photos-we-need.txt 
echo "Producing the scripts to rename the photos we just got" 
echo "At `/bin/date`" 
echo "" 
cat /usr/local/data_integration/photo-get/photos-we-dont-have.txt \
 |awk -F"|" {'print "mv /usr/local/blackboard/content/PHOTOS/"$2 " /usr/local/blackboard/content/PHOTOS/"$1".jpg" " 2> /dev/null"'}\
 |sort -u > /usr/local/data_integration/photo-get/rename-photos-we-need.sh  
# Make the script executable  
chmod +x /usr/local/data_integration/photo-get/rename-photos-we-need.sh   
# Run the script to rename the photos so Blackboard can use them 
echo "" 
echo "Running the script to rename the photo files so that Blackboard understands them" 
echo "at `/bin/date`" 
echo ""  
/usr/local/data_integration/photo-get/rename-photos-we-need.sh 
I then finish by recording when the script started and finished.  I get an email each day with the scripts output just so I can keep an eye that it is running ok.

echo "Started at $STARTED" 
echo "Finished at `/bin/date`" 
echo ""

So ultimately it's quite simple and I expect after posting this someone will tell me that I could have done this with one line of perl. But it does the job for now and I learned a few things while creating it.

Comments

Popular posts from this blog

My updated theme for Blackboard 3800.13