2012 DLF Fall Forum Panel: Highlighting DuraCloud and Other Cloud Services

Mon, 2012-11-26 19:16 -- carol

Winchester. MA  A panel presentation, "Tales from the Cloud: Experiences and Challenges" at the Digital Library Federation 2012 DLF Fall Forum offered accounts of first-hand institutional user cloud-adoption experiences when developing and working with cloud-based solutions for the purpose of digital object preservation. Cloud pricing was also of interest to participants as institutions look for ways to take advantage of cloud computing while maximizing institutional investment in preservation and archiving services and infrastructure. Ryan Stearns,Texas Digital Library, Bridger Dyson-Smith, University of Tennessee Knoxville, Andrew Woods, DuraSpace, and Declan Fleming, University of California, San Diego, provided insights into institutional priorities, ongoing investigations and conclusions with regard to using DuraCloud, Chronopolis, OpenStack, and Amazon’s S3 services in a preservation context.

Slides from the presentation are available here: http://duraspace.org/web_seminars

Overview

• Ryan Steans is the Program Coordinator at the Texas Digital Library (TDL) located on the University of Texas at Austin campus. He began investigating cloud computing for the Texas Digital Library in the fall of 2010 and specifically looked at Amazon Web Services (AWS) for backup options. TDL local issues included immediate undefined space requirements, limited staff and administration issues. Because AWS was elastic and easy to add on to they decided to move to AWS for all services in the spring of 2011.

Today TDL runs 100 separate AWS instances which costs $5,500/month. Nothing is kept on University of Texas hardware. This allows for separate service instances and better user management of customization, access, and control. Virtual machines (VMs) are rapidly deployed and accounting is simplified.

Srearns said, "Most university IT policies haven't caught up with the cloud." He therefore suggests that potential institutional cloud users "Beg forgiveness instead of asking permission" in moving storage and back-up to the cloud. He has found that costs are lower, members do not experience any difference in services, and staff (2 system administrators) have successfully adjusted to the new workflow. 

• Bridger Dyson-Smith, University of Tennessee Libraries Knoxville, explained how his institution has implemented DuraCloud. UT was primarily storing content on local disks backed up to tape that was stored in facilities across campus. They found that it was easy to set up a DuraCloud trial instance and decided to move to a full two-year trial account storing 3TB of special collections information (converted high resolution TIFFS to lossless JPEGs). They are set to expand their usage from 3TB to 15TB for backup storage rather than for access. They maintain a local copy for day-to-day processing and are in the process of establishing priority levels for what is stored in DuraCloud

Dyson-Smith reports that DuraCloud is "The only service that is available to do exactly what we need and (that could) do something else in the future if we need it to," particularly because they do not have local staff to support other options.

• Andrew Woods, DuraSpace, began by reviewing reasons why institutional users might be interested in the cloud. They include faster deployment, easy geographic distribution, expandable local infrastructure, lack of technical staff to mange local implementations and red tape associated with using local IT support.

There are downsides. In his experience in working with many cloud storage providers he has found that it is seamless to get a checksum from storage vendors when an item is deposited. After time has passed, however, it is difficult to generate a checksum for ongoing bit integrity checks. Risks that are typically associated with the use of cloud compute and storage are costs, that to some degree can be mitigated by the ability to easily turn off cloud services. The challenge of doing an "apples to apples" comparison when pricing out cloud options is challenging because local infrastructure costs are often unknown or difficult to separate out. Security of content in the cloud and vendor lock-in are also concerns. Woods discussed a "perceived loss of control" of content that may not be different from the use of any local storage infrastructure but that "feels" different to users. Local bandwidth limitations and individual size of files can make pushing multiple TBs to the cloud a challenge.

HTTP has its limitations and transfers have to be managed. Most cloud providers allow customers to, theoretically, push items of up to 5GB in size. In reality at 1GB the transfer failure route climbs, so objects of that size or greater must be broken up for transfer and 'reassembled' on the other end. Woods also pointed out that the server that supports your app may get rebooted and hardware may get upgraded. That can have strong impacts on your services resulting in what he calls "server volatility". 

DuraCloud as a managed cloud solution can be used as a service or downloaded as open-source software. It has been designed to mitigate risk, avoid vendor lock-in, and provide ongoing bit integrity content checks of content. Billing is consolidated and managed by DuraSpace, to make it simple to implement DuraCloud at an institution. DuraCloud users may chose from one of four back-up and preservation plans http://duracloud.org/pricing that start at $125.00 each month for the first year.

• When faced with supporting a big data research project 100TB) Declan Fleming, University of California, SanDiego (UCSD), began with 4TB 8 years ago, and eventually moved everything to local SAMBA. He began investigating the San Diego Supercomputer Cloud (SDSC).

The transition to SDSC OpenStack was relatively simple with two API choices–RackSpace or RESTian. RESTian Required no caching layer which made up and downloads faster. 5GB segments work with OpenStack performance is the same regardless of file size. 

Fleming indicated that overall local and cloud storage cost comparisons came out very closely when labor costs were included. This cost model did not include other local costs: cooling, power, racking, facilities management, and hardware purchasing decisions and actions. Cloud options look even better when those costs are added to the equation.