Cloudera Introduces Analytic Experiences for Cloudera Data Platform

Cloudera recently announced new enterprise data cloud services on Cloudera Data Platform (CDP): CDP Data Engineering; CDP Operational Database; and CDP Data Visualization. The new services include key capabilities to help data engineers, data analysts, and data scientists collaborate across the entire analytics workflow and work smarter and faster. CDP enterprise data cloud services are purpose-built to enable data specialists to navigate the exponential data growth and siloed data analytics operating across multiple public and private clouds.

Data lifecycle integration enables data engineers, data analysts and data scientists to work on the same data securely and efficiently, no matter where that data may reside or where the analytics run. CDP not only helps to improve individual data specialist productivity, it also helps data teams work better together, through its unique hybrid data architecture that integrates analytic experiences across the data lifecycle and across public and private clouds. Effectively managing and securing data collection, enrichment, analysis, experimentation and analytics visualization is fundamental to navigating the data deluge. The result is data scientists and engineers can collaborate better and more rapidly deliver data-driven use cases. Following are the new enterprise cloud services announcements:

CDP Data Engineering: is a powerful Apache Spark service on Kubernetes and includes key productivity enhancing capabilities typically not available with basic data engineering services. Preparing data for analysis and production use cases across the data lifecycle is critical for transforming data into business value. CDP Data Engineering is a purpose-built data engineering service to accelerate enterprise data pipelines from collection and enrichment to insight, at scale.

CDP Operational Database: is a high-performance NoSQL database service that provides scale and performance for business-critical operational applications. It offers evolutionary schema support to leverage the power of data while preserving flexibility in application design by allowing changes to underlying data models without having to make changes to the application. In addition, it provides Auto-scaling based on the workload utilization of the cluster to optimize infrastructure utilization.

CDP Data Visualization: CDP Data Visualization simplifies the curation of rich, visual dashboards, reports and charts to provide agile analytical insight in the language of business, democratizing access to data and analytics across the organization at scale. It allows Technical teams to rapidly share analysis and machine learning models using drag and drop custom interactive applications. It provides business teams and decision makers the data insights to make trusted, well informed business decisions.

These data cloud services in combination with CDP are purpose-built for data specialists. They deliver rapid, real-time business insights with the enterprise-grade security and governance and will permit Cloudera to continue to be a leader in data science.

Why Deploy an Enterprise Data Warehouse on a Hybrid Cloud Architecture?

Why Deploy an Enterprise Data Warehouse on a Hybrid Cloud Architecture?

Analytics and artificial intelligence (AI) solutions are profoundly transforming how businesses and governments engage with consumers and citizens. Across many industries, high-value transformative use cases in personalized medicine, predictive maintenance, fraud detection, cybersecurity, logistics, customer engagement, geospatial analytics, and more are rapidly emerging

Deploying and scaling AI across the enterprise is not easy especially as the volume, velocity, and variety of data continue to explode. What’s needed is a well-designed, agile, scalable, high-performance, modern, and cloud-native data and AI platform that allows clients to efficiently traverse the AI space with trust and transparency. An enterprise data warehouse (EDW) is a critical component of this platform.

EDWs are central repositories of integrated data from many sources. They store current and historical data used extensively by organizations for analysis, reporting, and better insights and decision-making. Historically, data warehouse appliances (DWAs) have delivered high query performance and scalability, but are now struggling to transform data into timely, actionable insights with the data explosion.

A hybrid, open, multi-cloud platform allows organizations to take advantage of their data and applications wherever they reside, on-premises, and across many clouds. Here are some key pros and cons of deploying EDWs over on-premises, hybrid, or public clouds (Figure 1):

 

Figure 1: Comparing Enterprise Data Warehouses on On-Premises, Public and Hybrid Cloud

  • Strategic for the long-term: About 80% of enterprise workloads are still on-premises[1] and still strategic, the public/hybrid cloud is even more strategic driving most of the innovation, growth, and investment in analytics.
  • Total long-term costs: On-premises costs are predictable and become more favorable with greater utilization. Public cloud costs are unpredictable and good for short, infrequent spiky workloads and consumption-based pricing produces greater accountability of the user population. However, these costs grow steeply with higher utilization typical for most EDWs today. In addition, there are many other hidden costs such as long-term contracts, incremental, supplementary licensing fees, and more.

With hybrid cloud EDWs, customers can prudently optimize costs using on-premises assets for predictable workloads and offload spiky workloads to the public cloud. This is very effective for the long-term as a smaller on-premises hardware footprint can meet immediate requirements, and incremental needs for resources during peaks can be satisfied by the public cloud.  Key components of the total costs include:

  • Data Transfer/Migration Costs: For on-premises, these are negligible since most of the data for the entire analytics workflow typically reside on-premises. Significant for public clouds since many analytics workflows require substantial movement of data to and from the public cloud. Often enterprises are limited in their ability to move datasets from the cloud back to their on-premises equipment or to another cloud. Moreover, cloud providers charge fees for transferring data out their cloud environment which dramatically increases costs – particularly as datasets continue to grow. Also migrating on-premises workloads to the public cloud is hard and time-consuming.

In hybrid clouds, there is limited movement of data throughout the analytics workflow to and from the public cloud, and so these costs are low to medium. With consistent cloud-native architectures, migrating workloads from on-premises to public clouds is also relatively easy and less expensive.

  • Capital Costs: Significant capital investment for on-premises IT infrastructure is needed to handle peak loads and may result in lower and sub-optimal utilization under normal operations. For public clouds customer capital costs are negligible. For hybrid clouds, some capital investment for IT infrastructure is needed for certain critical analytics workloads to run on-premises with the rest offloaded to the public cloud. This may result in better utilization and lower capital costs compared to the all on-premises alternative.
  • Upgrade Costs: Significant capital expense for hardware upgrades over time needed to modernize on-premises IT infrastructure to drive innovation. For public clouds, the customer incurs a negligible capital expense for hardware upgrades over time since the provider is responsible for the infrastructure. For hybrid clouds, the modest capital expense for hardware upgrades over time is needed to modernize infrastructure.
  • Operating Costs: Since the customer typically owns and operates on-premises assets, costs are predictable and high utilization environments provide better economics than public clouds which are better for short spiky workloads. With a hybrid cloud, the customer can prudently minimize costs by largely using on-premises assets for predictable workloads and offloading spiky workloads to the public cloud.
  • Deployment Costs (no Integration/Customization): Significant for on-premises since provisioning and deploying resources and analytics workflows take more time and effort. Whereas costs are low on public clouds with faster provisioning and deployment as the process is automated. On hybrid clouds, costs are significant since connectivity between on-premises and public cloud and maintaining two environments could add another layer of complexity. However, this could be alleviated with a consistent cloud-native containerized architecture.
  • Management/Maintenance: Moderately hard for on-premises since customers must invest in scarce skills and resources to maintain and operate these environments. Much easier with public clouds since customers typically can use a centralized portal with process automation. For hybrid clouds, it is relatively straightforward for customers to maintain and operate with the right pre-determined operating policies and procedures for workload placement on-premises or on-the-cloud.
  • Integration/Customization: Easier for on-premises customers to customize and integrate newer solutions with their legacy solutions. This is harder to do on public clouds. On hybrid clouds, it is easier to integrate legacy systems with newer custom solutions from the edge to multiple clouds seamlessly.
  • Business Continuity/Serviceability: It can be tailored to provide higher service level agreements (SLAs) for on-premises customers. It is harder to do for public clouds, but they can deliver excellent business continuity. Hybrid clouds can provide high SLAs and excellent business continuity even with disasters.
  • Performance/Scalability: EDWs offer excellent performance on-premises with hardware accelerators, faster storage, and proximity to data, but harder to scale to address new business requirements. Lower performance for large-scale analytics on public clouds since maintaining data proximity is hard and optimized storage and computing infrastructure are typically not available. But public clouds can easily scale to meet new business requirements for smaller data sizes. However, as data sets continue to grow exponentially, beyond a few 100s of terabytes, these environments have limited elasticity. Hybrid EDWs have excellent performance with hardware accelerators, faster storage, and proximity to data either on-premises or on-the-cloud and can also easily scale to meet new business requirements.
  • Governance/Compliance: Excellent for on-premises since these operations can be tailored to meet individual enterprise and regulatory requirements. Public clouds have limited ability to tailor these operations for individual customers since they are set broadly by the cloud provider. Hybrid clouds are excellent since these operations can be tailored to meet individual enterprise and regulatory requirements consistently end-to-end.
  • Data Protection/Security: On-premises and hybrid clouds are excellent since sensitive data can be stored and managed for individual customer requirements and protocols. Public clouds are somewhat vulnerable since their infrastructure is shared and many enterprises are reluctant to part with their mission-critical data.
  • Vendor Lock-in: Strong for on-premises and public clouds especially with the underlying software infrastructure. Also, data migration to an alternate solution is complex and expensive.

 

A hybrid multi-cloud environment empowers customers to experiment with and choose the tools, programming languages, algorithms, and infrastructure to build data pipelines, train and make analytics/AI models ready for production in a governed way for the enterprise, and share insights throughout the workflow.

[1] Nagendra Bommadevara, Andrea Del Miglio, and Steve Jansen, “Cloud adoption to accelerate IT modernization”, McKinsey & Company, 2018

 

Total Value of Ownership (TVO) of IBM Cloud Pak for Data

The speed and scope of the business decision-making process is accelerating because of several emerging technology trends – Cloud, Social, Mobile, the Internet of Things (IoT), Analytics and Artificial Intelligence/Machine Learning (AI/ML). To obtain faster actionable insights from this growing volume and variety of data, many organizations are deploying Analytics solutions across the entire workflow.

For strategic reasons, IT leaders are focused on moving existing workloads to the cloud or building new workloads on the cloud and integrating those with existing workloads. Quite often, the need for data security and privacy makes some organizations hesitant about migrating to the public cloud. The business model for cloud services is evolving to enable more businesses to deploy a hybrid cloud, particularly in the areas of big data and analytics solutions.

IBM Cloud Pak for Data is an integrated data science, data engineering and app building platform built on top of IBM Cloud Pak for Data – a hybrid cloud that provides all the benefits of cloud computing inside the client’s firewall and provides a migratory path should the client want to leverage public clouds. IBM Cloud Pak for Data clients can get significant value because of unique capabilities to connect their data (no matter where it is), govern it, find it, and use it for analysis. IBM Cloud Pak for Data also enables users to collaborate from a single, unified interface and their IT staff doesn’t need to deploy and connect multiple applications manually.

These IBM Cloud Pak for Data differentiators enable quicker deployments, faster time to value, lower risks of failure and higher revenues/profits. They also enhance the productivity of data scientists, data engineers, application developers and analysts; allowing clients to optimize their Total Value of Ownership (TVO), which is Total Benefits – Total Costs.

The comprehensive TVO analysis presented in a recent Cabot Partners paper compares the IBM Cloud Pak for Data solution with a corresponding In-house solution alternative for three configurations – small, medium and large. This cost-benefit analysis framework considers cost/benefit drivers in a 2 by 2 continuum: Direct vs. Derived and Technology vs. Business mapped into four quantified quadrants: Costs, Productivity, Revenues/Profits and Risks.

Compared to using an In-house solution, IBM Cloud Pak for Data can improve the three-year ROI for all three configurations. Likewise, the Payback Period (PP) for the IBM Cloud Pak for Data solution is shorter than the In-house solution; providing clients faster time to value. In fact, these ROI/PP improvements grow with configuration size; offering clients better investment protection as they progress in their Analytics and AI/ML journey and as data volumes and Analytics model complexities continue to grow.

You can access the full report here.

IBM – Building Momentum to Win the Hybrid Cloud Platform War

IBM – Building Momentum to Win the Hybrid Cloud Platform War

IBM – Building Momentum to Win the Hybrid Cloud Platform War

By Ravi Shankar and Srini Chari, PhD., MBA – Cabot Partners, May 8, 2020.

As the evolving impacts of COVID-19 ripple globally through our communities, the new IBM CEO Arvind Krishna kicked-off the virtual IBM Think conference on May 5, 2020 with an apt assertion: “There's no question this pandemic is a powerful force of disruption and an unprecedented tragedy, but it is also a critical turning point.” Krishna said “History will look back on this as the moment when the digital transformation of business and society suddenly accelerated, and together, we laid the groundwork for the post-COVID world”.

Originally set to be held in San Francisco, the IBM 2020 Think Digital Experience quickly became one of the many tech events held online. The attendance was very large. About 100,000 non-IBM participants registered and over 170,000 unique visitors attended sessions and consumed content. On average, IBM clients and Business Partners joined 6.5 sessions and watched most of those sessions. This was 3 times the number of clients and 2 times the number of Business Partners compared to last year.

Think Digital featured many key announcements and offered virtual attendees an exciting array of speaker sessions, real-time Q&As and technical training highlighting how hybrid cloud and artificial intelligence (AI) are galvanizing digital transformation, and how IBM is building an agile and scalable platform for developers, partners and clients to overcome data and applications migration challenges from the edge to the cloud.

Lift and Shift to the Public Cloud is Inadequate for Most Enterprise Workloads[i]

The simplest enterprise workloads – about 20% of all enterprise workloads – have already been moved to the public cloud and have benefited from greater agility and scalability. However, the remaining 80% of workloads continue to remain on-premises.1 Contrary to what many public cloud providers proclaim, traditional Lift and Shift (Figure 1) cloud transformation is not always economical and easy, especially for the analytics and AI journey.

Figure 1: Traditional Lift and Shift of On-Premises Data and Applications to the Public Cloud is Inadequate

If public cloud migration were easy for many enterprise applications and data, most businesses would have already migrated most of their workloads to the public cloud and realized the associated benefits (Table 1). On-premises solutions still provide many benefits especially for analytics and AI (Table 1).

Public Cloud

On-Premises

·     High scalability and flexibility for unpredictable workload demands with varying peaks/valleys

·     Rapid software development, test and proof of concept pilot environments

·     No capital investments required to deploy and maintain infrastructure

·     Faster provisioning time and reduced requirements on IT expertise as this is managed by the cloud vendor

·    Bring compute to where data resides since it is hard to move existing data lakes into the cloud because of large data volumes

·    Supports analytics at the edge and other distributed environments to make immediate decisions

·    Provides dedicated and secure environments for compliance to stringent regulations and/or unique workload requirements

·    High/custom SLA performance and efficiency

·    Retains the value of investments in existing solutions

Table 1: Benefits of Public Cloud and On-Premises Infrastructures for Analytics/AI

To improve the agility and scalability of the remaining 80% of enterprise applications and data, it is possible, with hybrid clouds, to combine the benefits of a public cloud with that of an on-premises infrastructure. 

The Hybrid Multi-Cloud Platform is the New Battleground

Over the last decade, the term “cloud wars” has been used to describe the competition between public cloud providers, AWS, Microsoft Azure, Google Cloud, IBM Cloud and a few others.  But with 80% of the workloads still left on-premises, this is more a rift or a squabble than a war. The real “cloud war” is only beginning for rapidly-growing hybrid cloud platform – particularly for analytics and AI.

The worldwide data services for the hybrid cloud market is expected to grow at a healthy CAGR of 20.53% from 2016 to 2021[i] as enterprises prioritize a balance of public and private infrastructure. Only 31% of enterprises see public cloud as their top priority, while a combined 45% of enterprises see hybrid cloud as the future state.2

Today, large organizations leverage almost five clouds on average. The percentage of enterprises with a strategy to use multiple clouds is 84%[ii]and 56% of the organizations plan to increase the use of containers.[iii] 

Hybrid cloud platforms that support a multi-cloud architecture will be the winning platform in the future, especially as more data is ingested at the edge with the transition to 5G and stored on-premises or in the cloud.

Lift, Sift and Shift Data and Applications for Swift Connect from the Edge to Multi-cloud

In order to win the impending hybrid cloud war, the following four elements must be in place:

  • Lift: Ability to move/process data and applications all the way from the edge to the enterprise or to a multi-cloud environment and migrate workloads efficiently
  • Sift: Automate the current, tedious semi-manual, error-prone processes used to cleanse and prepare data, remove bias, prioritize it for analysis, and provide clear traceability
  • Shift: Ability to move compute to where the data resides to minimize data movement costs and improve performance
  • Swift: Multi-directionally execute all the above operations at scale with agility, flexibility and high-performance so that data can move between public, private and hybrid clouds, on-premises and edge installations, and can be updated as needed
  • Connect: Seamlessly connect the edge, on-premise and multi-cloud implementations to a cohesive and agile environment with a single dashboard for centralized observation across all platform entities.   

Figure 2 depicts this hybrid cloud platform which empowers customers to experiment with and choose the programming languages, tools, algorithms and infrastructure to build data pipelines, train and productionize analytics/AI models in a governed way for the enterprise and share insights throughout the organization from the edge to the cloud.

Figure 2: An Agile Hybrid Cloud Platform for Analytics/AI that Scales from the Edge to Multi-cloud

To win the hybrid cloud war, the platform must scale, be compliant, resilient and agile, and support open standards for interoperability. This gives clients the flexibility to adapt quickly to changing business needs and to choose the best components from multiple providers in the ecosystem.  

The IBM Hybrid Multi-cloud Vision and Key Think 2020 Announcements  

With the acquisition of Red Hat, IBM has laid the foundation to win the hybrid cloud war by enabling clients to avoid the pitfalls of single-vendor reliance. Clients can scale workloads across multiple systems and cloud vendors with increased agility through containers and unify the entire infrastructure from the edge to the data center to the cloud (Figure 3).

Red Hat provides open source technologies to bring a consistent foundation from the edge to on-premises or to any cloud deployment: public, private, hybrid, or multi: 

  • Red Hat OpenShift is a complete container application platform built on Kubernetes – an open source platform that automates Linux container operations and management
  • Red Hat Enterprise Linux and Red Hat OpenShift bring more security to every container and better consistency across environments.
  • Red Hat Cloud Suite combines a container-based development platform, private infrastructure, public cloud interoperability, and a common management framework into a single, easily deployed solution for clients who need a cloud and a container platform.

Figure 3:  IBM Hybrid Multi-cloud Vision and Key New Think 2020 Announcements

For decades, IBM’s core competency has been as a trusted technology provider for enterprise customers running mission critical applications. IBM delivers enterprise systems, software, network and services. Key hybrid cloud offerings include:

  • IBM Cloud Paks (Figure 3) are enterprise-ready, containerized services that give clients an open, faster and more secure way to move core business applications to any cloud. Each of the six IBM Cloud Paks includes containerized IBM middleware and common cloud services for development and management, on top of a common integration layer and runs wherever Red Hat OpenShift
  • IBM Cloud is built on open standards, with a choice of many cloud models: public, dedicated, private and managed, so clients can run the right workload on the right cloud model without vendor lock-in.
  • IBM Systems deliver reliable, flexible and secure compute, storage and operating systems solutions.
  • IBM Services help organizations by bringing deep industry expertise to accelerate their cloud journeys and modernize their environments.

Reinforcing the strength of the existing portfolio of products and offerings, IBM launched several AI and hybrid cloud offerings backed by a broad ecosystem of partners to help enterprises and telecommunications companies speed their transition to edge computing in the 5G era. (Figure 3):

  • IBM Cloud Satellite gives the customer the ability to use IBM Cloud services anywhere — on IBM Cloud, on premises or at the edge — delivered as-a-service from a single pane of glass controlled through the public cloud. IBM Cloud Satellite specifically extends the IBM Public Cloud with a generalized IaaS and PaaS environment, including support for cloud native apps and DevOps, while providing access to IBM Public Cloud Services in the location that works best for individual solutions.
  • IBM Watson AIOps uses AI to automate how enterprises self-detect, diagnose and respond to IT anomalies in real time to better predict and shape future outcomes, focus resources on higher-value work and build more responsive and intelligent networks that can stay up and running longer.
  • IBM Edge Application Manager is an autonomous management solution designed to enable AI, analytics and IoT enterprise workloads to be deployed and remotely managed, delivering real-time analysis and insight at scale. The solution enables the management of up to 10,000 edge nodes simultaneously by a single administrator.
  • IBM Telco Network Cloud Manager runs on Red Hat OpenShift, to deliver intelligent automation capabilities to orchestrate virtual and container network functions in minutes. Service providers will have the ability to manage workloads on both Red Hat OpenShift and on the Red Hat OpenStack Platform, which will be critical as telcos increasingly look for ways to modernize their networks for greater agility and efficiency, and to provide new services today and as 5G adoption expands.

With the Hybrid Cloud Strategy in Place, the Focus is on Execution

The new product announcements and initiatives launched during IBM’s 2020 Think Digital event reinforce IBM’s intent – even during this pandemic – to march full steam ahead to execute on its hybrid multi-cloud vision: Any application can run anywhere on any platform, at scale, wherever data resides, with resilience, agility and interoperability across all clouds on an open, secure and governed enterprise-grade environment.

IBM has also issued the Call for Code challenge to address the current pandemic.   This global challenge encourages innovators to create practical, effective, and high-quality applications based on one or more IBM Cloud services (for example, web, mobile, data, analytics, AI, IoT, or weather) that can have an immediate and lasting impact on humanitarian issues. Teams of developers, data scientists, designers, business analysts, subject matter experts and more are challenged to build solutions to mitigate the impact of COVID-19 and climate change.

With this compelling hybrid strategy and continuing focus of collaboratively solving challenging problems, we believe IBM is well-positioned to execute to win the hybrid cloud war because:

  1. A strong technical, business and sales savvy leadership is in place with Arvind Krishna and Jim Whitehurst (President of IBM).
  2. This pragmatic strategy builds on IBM’s traditional strengths of serving enterprise customers on their cloud journey by providing the much-needed technologies and momentum for modernization.  
  3. In addition to IBM Research, the autonomy that Red Hat enjoys will ensure its entrepreneurial growth culture will continue to be a lightning rod for IBM innovation.
  4. With the focus on an open platform, clients and the partner ecosystem will have the assurance to co-create high-value offerings and services to meet future challenges.

Last and perhaps most important, the technology industry is constantly disrupting, with new billion-dollar businesses emerging rapidly. This century’s first decade witnessed the rise of social media, mobile and cloud computing. As keen observers of IBM both from the inside and outside, we believe this is probably the first time in recent decades that IBM is endowed with a technology and business savvy leadership team that has a track-record of rapidly growing large billion-dollar businesses. As the cloud wars rage in the next decade, there will undoubtedly be many disruptions. Perhaps now more than at any time in the recent past, IBM will not only spot these opportunities but also boldly act to galvanize its incredible human resources and its vast ecosystem to build these next generation hybrid cloud and AI businesses.


[1] Nagendra Bommadevara, Andrea Del Miglio, and Steve Jansen, “Cloud adoption to accelerate IT modernization”, McKinsey & Company, 2018

[1] https://www.ibm.com/downloads/cas/V93QE3QG

[1] RightScale STATE OF THE CLOUD REPORT 2019 from Flexera

[1]  https://www.redhat.com/cms/managed-files/rh-enterprise-open-source-report-detail-f21756-202002-en.pdf

 

Cabot Partners is a collaborative consultancy and an independent IT analyst firm. We specialize in advising technology companies and their clients on how to build and grow a customer base, how to achieve desired revenue and profitability results, and how to make effective use of emerging technologies including HPC, Cloud Computing, Analytics and Artificial Intelligence/Machine Learning. To find out more, please go to www.cabotpartners.com.

 

Copyright® 2020. Cabot Partners Group. Inc. All rights reserved. Other companies’ product names, trademarks, or service marks are used herein for identification only and belong to their respective owner. All images and supporting data were obtained from IBM or from public sources. The information and product recommendations made by the Cabot Partners Group are based upon public information and sources and may also include personal opinions of both Cabot Partners Group and others, all of which we believe to be accurate and reliable. However, as market conditions change and not within our control, the information and recommendations are made without warranty of any kind. The Cabot Partners Group, Inc. assumes no responsibility or liability for any damages whatsoever (including incidental, consequential or otherwise), caused by your or your client’s use of, or reliance upon, the information and recommendations presented herein, nor for any inadvertent errors which may appear in this blog. This blog was developed with IBM funding. Although the blog may utilize publicly available material from various vendors, including IBM, it does not necessarily reflect the positions of such vendors on the issues addressed in this document. 

 

 

 

Highlights from the Strata AI Conference in New York City

Last month (September 25 and 26), I attended the Strata AI Conference in New York City. The Strata AI Conference continues to provide an informative and comprehensive overview of artificial intelligence and its accelerating transition from research to industrialization. Sessions covered a broad spectrum of AI topics, including cutting-edge research, open source tools, regulatory considerations, use cases and best practices for implementation.

Over 5,000 people attended the conference. There were over 125 exhibitors and about 170 breakout sessions covering all aspects of AI establishing the Strata conference as one of the unique gathering in the area of Cloud/AI/ML/DL

Considering the number of keynote speeches and breakout sessions, in the interest of space, we will highlight only few of the topics.

The road to an enterprise cloud

Mick Hollison (CMO, Cloudera) discussed the essential elements of enterprise cloud and how Cloudera and its strategic partner IBM are working together in assisting customers to build a true enterprise cloud. He stated that “No one that we work with does more to force the issue around hybrid and multicloud, and to really drive that message home, than our most strategic partner in the world, IBM”.  He informed The IBM + Cloudera strategic partnership reinforces a combined commitment to open source and cloud for Analytics/AI initiatives. It offers clients an industry-leading, enterprise-grade Hadoop distribution plus an ecosystem of integrated products and services – all designed to help organizations industrialize Analytics/AI.

Reader’s may be interested in the latest Cabot Partners publication that describes the strengths of IBM and Cloudera alliance – Greater Choice and Value for Advanced Analytics and AI –  https://cabotpartners.com/wp-content/uploads/2018/07/IBM-Cloudera-Alliance-September-2019.pdf .

AI Ladder

Highlights of an interesting discussion between Rob Thomas, General Manager, IBM Data and Watson AI and Tim O’Reilly, Chairman of O’Reilly Media.

AI Ladder: Breaking an AI strategy down into pieces – or rungs of a ladder serves as a guiding principle for organizations to transform their business by providing four key areas to consider: how they collect data, organize data, analyze data, and then ultimately infuse AI into the organization. By using the ladder to AI as a guiding framework, enterprises can build the foundation for a governed, efficient, agile, and future-proof approach to AI.

AI challenges: The challenges companies face can be categorized as follows:

  • Lack of understanding – because of increasing popularity of AI, assume that it will fix any problem – which is not true
  • Getting a handle on their data – good data is essential for a successful AI implementation. Organizations suffer from a combination of lack of data, too much data, and bad data.
  • Lack of relevant skills – AI skills are rare and therefore high in demand and there is a shortage of skilled workers.
  • Trust – as more applications make use of AI, businesses need visibility into the recommendations made by their AI applications. Traceability and explainability are very important
  • Culture and business model change – is required to take advantage of the opportunity the new technology provides.

Concluding thoughts: Fear factor – managers who use AI will replace those who haven’t gotten through the hype phase of AI. Now is the time for AI.

Top blunders in Big Data

Michael Stonebraker, Computer scientist, winner of Turing award had an interesting take on top ten blunders of Big Data (in the interest of space – combined few of them). One need not agree with his list – however, they stimulate thinking and good discussion.

  1. Not planning for AI/ML
  2. Not solving your real data science problem – typical data scientist 90% of the time on data discovery and data cleaning,
  3. Belief that traditional data integration will be solved by data science
  4. Belief that data warehouses, data lakes and Hadoop/Spark will solve all your data science problems
  5. Succumbing to Innovator’s dilemma – got to reinvent yourself
  6. Not paying for few rocket scientists
  7. Outsourcing to an external service provider
  8. Not moving everything to cloud

Conclusion

AI is one of the greatest challenges and opportunities of our time. It will transform entire industries and the way enterprises operate. The pace of innovation continues to accelerate at a phenomenal pace. Events like the Strata AI Conference can provide analytics leaders with valuable insights and an understanding of the future.

Highlights from the O’Reilly AI Conference in NY City

Last month (April 17 and 18), I attended the O’Reilly AI Conference in New York City. The O’Reilly AI Conference continues to provide an informative and comprehensive overview of artificial intelligence and its accelerating transition from research to industrialization. Sessions covered a broad spectrum of AI topics, including cutting-edge research, open source tools, regulatory considerations, use cases and best practices for implementation. Over 2,000 people attended the conference. Some of the important details of the proceedings shown below.

Computational Propaganda
In his keynote “Computational Propaganda,” Sean Gourley (CEO, Primer) discussed how advances in AI capabilities can be used to manipulate public opinion and threaten democratic institutions.
Sean noted that the financial services industry is being dominated by algorithms because algorithms are faster than humans. Although these algorithms have brought dramatic improvements in market efficiencies, they have also led to unexpected events like the Flash Crash. This incongruity is summarized by Wiener’s Laws, which generally state that automation will routinely tidy up ordinary messes, but will occasionally create an extraordinary mess.
All these advances are leading to the advent of Computational Propaganda where the ability of machines to generate images and language that we can’t tell from reality makes it increasingly difficult to know what is real and what is fake. To emphasize his point, Sean walked through several real-world examples of increasingly sophisticated applications of social bots by state and nonstate actors, including Russia. He showed a video of President Obama making an incendiary speech which was totally fake yet very realistic. In conclusion, Sean states his belief that democracies will need to regulate the use of these technologies and will have to walk a fine line between censorship and freedom of speech.

AI Success Stories
Several keynotes highlighted successful, real-world implementations of ML and AI technologies. These presentations demonstrated the maturation of these powerful technologies while also showing that it takes considerable investment and engineering capabilities to make it a success.

INTUIT
Desiree Gosby (VP of Identify and Profile) discussed how Intuit’s TurboTax unit is using advanced text analytics and image recognition to automatically generate tax returns using mobile phone pictures of tax documents.
Her keynote highlighted both the ability of AI to create disruptive new services and the practical challenges and sophistication required to successful deploy AI into production, which often requires the ability to combine multiple, advanced techniques.
For example, it is difficult for consumers to take high-quality, easy-to-analyze images. Intuit had to deploy a variety of image techniques (edge detection and brightness/focus/ contrast detection) and Convolutional Neural Networks (CNN) models to provide feedback and get usable images. Next, they had to implement multiple techniques (document classification using CNN models and layout matching using Recurrent Neural Networks (RNN) models) to determine the type of tax form and manage the format variability of many forms. Finally, they had to extract the tax information using traditional optical character recognition coupled with natural language processing, entity recognition modeling and contextual random field modeling

NETFLIX
Tony Jebra (Director of Machine Learning) from Netflix discussed how his company uses ML to personalize its service for its 140 million subscribers. ML is deployed to optimize almost all the customer-facing experience, including ranking, page generation, promotion, image selection, search, messaging and marketing.
Tony demonstrated how Netflix uses contextual bandits, instead of batch machine learning, to interleave learning with data collection and to personalize content artwork for each subscriber. A separate ML system selects the content that should interest a subscriber. For each piece of content, Netflix has multiple artwork options that highlight unique elements of the content (genre, themes, actors, age, etc.). Subscriber information, including viewing history and country, is used to choose which artwork to display for a piece of content. As an example, Tony showed how artwork displayed for the Netflix show “Stranger Things” is personalized based on your interest in horror films, sci-fi films or teenage dramas.

MASTERCARD
During his keynote, Nick Curcuru (VP of Data Analytics and Cyber Security) discussed MasterCard’s growing investment in AI. Mastercard has 2.5 billion customers that process 74 billion transactions per year. As fraud threats have become more sophisticated, Mastercard is turning to AI to protect customer data. Mastercard uses AI to provide real- time credit intelligence using hundreds of data points, to provide passive biometric identification for additional security and in its anti-money laundering monitoring. As an example, Mastercard has used AI to increase the number of data points used in card charge authorization from 15 to over 150 in five years.

FACEBOOK
Kim Hazelwood (Senior Engineering Manager, AI Infrastructure) presented how Facebook uses AI and ML to create highly customized experiences for its 2.8B subscribers. Facebook deploys a variety of ML models across its services and features. Multilayer Perception models are used for search ranking, news feed ranking and advertisement displays. Support Vector Machines and CNN are used for automatic facial recognition and image tagging while RNN are used for language translation, speech recognition and content understanding. All these algorithms are deployed at a massive scale. Facebook does over 200 trillion predictions per day and does over 6 billion language translations per day.

Conclusion
Analytics, Artificial Intelligence (AI) and Machine Learning (ML) are profoundly transforming how businesses and governments engage with consumers and citizens. Across many industries, high value transformative use cases in personalized medicine, predictive maintenance, fraud detection, cybersecurity and more are rapidly emerging. The pace of innovation continues to accelerate. events like the O’Reilly AI Conference can provide analytics leaders with valuable insights and an understanding of the future.

Total Value of Ownership (TVO) Assessment of the IBM Private Cloud for Data Solution for Analytics

For strategic reasons, IT leaders are focused on moving existing workloads to the cloud, or building new workloads on the cloud and integrating those with existing workloads. Quite often, the need for data security and privacy makes some organizations hesitant about migrating to the public cloud. The business model for cloud services is evolving to enable more businesses to deploy a hybrid cloud, particularly in the areas of big data and analytics solutions.

 

IBM Cloud Private (ICP) for Data is an integrated data science, data engineering and app building platform built on top of ICP – a hybrid cloud that provides all the benefits of cloud computing inside the client’s firewall and provides a migratory path should the client want to leverage public clouds. ICP for Data clients can get significant value because of unique capabilities to connect their data (no matter where it is), govern it, find it, and use it for analysis. ICP for Data also enables users to collaborate from a single, unified interface and their IT staff doesn’t need to deploy and connect multiple applications manually.

 

These ICP for Data differentiators enable quicker deployments, faster time to value, lower risks of failure and higher revenues/profits. They also enhance the productivity of data scientists, data engineers, application developers and analysts; allowing clients to optimize their Total Value of Ownership (TVO), which is Total Benefits – Total Costs.

 

The comprehensive TVO analysis presented in this paper compares the IBM Private Cloud for Data solution with a corresponding In-house solution alternative for three configurations – small, medium and large. This cost-benefit analysis framework considers cost/benefit drivers in a 2 by 2 continuum: Direct vs. Derived and Technology vs. Business mapped into four quantified quadrants: Costs, Productivity, Revenues/Profits and Risks.

 

Compared to using an In-house solution, IBM Cloud Private for Data can improve the three-year ROI for all three configurations. Likewise, the Payback Period (PP) for the ICP for Data solution is shorter than the In-house solution; providing clients faster time to value. In fact, these ROI/PP improvements grow with configuration size; offering clients better investment protection as they progress in their Analytics and AI/ML journey and as data volumes and Analytics model complexities continue to grow.

For more details please click here

The changing face of HPC in finance

Carrots and sticks: How new challenges are bringing fresh opportunities for HPC, data analytics, and AI

This Cabot Partners article sponsored by IBM was published at HPCwire at this URL. This article is identical, but graphics are presented in higher resolution.

————-

For decades, banks have relied on high-performance computing (HPC). When it comes to problems too hard to solve deterministically (like predicting market movements), Monte Carlo simulation is the only game in town.

Banks use proprietary pricing models to calculate the future value of various financial instruments. They generate thousands of scenarios (essentially randomized vectors of self-consistent risk factors), compute the value of each instrument across every scenario (often over multiple time steps) and roll-up the portfolio value for each scenario. The result is a probability distribution showing the range of probable outcomes. The more sophisticated the models, and the more scenarios considered, the higher the confidence in the predicted result.

Analysts often focus on value-at-risk (VaR) or left-tail risk referring to the left side of the curve representing worst-case market scenarios. Risk managers literally take these results to the bank relying on computer simulation to help grow portfolio values and manage risk.

Accuracy and speed provide a competitive edge

In the game of financial risk, both speed and accuracy matter. Banks need to maximize gains while maintaining capital reserves adequate to cover losses in worst-case scenarios. Nobody wants cash sitting idle, so accurately assessing VaR helps banks quantify needed reserves, maximize capital deployed, and boost profits.

Simulation is used for dozens of financial applications including model back-testing, stress-testing, new product development, and developing algorithms for high-frequency trading (HFT). Firms with more capable HPC can run deeper analysis faster, bid more aggressively, and model more scenarios pre-trade to make faster better-informed decisions.

While traditional HPC is often about raw compute capacity (modeling a vehicle collision in software for example), financial simulation demands both capacity and timeliness. Reflecting this need for urgency, vendors have responded with specialized, service-oriented grid software purpose-built for low-latency pricing calculations. State-of-the-art middleware can bring thousands of computing cores to bear on a parallel problem almost instantly with sub-millisecond overhead. As financial products grow in complexity, banks increasingly compete on the agility, capacity, and efficiency of their HPC infrastructure.

Post-2008 the plot thickens

Much has been written about the financial crisis of 2008, but an obvious consequence for banks has been an increase in regulation. 2008 served to highlight systemic vulnerabilities to credit risk, liquidity risk, and swap markets trading derivative products.

While these risks were already known, the crisis re-enforced that in addition to left-tail risk (the probability of market losses), firms also needed to emphasize right-tail risk (or credit risk), when paper gains become so large that counterparties are forced into insolvency (potentially leading to cascading bankruptcies). Faced with unpopular government bailouts, politicians and regulatory bodies unleashed a torrent of regulation including Basel III, Dodd-Frank, CRD IV and CRR, Solvency II all aimed at avoiding a repeat of the crisis.

Modeling Counterparty Credit Risk (CCR) is much harder than market risk, so banks were again forced to re-tool, investing in systems to calculate new metrics like Credit Value Adjustments (CVA), a fair-value adjustment to the price derivatives taking CCR into account.

The latest round of regulation affecting banks is Full-Review of Trading Book (FRTB), the regulation published by the Basel Committee on Banking Supervision expected to go fully into effect by 2022. FRTB will require banks to adopt Standard Approaches (SA) and compute and report on a variety of new metrics further increasing infrastructure requirements and compliance costs.

As new workloads emerge, banks become software shops

As if banks didn’t have enough on their plate already, competitive pressures are forcing them to invest in new capabilities for reasons unrelated to regulation. Big data environments are used for a variety of applications in investment and retail banking including cultivating loyalty, reducing churn, boosting cyber-security, and making better decisions when extending credit. Banks increasingly resemble software companies, employing hundreds of developers as software becomes ever more critical to competitiveness.

Faced with competition from alt-lenders and online start-ups, financial firms are racing to leverage Artificial Intelligence (AI) to improve service delivery and reduce costs. Today AI is being used in robo-advisors, improved fraud detection, and applications like loan and insurance underwriting. New predictive models rely on machine learning to supplement traditional predictive methods and enable better quality decisions. Applications on the horizon include automating customer service with chatbots, automating sales recommendations, and leveraging AI for deeper analysis of big data sources like news feeds to improve decision quality further.

An abundance of frameworks compounds infrastructure challenges

Banks face both challenges and opportunities. On the one hand, competitive pressures and new regulations are forcing banks to make new investments in systems and software; on the other hand, advances in technologies like AI, cloud computing, and container technology promise to reduce cost, improve agility, and boost competitiveness.

As new capabilities are added, legacy systems don’t go away, so it’s imperative that the high-performance infrastructure supports multiple software frameworks. Banks need to run not only core risk analytics, but big data (Hadoop and Spark), streaming analytics, and scalable software environments for training and deploying deep learning models. In the age of big data, information increasingly resides in distributed, scaled-out systems including not only HDFS and HBase but distributed caches, object stores, and NoSQL stores like Cassandra MongoDB.

Beyond simply considering where to run these applications (on-premises, in public clouds or both) the real challenge is the diversity of frameworks. Adding more compute capacity or additional siloed systems is not the answer. Banks need solutions that will provide flexibility and help them operate scaled-out high-performance environments more efficiently.

A shared infrastructure for high-performance financial workloads

As banks grapple with new regulation and embrace new technologies to deliver services more efficiently, many are seeing this as an opportunity to re-think their infrastructure. Production-proven in the world’s leading investment banks, IBM Spectrum Computing is a proven solution for accelerating and simplifying the full-range of high-performance financial applications including AI, big data, and risk analytics. Spectrum Computing can help banks seamlessly and efficiently share infrastructure resources on-premises or on their preferred cloud platform with minimal disruption to existing systems.
To learn how you can simplify and consolidate application environments, and build a future-proof, cloud-agnostic IT infrastructure, download IBM’s whitepaper Modernizing your Financial Risk Infrastructure with IBM Spectrum Computing.

 

 

 

Delivering superior throughput for EDA verification workloads

During September of 2018, members of our Cabot Partners team had the opportunity to work with talented engineers and industry experts from HPE, Cadence, Marvell, and Arm. We collaborated to develop a whitepaper and conduct preliminary benchmarks showing how new Cadence tools optimized for multi-core systems benefit from Marvell ThunderX2 Arm-based systems such as the HPE Apollo 70 system.

Arm datacenter systems are fast becoming a force to be reckoned with in HPC. You can read our published HPE whitepaper here.

Perhaps no industry is more competitive than modern electronics manufacturing and chip design. As consumers, we take it for granted that electronic devices continue to get faster, cheaper, and more capable with each generation. From smart watches to industrial controls to electronic heart-rate monitors, electronics manufacturers are challenged to build smarter, more complex devices leveraging system-on-a-chip (SoC) designs for an increasingly connected world. With the number of IoT devices forecast to grow to over 75 billion by 2025, consumers and manufacturers are all about durability, safety, battery life, and security including resistance to malware and hacking attempts.

This level of change is unprecedented. Chip designers are faced with seemingly irreconcilable pressures such as shorter product design cycles, increasing complexity, increased requirements for quality, and continuous pressures on costs.

In this paper, we discuss the challenge of device verification, a key issue in EDA, and explain how new high-performance systems and software are promising to improve the economics of chip design enabling firms to innovate faster and create high-quality products.

 

 

Fueling High Performance Computing (HPC) on Clouds with GPUs

Businesses are increasingly investing in HPC to manufacture higher quality products faster, optimize oil and gas exploration, improve patient outcomes, detect fraud and breaches, mitigate financial risks, and more. HPC also helps governments respond faster to emergencies, analyze terrorist threats better and accurately predict the weather – all vital for national security, public safety, and the environment. The economic and social value of HPC is immense.

 

HPC workloads are also getting larger and spikier with more interdisciplinary analyses, higher fidelity models, and larger data volumes. Hence, managing and deploying on-premises HPC is getting harder and more expensive, especially as the line between HPC and analytics is blurring in every industry.   Businesses are also challenged with rapid technology refresh cycles, limited in-house datacenter space, skills to cost-effectively operate an on-premises HPC environment customized to match performance, security and compliance requirements. So, businesses are increasingly considering cloud computing. Hence, HPC on the cloud is growing at over 4 times the growth rate of HPC.

 

As a pioneer in cloud computing, Amazon Web Services (AWS) continues to innovate and overcome many past issues with using public clouds for HPC. AWS is fueling the rapid migration of HPC to the cloud with some key differentiators such as the NVIDIA GPU-enabled cloud instances for compute and remote visualization and a growing ecosystem of highly-skilled partners.

 

Likewise, NVIDIA, as the leader in accelerated computing for HPC and Artificial Intelligence/Deep Learning (AI/DL), continues to invest in building a robust ecosystem of software for highly parallel computing. A recent analyst report shows that 70% of the most popular HPC applications, including 15 of the top 15 are accelerated by GPUs. These provide upwards of two orders of magnitude of speed up compared to CPUs. AWS and NVIDIA GPU Cloud ( NGC) container registry available on AWS marketplace provide NVIDIA GPU-optimized containers to simplify deployment of key HPC applications. AWS and NVIDIA is a winning combination for HPC.

 

This winning combination accelerates large-scale HPC workflows from data ingestion to computing to visualization with flexibility, reliability, and security. Further, it fosters unprecedented collaborative innovation between engineers and scientists, converts capital costs to usage-based operational costs and keeps pace with technology refresh cycles. 

 

Prominent real-world client examples highlighted here span many industries: manufacturing, oil and gas, life sciences and healthcare, and more. These examples demonstrate how AWS and NVIDIA help in reducing costs, enhancing productivity, increasing revenues and profits, and lowering risks for HPC clients.

For more details, please see – https://cabotpartners.com/wp-content/uploads/2018/10/FuelingHighPerformanceComputingCloudwithGPUs-September-2018.pdf