Tag Archives: group sync

Group Synchronization Backends in Hue

Hueis the turn-key solution for Apache Hadoop. It hides the complexity of the ecosystem including HDFS, Oozie, MapReduce, etc. Hue provides authentication and integrates with SAMLLDAP, and other systems. A new feature added in Hue is the ability to synchronize groups with a third party authority provider. In this blog post, we’ll be covering the basics of creating a Group Synchronization Backend.

The Design

The purpose of the group synchronization backends are to keep Hue’s internal group lists fresh. The design was separated into two functional parts:

  1. A way to synchronize on every request.
  2. A definition of how and what to synchronize.

Ekran Resmi 2015-08-29 14.00.11

Image 1: Request cycle in Hue with a synchronization backend.

The first function is a Django middleware that is called on every request. It is intended to be immutable, but configurable. The second function is a backend that can be customized. This gives developers the ability to choose how their groups and user-group memberships can be synchronized. The middleware can be configured to use a particular synchronization backend and will call it on every request. If no backend is configured, then the middleware is disabled.

Creating Your Own Backend

A synchronization backend can be created by extending a class and providing your own logic. Here is an example backend that comes packaged with Hue:

class LdapSynchronizationBackend(DesktopSynchronizationBackendBase): USER_CACHE_NAME = ‘ldap_use_group_sync_cache’ def sync(self, request): user = request.user if not user or not user.is_authenticated(): return if not User.objects.filter(username=user.username, userprofile__creation_method=str(UserProfile.CreationMethod.EXTERNAL)).exists(): LOG.warn(“User %s is not an Ldap user” % user.username) return # Cache should be cleared when user logs out. if self.USER_CACHE_NAME not in request.session: request.session[self.USER_CACHE_NAME] = import_ldap_users(user.username, sync_groups=True, import_by_dn=False) request.session.modified = True

In the above code snippet, the synchronization backend is defined by extending “DesktopSynchronizationBackendBase”. Then, the method “sync(self, request)” is overridden and provides the syncing logic.


The synchronization middleware can be configured to use a backend by changing “desktop -> auth -> user_group_membership_synchronization_backend” to the full import path of your class. For example, setting this config to “desktop.auth.backend.LdapSynchronizationBackend” configures Hue to synchronize with the configured LDAP authority.

Design Intelligently

Backends in Hue are extremely powerful and can affect the performance of the server. So, they should be designed in such a fashion that they do not do any operations that block for long periods of time. Also, they should manage the following appropriately:

  1. Throttling requests to whatever service contains the group information.
  2. Ensuring users are authenticated.
  3. Caching if appropriate.


Hue is enterprise grade software ready to integrate with LDAP, SAML, etc. The newest feature, Group Synchronization, ensures corporate authority is fresh in Hue. It’s easy to configure and create backends and Hue comes with an LDAP backend.

Hue is undergoing heavy development and are welcoming external contributions! Have any suggestions? Feel free to tell us what you think through hue-user or @gethue.